1 # Analyzing multiple samples
3 Analyzing a single sample is great, but in the real world you probably
4 have a batch of samples that you need to analyze and then compare.
8 In addition to running command line tools, a workflow step can also
9 execute another workflow.
11 Let's copy "main.cwl" to "alignment.cwl".
13 Now, edit open "main.cwl" for editing. We are going to replace the `steps` and `outputs` sections.
23 out: [qc_html, bam_sorted_indexed, featurecounts]
26 In the outputs section, all the output sources are from the alignment step:
32 outputSource: alignment/qc_html
35 outputSource: alignment/bam_sorted_indexed
38 outputSource: alignment/featurecounts
41 We also need a little boilerplate to tell the workflow runner that we want to use subworkflows:
45 SubworkflowFeatureRequirement: {}
48 If you run this workflow, you will get exactly the same results as
49 before, we've just wrapped the inner workflow with an outer workflow.
53 The wrapper lets us do something useful. We can modify the outer
54 workflow to accept a list of files, and then invoke the inner workflow
55 step for every one of those files. We will need to modify the
56 `inputs`, `steps`, `outputs`, and `requirements` sections.
58 First we change the `fq` parameter to expect a list of files:
67 Next, we add `scatter` to the alignment step. The means it will
68 run `alignment.cwl` for each value in the list in the `fq` parameter.
79 out: [qc_html, bam_sorted_indexed, featurecounts]
82 Because the scatter produces multiple outputs, each output parameter
83 becomes a list as well:
89 outputSource: alignment/qc_html
92 outputSource: alignment/bam_sorted_indexed
95 outputSource: alignment/featurecounts
98 Finally, we need a little more boilerplate to tell the workflow runner
99 that we want to use scatter:
103 SubworkflowFeatureRequirement: {}
104 ScatterFeatureRequirement: {}
107 ### 3. Running with list inputs
109 The `fq` parameter needs to be a list. You write a list in yaml by
110 starting each list item with a dash. Example `main-input.yaml`
115 location: rnaseq/raw_fastq/Mov10_oe_1.subset.fq
116 format: http://edamontology.org/format_1930
118 location: rnaseq/raw_fastq/Mov10_oe_2.subset.fq
119 format: http://edamontology.org/format_1930
121 location: rnaseq/raw_fastq/Mov10_oe_3.subset.fq
122 format: http://edamontology.org/format_1930
124 location: rnaseq/raw_fastq/Irrel_kd_1.subset.fq
125 format: http://edamontology.org/format_1930
127 location: rnaseq/raw_fastq/Irrel_kd_2.subset.fq
128 format: http://edamontology.org/format_1930
130 location: rnaseq/raw_fastq/Irrel_kd_3.subset.fq
131 format: http://edamontology.org/format_1930
134 location: hg19-chr1-STAR-index
137 location: rnaseq/reference_data/chr1-hg19_genes.gtf
140 Now you can run the workflow the same way as in Lesson 2.
142 ### 4. Combining results
144 Each instance of the alignment workflow produces its own featureCounts
145 file. However, to be able to compare results easily, we need them a
146 single file with all the results.
148 The easiest way to do this is to run `featureCounts` just once at the
149 end of the workflow, with all the bam files listed on the command
152 We'll need to modify a few things.
154 First, in `featureCounts.cwl` we need to modify it to accept either a
155 single bam file or list of bam files.
165 Second, in `alignment.cwl` we need to remove the `featureCounts` step from alignment.cwl, as well as the `featurecounts` output parameter.
167 Third, in `main.cwl` we need to remove `featurecounts` from the `alignment` step
168 outputs, and add a new step:
179 out: [qc_html, bam_sorted_indexed]
184 run: featureCounts.cwl
186 counts_input_bam: alignment/bam_sorted_indexed
191 Last, we modify the `featurecounts` output parameter. Instead of a
192 list of files produced by the `alignment` step, it is now a single
193 file produced by the new `featureCounts` step.
200 outputSource: featureCounts/featurecounts
203 Run this workflow to get a single `featurecounts.tsv` file with a column for each bam file.