2 title: " Analyzing multiple samples"
6 - "Key question (FIXME)"
8 - "First learning objective. (FIXME)"
10 - "First key point. Brief Answer to questions. (FIXME)"
13 # Analyzing multiple samples
15 Analyzing a single sample is great, but in the real world you probably
16 have a batch of samples that you need to analyze and then compare.
20 In addition to running command line tools, a workflow step can also
21 execute another workflow.
23 Let's copy "main.cwl" to "alignment.cwl".
25 Now, edit open "main.cwl" for editing. We are going to replace the `steps` and `outputs` sections.
35 out: [qc_html, bam_sorted_indexed, featurecounts]
38 In the outputs section, all the output sources are from the alignment step:
44 outputSource: alignment/qc_html
47 outputSource: alignment/bam_sorted_indexed
50 outputSource: alignment/featurecounts
53 We also need a little boilerplate to tell the workflow runner that we want to use subworkflows:
57 SubworkflowFeatureRequirement: {}
60 If you run this workflow, you will get exactly the same results as
61 before, we've just wrapped the inner workflow with an outer workflow.
65 The wrapper lets us do something useful. We can modify the outer
66 workflow to accept a list of files, and then invoke the inner workflow
67 step for every one of those files. We will need to modify the
68 `inputs`, `steps`, `outputs`, and `requirements` sections.
70 First we change the `fq` parameter to expect a list of files:
79 Next, we add `scatter` to the alignment step. The means it will
80 run `alignment.cwl` for each value in the list in the `fq` parameter.
91 out: [qc_html, bam_sorted_indexed, featurecounts]
94 Because the scatter produces multiple outputs, each output parameter
95 becomes a list as well:
101 outputSource: alignment/qc_html
104 outputSource: alignment/bam_sorted_indexed
107 outputSource: alignment/featurecounts
110 Finally, we need a little more boilerplate to tell the workflow runner
111 that we want to use scatter:
115 SubworkflowFeatureRequirement: {}
116 ScatterFeatureRequirement: {}
119 ### 3. Running with list inputs
121 The `fq` parameter needs to be a list. You write a list in yaml by
122 starting each list item with a dash. Example `main-input.yaml`
127 location: rnaseq/raw_fastq/Mov10_oe_1.subset.fq
128 format: http://edamontology.org/format_1930
130 location: rnaseq/raw_fastq/Mov10_oe_2.subset.fq
131 format: http://edamontology.org/format_1930
133 location: rnaseq/raw_fastq/Mov10_oe_3.subset.fq
134 format: http://edamontology.org/format_1930
136 location: rnaseq/raw_fastq/Irrel_kd_1.subset.fq
137 format: http://edamontology.org/format_1930
139 location: rnaseq/raw_fastq/Irrel_kd_2.subset.fq
140 format: http://edamontology.org/format_1930
142 location: rnaseq/raw_fastq/Irrel_kd_3.subset.fq
143 format: http://edamontology.org/format_1930
146 location: hg19-chr1-STAR-index
149 location: rnaseq/reference_data/chr1-hg19_genes.gtf
152 Now you can run the workflow the same way as in Lesson 2.
154 ### 4. Combining results
156 Each instance of the alignment workflow produces its own featureCounts
157 file. However, to be able to compare results easily, we need them a
158 single file with all the results.
160 The easiest way to do this is to run `featureCounts` just once at the
161 end of the workflow, with all the bam files listed on the command
164 We'll need to modify a few things.
166 First, in `featureCounts.cwl` we need to modify it to accept either a
167 single bam file or list of bam files.
177 Second, in `alignment.cwl` we need to remove the `featureCounts` step from alignment.cwl, as well as the `featurecounts` output parameter.
179 Third, in `main.cwl` we need to remove `featurecounts` from the `alignment` step
180 outputs, and add a new step:
191 out: [qc_html, bam_sorted_indexed]
196 run: featureCounts.cwl
198 counts_input_bam: alignment/bam_sorted_indexed
203 Last, we modify the `featurecounts` output parameter. Instead of a
204 list of files produced by the `alignment` step, it is now a single
205 file produced by the new `featureCounts` step.
212 outputSource: featureCounts/featurecounts
215 Run this workflow to get a single `featurecounts.tsv` file with a column for each bam file.