2 title: " Analyzing multiple samples"
6 - "Key question (FIXME)"
8 - "First learning objective. (FIXME)"
10 - "First key point. Brief Answer to questions. (FIXME)"
13 Analyzing a single sample is great, but in the real world you probably
14 have a batch of samples that you need to analyze and then compare.
18 In addition to running command line tools, a workflow step can also
19 execute another workflow.
21 Let's copy "main.cwl" to "alignment.cwl".
23 Now, edit open "main.cwl" for editing. We are going to replace the `steps` and `outputs` sections.
33 out: [qc_html, bam_sorted_indexed, featurecounts]
36 In the outputs section, all the output sources are from the alignment step:
42 outputSource: alignment/qc_html
45 outputSource: alignment/bam_sorted_indexed
48 outputSource: alignment/featurecounts
51 We also need a little boilerplate to tell the workflow runner that we want to use subworkflows:
55 SubworkflowFeatureRequirement: {}
58 If you run this workflow, you will get exactly the same results as
59 before, we've just wrapped the inner workflow with an outer workflow.
63 The wrapper lets us do something useful. We can modify the outer
64 workflow to accept a list of files, and then invoke the inner workflow
65 step for every one of those files. We will need to modify the
66 `inputs`, `steps`, `outputs`, and `requirements` sections.
68 First we change the `fq` parameter to expect a list of files:
77 Next, we add `scatter` to the alignment step. The means it will
78 run `alignment.cwl` for each value in the list in the `fq` parameter.
89 out: [qc_html, bam_sorted_indexed, featurecounts]
92 Because the scatter produces multiple outputs, each output parameter
93 becomes a list as well:
99 outputSource: alignment/qc_html
102 outputSource: alignment/bam_sorted_indexed
105 outputSource: alignment/featurecounts
108 Finally, we need a little more boilerplate to tell the workflow runner
109 that we want to use scatter:
113 SubworkflowFeatureRequirement: {}
114 ScatterFeatureRequirement: {}
117 # 3. Running with list inputs
119 The `fq` parameter needs to be a list. You write a list in yaml by
120 starting each list item with a dash. Example `main-input.yaml`
125 location: rnaseq/raw_fastq/Mov10_oe_1.subset.fq
126 format: http://edamontology.org/format_1930
128 location: rnaseq/raw_fastq/Mov10_oe_2.subset.fq
129 format: http://edamontology.org/format_1930
131 location: rnaseq/raw_fastq/Mov10_oe_3.subset.fq
132 format: http://edamontology.org/format_1930
134 location: rnaseq/raw_fastq/Irrel_kd_1.subset.fq
135 format: http://edamontology.org/format_1930
137 location: rnaseq/raw_fastq/Irrel_kd_2.subset.fq
138 format: http://edamontology.org/format_1930
140 location: rnaseq/raw_fastq/Irrel_kd_3.subset.fq
141 format: http://edamontology.org/format_1930
144 location: hg19-chr1-STAR-index
147 location: rnaseq/reference_data/chr1-hg19_genes.gtf
150 Now you can run the workflow the same way as in Lesson 2.
152 # 4. Combining results
154 Each instance of the alignment workflow produces its own featureCounts
155 file. However, to be able to compare results easily, we need them a
156 single file with all the results.
158 The easiest way to do this is to run `featureCounts` just once at the
159 end of the workflow, with all the bam files listed on the command
162 We'll need to modify a few things.
164 First, in `featureCounts.cwl` we need to modify it to accept either a
165 single bam file or list of bam files.
175 Second, in `alignment.cwl` we need to remove the `featureCounts` step from alignment.cwl, as well as the `featurecounts` output parameter.
177 Third, in `main.cwl` we need to remove `featurecounts` from the `alignment` step
178 outputs, and add a new step:
189 out: [qc_html, bam_sorted_indexed]
194 run: featureCounts.cwl
196 counts_input_bam: alignment/bam_sorted_indexed
201 Last, we modify the `featurecounts` output parameter. Instead of a
202 list of files produced by the `alignment` step, it is now a single
203 file produced by the new `featureCounts` step.
210 outputSource: featureCounts/featurecounts
213 Run this workflow to get a single `featurecounts.tsv` file with a column for each bam file.