2 title: "Analyzing Multiple Samples"
6 - "How can you run the same workflow over multiple samples?"
8 - "Modify the workflow to process multiple samples, then perform a joint analysis."
10 - "Separate the part of the workflow that you want to run multiple times into a subworkflow."
11 - "Use a scatter step to run the subworkflow over a list of inputs."
12 - "The result of a scatter is an array, which can be used in a combine step to get a single result."
15 In the previous lesson, we completed converting the function of the
16 original source shell script into CWL. This lesson expands the scope
17 by demonstrating what changes to make to the workflow to be able to
18 analyze multiple samples in parallel.
22 In addition to running command line tools, a workflow step can also
23 execute another workflow.
25 First, copy `main.cwl` to `alignment.cwl`.
27 Next, open `main.cwl` for editing. We are going to replace the `steps` and `outputs` sections.
29 Remove all the steps and replace them with a single `alignment` step
30 which invokes the `alignment.cwl` we just copied.
40 out: [qc_html, bam_sorted_indexed, featurecounts]
44 In the `outputs` section, all the output sources are from the alignment step:
50 outputSource: alignment/qc_html
53 outputSource: alignment/bam_sorted_indexed
56 outputSource: alignment/featurecounts
60 We also need add "SubworkflowFeatureRequirement" to tell the workflow
61 runner that we are using subworkflows:
65 SubworkflowFeatureRequirement: {}
69 If you run this workflow, you will get exactly the same results as
70 before, as all we have done so far is to wrap the inner workflow with
75 The "wrapper" step lets us do something useful. We can modify the
76 outer workflow to accept a list of files, and then invoke the inner
77 workflow step for every one of those files. We will need to modify
78 the `inputs`, `steps`, `outputs`, and `requirements` sections.
80 First we change the `fq` parameter to expect a list of files:
90 Next, we add `scatter` to the alignment step. The means we want to
91 run run `alignment.cwl` for each value in the list in the `fq`
103 out: [qc_html, bam_sorted_indexed, featurecounts]
107 Because the scatter produces multiple outputs, each output parameter
108 becomes a list as well:
114 outputSource: alignment/qc_html
117 outputSource: alignment/bam_sorted_indexed
120 outputSource: alignment/featurecounts
124 We also need add "ScatterFeatureRequirement" to tell the workflow
125 runner that we are using scatter:
129 SubworkflowFeatureRequirement: {}
130 ScatterFeatureRequirement: {}
134 # Input parameter lists
136 The `fq` parameter needs to be a list. You write a list in yaml by
137 starting each list item with a dash. Example `main-input.yaml`
142 location: rnaseq/raw_fastq/Mov10_oe_1.subset.fq
143 format: http://edamontology.org/format_1930
145 location: rnaseq/raw_fastq/Mov10_oe_2.subset.fq
146 format: http://edamontology.org/format_1930
148 location: rnaseq/raw_fastq/Mov10_oe_3.subset.fq
149 format: http://edamontology.org/format_1930
151 location: rnaseq/raw_fastq/Irrel_kd_1.subset.fq
152 format: http://edamontology.org/format_1930
154 location: rnaseq/raw_fastq/Irrel_kd_2.subset.fq
155 format: http://edamontology.org/format_1930
157 location: rnaseq/raw_fastq/Irrel_kd_3.subset.fq
158 format: http://edamontology.org/format_1930
161 location: hg19-chr1-STAR-index
164 location: rnaseq/reference_data/chr1-hg19_genes.gtf
168 If you run the workflow, you will get results for each one of the
173 Each instance of the alignment workflow produces its own
174 `featurecounts.tsv` file. However, to be able to compare results
175 easily, we would like single file with all the results.
177 We can modify the workflow to run `featureCounts` once at the end of
178 the workflow, taking all the bam files listed on the command line.
180 We will need to change a few things.
182 First, in `featureCounts.cwl` we need to modify it to accept either a
183 single bam file or list of bam files.
194 Second, in `alignment.cwl` we need to remove the `featureCounts` step from alignment.cwl, as well as the `featurecounts` output parameter.
196 Third, in `main.cwl` we need to remove `featurecounts` from the `alignment` step
197 outputs, and add a new step:
208 out: [qc_html, bam_sorted_indexed]
213 run: featureCounts.cwl
215 counts_input_bam: alignment/bam_sorted_indexed
221 Last, we modify the `featurecounts` output parameter. Instead of a
222 list of files produced by the `alignment` step, it is now a single
223 file produced by the new `featureCounts` step.
230 outputSource: featureCounts/featurecounts
234 Run this workflow to get a single `featurecounts.tsv` file with a
235 column for each bam file.