1 # Running and debugging a workflow
3 ### 1. The input parameter file
5 CWL input values are provided in the form of a YAML or JSON file.
6 Create one by right clicking on the explorer, select "New File" and
7 create a called file "main-input.yaml".
9 This file gives the values for parameters declared in the `inputs`
10 section of our workflow. Our workflow takes `fq`, `genome` and `gtf`
13 When setting inputs, Files and Directories are given as an object with
14 `class: File` or `class: Directory`. This distinguishes them from
15 plain strings that may or may not be file paths.
17 Note: if you don't have example sequence data or the STAR index files, see the Appendix below.
22 location: rnaseq/raw_fastq/Mov10_oe_1.subset.fq
23 format: http://edamontology.org/format_1930
26 location: hg19-chr1-STAR-index
29 location: rnaseq/reference_data/chr1-hg19_genes.gtf
37 location: keep:9178fe1b80a08a422dbe02adfd439764+925/raw_fastq/Mov10_oe_1.subset.fq
38 format: http://edamontology.org/format_1930
41 location: keep:02a12ce9e2707610991bd29d38796b57+2912
44 location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1-hg19_genes.gtf
47 ### 2. Running the workflow
49 Type this into the terminal:
52 cwl-runner main.cwl main-input.yaml
55 On Arvados with vscode, select "main.cwl" and then choose "Terminal -> Run task -> Run CWL workflow on Arvados"
57 ### 3. Debugging the workflow
59 A workflow can fail for many reasons: some possible reasons include
60 bad input, bugs in the code, or running out memory. In this case, the
61 STAR workflow might fail with an out of memory error.
63 To help diagnose these errors, the workflow runner produces logs that
64 record what happened, either in the terminal or the web interface.
66 Some errors you might see in the logs that would indicate an out of
70 EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc
71 Possible cause 1: not enough RAM. Check if you have enough RAM 5711762337 bytes
72 Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 5711762337
78 Container exited with code: 137
81 (Exit code 137 most commonly occurs when a container goes "out of memory" and is terminated by the operating system).
83 If this happens, you will need to request more RAM.
85 ### 4. Setting runtime RAM requirements
87 By default, a step is allocated 256 MB of RAM. From the STAR error message:
89 > Check if you have enough RAM 5711762337 bytes
91 We can see that STAR requires quite a bit more RAM than that. To
92 request more RAM, add a "requirements" section with
93 "ResourceRequirement" to the "STAR" step:
100 run: bio-cwl-tools/STAR/STAR-Align.cwl
103 Resource requirements you can set include:
105 * coresMin: CPU cores
106 * ramMin: RAM (in megabytes)
107 * tmpdirMin: temporary directory available space
108 * outdirMin: output directory available space
110 After setting the RAM requirements, re-run the workflow.
112 ### 5. Workflow results
114 The CWL runner will print a results JSON object to standard output. It will look something like this (it may include additional fields).
119 "bam_sorted_indexed": {
120 "location": "file:///home/username/rnaseq-cwl-training-exercises/Aligned.sortedByCoord.out.bam",
121 "basename": "Aligned.sortedByCoord.out.bam",
126 "basename": "Aligned.sortedByCoord.out.bam.bai",
127 "location": "file:///home/username/rnaseq-cwl-training-exercises/Aligned.sortedByCoord.out.bam.bai",
134 "location": "file:///home/username/rnaseq-cwl-training-exercises/Mov10_oe_1.subset_fastqc.html",
135 "basename": "Mov10_oe_1.subset_fastqc.html",
142 This has the same structure as `main-input.yaml`. The each output
143 parameter is listed, with the `location` field of each `File` object
144 indicating where the output file can be found.
148 ## Downloading sample and reference data
150 Start from your rnaseq-cwl-exercises directory.
155 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
158 ## Downloading or generating STAR index
160 Running STAR requires index files generated from the reference.
162 This is a rather large download (4 GB). Depending on your bandwidth, it may be faster to generate it yourself.
166 Go to the "Terminal" tab in the lower vscode panel. If necessary, select `bash` from the dropdown list in the upper right corner.
169 mkdir hg19-chr1-STAR-index
170 cd hg19-chr1-STAR-index
171 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
176 Create `chr1-star-index.yaml`:
181 location: rnaseq/reference_data/chr1.fa
182 format: http://edamontology.org/format_1930
183 IndexName: 'hg19-chr1-STAR-index'
186 location: rnaseq/reference_data/chr1-hg19_genes.gtf
190 Next, go to the "Terminal" tab in the lower vscode panel. If
191 necessary, select `bash` from the dropdown list in the upper right
192 corner. Generate the index with your local cwl-runner.
195 cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml