4 title: "Writing a CWL workflow"
7 Copyright (C) The Arvados Authors. All rights reserved.
9 SPDX-License-Identifier: CC-BY-SA-3.0
12 {% include 'what_is_cwl' %}
14 {% include 'tutorial_expectations' %}
16 h2. Developing workflows
18 For an introduction and and detailed documentation about writing CWL, see the "CWL User Guide":https://www.commonwl.org/user_guide and the "CWL Specification":http://commonwl.org/v1.1 .
20 See "Writing Portable High-Performance Workflows":{{site.baseurl}}/user/cwl/cwl-style.html and "Arvados CWL Extensions":{{site.baseurl}}/user/cwl/cwl-extensions.html for additional information about using CWL on Arvados.
22 See "Repositories of CWL Tools and Workflows":https://www.commonwl.org/#Repositories_of_CWL_Tools_and_Workflows for links to repositories of existing tools for reuse.
24 See "Software for working with CWL":https://www.commonwl.org/#Software_for_working_with_CWL for links to software tools to help create CWL documents.
28 You can create new workflows in the browser using "Arvados Composer":{{site.baseurl}}/user/composer/composer.html
30 h2. Registering a workflow to use in Workbench
32 Use @--create-workflow@ to register a CWL workflow with Arvados. This enables you to share workflows with other Arvados users, and run them by clicking the <span class="btn btn-sm btn-primary"><i class="fa fa-fw fa-gear"></i> Run a process...</span> button on the Workbench Dashboard and on the command line by UUID.
35 <pre><code>~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">arvados-cwl-runner --create-workflow bwa-mem.cwl</span>
36 arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
37 2016-07-01 12:21:01 arvados.arv-run[15796] INFO: Upload local files: "bwa-mem.cwl"
38 2016-07-01 12:21:01 arvados.arv-run[15796] INFO: Uploaded to qr1hi-4zz18-7e0hedrmkuyoei3
39 2016-07-01 12:21:01 arvados.cwl-runner[15796] INFO: Created template qr1hi-p5p6p-rjleou1dwr167v5
40 qr1hi-p5p6p-rjleou1dwr167v5
44 You can provide a partial input file to set default values for the workflow input parameters. You can also use the @--name@ option to set the name of the workflow:
47 <pre><code>~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">arvados-cwl-runner --name "My workflow with defaults" --create-workflow bwa-mem.cwl bwa-mem-template.yml</span>
48 arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
49 2016-07-01 14:09:50 arvados.arv-run[3730] INFO: Upload local files: "bwa-mem.cwl"
50 2016-07-01 14:09:50 arvados.arv-run[3730] INFO: Uploaded to qr1hi-4zz18-0f91qkovk4ml18o
51 2016-07-01 14:09:50 arvados.cwl-runner[3730] INFO: Created template qr1hi-p5p6p-0deqe6nuuyqns2i
52 qr1hi-p5p6p-zuniv58hn8d0qd8
56 h3. Running registered workflows at the command line
58 You can run a registered workflow at the command line by its UUID:
61 <pre><code>~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">arvados-cwl-runner qr1hi-p5p6p-zuniv58hn8d0qd8 --help</span>
62 /home/peter/work/scripts/venv/bin/arvados-cwl-runner 0d62edcb9d25bf4dcdb20d8872ea7b438e12fc59 1.0.20161209192028, arvados-python-client 0.1.20161212125425, cwltool 1.0.20161207161158
63 Resolved 'qr1hi-p5p6p-zuniv58hn8d0qd8' to 'keep:655c6cd07550151b210961ed1d3852cf+57/bwa-mem.cwl'
64 usage: qr1hi-p5p6p-zuniv58hn8d0qd8 [-h] [--PL PL] --group_id GROUP_ID
65 --read_p1 READ_P1 [--read_p2 READ_P2]
66 [--reference REFERENCE] --sample_id
71 job_order Job input json file
74 -h, --help show this help message and exit
77 --read_p1 READ_P1 The reads, in fastq format.
78 --read_p2 READ_P2 For mate paired reads, the second file (optional).
80 The index files produced by `bwa index`
87 When developing a workflow, it is often helpful to run it on the local host to avoid the overhead of submitting to the cluster. To execute a workflow only on the local host (without submitting jobs to an Arvados cluster) you can use the @cwltool@ command. Note that when using @cwltool@ you must have the input data accessible on the local file system using either @arv-mount@ or @arv-get@ to fetch the data from Keep.
90 <pre><code>~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">arv-get 2463fa9efeb75e099685528b3b9071e0+438/ .</span>
91 156 MiB / 156 MiB 100.0%
92 ~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">arv-get ae480c5099b81e17267b7445e35b4bc7+180/ .</span>
93 23 MiB / 23 MiB 100.0%
94 ~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">cwltool bwa-mem-input.yml bwa-mem-input-local.yml</span>
95 cwltool 1.0.20160629140624
96 [job bwa-mem.cwl] /home/example/arvados/doc/user/cwl/bwa-mem$ docker \
99 --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.ann:/var/lib/cwl/job979368791_bwa-mem/19.fasta.ann:ro \
100 --volume=/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq:/var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq:ro \
101 --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.sa:/var/lib/cwl/job979368791_bwa-mem/19.fasta.sa:ro \
102 --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.amb:/var/lib/cwl/job979368791_bwa-mem/19.fasta.amb:ro \
103 --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.pac:/var/lib/cwl/job979368791_bwa-mem/19.fasta.pac:ro \
104 --volume=/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq:/var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq:ro \
105 --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.bwt:/var/lib/cwl/job979368791_bwa-mem/19.fasta.bwt:ro \
106 --volume=/home/example/arvados/doc/user/cwl/bwa-mem:/var/spool/cwl:rw \
107 --volume=/tmp/tmpgzyou9:/tmp:rw \
108 --workdir=/var/spool/cwl \
114 --env=HOME=/var/spool/cwl \
121 '@RG ID:arvados_tutorial PL:illumina SM:HWI-ST1027_129' \
122 /var/lib/cwl/job979368791_bwa-mem/19.fasta \
123 /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq \
124 /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq > /home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.sam
125 [M::bwa_idx_load_from_disk] read 0 ALT contigs
126 [M::process] read 100000 sequences (10000000 bp)...
127 [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 4745, 1, 0)
128 [M::mem_pestat] skip orientation FF as there are not enough pairs
129 [M::mem_pestat] analyzing insert size distribution for orientation FR...
130 [M::mem_pestat] (25, 50, 75) percentile: (154, 181, 214)
131 [M::mem_pestat] low and high boundaries for computing mean and std.dev: (34, 334)
132 [M::mem_pestat] mean and std.dev: (185.63, 44.88)
133 [M::mem_pestat] low and high boundaries for proper pairs: (1, 394)
134 [M::mem_pestat] skip orientation RF as there are not enough pairs
135 [M::mem_pestat] skip orientation RR as there are not enough pairs
136 [M::mem_process_seqs] Processed 100000 reads in 9.848 CPU sec, 9.864 real sec
137 [main] Version: 0.7.12-r1039
138 [main] CMD: bwa mem -t 1 -R @RG ID:arvados_tutorial PL:illumina SM:HWI-ST1027_129 /var/lib/cwl/job979368791_bwa-mem/19.fasta /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq
139 [main] Real time: 10.061 sec; CPU: 10.032 sec
140 Final process status is success
144 "path": "/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.sam",
145 "checksum": "sha1$0c668cca45fef02397bb5302880526d300ee4dac",
152 If you get the error @JavascriptException: Long-running script killed after 20 seconds.@ this may be due to the Dockerized Node.js engine taking too long to start. You may address this by installing Node.js locally (run @apt-get install nodejs@ on Debian or Ubuntu) or by specifying a longer timeout with the @--eval-timeout@ option. For example, run the workflow with @cwltool --eval-timeout=40@ for a 40-second timeout.
154 h2. Making workflows directly executable
156 You can make a workflow file directly executable (@cwl-runner@ should be an alias to @arvados-cwl-runner@) by adding the following line to the top of the file:
159 <pre><code>#!/usr/bin/env cwl-runner
164 <pre><code>~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">./bwa-mem.cwl bwa-mem-input.yml</span>
165 arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
166 2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
167 2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to qr1hi-4zz18-h7ljh5u76760ww2
168 2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job qr1hi-8i9sb-fm2n3b1w0l6bskg
169 2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Running
170 2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Complete
171 2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
174 "path": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
175 "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
183 You can even make an input file directly executable the same way with the following two lines at the top:
186 <pre><code>#!/usr/bin/env cwl-runner
187 cwl:tool: <span class="userinput">bwa-mem.cwl</span>
192 <pre><code>~/arvados/doc/user/cwl/bwa-mem$ <span class="userinput">./bwa-mem-input.yml</span>
193 arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
194 2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
195 2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to qr1hi-4zz18-h7ljh5u76760ww2
196 2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job qr1hi-8i9sb-fm2n3b1w0l6bskg
197 2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Running
198 2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Complete
199 2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
202 "path": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
203 "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
211 h2(#migrate). Differences in running CWL on the legacy jobs API vs containers API
213 Most users can ignore this section.
215 When migrating your Arvados cluster from using the jobs API (--api=jobs) (sometimes referred to as "crunch v1") to the containers API (--api=containers) ("crunch v2") there are a few differences in behavior:
217 A tool may fail to find an input file that could be found when run under the jobs API. This is because tools are limited to accessing collections explicitly listed in the input, and further limited to those individual files or subdirectories that are listed. For example, given an explicit file input @/dir/subdir/file1.txt@, a tool will not be allowed to implicitly access a file in the parent directory @/dir/file2.txt@. Use @secondaryFiles@ or a @Directory@ for files that need to be grouped together.
219 A tool may fail when attempting to rename or delete a file in the output directory. This may happen because files listed in @InitialWorkDirRequirement@ appear in the output directory as normal files (not symlinks) but cannot be moved, renamed or deleted unless marked as "writable" in CWL. These files will be added to the output collection but without any additional copies of the underlying data.
221 A tool may fail when attempting to access the network. This may happen because, unlike the jobs API, under the containers API network access is disabled by default. Tools which require network access should add @arv:APIRequirement: {}@ to the @requirements@ section.