-~$ git clone https://github.com/curoverse/arvados
-~$ cd arvados/doc/user/cwl/bwa-mem
+~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner bwa-mem.cwl bwa-mem-input.yml
+arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
+2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
+2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
+2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
+2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
+2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
+2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
+{
+ "aligned_sam": {
+ "location": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
+ "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
+ "class": "File",
+ "size": 30738986
+ }
+}
-The tutorial data is hosted on "https://cloud.curoverse.com":https://cloud.curoverse.com (also referred to by the identifier *qr1hi*). If you are using a different Arvados instance, you may need to copy the data to your own instance. The easiest way to do this is with "arv-copy":{{site.baseurl}}/user/topics/arv-copy.html (this requires signing up for a free cloud.curoverse.com account).
+h3. Referencing files
+
+When running a workflow on an Arvados cluster, the input files must be stored in Keep. There are several ways this can happen.
+
+A URI reference to Keep uses the @keep:@ scheme followed by either the portable data hash or UUID of the collection and then the location of the file inside the collection. For example, @keep:2463fa9efeb75e099685528b3b9071e0+438/19.fasta.bwt@ or @keep:zzzzz-4zz18-zzzzzzzzzzzzzzz/19.fasta.bwt@.
+
+If you reference a file in "arv-mount":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-gnu-linux.html, such as @/home/example/keep/by_id/2463fa9efeb75e099685528b3b9071e0+438/19.fasta.bwt@, then @arvados-cwl-runner@ will automatically determine the appropriate Keep URI reference.
+
+If you reference a local file which is not in @arv-mount@, then @arvados-cwl-runner@ will upload the file to Keep and use the Keep URI reference from the upload.
+
+You can also execute CWL files that have been uploaded Keep:
-~$ arv-copy --src qr1hi --dst settings 2463fa9efeb75e099685528b3b9071e0+438
-~$ arv-copy --src qr1hi --dst settings ae480c5099b81e17267b7445e35b4bc7+180
+
+~/arvados/doc/user/cwl/bwa-mem$ arv-put --portable-data-hash --name "bwa-mem.cwl" bwa-mem.cwl
+2020-08-20 13:40:02 arvados.arv_put[12976] INFO: Collection saved as 'bwa-mem.cwl'
+f141fc27e7cfa7f7b6d208df5e0ee01b+59
+~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner keep:f141fc27e7cfa7f7b6d208df5e0ee01b+59/bwa-mem.cwl bwa-mem-input.yml
+arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
+2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
+2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
+2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
+2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
+2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
+{
+ "aligned_sam": {
+ "location": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
+ "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
+ "class": "File",
+ "size": 30738986
+ }
+}
-If you do not wish to create an account on "https://cloud.curoverse.com":https://cloud.curoverse.com, you may download the files anonymously and upload them to your local Arvados instance:
+Note: uploading a workflow file to Keep is _not_ the same as registering the workflow for use in Workbench. See "Registering a workflow to use in Workbench":#registering below.
+
+h3. Work reuse
-"https://cloud.curoverse.com/collections/2463fa9efeb75e099685528b3b9071e0+438":https://cloud.curoverse.com/collections/2463fa9efeb75e099685528b3b9071e0+438
+Workflows submitted with @arvados-cwl-runner@ will take advantage of Arvados job reuse. If you submit a workflow which is identical to one that has run before, it will short cut the execution and return the result of the previous run. This also applies to individual workflow steps. For example, a two step workflow where the first step has run before will reuse results for first step and only execute the new second step. You can disable this behavior with @--disable-reuse@.
-"https://cloud.curoverse.com/collections/ae480c5099b81e17267b7445e35b4bc7+180":https://cloud.curoverse.com/collections/ae480c5099b81e17267b7445e35b4bc7+180
+h3(#docker). Docker images
-h2. Submitting a workflow to an Arvados cluster
+Docker images referenced by the workflow must be uploaded to Arvados. This requires @docker@ to be installed and usable by the user running @arvados-cwl-runner@. If the image is not present in the local Docker instance, @arvados-cwl-runner@ will first attempt to pull the image using @docker pull@, then upload it.
-{% include 'arvados_cwl_runner' %}
+If there is already a Docker image in Arvados with the same name, it will use the existing image. In this case, the submitter will not use Docker.
-h2. Work reuse
+The @--match-submitter-images@ option will check the id of the image in the local Docker instance and compare it to the id of the image already in Arvados with the same name and tag. If they are different, it will choose the image matching the local image id, which will be uploaded it if necessary. This helpful for development, if you locally rebuild the image with the 'latest' tag, the @--match-submitter-images@ will ensure that the newer version is used.
-Workflows submitted with @arvados-cwl-runner@ will take advantage of Arvados job reuse. If you submit a workflow which is identical to one that has run before, it will short cut the execution and return the result of the previous run. This also applies to individual workflow steps. For example, a two step workflow where the first step has run before will reuse results for first step and only execute the new second step. You can disable this behavior with @--disable-reuse@.
+h3(#dependencies). Dependencies
-h2. Referencing files
+Dependencies include collections and Docker images referenced by the workflow. Dependencies are automatically uploaded by @arvados-cwl-runner@ if they are not present or need to be updated. When running a workflow, dependencies that already exist somewhere on the Arvados instance (from a previous upload) will not be uploaded or copied, regardless of the project they are located in. Sometimes this creates problems when sharing a workflow run with others. In this case, use @--copy-deps@ to indicate that you want all dependencies to be copied into the destination project (specified by @--project-uuid@).
-When running a workflow on an Arvados cluster, the input files must be stored in Keep. There are several ways this can happen.
+h3. Command line options
-A URI reference to Keep uses the @keep:@ scheme followed by the portable data hash, collection size, and path to the file inside the collection. For example, @keep:2463fa9efeb75e099685528b3b9071e0+438/19.fasta.bwt@.
+See "arvados-cwl-runner options":{{site.baseurl}}/user/cwl/cwl-run-options.html
-If you reference a file in "arv-mount":{{site.baseurl}}/user/tutorials/tutorial-keep-mount.html, such as @/home/example/keep/by_id/2463fa9efeb75e099685528b3b9071e0+438/19.fasta.bwt@, then @arvados-cwl-runner@ will automatically determine the appropriate Keep URI reference.
+h2(#registering). Registering a workflow to use in Workbench
-If you reference a local file which is not in @arv-mount@, then @arvados-cwl-runner@ will upload the file to Keep and use the Keep URI reference from the upload.
+Use @--create-workflow@ to register a CWL workflow with Arvados. Use @--project-uuid@ to upload the workflow to a specific project, along with its dependencies. You can share the workflow with other Arvados users by sharing that project. You can run the workflow by clicking the @ to update an existing workflow.
+
+
+~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner --update-workflow zzzzz-p5p6p-zuniv58hn8d0qd8 bwa-mem.cwl
+arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
+2016-07-01 12:21:01 arvados.arv-run[15796] INFO: Upload local files: "bwa-mem.cwl"
+2016-07-01 12:21:01 arvados.arv-run[15796] INFO: Uploaded to zzzzz-4zz18-7e0hedrmkuyoei3
+2016-07-01 12:21:01 arvados.cwl-runner[15796] INFO: Created template zzzzz-p5p6p-zuniv58hn8d0qd8
+zzzzz-p5p6p-zuniv58hn8d0qd8
+
+
-h2. Registering a workflow to use in Workbench
+When using @--create-workflow@ or @--update-workflow@, the @--copy-deps@ and @--match-submitter-images@ options are enabled by default.
-Use @--create-workflow@ to register a CWL workflow with Arvados. This enables you to share workflows with other Arvados users, and run them by clicking the Run a process... button on the Workbench Dashboard.
+h3. Running registered workflows at the command line
-{% include 'register_cwl_workflow' %}
+You can run a registered workflow at the command line by its UUID:
-h2. Making workflows directly executable
+
+~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner pirca-7fd4e-3nqbw08vtjl8ybz --help
+INFO /home/peter/work/scripts/venv3/bin/arvados-cwl-runner 2.1.0.dev20200814195416, arvados-python-client 2.1.0.dev20200814195416, cwltool 3.0.20200807132242
+INFO Resolved 'pirca-7fd4e-3nqbw08vtjl8ybz' to 'arvwf:pirca-7fd4e-3nqbw08vtjl8ybz#main'
+usage: pirca-7fd4e-3nqbw08vtjl8ybz [-h] [--PL PL] [--group_id GROUP_ID]
+ [--read_p1 READ_P1] [--read_p2 READ_P2]
+ [--reference REFERENCE]
+ [--sample_id SAMPLE_ID]
+ [job_order]
+
+positional arguments:
+ job_order Job input json file
+
+optional arguments:
+ -h, --help show this help message and exit
+ --PL PL
+ --group_id GROUP_ID
+ --read_p1 READ_P1 The reads, in fastq format.
+ --read_p2 READ_P2 For mate paired reads, the second file (optional).
+ --reference REFERENCE
+ The index files produced by `bwa index`
+ --sample_id SAMPLE_ID
+
+
+
+h2(#executable). Make a workflow file directly executable
You can make a workflow file directly executable (@cwl-runner@ should be an alias to @arvados-cwl-runner@) by adding the following line to the top of the file:
@@ -113,10 +211,10 @@ You can make a workflow file directly executable (@cwl-runner@ should be an alia
~/arvados/doc/user/cwl/bwa-mem$ ./bwa-mem.cwl bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
-2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to qr1hi-4zz18-h7ljh5u76760ww2
-2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job qr1hi-8i9sb-fm2n3b1w0l6bskg
-2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Running
-2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Complete
+2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
+2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
+2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
+2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
{
"aligned_sam": {
@@ -141,10 +239,10 @@ cwl:tool: bwa-mem.cwl
~/arvados/doc/user/cwl/bwa-mem$ ./bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
-2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to qr1hi-4zz18-h7ljh5u76760ww2
-2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job qr1hi-8i9sb-fm2n3b1w0l6bskg
-2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Running
-2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (qr1hi-8i9sb-fm2n3b1w0l6bskg) is Complete
+2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
+2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
+2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
+2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
{
"aligned_sam": {
@@ -157,75 +255,6 @@ arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107,
-h2. Developing workflows
-
-For an introduction and and detailed documentation about writing CWL, see the "CWL User Guide":http://commonwl.org/v1.0/UserGuide.html and the "CWL Specification":http://commonwl.org/v1.0 .
-
-To run on Arvados, a workflow should provide a @DockerRequirement@ in the @hints@ section.
-
-When developing a workflow, it is often helpful to run it on the local host to avoid the overhead of submitting to the cluster. To execute a workflow only on the local host (without submitting jobs to an Arvados cluster) you can use the @cwltool@ command. Note that you must also have the input data accessible on the local host. You can use @arv-get@ to fetch the data from Keep.
-
-
-~/arvados/doc/user/cwl/bwa-mem$ arv-get 2463fa9efeb75e099685528b3b9071e0+438/ .
-156 MiB / 156 MiB 100.0%
-~/arvados/doc/user/cwl/bwa-mem$ arv-get ae480c5099b81e17267b7445e35b4bc7+180/ .
-23 MiB / 23 MiB 100.0%
-~/arvados/doc/user/cwl/bwa-mem$ cwltool bwa-mem-input.yml bwa-mem-input-local.yml
-cwltool 1.0.20160629140624
-[job bwa-mem.cwl] /home/example/arvados/doc/user/cwl/bwa-mem$ docker \
- run \
- -i \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.ann:/var/lib/cwl/job979368791_bwa-mem/19.fasta.ann:ro \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq:/var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq:ro \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.sa:/var/lib/cwl/job979368791_bwa-mem/19.fasta.sa:ro \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.amb:/var/lib/cwl/job979368791_bwa-mem/19.fasta.amb:ro \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.pac:/var/lib/cwl/job979368791_bwa-mem/19.fasta.pac:ro \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq:/var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq:ro \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.bwt:/var/lib/cwl/job979368791_bwa-mem/19.fasta.bwt:ro \
- --volume=/home/example/arvados/doc/user/cwl/bwa-mem:/var/spool/cwl:rw \
- --volume=/tmp/tmpgzyou9:/tmp:rw \
- --workdir=/var/spool/cwl \
- --read-only=true \
- --log-driver=none \
- --user=1001 \
- --rm \
- --env=TMPDIR=/tmp \
- --env=HOME=/var/spool/cwl \
- biodckr/bwa \
- bwa \
- mem \
- -t \
- 1 \
- -R \
- '@RG ID:arvados_tutorial PL:illumina SM:HWI-ST1027_129' \
- /var/lib/cwl/job979368791_bwa-mem/19.fasta \
- /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq \
- /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq > /home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.sam
-[M::bwa_idx_load_from_disk] read 0 ALT contigs
-[M::process] read 100000 sequences (10000000 bp)...
-[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 4745, 1, 0)
-[M::mem_pestat] skip orientation FF as there are not enough pairs
-[M::mem_pestat] analyzing insert size distribution for orientation FR...
-[M::mem_pestat] (25, 50, 75) percentile: (154, 181, 214)
-[M::mem_pestat] low and high boundaries for computing mean and std.dev: (34, 334)
-[M::mem_pestat] mean and std.dev: (185.63, 44.88)
-[M::mem_pestat] low and high boundaries for proper pairs: (1, 394)
-[M::mem_pestat] skip orientation RF as there are not enough pairs
-[M::mem_pestat] skip orientation RR as there are not enough pairs
-[M::mem_process_seqs] Processed 100000 reads in 9.848 CPU sec, 9.864 real sec
-[main] Version: 0.7.12-r1039
-[main] CMD: bwa mem -t 1 -R @RG ID:arvados_tutorial PL:illumina SM:HWI-ST1027_129 /var/lib/cwl/job979368791_bwa-mem/19.fasta /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq
-[main] Real time: 10.061 sec; CPU: 10.032 sec
-Final process status is success
-{
- "aligned_sam": {
- "size": 30738959,
- "path": "/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.sam",
- "checksum": "sha1$0c668cca45fef02397bb5302880526d300ee4dac",
- "class": "File"
- }
-}
-
-
+h2(#setup). Setting up arvados-cwl-runner
-If you get the error @JavascriptException: Long-running script killed after 20 seconds.@ this may be due to the Dockerized Node.js engine taking too long to start. You may address this by installing Node.js locally (run @apt-get install nodejs@ on Debian or Ubuntu) or by specifying a longer timeout with the @--eval-timeout@ option. For example, run the workflow with @cwltool --eval-timeout=40@ for a 40-second timeout.
+See "Arvados CWL Runner":{{site.baseurl}}/sdk/python/arvados-cwl-runner.html