X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/d071c34ca20aa86a5a053abcffb7414dbd8f4933..2d4b14cf28c71d3aa7e0417a2951806c15e29fb1:/doc/user/topics/arv-run.html.textile.liquid diff --git a/doc/user/topics/arv-run.html.textile.liquid b/doc/user/topics/arv-run.html.textile.liquid index 0d7d8c1643..93fc2c0f34 100644 --- a/doc/user/topics/arv-run.html.textile.liquid +++ b/doc/user/topics/arv-run.html.textile.liquid @@ -4,7 +4,11 @@ navsection: userguide title: "Using arv-run" ... -The @arv-run@ command enables you create Arvados pipelines at the command line that fan out to multiple concurrent tasks across Arvado compute nodes. +{% include 'crunch1only_begin' %} +On those sites, the features described here are not yet implemented. +{% include 'crunch1only_end' %} + +The @arv-run@ command enables you create Arvados pipelines at the command line that fan out to multiple concurrent tasks across Arvados compute nodes. {% include 'tutorial_expectations' %} @@ -24,16 +28,18 @@ HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACT $ arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC HWI-ST1027_129_D0THKACXX.1_1.fastq Running pipeline qr1hi-d1hrv-mg3bju0u7r6w241 [...] -Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq -Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:14:TCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCCCAACCTA -Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:18:AACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCT -Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACG -Thu Oct 16 17:30:42 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr run-command: completed with exit code 0 (success) + 0 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq + 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:14:TCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCCCAACCTA + 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:18:AACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCT + 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACG + 0 stderr run-command: completed with exit code 0 (success) [...] -A key feature of @arv-run@ is the ability to introspect the command line to determine which arguments are file inputs, and transform those paths so they are usable inside the Arvados container. In the above example, @HWI-ST1027_129_D0THKACXX.1_2.fastq@ is transformed into @/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq@. In the above example, @arv-run@ works together with @arv-mount@ to identify that the file is already part of an Arvados collection. In this case, it will use the existing collection without any upload step. If you specify a file that is only available on the local filesystem, @arv-run@ will upload a new collection and use that. +A key feature of @arv-run@ is the ability to introspect the command line to determine which arguments are file inputs, and transform those paths so they are usable inside the Arvados container. In the above example, @HWI-ST1027_129_D0THKACXX.1_2.fastq@ is transformed into @/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq@. @arv-run@ also works together with @arv-mount@ to identify that the file is already part of an Arvados collection. In this case, it will use the existing collection without any upload step. If you specify a file that is only available on the local filesystem, @arv-run@ will upload a new collection. + +If you find that @arv-run@ is incorrectly rewriting one of your command line arguments, place a backslash @\@ at the beginning of the affected argument to quote it (suppress rewriting). h2. Parallel tasks @@ -41,61 +47,112 @@ h2. Parallel tasks
+$ cd ~/keep/by_id/3229739b505d2b878b62aed09895a55a+142
+$ ls *.fastq
 HWI-ST1027_129_D0THKACXX.1_1.fastq  HWI-ST1027_129_D0THKACXX.1_2.fastq
 $ arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC -- *.fastq
 Running pipeline qr1hi-d1hrv-mg3bju0u7r6w241
 [...]
-Thu Oct 16 19:27:42 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 0 stderr run-command: parallelizing on input0 with items [u'/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq', u'/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq']
+ 0 stderr run-command: parallelizing on input0 with items [u'/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq', u'/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq']
 [...]
-Thu Oct 16 19:27:45 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq
-Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 2 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq
+ 1 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq
+ 2 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq
 [...]
-Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:14:TCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCCCAACCTA
-Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:18:AACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCT
-Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACG
-Thu Oct 16 19:27:47 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr run-command: completed with exit code 0 (success)
-Thu Oct 16 19:27:47 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 2 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq:34:CTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAG
-Thu Oct 16 19:27:47 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 2 stderr run-command: completed with exit code 0 (success)
+ 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:14:TCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCCCAACCTA
+ 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:18:AACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCT
+ 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACG
+ 1 stderr run-command: completed with exit code 0 (success)
+ 2 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq:34:CTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAG
+ 2 stderr run-command: completed with exit code 0 (success)
 
-You may use also stdin @<@ redirection on multiple files. This will create a separate task for each input file. Because the syntax is designed to mimic standard shell syntax, it is necessary to quote the metacharacters @<@, @>@ and @|@ as either @\<@, @\>@ and @\|@ or @'<'@, @'>'@ and @'|'@. +You may specify @--batch-size N@ (or the short form @-bN@) after the @--@ but before listing any files to specify how many files to provide put on the command line for each task. See "Putting it all together" below for an example. - -
-$ arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC \< *.fastq \> output.txt
-
-
+h2. Redirection -You are only permitted to supply a single file name for stdout @>@ redirection. If there are multiple tasks, their output will be collated at the end of the pipeline. Alternately, you may use "run-command":run-command.html parameter substitution in the file name to generate different filenames for each task. +You may use standard input (@<@) and standard output (@>@) redirection. This will create a separate task for each file listed in standard input. You are only permitted to supply a single file name for stdout @>@ redirection. If there are multiple tasks with their output sent to the same file, the output will be collated at the end of the pipeline. -Multiple commands connected by pipes all execute in the same container. If you need to capture intermediate results of a pipe, use the @tee@ command. +(Note: because the syntax is designed to mimic standard shell syntax, it is necessary to quote the metacharacters @<@, @>@ and @|@ as either @\<@, @\>@ and @\|@ or @'<'@, @'>'@ and @'|'@.) -@arv-run@ commands always run inside a Docker image. By default, this is "arvados/jobs". Use @arv --docker-image IMG@ to specify the image to use. Note: the Docker image must be uploaded to Arvados using @arv keep docker@. +{% include 'arv_run_redirection' %} -Use @arv-run --dry-run@ to print out the final Arvados pipeline generated by @arv-run@ without submitting it. +You may use "run-command":run-command.html parameter substitution in the output file name to generate different filenames for each task: -By default, the pipeline will be submitted to your configured Arvado instance. Use @arv-run --local@ to run the command locally using "arv-crunch-job". + +
+$ cd ~/keep/by_id/3229739b505d2b878b62aed09895a55a+142
+$ ls *.fastq
+$ arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC \< *.fastq \> '$(task.uuid).txt'
+[...]
+ 1 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC < /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq > qr1hi-ot0gb-hmmxf2zubfpmhfk.txt
+ 2 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC < /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq > qr1hi-ot0gb-iu2xgy4hkx4mmri.txt
+ 1 stderr run-command: completed with exit code 0 (success)
+ 1 stderr run-command: the following output files will be saved to keep:
+ 1 stderr run-command:          363 ./qr1hi-ot0gb-hmmxf2zubfpmhfk.txt
+ 1 stderr run-command: start writing output to keep
+ 1 stderr upload wrote 363 total 363
+ 2 stderr run-command: completed with exit code 0 (success)
+ 2 stderr run-command: the following output files will be saved to keep:
+ 2 stderr run-command:          121 ./qr1hi-ot0gb-iu2xgy4hkx4mmri.txt
+ 2 stderr run-command: start writing output to keep
+ 2 stderr upload wrote 121 total 121
+[...]
+
+
-You may specify @--batch-size N@ after the @--@ but before listing any files to specify how many files to provide put on the command line for each task. +h2. Pipes -h1. Examples +Multiple commands may be connected by pipes and execute in the same container: -Run one @grep@ task per file, with each input files piped from stdin. Redirect the output to output.txt. + +
+$ cd ~/keep/by_id/3229739b505d2b878b62aed09895a55a+142
+$ ls *.fastq
+$ arv-run cat -- *.fastq \| grep -H -n ATTGGAGGAAAGATGAGTGAC \> output.txt
+[...]
+ 1 stderr run-command: cat /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq | grep -H -n ATTGGAGGAAAGATGAGTGAC > output.txt
+ 2 stderr run-command: cat /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq | grep -H -n ATTGGAGGAAAGATGAGTGAC > output.txt
+[...]
+
+
+If you need to capture intermediate results of a pipe, use the @tee@ command. -Run @cat | grep@ once per file. Redirect the output to output.txt. +h2. Running a shell script
-$ arv-run cat -- *.fastq \| grep -H -n ATTGGAGGAAAGATGAGTGAC \> output.txt
+$ echo 'echo hello world' > hello.sh
+$ arv-run /bin/sh hello.sh
+Upload local files: "hello.sh"
+Uploaded to qr1hi-4zz18-23u3hxugbm71qmn
+Running pipeline qr1hi-d1hrv-slcnhq5czo764b1
+[...]
+ 0 stderr run-command: /bin/sh /keep/5d3a4131b7d8f233f2a917d8a5c3c2b2+52/hello.sh
+ 0 stderr hello world
+ 0 stderr run-command: completed with exit code 0 (success)
+[...]
 
-Run @bwa@ for pairs of fastq files in "inputs" using the reference human_g1k_v37.fasta. +h2. Additional options + +* @--docker-image IMG@ : By default, commands run based in a container created from the @default_docker_image_for_jobs@ setting on the API server. Use this option to specify a different image to use. Note: the Docker image must be uploaded to Arvados using @arv keep docker@. +* @--dry-run@ : Print out the final Arvados pipeline generated by @arv-run@ without submitting it. +* @--local@ : By default, the pipeline will be submitted to your configured Arvados instance. Use this option to run the command locally using @arv-run-pipeline-instance --run-jobs-here@. +* @--ignore-rcode@ : Some commands use non-zero exit codes to indicate nonfatal conditions (e.g. @grep@ returns 1 when no match is found). Set this to indicate that commands that return non-zero return codes should not be considered failed. +* @--no-wait@ : Do not wait and display logs after submitting command, just exit. + +h2. Putting it all together: bwa mem
-arv-run --docker-image arvados/jobs-java-bwa-samtools bwa mem reference/human_g1k_v37.fasta -- --batch-size 2 inputs/*.fastq \> '$(task.uuid).sam'
+$ cd ~/keep/by_id/d0136bc494c21f79fc1b6a390561e6cb+2778
+$ arv-run --docker-image arvados/jobs-java-bwa-samtools bwa mem ../3514b8e5da0e8d109946bc809b20a78a+5698/human_g1k_v37.fasta -- --batch-size 2 *.fastq.gz \> '$(task.uuid).sam'
+ 0 stderr run-command: parallelizing on input0 with items [[u'/keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.1_1.fastq.gz', u'/keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.1_2.fastq.gz'], [u'/keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.2_1.fastq.gz', u'/keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.2_2.fastq.gz']]
+[...]
+ 1 stderr run-command: bwa mem /keep/3514b8e5da0e8d109946bc809b20a78a+5698/human_g1k_v37.fasta /keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.1_1.fastq.gz /keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.1_2.fastq.gz > qr1hi-ot0gb-a4bzzyqqz4ubair.sam
+ 2 stderr run-command: bwa mem /keep/3514b8e5da0e8d109946bc809b20a78a+5698/human_g1k_v37.fasta /keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.2_1.fastq.gz /keep/d0136bc494c21f79fc1b6a390561e6cb+2778/HWI-ST1027_129_D0THKACXX.2_2.fastq.gz > qr1hi-ot0gb-14j9ncw0ymkxq0v.sam