--- layout: default navsection: userguide title: "Using arv-run" ... The @arv-run@ command enables you create Arvados pipelines at the command line that fan out to multiple concurrent tasks across Arvado compute nodes. {% include 'tutorial_expectations' %} h1. Usage Using @arv-run@ you can write and test command lines interactively, then insert @arv-run@ at the beginning of the command line to run the command on Arvados. For example:
$ cd ~/keep/by_id/3229739b505d2b878b62aed09895a55a+142
$ ls *.fastq
HWI-ST1027_129_D0THKACXX.1_1.fastq  HWI-ST1027_129_D0THKACXX.1_2.fastq
$ grep -H -n ATTGGAGGAAAGATGAGTGAC HWI-ST1027_129_D0THKACXX.1_1.fastq
HWI-ST1027_129_D0THKACXX.1_1.fastq:14:TCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCCCAACCTA
HWI-ST1027_129_D0THKACXX.1_1.fastq:18:AACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCT
HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACG
$ arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC HWI-ST1027_129_D0THKACXX.1_1.fastq
Running pipeline qr1hi-d1hrv-mg3bju0u7r6w241
[...]
Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq
Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:14:TCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCCCAACCTA
Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:18:AACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCT
Thu Oct 16 17:30:41 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACG
Thu Oct 16 17:30:42 2014 qr1hi-8i9sb-8wdaabnughiolpy 13541 0 stderr run-command: completed with exit code 0 (success)
[...]
A key feature of @arv-run@ is the ability to introspect the command line to determine which arguments are file inputs, and transform those paths so they are usable inside the Arvados container. In the above example, @HWI-ST1027_129_D0THKACXX.1_2.fastq@ is transformed into @/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq@. In the above example, @arv-run@ works together with @arv-mount@ to identify that the file is already part of an Arvados collection. In this case, it will use the existing collection without any upload step. If you specify a file that is only available on the local filesystem, @arv-run@ will upload a new collection and use that. h2. Parallel tasks @arv-run@ will parallelize over files listed on the command line after @--@.
HWI-ST1027_129_D0THKACXX.1_1.fastq  HWI-ST1027_129_D0THKACXX.1_2.fastq
$ arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC -- *.fastq
Running pipeline qr1hi-d1hrv-mg3bju0u7r6w241
[...]
Thu Oct 16 19:27:42 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 0 stderr run-command: parallelizing on input0 with items [u'/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq', u'/keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq']
[...]
Thu Oct 16 19:27:45 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq
Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 2 stderr run-command: grep -H -n ATTGGAGGAAAGATGAGTGAC /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq
[...]
Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:14:TCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCCCAACCTA
Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:18:AACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCT
Thu Oct 16 19:27:46 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_1.fastq:30:ATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACG
Thu Oct 16 19:27:47 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 1 stderr run-command: completed with exit code 0 (success)
Thu Oct 16 19:27:47 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 2 stderr /keep/3229739b505d2b878b62aed09895a55a+142/HWI-ST1027_129_D0THKACXX.1_2.fastq:34:CTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGACAGCATCAACTTCTCTCACAACCTAG
Thu Oct 16 19:27:47 2014 qr1hi-8i9sb-r0n6w78aq0knsoj 2331 2 stderr run-command: completed with exit code 0 (success)
You may use also stdin @<@ redirection on multiple files. This will create a separate task for each input file. Because the syntax is designed to mimic standard shell syntax, it is necessary to quote the metacharacters @<@, @>@ and @|@ as either @\<@, @\>@ and @\|@ or @'<'@, @'>'@ and @'|'@.
$ arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC \< *.fastq \> output.txt
You are only permitted to supply a single file name for stdout @>@ redirection. If there are multiple tasks, their output will be collated at the end of the pipeline. Alternately, you may use "run-command":run-command.html parameter substitution in the file name to generate different filenames for each task. Multiple commands connected by pipes all execute in the same container. If you need to capture intermediate results of a pipe, use the @tee@ command. @arv-run@ commands always run inside a Docker image. By default, this is "arvados/jobs". Use @arv --docker-image IMG@ to specify the image to use. Note: the Docker image must be uploaded to Arvados using @arv keep docker@. Use @arv-run --dry-run@ to print out the final Arvados pipeline generated by @arv-run@ without submitting it. By default, the pipeline will be submitted to your configured Arvado instance. Use @arv-run --local@ to run the command locally using "arv-crunch-job". You may specify @--batch-size N@ after the @--@ but before listing any files to specify how many files to provide put on the command line for each task. h1. Examples Run one @grep@ task per file, with each input files piped from stdin. Redirect the output to output.txt. Run @cat | grep@ once per file. Redirect the output to output.txt.
$ arv-run cat -- *.fastq \| grep -H -n ATTGGAGGAAAGATGAGTGAC \> output.txt
Run @bwa@ for pairs of fastq files in "inputs" using the reference human_g1k_v37.fasta.
arv-run --docker-image arvados/jobs-java-bwa-samtools bwa mem reference/human_g1k_v37.fasta -- --batch-size 2 inputs/*.fastq \> '$(task.uuid).sam'