X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/2183113c4c357e07719251854e3d249cdcd394dd..91e4be41d77d023a06d53879b974ecf5d3b82d94:/doc/user/tutorials/tutorial-new-pipeline.textile diff --git a/doc/user/tutorials/tutorial-new-pipeline.textile b/doc/user/tutorials/tutorial-new-pipeline.textile index 6dbafbc627..3eb9796267 100644 --- a/doc/user/tutorials/tutorial-new-pipeline.textile +++ b/doc/user/tutorials/tutorial-new-pipeline.textile @@ -1,15 +1,16 @@ --- layout: default navsection: userguide -title: "Construct a pipeline" -navorder: 115 ---- +navmenu: Tutorials +title: "Constructing a Crunch pipeline" +navorder: 15 +... -h1. Tutorial: Construct a pipeline +h1. Tutorial: Constructing a Crunch pipeline -A pipeline in Arvados is a sequence of crunch scripts, in which the output from the previous script is fed in as the input to the next script. +A pipeline in Arvados is a collection of crunch scripts, in which the output from one script may be used as the input to another script. -*This tutorial assumes that you are "logged into an Arvados VM instance":ssh-access.html#login, and have a "working environment.":check-environment.html* +*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.basedoc}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.basedoc}}/user/getting_started/check-environment.html* h2. Create a new script @@ -61,9 +62,7 @@ EOF * @"components"@ is a set of scripts that make up the pipeline * Each component is listed with a human-readable name (@"do_hash"@ and @"filter"@ in this example) * Each item in @"components"@ is a single Arvados job, and uses the same format that we saw previously with @arv job create@ -* @"output_of"@ indicates that the @"input"@ of @"filter"@ is the @"output"@ of the @"do_hash"@ component - -The @"output_of"@ specifies a _dependency_. Arvados uses the dependencies between jobs to automatically determine the correct order to run the jobs. +* @"output_of"@ indicates that the @"input"@ of @"filter"@ is the @"output"@ of the @"do_hash"@ component. This is a _dependency_. Arvados uses the dependencies between jobs to automatically determine the correct order to run the jobs. Now, use @arv pipeline_template create@ tell Arvados about your pipeline template: @@ -100,8 +99,7 @@ Arvados adds each pipeline component to the job queue as its dependencies are sa The Keep locators of the output of each of @"do_hash"@ and @"filter"@ component are available from the output log shown above. The output is also available on the Workbench by navigating to %(rarr)→% Compute %(rarr)→% Pipeline instances %(rarr)→% pipeline uuid under the *id* column %(rarr)→% components. -

-$ arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162+K@qr1hi/md5sum.txt
+
$ arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162+K@qr1hi/md5sum.txt
 0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
 504938460ef369cd275e4ef58994cffe bob.txt
 8f3b36aff310e06f3c5b9e95678ff77a carol.txt
@@ -117,8 +115,7 @@ h3. Running a pipeline with different parameters
 Notice that the pipeline definition explicitly specifies the Keep locator for the input:
 
 
-

-...
+
...
     "do_hash":{
       "script_parameters":{
         "input": "887cd41e9c613463eab2f0d885c6dd96+83"
@@ -130,17 +127,32 @@ Notice that the pipeline definition explicitly specifies the Keep locator for th
 
 What if we want to run the pipeline on a different input block?  One option is to define a new pipeline template, but would potentially result in clutter with many pipeline templates defined for one-off jobs.  Instead, you can override values in the input of the component like this:
 
-

-$ arv pipeline run --template qr1hi-p5p6p-uf9gi9nolgakm85 do_hash::input=33a9f3842b01ea3fdf27cc582f5ea2af
+
+
$ arv pipeline run --template qr1hi-d1hrv-vxzkp38nlde9yyr do_hash::input=33a9f3842b01ea3fdf27cc582f5ea2af
+2013-12-17 20:31:24 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 queued 2013-12-17T20:31:24Z
+filter  -                           -
+2013-12-17 20:31:34 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 {:done=>1, :running=>1, :failed=>0, :todo=>0}
+filter  -                           -
+2013-12-17 20:31:44 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 {:done=>1, :running=>1, :failed=>0, :todo=>0}
+filter  -                           -
+2013-12-17 20:31:55 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54+K@qr1hi
+filter  qr1hi-8i9sb-j347g1sqovdh0op queued 2013-12-17T20:31:55Z
+2013-12-17 20:32:05 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54+K@qr1hi
+filter  qr1hi-8i9sb-j347g1sqovdh0op fb728f0ffe152058fa64b9aeed344cb5+54
 
+Now check the output: + -

-$ arv keep get 880b55fb4470b148a447ff38cacdd952+54+K@qr1hi/md5sum.txt
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-$ arv keep get fb728f0ffe152058fa64b9aeed344cb5+54
+
$ arv keep ls -s fb728f0ffe152058fa64b9aeed344cb5+54
+0 0-filter.txt
 
-Since the hash of @var-GS000016015-ASM.tsv.bz2@ does not start with 0, the filter script has no output in this pipeline instance. +Here the filter script output is empty, so none of the files in the collection have hash code that start with 0.