The tutorial encourages users to copy and paste command text.
However, most of the JSON needs to have a user-specific value for
"repository", which is easy to overlook. Since we're telling users to
save JSON with @cat@, we can take advantage of shell variable
expansion to make the right thing happen automatically.
I added accompanying notes to help explain what's going on for people
who aren't copying instructions so literally.
Conflicts:
doc/user/tutorials/tutorial-firstscript.html.textile.liquid
doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid
"script_parameters":{
"input": "887cd41e9c613463eab2f0d885c6dd96+83"
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master"
},
"filter":{
"output_of":"do_hash"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master"
}
}
~$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span></code></pre>
</notextile>
+(Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
You can run this pipeline from the command line using @arv pipeline run@, filling in the UUID that you received from @arv pipeline_template create@:
<notextile>
notextile. <pre>~/<b>you</b>/crunch_scripts$ <code class="userinput">nano parallel-hash.py</code></pre>
-Add the following code to compute the md5 hash of each file in a
+Add the following code to compute the md5 hash of each file in a
<notextile> {% code 'parallel_hash_script_py' as python %} </notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
{
"script": "parallel-hash.py",
- "repository": "<b>you</b>",
+ "repository": "$USER",
"script_version": "master",
"script_parameters":
{
</code></pre>
</notextile>
+(Your shell should automatically fill in @$USER@ with your login name. The job JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
Because the job ran in parallel, each instance of parallel-hash creates a separate @md5sum.txt@ as output. Arvados automatically collates theses files into a single collection, which is the output of the job:
<notextile>
"dataclass": "Collection"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master"
}
}
</code></pre>
</notextile>
+(Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
Your new pipeline template will appear on the Workbench "Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using Workbench":tutorial-pipeline-workbench.html.
"dataclass": "Collection"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master",
"output_is_persistent":true
}
* @"name"@ is a human-readable name for the pipeline.
* @"components"@ is a set of scripts that make up the pipeline.
* The component is listed with a human-readable name (@"do_hash"@ in this example).
-* @"repository"@ is the name of a git repository to search for the script version. You can access a list of available git repositories on the Arvados Workbench under "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}/repositories.
+* @"repository"@ is the name of a git repository to search for the script version. You can access a list of available git repositories on the Arvados Workbench under "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}/repositories. Your shell should automatically fill in @$USER@ with your login name, so that the final JSON has @"repository"@ pointed at your personal git repository.
* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit git revision hash, a tag, or a branch (in which case it will use the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
* @"script"@ specifies the filename of the script to run. Crunch expects to find this in the @crunch_scripts/@ subdirectory of the git repository.
* @"script_parameters"@ describes the parameters for the script. In this example, there is one parameter called @input@ which is @required@ and is a @Collection@.
"dataclass": "Collection"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master",
"output_is_persistent":false
},
"output_of":"do_hash"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master",
"output_is_persistent":true
}
* @"output_of"@ indicates that the @output@ of the @do_hash@ component is connected to the @"input"@ of @do_filter@. This is a _dependency_. Arvados uses the dependencies between jobs to automatically determine the correct order to run the jobs.
+(Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
Now, use @arv pipeline_template create@ to register your pipeline template in Arvados:
<notextile>