A pipeline in Arvados is a collection of crunch scripts, in which the output from one script may be used as the input to another script.
-*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
+{% include 'tutorial_expectations' %}
+
+This tutorial uses *@you@* to denote your username. Replace *@you@* with your user name in all the following examples.
h2. Create a new script
-Our second script will filter the output of @hash.py@ and only include hashes that start with 0. Create a new script in @crunch_scripts/@ called @0-filter.py@:
+Our second script will filter the output of @hash.py@ and only include hashes that start with 0. Create a new script in <notextile><code>~/<b>you</b>/crunch_scripts/</code></notextile> called @0-filter.py@:
<notextile> {% code '0_filter_py' as python %} </notextile>
-Now add it to git:
+Now add it to your repository:
<notextile>
-<pre><code>$ <span class="userinput">git add 0-filter.py</span>
-$ <span class="userinput">git commit -m"zero filter"</span>
-$ <span class="userinput">git push origin master</span>
+<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x 0-filter.py</span>
+~/<b>you</b>/crunch_scripts$ <span class="userinput">git add 0-filter.py</span>
+~/<b>you</b>/crunch_scripts$ <span class="userinput">git commit -m"zero filter"</span>
+~/<b>you</b>/crunch_scripts$ <span class="userinput">git push origin master</span>
</code></pre>
</notextile>
Next, create a file that contains the pipeline definition:
<notextile>
-<pre><code>$ <span class="userinput">cat >the_pipeline <<EOF
+<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_pipeline <<EOF
{
- "name":"Filter md5 hash values",
+ "name":"Filter MD5 hash values",
"components":{
"do_hash":{
"script":"hash.py",
"dataclass": "Collection"
}
},
- "script_version":"you:master"
+ "repository":"$USER",
+ "script_version":"master",
+ "output_is_persistent":false
},
- "filter":{
+ "do_filter":{
"script":"0-filter.py",
"script_parameters":{
"input":{
"output_of":"do_hash"
}
},
- "script_version":"you:master"
+ "repository":"$USER",
+ "script_version":"master",
+ "output_is_persistent":true
}
}
}
</span></code></pre>
</notextile>
-* @"output_of"@ indicates that the @input@ of the @do_hash@ component is connected to the @output@ of @filter@. This is a _dependency_. Arvados uses the dependencies between jobs to automatically determine the correct order to run the jobs.
+* @"output_of"@ indicates that the @output@ of the @do_hash@ component should be used as the @"input"@ of @do_filter@. Arvados uses these dependencies between jobs to automatically determine the correct order to run them.
+
+(Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal Git repository.)
-Now, use @arv pipeline_template create@ tell Arvados about your pipeline template:
+Now, use @arv pipeline_template create@ to register your pipeline template in Arvados:
<notextile>
-<pre><code>$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span>
+<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat ~/the_pipeline)"</span>
</code></pre>
</notextile>
-Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_instances page.
+Your new pipeline template will appear on the Workbench "Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates page.
+For more information and examples for writing pipelines, see the "pipeline template reference":{{site.baseurl}}/api/schema/PipelineTemplate.html