5 title: "Writing a pipeline"
8 In this tutorial, we will write the "hash" script demonstrated in the first tutorial.
10 *This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
12 This tutorial uses *@you@* to denote your username. Replace *@you@* with your user name in all the following examples.
16 As discussed in the previous tutorial, all Crunch scripts are managed through the @git@ revision control system.
18 First, you should do some basic configuration for git (you only need to do this the first time):
21 <pre><code>~$ <span class="userinput">git config --global user.name "Your Name"</span>
22 ~$ <span class="userinput">git config --global user.email <b>you</b>@example.com</span></code></pre>
25 On the Arvados Workbench, navigate to "Compute %(rarr)→% Code repositories":http://{{site.arvados_workbench_host}}/repositories . You should see a repository with your user name listed in the *name* column. Next to *name* is the column *push_url*. Copy the *push_url* value associated with your repository. This should look like <notextile><code>git@git.{{ site.arvados_api_host }}:<b>you</b>.git</code></notextile>.
27 Next, on the Arvados virtual machine, clone your git repository:
30 <pre><code>~$ <span class="userinput">git clone git@git.{{ site.arvados_api_host }}:<b>you</b>.git</span>
31 Cloning into '<b>you</b>'...</code></pre>
34 This will create an git checkout in the directory called *@you@*.
36 {% include 'notebox_begin' %}
37 For more information about using @git@, try
39 notextile. <pre><code>$ <span class="userinput">man gittutorial</span></code></pre>
41 or <b>"click here to search Google for git tutorials":http://google.com/#q=git+tutorial</b>
42 {% include 'notebox_end' %}
44 h2. Creating a Crunch script
46 Start by entering the *@you@* directory created by @git clone@. Next create a subdirectory called @crunch_scripts@ and change to that directory:
49 <pre><code>~$ <span class="userinput">cd <b>you</b></span>
50 ~/<b>you</b>$ <span class="userinput">mkdir crunch_scripts</span>
51 ~/<b>you</b>$ <span class="userinput">cd crunch_scripts</span></code></pre>
54 Next, using @nano@ or your favorite Unix text editor, create a new file called @hash.py@ in the @crunch_scripts@ directory.
56 notextile. <pre>~/<b>you</b>/crunch_scripts$ <code class="userinput">nano hash.py</code></pre>
58 Add the following code to compute the md5 hash of each file in a collection:
60 <notextile> {% code 'tutorial_hash_script_py' as python %} </notextile>
62 Make the file executable:
64 notextile. <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x hash.py</span></code></pre>
66 {% include 'notebox_begin' %}
67 The steps below describe how to execute the script after committing changes to git. To run a script locally for testing, please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html .
69 {% include 'notebox_end' %}
71 Next, add the file to @git@ staging. This tells @git@ that the file should be included on the next commit.
73 notextile. <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">git add hash.py</span></code></pre>
75 Next, commit your changes to git. All staged changes are recorded into the local @git@ repository:
78 <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">git commit -m"my first script"</span>
79 [master (root-commit) 27fd88b] my first script
80 1 file changed, 33 insertions(+)
81 create mode 100755 crunch_scripts/hash.py</code></pre>
84 Finally, upload your changes to the Arvados server:
87 <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">git push origin master</span>
88 Counting objects: 4, done.
89 Compressing objects: 100% (2/2), done.
90 Writing objects: 100% (4/4), 682 bytes, done.
91 Total 4 (delta 0), reused 0 (delta 0)
92 To git@git.qr1hi.arvadosapi.com:you.git
93 * [new branch] master -> master</code></pre>
96 h2. Create a pipeline template
98 Next, create a file that contains the pipeline definition:
101 <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cd ~</span>
102 ~$ <span class="userinput">cat >the_pipeline <<EOF
104 "name":"My first pipeline",
108 "script_parameters":{
111 "dataclass": "Collection"
114 "script_version":"<b>you</b>:master"
122 * @cat@ is a standard Unix utility that simply copies standard input to standard output
123 * @<<EOF@ tells the shell to direct the following lines into the standard input for @cat@ up until it sees the line @EOF@
124 * @>the_pipeline@ redirects standard output to a file called @the_pipeline@
125 * @"name"@ is a human-readable name for the pipeline
126 * @"components"@ is a set of scripts that make up the pipeline
127 * The component is listed with a human-readable name (@"do_hash"@ in this example)
128 * @"script"@ specifies the name of the script to run. The script is searched for in the "crunch_scripts/" subdirectory of the @git@ checkout specified by @"script_version"@.
129 * @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit @git@ revision hash, or in the form "repository:branch" (in which case it will take the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run. You can access a list of available @git@ repositories on the Arvados workbench under "Compute %(rarr)→% Code repositories":http://{{site.arvados_workbench_host}}//repositories .
130 * @"script_parameters"@ describes the parameters for the script. In this example, there is one parameter called @input@ which is @required@ and is a @Collection@.
132 Now, use @arv pipeline_template create@ tell Arvados about your pipeline template:
135 <pre><code>~$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span>
139 Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html