X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/41c7c826a7e4c3a074a6ab5a719bf4c88e9a0e28..a66dcf3c878be422520771e5bde3791248dba001:/doc/user/tutorials/tutorial-firstscript.html.textile.liquid diff --git a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid index 4c49d19355..245e89066b 100644 --- a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid +++ b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid @@ -2,12 +2,9 @@ layout: default navsection: userguide navmenu: Tutorials -title: "Writing a Crunch script" - +title: "Writing a pipeline" ... -h1. Writing a Crunch script - In this tutorial, we will write the "hash" script demonstrated in the first tutorial. *This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html* @@ -25,7 +22,7 @@ First, you should do some basic configuration for git (you only need to do this ~$ git config --global user.email you@example.com -On the Arvados Workbench, navigate to _Compute %(rarr)→% Code repositories._ You should see two repositories, one named "arvados" (under the *name* column) and a second with your user name. Next to *name* is the column *push_url*. Copy the *push_url* cell associated with your repository. This should look like git@git.{{ site.arvados_api_host }}:you.git. +On the Arvados Workbench, navigate to "Compute %(rarr)→% Code repositories":http://{{site.arvados_workbench_host}}/repositories . You should see a repository with your user name listed in the *name* column. Next to *name* is the column *push_url*. Copy the *push_url* value associated with your repository. This should look like git@git.{{ site.arvados_api_host }}:you.git. Next, on the Arvados virtual machine, clone your git repository: @@ -60,12 +57,17 @@ notextile.
~/you/crunch_scripts$ nano hash.p
 
 Add the following code to compute the md5 hash of each file in a collection:
 
-
{% include 'tutorial_hash_script_py' %}
+ {% code 'tutorial_hash_script_py' as python %} Make the file executable: notextile.
~/you/crunch_scripts$ chmod +x hash.py
+{% include 'notebox_begin' %} +The steps below describe how to execute the script after committing changes to git. To run a script locally for testing, please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html . + +{% include 'notebox_end' %} + Next, add the file to @git@ staging. This tells @git@ that the file should be included on the next commit. notextile.
~/you/crunch_scripts$ git add hash.py
@@ -91,34 +93,47 @@ To git@git.qr1hi.arvadosapi.com:you.git * [new branch] master -> master
-You should now be able to run your script using Crunch, similar to how we did it in the "first tutorial.":tutorial-job1.html The field @"script_version"@ should be @you:master@ to tell Crunch to run the script at the head of the "master" git branch, which you just uploaded. +h2. Create a pipeline template + +Next, create a file that contains the pipeline definition: -
~/you/crunch_scripts$ cat >~/the_job <<EOF
-{
- "script": "hash.py",
- "script_version": "you:master",
- "script_parameters":
- {
-  "input": "c1bad4b39ca5a924e481008009d94e32+210"
- }
-}
-EOF
-~/you/crunch_scripts$ arv job create --job "$(cat ~/the_job)"
-{
- ...
- "uuid":"qr1hi-xxxxx-xxxxxxxxxxxxxxx"
- ...
-}
-~/you/crunch_scripts$ arv job get --uuid qr1hi-xxxxx-xxxxxxxxxxxxxxx
+
~/you/crunch_scripts$ cd ~
+~$ cat >the_pipeline <<EOF
 {
- ...
- "output":"880b55fb4470b148a447ff38cacdd952+54",
- ...
+  "name":"My first pipeline",
+  "components":{
+    "do_hash":{
+      "script":"hash.py",
+      "script_parameters":{
+        "input":{
+          "required": true,
+          "dataclass": "Collection"
+        }
+      },
+      "script_version":"you:master"
+    }
+  }
 }
-~/you/crunch_scripts$ arv keep get 880b55fb4470b148a447ff38cacdd952+54/md5sum.txt
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
+EOF
+
+ + +* @cat@ is a standard Unix utility that simply copies standard input to standard output +* @<the_pipeline@ redirects standard output to a file called @the_pipeline@ +* @"name"@ is a human-readable name for the pipeline +* @"components"@ is a set of scripts that make up the pipeline +* The component is listed with a human-readable name (@"do_hash"@ in this example) +* @"script"@ specifies the name of the script to run. The script is searched for in the "crunch_scripts/" subdirectory of the @git@ checkout specified by @"script_version"@. +* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit @git@ revision hash, or in the form "repository:branch" (in which case it will take the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run. You can access a list of available @git@ repositories on the Arvados workbench under "Compute %(rarr)→% Code repositories":http://{{site.arvados_workbench_host}}//repositories . +* @"script_parameters"@ describes the parameters for the script. In this example, there is one parameter called @input@ which is @required@ and is a @Collection@. + +Now, use @arv pipeline_template create@ tell Arvados about your pipeline template: + + +
~$ arv pipeline_template create --pipeline-template "$(cat the_pipeline)"
 
-Next, "debugging a crunch script.":tutorial-job-debug.html +Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html