X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/7024cc159936593350aaf7939d700102f6510787..224f384d411bb1b4cccc7165c55bb64fd5c695ad:/doc/user/tutorials/tutorial-firstscript.html.textile.liquid diff --git a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid index 41f8a84c3b..d4caafef5c 100644 --- a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid +++ b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid @@ -3,122 +3,105 @@ layout: default navsection: userguide navmenu: Tutorials title: "Writing a Crunch script" - ... -h1. Writing a Crunch script - -In this tutorial, we will write the "hash" script demonstrated in the first tutorial. - -*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.basedoc}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.basedoc}}/user/getting_started/check-environment.html* - -This tutorial uses *@you@* to denote your username. Replace *@you@* with your user name in all the following examples. - -h2. Setting up Git - -As discussed in the previous tutorial, all Crunch scripts are managed through the @git@ revision control system. - -First, you should do some basic configuration for git (you only need to do this the first time): - - -
~$ git config --global user.name "Your Name"
-~$ git config --global user.email you@example.com
-
- -On the Arvados Workbench, navigate to _Compute %(rarr)→% Code repositories._ You should see two repositories, one named "arvados" (under the *name* column) and a second with your user name. Next to *name* is the column *push_url*. Copy the *push_url* cell associated with your repository. This should look like git@git.{{ site.arvados_api_host }}:you.git. - -Next, on the Arvados virtual machine, clone your git repository: - - -
~$ git clone git@git.{{ site.arvados_api_host }}:you.git
-Cloning into 'you'...
-
- -This will create an git checkout in the directory called *@you@*. - -{% include 'notebox_begin' %} -For more information about using @git@, try +{% include 'pipeline_deprecation_notice' %} -notextile.
$ man gittutorial
+This tutorial demonstrates how to write a script using Arvados Python SDK. The Arvados SDK supports access to advanced features not available using the @run-command@ wrapper, such as scheduling concurrent tasks across nodes. -or "click here to search Google for git tutorials":http://google.com/#q=git+tutorial -{% include 'notebox_end' %} +{% include 'tutorial_expectations' %} -h2. Creating a Crunch script +This tutorial uses @$USER@ to denote your username. Replace @$USER@ with your user name in all the following examples. -Start by entering the *@you@* directory created by @git clone@. Next create a subdirectory called @crunch_scripts@ and change to that directory: +Start by creating a directory called @tutorial@ in your home directory. Next, create a subdirectory called @crunch_scripts@ and change to that directory: -
~$ cd you
-~/you$ mkdir crunch_scripts
-~/you$ cd crunch_scripts
+
~$ cd $HOME
+~$ mkdir -p tutorial/crunch_scripts
+~$ cd tutorial/crunch_scripts
Next, using @nano@ or your favorite Unix text editor, create a new file called @hash.py@ in the @crunch_scripts@ directory. -notextile.
~/you/crunch_scripts$ nano hash.py
+notextile.
~/tutorial/crunch_scripts$ nano hash.py
-Add the following code to compute the md5 hash of each file in a collection: +Add the following code to compute the MD5 hash of each file in a collection: -
{% include 'tutorial_hash_script_py' %}
+ {% code 'tutorial_hash_script_py' as python %} Make the file executable: -notextile.
~/you/crunch_scripts$ chmod +x hash.py
+notextile.
~/tutorial/crunch_scripts$ chmod +x hash.py
-Next, add the file to @git@ staging. This tells @git@ that the file should be included on the next commit. - -notextile.
~/you/crunch_scripts$ git add hash.py
- -Next, commit your changes to git. All staged changes are recorded into the local @git@ repository: +Next, create a submission job record. This describes a specific invocation of your script: -
~/you/crunch_scripts$ git commit -m"my first script"
-[master (root-commit) 27fd88b] my first script
- 1 file changed, 33 insertions(+)
- create mode 100755 crunch_scripts/hash.py
+
~/tutorial/crunch_scripts$ cat >~/the_job <<EOF
+{
+ "repository":"",
+ "script":"hash.py",
+ "script_version":"$HOME/tutorial",
+ "script_parameters":{
+   "input":"c1bad4b39ca5a924e481008009d94e32+210"
+ }
+}
+EOF
+
-Finally, upload your changes to the Arvados server: +You can now run your script on your local workstation or VM using @arv-crunch-job@: -
~/you/crunch_scripts$ git push origin master
-Counting objects: 4, done.
-Compressing objects: 100% (2/2), done.
-Writing objects: 100% (4/4), 682 bytes, done.
-Total 4 (delta 0), reused 0 (delta 0)
-To git@git.qr1hi.arvadosapi.com:you.git
- * [new branch]      master -> master
+
~/tutorial/crunch_scripts$ arv-crunch-job --job "$(cat ~/the_job)"
+2014-08-06_15:16:22 qr1hi-8i9sb-qyrat80ef927lam 14473  check slurm allocation
+2014-08-06_15:16:22 qr1hi-8i9sb-qyrat80ef927lam 14473  node localhost - 1 slots
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  start
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  script hash.py
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  script_version $HOME/tutorial
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  script_parameters {"input":"c1bad4b39ca5a924e481008009d94e32+210"}
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  runtime_constraints {"max_tasks_per_node":0}
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  start level 0
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 0 done, 0 running, 1 todo
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473 0 job_task qr1hi-ot0gb-lptn85mwkrn9pqo
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473 0 child 14478 started on localhost.1
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 0 done, 1 running, 0 todo
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 stderr crunchstat: Running [stdbuf --output=0 --error=0 /home/$USER/tutorial/crunch_scripts/hash.py]
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 child 14478 on localhost.1 exit 0 signal 0 success=true
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 success in 1 seconds
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 output
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  wait for last 0 children to finish
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 1 done, 0 running, 1 todo
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  start level 1
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 1 done, 0 running, 1 todo
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473 1 job_task qr1hi-ot0gb-e3obm0lv6k6p56a
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473 1 child 14504 started on localhost.1
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 1 done, 1 running, 0 todo
+2014-08-06_15:16:26 qr1hi-8i9sb-qyrat80ef927lam 14473 1 stderr crunchstat: Running [stdbuf --output=0 --error=0 /home/$USER/tutorial/crunch_scripts/hash.py]
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 1 child 14504 on localhost.1 exit 0 signal 0 success=true
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 1 success in 10 seconds
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 1 output 8c20281b9840f624a486e4f1a78a1da8+105+A234be74ceb5ea31db6e11b6be26f3eb76d288ad0@54987018
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  wait for last 0 children to finish
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 2 done, 0 running, 0 todo
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  release job allocation
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  Freeze not implemented
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  collate
+2014-08-06_15:16:36 qr1hi-8i9sb-qyrat80ef927lam 14473  collated output manifest text to send to API server is 105 bytes with access tokens
+2014-08-06_15:16:36 qr1hi-8i9sb-qyrat80ef927lam 14473  output hash c1b44b6dc41ef334cf1136033ca950e6+54
+2014-08-06_15:16:37 qr1hi-8i9sb-qyrat80ef927lam 14473  finish
+2014-08-06_15:16:38 qr1hi-8i9sb-qyrat80ef927lam 14473  log manifest is 7fe8cf1d45d438a3ca3ac4a184b7aff4+83
+
-You should now be able to run your script using Crunch, similar to how we did it in the "first tutorial.":tutorial-job1.html The field @"script_version"@ should be @you:master@ to tell Crunch to run the script at the head of the "master" git branch, which you just uploaded. +Although the job runs locally, the output of the job has been saved to Keep, the Arvados file store. The "output hash" line (third from the bottom) provides the portable data hash of the Arvados collection where the script's output has been saved. Copy the output hash and use @arv-ls@ to list the contents of your output collection, and @arv-get@ to download it to the current directory: -
~/you/crunch_scripts$ cat >~/the_job <<EOF
-{
- "script": "hash.py",
- "script_version": "you:master",
- "script_parameters":
- {
-  "input": "c1bad4b39ca5a924e481008009d94e32+210"
- }
-}
-EOF
-~/you/crunch_scripts$ arv job create --job "$(cat ~/the_job)"
-{
- ...
- "uuid":"qr1hi-xxxxx-xxxxxxxxxxxxxxx"
- ...
-}
-~/you/crunch_scripts$ arv job get --uuid qr1hi-xxxxx-xxxxxxxxxxxxxxx
-{
- ...
- "output":"880b55fb4470b148a447ff38cacdd952+54",
- ...
-}
-~/you/crunch_scripts$ arv keep get 880b55fb4470b148a447ff38cacdd952+54/md5sum.txt
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
+
~/tutorial/crunch_scripts$ arv-ls c1b44b6dc41ef334cf1136033ca950e6+54
+./md5sum.txt
+~/tutorial/crunch_scripts$ arv-get c1b44b6dc41ef334cf1136033ca950e6+54/ .
+0 MiB / 0 MiB 100.0%
+~/tutorial/crunch_scripts$ cat md5sum.txt
+44b8ae3fde7a8a88d2f7ebd237625b4f c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2
 
-Next, "debugging a crunch script.":tutorial-job-debug.html +Running locally is convenient for development and debugging, as it permits a fast iterative development cycle. Your job run is also recorded by Arvados, and will appear in the *Recent jobs and pipelines* panel on the "Workbench Dashboard":{{site.arvados_workbench_host}}. This provides limited provenance, by recording the input parameters, the execution log, and the output. However, running locally does not allow you to scale out to multiple nodes, and does not store the complete system snapshot required to achieve reproducibility; to do that you need to "submit a job to the Arvados cluster":{{site.baseurl}}/user/tutorials/tutorial-submit-job.html.