---
layout: default
navsection: userguide
navmenu: Tutorials
title: "Writing a Crunch script"
...
This tutorial demonstrates how to create a new Arvados pipeline using the Arvados Python SDK. The Arvados SDK supports access to advanced features not available using the @run-command@ wrapper, such as scheduling parallel tasks across nodes.
{% include 'tutorial_expectations' %}
This tutorial uses @$USER@ to denote your username. Replace @$USER@ with your user name in all the following examples.
h2. Setting up Git
All Crunch scripts are managed through the Git revision control system. Before you start using Git, you should do some basic configuration (you only need to do this the first time):
~$ git config --global user.name "Your Name"
~$ git config --global user.email $USER@example.com
git@git.{{ site.arvados_api_host }}:$USER.git
~$ cd $HOME # (or wherever you want to install)
~$ git clone git@git.{{ site.arvados_api_host }}:$USER.git
Cloning into '$USER'...
$ man gittutorial
or *"search Google for Git tutorials":http://google.com/#q=git+tutorial*.
{% include 'notebox_end' %}
h2. Creating a Crunch script
Start by entering the @$USER@ directory created by @git clone@. Next create a subdirectory called @crunch_scripts@ and change to that directory:
~$ cd $USER
~/$USER$ mkdir crunch_scripts
~/$USER$ cd crunch_scripts
~/$USER/crunch_scripts$ nano hash.py
Add the following code to compute the MD5 hash of each file in a collection:
~/$USER/crunch_scripts$ chmod +x hash.py
{% include 'notebox_begin' %}
The steps below describe how to execute the script after committing changes to Git. To run a single script locally for testing (bypassing the job queue) please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html.
{% include 'notebox_end' %}
Next, add the file to the staging area. This tells @git@ that the file should be included on the next commit.
notextile. ~/$USER/crunch_scripts$ git add hash.py
Next, commit your changes. All staged changes are recorded into the local git repository:
~/$USER/crunch_scripts$ git commit -m"my first script"
[master (root-commit) 27fd88b] my first script
1 file changed, 45 insertions(+)
create mode 100755 crunch_scripts/hash.py
~/$USER/crunch_scripts$ git push origin master
Counting objects: 4, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 682 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To git@git.qr1hi.arvadosapi.com:$USER.git
* [new branch] master -> master
~/$USER/crunch_scripts$ cd ~
~$ cat >the_pipeline <<EOF
{
"name":"My md5 pipeline",
"components":{
"do_hash":{
"script":"hash.py",
"script_parameters":{
"input":{
"required": true,
"dataclass": "Collection"
}
},
"repository":"$USER",
"script_version":"master",
"output_is_persistent":true,
"runtime_constraints":{
"docker_image":"arvados/jobs-java-bwa-samtools"
}
}
}
}
EOF
~$ arv pipeline_template create --pipeline-template "$(cat the_pipeline)"