---
layout: default
navsection: userguide
navmenu: Tutorials
title: "Running on an Arvados cluster"
...
This tutorial demonstrates how to create a pipeline to run your crunch script on an Arvados cluster. Cluster jobs can scale out to multiple nodes, and use @git@ and @docker@ to store the complete system snapshot required to achieve reproducibilty.
{% include 'tutorial_expectations' %}
This tutorial uses @$USER@ to denote your username. Replace @$USER@ with your user name in all the following examples.
h2. Setting up Git
All Crunch scripts are managed through the Git revision control system. Before you start using Git, you should do some basic configuration (you only need to do this the first time):
~$ git config --global user.name "Your Name"
~$ git config --global user.email $USER@example.com
git@git.{{ site.arvados_api_host }}:$USER.git
~$ cd $HOME # (or wherever you want to install)
~$ git clone git@git.{{ site.arvados_api_host }}:$USER.git
Cloning into '$USER'...
$ man gittutorial
or *"search Google for Git tutorials":http://google.com/#q=git+tutorial*.
{% include 'notebox_end' %}
h2. Creating a Crunch script
Start by entering the @$USER@ directory created by @git clone@. Next create a subdirectory called @crunch_scripts@ and change to that directory:
~$ cd $USER
~/$USER$ mkdir crunch_scripts
~/$USER$ cd crunch_scripts
~/$USER/crunch_scripts$ nano hash.py
Add the following code to compute the MD5 hash of each file in a collection (if you already completed "Writing a Crunch script":tutorial-firstscript.html you can just copy the @hash.py@ file you created previously.)
~/$USER/crunch_scripts$ chmod +x hash.py
Next, add the file to the staging area. This tells @git@ that the file should be included on the next commit.
notextile. ~/$USER/crunch_scripts$ git add hash.py
Next, commit your changes. All staged changes are recorded into the local git repository:
~/$USER/crunch_scripts$ git commit -m"my first script"
[master (root-commit) 27fd88b] my first script
1 file changed, 45 insertions(+)
create mode 100755 crunch_scripts/hash.py
~/$USER/crunch_scripts$ git push origin master
Counting objects: 4, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 682 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To git@git.qr1hi.arvadosapi.com:$USER.git
* [new branch] master -> master
~/$USER/crunch_scripts$ cd ~
~$ cat >the_pipeline <<EOF
{
"name":"My md5 pipeline",
"components":{
"do_hash":{
"script":"hash.py",
"script_parameters":{
"input":{
"required": true,
"dataclass": "Collection"
}
},
"repository":"$USER",
"script_version":"master",
"runtime_constraints":{
"docker_image":"arvados/jobs-java-bwa-samtools"
}
}
}
}
EOF
~$ arv pipeline_template create --pipeline-template "$(cat the_pipeline)"