--- layout: default navsection: userguide title: "Using Crunch to run external programs" ... This tutorial demonstrates how to use Crunch to run an external program by writting a wrapper using the Python SDK. *This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html* In this tutorial, you will use the external program @md5sum@ to compute hashes instead of the built-in Python library used in earlier tutorials. Start by entering the @crunch_scripts@ directory of your git repository:
~$ cd you/crunch_scripts
Next, using @nano@ or your favorite Unix text editor, create a new file called @run-md5sum.py@ in the @crunch_scripts@ directory. notextile.
~/you/crunch_scripts$ nano run-md5sum.py
Add the following code to use the @md5sum@ program to compute the hash of each file in a collection: {% code 'run_md5sum_py' as python %} Make the file executable: notextile.
~/you/crunch_scripts$ chmod +x run-md5sum.py
Next, add the file to @git@ staging, commit and push:
~/you/crunch_scripts$ git add run-md5sum.py
~/you/crunch_scripts$ git commit -m"run external md5sum program"
~/you/crunch_scripts$ git push origin master
You should now be able to run your new script using Crunch, with "script" referring to our new "run-md5sum.py" script.
~/you/crunch_scripts$ cat >~/the_pipeline <<EOF
{
  "name":"Run external md5sum program",
  "components":{
    "do_hash":{
      "script":"run-md5sum.py",
      "script_parameters":{
        "input":{
          "required": true,
          "dataclass": "Collection"
        }
      },
      "script_version":"you:master"
    }
  }
}
EOF
~/you/crunch_scripts$ arv pipeline_template create --pipeline-template "$(cat ~/the_pipeline)"
Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html