4 title: "Using Crunch to run external programs"
7 This tutorial demonstrates how to use Crunch to run an external program by writting a wrapper using the Python SDK.
9 *This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
11 In this tutorial, you will use the external program @md5sum@ to compute hashes instead of the built-in Python library used in earlier tutorials.
13 Start by entering the @crunch_scripts@ directory of your Git working tree:
16 <pre><code>~$ <span class="userinput">cd <b>you</b>/crunch_scripts</span>
20 Next, using @nano@ or your favorite Unix text editor, create a new file called @run-md5sum.py@ in the @crunch_scripts@ directory.
22 notextile. <pre>~/<b>you</b>/crunch_scripts$ <code class="userinput">nano run-md5sum.py</code></pre>
24 Add the following code to use the @md5sum@ program to compute the hash of each file in a collection:
26 <notextile> {% code 'run_md5sum_py' as python %} </notextile>
28 Make the file executable:
30 notextile. <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x run-md5sum.py</span></code></pre>
32 Next, use Git to stage the file, commit, and push:
35 <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">git add run-md5sum.py</span>
36 ~/<b>you</b>/crunch_scripts$ <span class="userinput">git commit -m"run external md5sum program"</span>
37 ~/<b>you</b>/crunch_scripts$ <span class="userinput">git push origin master</span>
41 You should now be able to run your new script using Crunch, with @"script"@ referring to our new @run-md5sum.py@ script.
44 <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_pipeline <<EOF
46 "name":"Run external md5sum program",
49 "script":"run-md5sum.py",
53 "dataclass": "Collection"
57 "script_version":"master"
62 </span>~/<b>you</b>/crunch_scripts$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat ~/the_pipeline)"</span>
66 (Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal Git repository.)
68 Your new pipeline template will appear on the Workbench "Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using Workbench":tutorial-pipeline-workbench.html.