import hashlib
import arvados
-# Jobs consist of one of more tasks. A task is a single invocation of a crunch script.
+# Jobs consist of one of more tasks. A task is a single invocation of
+# a crunch script.
# Get the current task
this_task = arvados.current_task()
# The task sequence was not 0, so it must be a parallel worker task
# created by the first task
- # Instead of getting "input" from the "script_parameters" field of the job object,
- # we get it from the "parameters" field of the task object
+ # Instead of getting "input" from the "script_parameters" field of
+ # the job object, we get it from the "parameters" field of the
+ # task object
this_task_input = this_task['parameters']['input']
collection = arvados.CollectionReader(this_task_input)
out = arvados.CollectionWriter()
out.set_current_file_name("md5sum.txt")
- # There should only be one file in the collection, so get the first one.
- # collection.all_files() returns an iterator so we need to make in into a list
- # for indexed access.
+ # There should only be one file in the collection, so get the
+ # first one. collection.all_files() returns an iterator so we
+ # need to make it into a list for indexed access.
input_file = list(collection.all_files())[0]
# Everything after this is the same as the first tutorial.
notextile. <pre><code>$ <span class="userinput">man gittutoral</span></code></pre>
-or "click here search Google for git tutorials":http://google.com/#q=git+tutorial
+or "click here to search Google for git tutorials":http://google.com/#q=git+tutorial
{% include notebox-end.html %}
h2. Creating a Crunch script
h2. Using arv-crunch-job to run the job in your VM
-Instead of a git commit hash, we provide the path to the directory in the "script_version" parameter. The script specified in "script" will actually be searched for in the "crunch_scripts/" subdirectory of the directory specified "script_version". Although we are running the script locally, the script still requires access to the Arvados API server and Keep storage service, and the job will be recorded on then Arvados Workbench job history.
+Instead of a git commit hash, we provide the path to the directory in the "script_version" parameter. The script specified in "script" will actually be searched for in the "crunch_scripts/" subdirectory of the directory specified "script_version". Although we are running the script locally, the script still requires access to the Arvados API server and Keep storage service. The job will be recorded in the Arvados job history, and visible in Workbench.
<notextile>
<pre><code>$ <span class="userinput">cat >the_job <<EOF
bc. 2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr hello world
-The script's printout is captured in the log, which is useful for print statement debugging. However, although it the script returned a status code of 0 (success), the job failed. Why? For a job to complete successfully scripts must explicitly add their output to Keep, and then tell Arvados about it. Here is a second try:
+The script's output is captured in the log, which is useful for print statement debugging. However, although this script returned a status code of 0 (success), the job failed. Why? For a job to complete successfully scripts must explicitly add their output to Keep, and then tell Arvados about it. Here is a second try:
<notextile>
<pre><code>$ <span class="userinput">cat >hello-world.py <<EOF
h1. Tutorial: Search PGP data by trait
-Here you will use the Python SDK to find public WGS data for people who have a certain medical condition.
+Here you will use the Python SDK to find public WGS data for people who have reported a certain medical condition.
<!-- _Define WGS_ -->
<!-- _Explain the motivation in this example a little better. If I'm
reading this right, the workflow is
-traits -> people with those traits -> presense of a specific genetic
+traits -> people with those traits -> presence of a specific genetic
variant in the people with the reported traits_ -->
<!-- _Rather than having the user do this through the Python command line,