X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/7791c7e1b09341ce1fed131c6b11c91da8217c3f..30b02581c938c05b804f7510a1fc8e850830b9cd:/doc/user/topics/tutorial-parallel.html.textile.liquid?ds=sidebyside diff --git a/doc/user/topics/tutorial-parallel.html.textile.liquid b/doc/user/topics/tutorial-parallel.html.textile.liquid index 23b7cfbecd..021d736385 100644 --- a/doc/user/topics/tutorial-parallel.html.textile.liquid +++ b/doc/user/topics/tutorial-parallel.html.textile.liquid @@ -1,14 +1,10 @@ --- layout: default navsection: userguide -navmenu: Tutorials title: "Parallel Crunch tasks" - ... -h1. Parallel Crunch tasks - -In the tutorial "writing a crunch script,":tutorial-firstscript.html our script used a "for" loop to compute the md5 hashes for each file in sequence. This approach, while simple, is not able to take advantage of the compute cluster with multiple nodes and cores to speed up computation by running tasks in parallel. This tutorial will demonstrate how to create parallel Crunch tasks. +In the previous tutorials, we used @arvados.job_setup.one_task_per_input_file()@ to automatically parallelize our jobs by creating a separate task per file. For some types of jobs, you may need to split the work up differently, for example creating tasks to process different segments of a single large file. In this this tutorial will demonstrate how to create Crunch tasks directly. Start by entering the @crunch_scripts@ directory of your git repository: @@ -21,7 +17,7 @@ Next, using @nano@ or your favorite Unix text editor, create a new file called @ notextile.
~/you/crunch_scripts$ nano parallel-hash.py
-Add the following code to compute the md5 hash of each file in a
+Add the following code to compute the md5 hash of each file in a collection:
~/you/crunch_scripts$ cat >~/the_job <<EOF
{
"script": "parallel-hash.py",
- "script_version": "you:master",
+ "repository": "$USER",
+ "script_version": "master",
"script_parameters":
{
"input": "887cd41e9c613463eab2f0d885c6dd96+83"
@@ -66,13 +63,13 @@ EOF
~/you/crunch_scripts$ arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162
-md5sum.txt
-md5sum.txt
-md5sum.txt
+~/you/crunch_scripts$ arv keep ls e2ccd204bca37c77c0ba59fc470cd0f7+162
+./md5sum.txt
~/you/crunch_scripts$ arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162/md5sum.txt
0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
504938460ef369cd275e4ef58994cffe bob.txt
@@ -80,9 +77,4 @@ md5sum.txt