X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/88b5b089f38a587292bb68772251b8275c0e27c7..77aeb6fdedab3d2aac25120e5a99317155c9f26e:/doc/user/topics/tutorial-parallel.html.textile.liquid
diff --git a/doc/user/topics/tutorial-parallel.html.textile.liquid b/doc/user/topics/tutorial-parallel.html.textile.liquid
index 2e4cf7832e..6d0058b5e9 100644
--- a/doc/user/topics/tutorial-parallel.html.textile.liquid
+++ b/doc/user/topics/tutorial-parallel.html.textile.liquid
@@ -1,59 +1,60 @@
---
layout: default
navsection: userguide
-title: "Parallel Crunch tasks"
+title: "Concurrent Crunch tasks"
...
-In the tutorial "writing a crunch script,":tutorial-firstscript.html our script used a "for" loop to compute the md5 hashes for each file in sequence. This approach, while simple, is not able to take advantage of the compute cluster with multiple nodes and cores to speed up computation by running tasks in parallel. This tutorial will demonstrate how to create parallel Crunch tasks.
+In the previous tutorials, we used @arvados.job_setup.one_task_per_input_file()@ to automatically create concurrent jobs by creating a separate task per file. For some types of jobs, you may need to split the work up differently, for example creating tasks to process different segments of a single large file. This tutorial will demonstrate how to create Crunch tasks directly.
-Start by entering the @crunch_scripts@ directory of your git repository:
+Start by entering the @crunch_scripts@ directory of your Git repository:
~$ cd you/crunch_scripts
+
~$ cd $USER/crunch_scripts
~/you/crunch_scripts$ nano parallel-hash.py
+notextile. ~/$USER/crunch_scripts$ nano concurrent-hash.py
-Add the following code to compute the md5 hash of each file in a
+Add the following code to compute the MD5 hash of each file in a collection:
-~/you/crunch_scripts$ chmod +x parallel-hash.py
+notextile. ~/$USER/crunch_scripts$ chmod +x concurrent-hash.py
-Next, add the file to @git@ staging, commit and push:
+Add the file to the Git staging area, commit, and push:
~/you/crunch_scripts$ git add parallel-hash.py
-~/you/crunch_scripts$ git commit -m"parallel hash"
-~/you/crunch_scripts$ git push origin master
+~/$USER/crunch_scripts$ git add concurrent-hash.py
+~/$USER/crunch_scripts$ git commit -m"concurrent hash"
+~/$USER/crunch_scripts$ git push origin master
~/you/crunch_scripts$ cat >~/the_job <<EOF
+~/$USER/crunch_scripts$ cat >~/the_job <<EOF
{
- "script": "parallel-hash.py",
- "script_version": "you:master",
+ "script": "concurrent-hash.py",
+ "repository": "$USER/$USER",
+ "script_version": "master",
"script_parameters":
{
"input": "887cd41e9c613463eab2f0d885c6dd96+83"
}
}
EOF
-~/you/crunch_scripts$ arv job create --job "$(cat ~/the_job)"
+~/$USER/crunch_scripts$ arv job create --job "$(cat ~/the_job)"
{
...
"uuid":"qr1hi-xxxxx-xxxxxxxxxxxxxxx"
...
}
-~/you/crunch_scripts$ arv job get --uuid qr1hi-xxxxx-xxxxxxxxxxxxxxx
+~/$USER/crunch_scripts$ arv job get --uuid qr1hi-xxxxx-xxxxxxxxxxxxxxx
{
...
"output":"e2ccd204bca37c77c0ba59fc470cd0f7+162",
@@ -62,23 +63,16 @@ EOF
~/you/crunch_scripts$ arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162
-md5sum.txt
-md5sum.txt
-md5sum.txt
-~/you/crunch_scripts$ arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162/md5sum.txt
+~/$USER/crunch_scripts$ arv keep ls e2ccd204bca37c77c0ba59fc470cd0f7+162
+./md5sum.txt
+~/$USER/crunch_scripts$ arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162/md5sum.txt
0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
504938460ef369cd275e4ef58994cffe bob.txt
8f3b36aff310e06f3c5b9e95678ff77a carol.txt