-To test changes to a script by running a job, the change must be pushed into @git@, the job queued asynchronously, and the actual execution may be run on any compute server. As a result, debugging a script can be difficult and time consuming. This tutorial demonstrates using @arv-crunch-job@ to run your job in your local VM. This avoids the job queue and allows you to execute the script from your uncomitted git tree.
+To test changes to a script by running a job, the change must be pushed to your hosted repository, and the job might have to wait in the queue before it runs. This cycle can be an inefficient way to develop and debug scripts. This tutorial demonstrates an alternative: using @arv-crunch-job@ to run your job in your local VM. This avoids the job queue and allows you to execute the script directly from your git working tree without committing or pushing.
-Instead of a git commit hash, we provide the path to the directory in the "script_version" parameter. The script specified in "script" will actually be searched for in the "crunch_scripts/" subdirectory of the directory specified "script_version". Although we are running the script locally, the script still requires access to the Arvados API server and Keep storage service. The job will be recorded in the Arvados job history, and visible in Workbench.
+Instead of a Git commit hash, we provide the path to the directory in the "script_version" parameter. The script specified in "script" is expected to be in the @crunch_scripts/@ subdirectory of the directory specified "script_version". Although we are running the script locally, the script still requires access to the Arvados API server and Keep storage service. The job will be recorded in the Arvados job history, and visible in Workbench.
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 check slurm allocation
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 node localhost - 1 slots
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 start
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 script hello-world.py
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 check slurm allocation
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 node localhost - 1 slots
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 start
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 script hello-world.py
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 script_parameters {}
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 runtime_constraints {"max_tasks_per_node":0}
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 start level 0
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 script_parameters {}
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 runtime_constraints {"max_tasks_per_node":0}
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 start level 0
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr hello world
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 child 29834 on localhost.1 exit 0 signal 0 success=
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 failure (#1, permanent) after 0 seconds
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr hello world
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 child 29834 on localhost.1 exit 0 signal 0 success=
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 failure (#1, permanent) after 0 seconds
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 Every node has failed -- giving up on this round
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 wait for last 0 children to finish
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 status: 0 done, 0 running, 0 todo
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 Every node has failed -- giving up on this round
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 wait for last 0 children to finish
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 status: 0 done, 0 running, 0 todo
The script's output is captured in the log, which is useful for print statement debugging. However, although this script returned a status code of 0 (success), the job failed. Why? For a job to complete successfully scripts must explicitly add their output to Keep, and then tell Arvados about it. Here is a second try:
<notextile>
The script's output is captured in the log, which is useful for print statement debugging. However, although this script returned a status code of 0 (success), the job failed. Why? For a job to complete successfully scripts must explicitly add their output to Keep, and then tell Arvados about it. Here is a second try:
<notextile>
~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x hello-world-fixed.py</span>
~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
{
~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x hello-world-fixed.py</span>
~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
{
2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578 check slurm allocation
2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578 node localhost - 1 slots
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 start
2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578 check slurm allocation
2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578 node localhost - 1 slots
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 start
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script hello-world.py
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script_version /home/you/you
+2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script hello-world-fixed.py
+2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script_version /home/<b>you</b>/<b>you</b>
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script_parameters {}
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 runtime_constraints {"max_tasks_per_node":0}
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 start level 0
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script_parameters {}
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 runtime_constraints {"max_tasks_per_node":0}
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 start level 0
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 Freeze not implemented
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 collate
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 output 576c44d762ba241b0a674aa43152b52a+53
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 Freeze not implemented
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 collate
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 output 576c44d762ba241b0a674aa43152b52a+53