3 navsection: installguide
4 title: Test Slurm dispatch
7 Copyright (C) The Arvados Authors. All rights reserved.
9 SPDX-License-Identifier: CC-BY-SA-3.0
12 {% include 'notebox_begin_warning' %}
13 @crunch-dispatch-slurm@ is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you use LSF or if you are installing a cloud cluster.
14 {% include 'notebox_end' %}
16 h2. Test compute node setup
18 You should now be able to submit Slurm jobs that run in Docker containers. On the node where you're running the dispatcher, you can test this by running:
21 <pre><code>~$ <span class="userinput">sudo -u <b>crunch</b> srun -N1 docker run busybox echo OK
25 If it works, this command should print @OK@ (it may also show some status messages from Slurm and/or Docker). If it does not print @OK@, double-check your compute node setup, and that the @crunch@ user can submit Slurm jobs.
27 h2. Test the dispatcher
29 Make sure all of your compute nodes are set up with "Docker":../crunch2/install-compute-node-docker.html or "Singularity":../crunch2/install-compute-node-singularity.html.
31 On the dispatch node, start monitoring the crunch-dispatch-slurm logs:
34 <pre><code># <span class="userinput">journalctl -o cat -fu crunch-dispatch-slurm.service</span>
38 In another terminal window, use the diagnostics tool to run a simple container.
41 <pre><code># <span class="userinput">arvados-client sudo diagnostics</span>
42 INFO 5: running health check (same as `arvados-server check`)
43 INFO 10: getting discovery document from https://zzzzz.arvadosapi.com/discovery/v1/apis/arvados/v1/rest
45 INFO 160: running a container
46 INFO ... container request submitted, waiting up to 10m for container to run
50 Once @crunch-dispatch-slurm@ polls the API server for new containers to run, you should see it dispatch the new container. It will log messages like:
53 <pre><code>2016/08/05 13:52:54 Monitoring container zzzzz-dz642-hdp2vpu9nq14tx0 started
54 2016/08/05 13:53:04 About to submit queued container zzzzz-dz642-hdp2vpu9nq14tx0
55 2016/08/05 13:53:04 sbatch succeeded: Submitted batch job 8102
59 Before the container finishes, Slurm's @squeue@ command will show the new job in the list of queued and running jobs. For example, you might see:
62 <pre><code>~$ <span class="userinput">squeue --long</span>
63 Fri Aug 5 13:57:50 2016
64 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON)
65 8103 compute zzzzz-dz crunch RUNNING 1:56 UNLIMITED 1 compute0
69 The job's name corresponds to the container's UUID. You can get more information about it by running, e.g., <notextile><code>scontrol show job Name=<b>UUID</b></code></notextile>.
71 When the container finishes, the dispatcher will log that, with the final result:
74 <pre><code>2016/08/05 13:53:14 Container zzzzz-dz642-hdp2vpu9nq14tx0 now in state "Complete" with locked_by_uuid ""
75 2016/08/05 13:53:14 Monitoring container zzzzz-dz642-hdp2vpu9nq14tx0 finished
79 After the container finishes, you can get the container record by UUID *from a shell server* to see its results:
82 <pre><code>shell:~$ <span class="userinput">arv get <b>zzzzz-dz642-hdp2vpu9nq14tx0</b></span>
86 "log":"a01df2f7e5bc1c2ad59c60a837e90dc6+166",
87 "output":"d41d8cd98f00b204e9800998ecf8427e+0",
94 You can use standard Keep tools to view the container's output and logs from their corresponding fields. For example, to see the logs from the collection referenced in the @log@ field:
97 <pre><code>~$ <span class="userinput">arv keep ls <b>a01df2f7e5bc1c2ad59c60a837e90dc6+166</b></span>
101 ~$ <span class="userinput">arv-get <b>a01df2f7e5bc1c2ad59c60a837e90dc6+166</b>/stdout.txt</span>
102 2016-08-05T13:53:06.201011Z Hello, Crunch!
106 If the container does not dispatch successfully, refer to the @crunch-dispatch-slurm@ logs for information about why it failed.