]> git.arvados.org - arvados.git/commit
13078: Fix jobs stuck in "held" state in old SLURM versions.
authorTom Clegg <tclegg@veritasgenetics.com>
Wed, 7 Mar 2018 14:19:40 +0000 (09:19 -0500)
committerTom Clegg <tclegg@veritasgenetics.com>
Wed, 7 Mar 2018 16:38:17 +0000 (11:38 -0500)
commite1f97dcf68d197525228781e2a861cc3e64a0231
treeb248677345f9d99f8f460797352028ba7b93a0fc
parentf7d0830ae819e2a62115642be449fa79f2fc8152
13078: Fix jobs stuck in "held" state in old SLURM versions.

In SLURM 14 and 15, if a queued job has feature constraints which
become invalid (e.g., when "scontrol reconfigure" clears all node
features), the job is put on hold with priority=0, and it stays in
this state even after the features reappear. "scontrol release"
recovers a job from this state.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgenetics.com>
services/crunch-dispatch-slurm/crunch-dispatch-slurm_test.go
services/crunch-dispatch-slurm/slurm.go
services/crunch-dispatch-slurm/squeue.go
services/crunch-dispatch-slurm/squeue_test.go