20457: Include delayed supervisor containers in overquota metric. 20457-max-supervisors-overquota
authorTom Clegg <tom@curii.com>
Tue, 2 May 2023 21:16:05 +0000 (17:16 -0400)
committerTom Clegg <tom@curii.com>
Tue, 2 May 2023 21:16:05 +0000 (17:16 -0400)
commitafe02764a9209c0c4b0ec75df52b4851ec8ce01f
treeeae27e2e873d8c33d0ab4bd345b39661d0e80ddd
parentd33c63515b46bd5d9ad4dc07efc734743b7d530b
20457: Include delayed supervisor containers in overquota metric.

Previously, supervisor containers that had high enough priority to
run, but weren't scheduled because of SupervisorFraction, were not
counted in the containers_over_quota metric. This caused the
"overquota" metric to show a misleading time series as non-supervisor
containers made their way through the queue and the delayed supervisor
containers flapped between "not allocated because quota" (counted) and
"not allocated because SupervisorFraction" (not counted).

With this change, un-mappable supervisors always count toward the
containers_not_allocated_over_quota metric.

This also applies the "unlock if previously locked but now delayed due
to SupervisorFraction" logic to supervisor processes, which was
previously overlooked. This prevents supervisors from staying in
Locked state after being bumped by higher-priority containers.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
lib/dispatchcloud/scheduler/run_queue.go