From 0888c5335f4e1868cdcebda6b6f76c2138978c6b Mon Sep 17 00:00:00 2001 From: Tom Clegg Date: Tue, 19 Sep 2017 22:14:37 -0400 Subject: [PATCH] 12084: Fix dispatcher getting bogged down on "too many open files". Previously, handling the "too many open files" error from popen3 consisted of sleeping 1 second and trying the next job in the queue. When lots of jobs were queued, this meant getting stuck in start_jobs() for a long time, futilely trying to start different jobs, which all failed for the same reason. Worse, being stuck here meant none of the working jobs could finish, which meant no file descriptors could be freed. Arvados-DCO-1.1-Signed-off-by: Tom Clegg --- services/api/lib/crunch_dispatch.rb | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/services/api/lib/crunch_dispatch.rb b/services/api/lib/crunch_dispatch.rb index ca6e28ba19..3cabc1e3ce 100644 --- a/services/api/lib/crunch_dispatch.rb +++ b/services/api/lib/crunch_dispatch.rb @@ -429,8 +429,11 @@ class CrunchDispatch i, o, e, t = Open3.popen3(*cmd_args) rescue $stderr.puts "dispatch: popen3: #{$!}" - sleep 1 - next + # This is a dispatch problem like "Too many open files"; + # retrying another job right away would be futile. Just return + # and hope things are better next time, after (at least) a + # did_recently() delay. + return end $stderr.puts "dispatch: job #{job.uuid}" -- 2.39.5