17301: Report warning about OOM killer when exit code 137
authorPeter Amstutz <peter.amstutz@curii.com>
Tue, 19 Apr 2022 19:40:56 +0000 (15:40 -0400)
committerPeter Amstutz <peter.amstutz@curii.com>
Tue, 19 Apr 2022 19:41:13 +0000 (15:41 -0400)
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

sdk/cwl/arvados_cwl/arvcontainer.py

index e2c2f2e67bfef0b7b1bcd0f09f8cd1d212639f78..c85443a23af10cbe88bdb7bba1f51978d431efcb 100644 (file)
@@ -392,6 +392,10 @@ class ArvadosContainer(JobBase):
                     processStatus = "success"
                 else:
                     processStatus = "permanentFail"
+
+                if rcode == 137:
+                    logger.warning("%s job was killed on the compute instance.  The most common reason is that it attempted to allocate too much RAM and was targeted by the Out Of Memory (OOM) killer.  Try resubmitting with a higher 'ramMin'.",
+                                 self.arvrunner.label(self))
             else:
                 processStatus = "permanentFail"