radhika [Tue, 12 Apr 2016 19:29:58 +0000 (15:29 -0400)]
Merge branch 'master' into 8724-keep-block-check-script
Peter Amstutz [Tue, 12 Apr 2016 17:45:08 +0000 (13:45 -0400)]
Rename "rebuild" back to "reboot" and change "rebuild" to mean "build
--no-cache". "arvbox start" no longer fails if the container is already
running. Update docs. no issue #
Brett Smith [Tue, 12 Apr 2016 15:36:31 +0000 (11:36 -0400)]
4083: crunchstat-summary imports _strptime.
Refs #4083 for rationale.
Refs #8933 where this was reported.
See also
d9014288.
Brett Smith [Tue, 12 Apr 2016 14:51:28 +0000 (10:51 -0400)]
Merge branch '8912-node-manager-patch-nodes-wip'
Closes #8913, #8923. (The branch name has a typo.)
Brett Smith [Fri, 8 Apr 2016 22:56:44 +0000 (18:56 -0400)]
8912: Node Manager search_for_now uses overridden methods.
This wasn't possible in the original implementation because of the way
we used to proxy methods to self.real. Now that we proxy them
transparently, we can call methods on the Node Manager driver, and let
them be proxied to the underlying libcloud driver if needed.
radhika [Tue, 12 Apr 2016 14:17:24 +0000 (10:17 -0400)]
8724: some more cleanup of tests.
Peter Amstutz [Mon, 11 Apr 2016 20:57:17 +0000 (16:57 -0400)]
Add args.ignore_docker_for_reuse=False to cwl-runner crunch script. refs #8857
radhika [Tue, 12 Apr 2016 03:37:22 +0000 (23:37 -0400)]
Merge branch 'master' into 8724-keep-block-check-script
radhika [Tue, 12 Apr 2016 03:36:11 +0000 (23:36 -0400)]
8724: test updates
Peter Amstutz [Mon, 11 Apr 2016 20:22:17 +0000 (16:22 -0400)]
Add --help to test_with_arvbox.sh, no issue #
Peter Amstutz [Mon, 11 Apr 2016 20:19:11 +0000 (16:19 -0400)]
Separate out Go and Python dependencies into separate directories that don't
get deleted by "reset" because they include code from downloading external
dependencies. Add -data-manager-token to keepstore invocation for datamanager
testing.
radhika [Mon, 11 Apr 2016 14:02:33 +0000 (10:02 -0400)]
closes #7658
Merge branch '7658-websockets-reconnect-on-close'
radhika [Mon, 11 Apr 2016 14:02:15 +0000 (10:02 -0400)]
Merge branch 'master' into 7658-websockets-reconnect-on-close
radhika [Mon, 11 Apr 2016 14:00:36 +0000 (10:00 -0400)]
7658: update connect error test to use stream handler to read the log file, instead of using a temp file.
Peter Amstutz [Sun, 10 Apr 2016 02:11:32 +0000 (22:11 -0400)]
Merge branch '8799-make-drained-nodes-idle' closes #8799
Peter Amstutz [Fri, 8 Apr 2016 21:25:25 +0000 (17:25 -0400)]
8799: shutdown_eligible() returns "node is draining" when in drain state. Add comments about iterating over cloud_nodes to check for "down" nodes. Fix tests.
Brett Smith [Fri, 8 Apr 2016 20:31:17 +0000 (16:31 -0400)]
Pin dockercleaner's docker-py requirement to 1.7.2.
Finishes the job started by
8680c874. It only seems to be really
necessary on wheezy (because docker-py or its requirements abandoned
support for Python 3.2), but since we're pinning more as a general
rule, might as well make it universal. Closes #8904, #8922.
Peter Amstutz [Fri, 8 Apr 2016 20:14:59 +0000 (16:14 -0400)]
7658: Clean up & handle subscription filters consistently across EventClient,
_EventClient and PollClient.
radhika [Fri, 8 Apr 2016 15:23:53 +0000 (11:23 -0400)]
7658: improve the log verification in case of unexpected close.
radhika [Fri, 8 Apr 2016 14:32:33 +0000 (10:32 -0400)]
7658: add test that verifies reconnect retry behavior
Brett Smith [Fri, 8 Apr 2016 14:12:22 +0000 (10:12 -0400)]
Merge branch '8904-support-python3.2'
Closes #8904.
Tom Clegg [Thu, 7 Apr 2016 02:26:41 +0000 (22:26 -0400)]
8904: Avoid installing pip >= 8 in a Python 3.2 virtualenv.
Ward Vandewege [Fri, 8 Apr 2016 01:51:11 +0000 (21:51 -0400)]
Package crunchstat-summary.
closes #8911
Brett Smith [Thu, 7 Apr 2016 21:30:44 +0000 (17:30 -0400)]
Merge branch '8872-node-manager-create-search-handling-wip'
Closes #8872, #8900.
Brett Smith [Wed, 6 Apr 2016 18:23:11 +0000 (14:23 -0400)]
8872: Bugfix Node Manager's node search after node create failure.
search_for raises ValueError if the thing isn't found. create_node
seems to be expecting it to return None instead. Bring create_node in
line with search_for's documented API.
In order to get the tests to pass, I had to separate out the raw
search code from the caching, and use that in create_node. Otherwise,
the cloud node from the "node found" test would be cached and returned
in the "node not found" test.
radhika [Thu, 7 Apr 2016 15:21:57 +0000 (11:21 -0400)]
Merge branch 'master' into 7658-websockets-reconnect-on-close
radhika [Thu, 7 Apr 2016 15:17:25 +0000 (11:17 -0400)]
8724: performKeepBlockCheck() returns error when any of the listed blocks are not found.
radhika [Thu, 7 Apr 2016 13:51:06 +0000 (09:51 -0400)]
8724: test assertion improvements
Peter Amstutz [Thu, 7 Apr 2016 02:16:33 +0000 (22:16 -0400)]
Remove over-quoting from crunchrunner and certificate volume mounts. refs #8893
radhika [Wed, 6 Apr 2016 22:30:35 +0000 (18:30 -0400)]
8724: add keep-block-check script
Peter Amstutz [Wed, 6 Apr 2016 19:51:56 +0000 (15:51 -0400)]
8799: Nodes in "drain" state are not automatically eligible for shutdown to
avoid a race between starting a shutdown and resume_node().
Brett Smith [Wed, 6 Apr 2016 19:50:23 +0000 (15:50 -0400)]
Merge branch '8879-cwl-runner-job-owner-wip'
Closes #8879, #8887.
Brett Smith [Tue, 5 Apr 2016 19:45:21 +0000 (15:45 -0400)]
8879: Clean indentation in CWL SDK tests.
Brett Smith [Tue, 5 Apr 2016 19:37:35 +0000 (15:37 -0400)]
8879: cwl-runner --submit respects --project-uuid.
Peter Amstutz [Wed, 6 Apr 2016 15:22:31 +0000 (11:22 -0400)]
8799: Nodes with slurm_state are "down" are checked with sinfo and either reenabled or are valid for shutdown.
Brett Smith [Wed, 6 Apr 2016 16:13:06 +0000 (12:13 -0400)]
Merge branch '8810-crunch-improve-docker-loading-wip'
Closes #8810, #8888.
Brett Smith [Tue, 5 Apr 2016 20:21:20 +0000 (16:21 -0400)]
8810: crunch-job reports errors when checking if Docker image is loaded.
Since the check was previously in an `if !` condition, errors in it
would cause us to enter the branch.
Brett Smith [Wed, 6 Apr 2016 15:37:12 +0000 (11:37 -0400)]
Merge branch '8893-crunch-job-crunchrunner-quoting-wip'
Closes #8893, #8895.
Brett Smith [Wed, 6 Apr 2016 14:32:03 +0000 (10:32 -0400)]
8893: crunch-job doesn't pass empty strings to `docker run`.
We solve this issue by requiring $VOLUME_CRUNCHRUNNER and
$VOLUME_CERTS to contain their own quoting. Because of that, we clear
their values first, to make sure we don't inherit values that might
break the `docker run` invocation.
Nico Cesar [Tue, 5 Apr 2016 18:15:32 +0000 (14:15 -0400)]
Merge branch '8712-fuse-cache-reload-bug'
closes #8712
Peter Amstutz [Tue, 5 Apr 2016 17:22:06 +0000 (13:22 -0400)]
8712: Propagate return value of clear() from super method. Test cache clearing
collections with subdirs.
radhika [Tue, 5 Apr 2016 16:00:19 +0000 (12:00 -0400)]
7658: update EventClient.on_closed to retry on connect errors.
radhika [Mon, 4 Apr 2016 19:41:56 +0000 (15:41 -0400)]
7658: add reconnect logic when a websocket is closed unexpectedly.
Peter Amstutz [Mon, 4 Apr 2016 19:40:33 +0000 (15:40 -0400)]
8712: Set self.collection = None when clearing the contents of a
CollectionDirectory, so that it gets properly reloaded on update().
Peter Amstutz [Mon, 4 Apr 2016 18:59:10 +0000 (14:59 -0400)]
8712: Test case that reproduces cache-spill bug.
Brett Smith [Fri, 1 Apr 2016 19:50:01 +0000 (15:50 -0400)]
Merge branch '8811-srun-sync-tempfail-wip'
Closes #8811, #8862.
Brett Smith [Thu, 31 Mar 2016 21:46:51 +0000 (17:46 -0400)]
8811: crunch-job srun_sync detects and reports SLURM tempfails.
preprocess_stderr needed updating to check for these tempfails even in
cases where the child process does not have a slotindex.
Peter Amstutz [Fri, 1 Apr 2016 19:46:37 +0000 (15:46 -0400)]
Merge branch '8816-compute-node-update-exception' close #8816
Peter Amstutz [Fri, 1 Apr 2016 19:35:08 +0000 (15:35 -0400)]
8816: Use is_cloud_exception to determine if exception is a "cloud error". Add
test that exceptions don't crash ComputeNodeUpdateActor.
Ward Vandewege [Fri, 1 Apr 2016 19:16:49 +0000 (15:16 -0400)]
Fix package building by pinning docker-py to version 1.7.2
No issue #
Brett Smith [Fri, 1 Apr 2016 18:47:19 +0000 (14:47 -0400)]
Merge branch '8782-reapchildren-after-signal-wip'
Closes #8782, #8860, #8870.
Brett Smith [Fri, 1 Apr 2016 18:37:34 +0000 (14:37 -0400)]
8782: Remove WIFEXITED check from crunch-job reapchildren.
The intent of this check was to avoid reaping children that got
SIGSTOP. But from the waitpid(2) man page, you must pass specific
flags for waitpid to return those children. Without those flags,
waitpid will only return the pids of children that have terminated.
Meanwhile, WIFEXITED only returns true if the exit code indicates that
the child terminated normally. It returns false if the child was
killed by a signal like SIGINT or SIGKILL. This means children so
killed were not reaped by reapchildren, leading to infinite loops.
Peter Amstutz [Fri, 1 Apr 2016 18:32:14 +0000 (14:32 -0400)]
Merge branch '8857-cwl-job-reuse' closes #8857
Peter Amstutz [Fri, 1 Apr 2016 17:28:03 +0000 (13:28 -0400)]
8816: Handle cloud errors slightly differently from unrecognized errors.
Peter Amstutz [Fri, 1 Apr 2016 17:11:51 +0000 (13:11 -0400)]
8816: ComputeNodeUpdateActor._throttle_errors logs errors instead of re-throwing them.
Ward Vandewege [Fri, 1 Apr 2016 14:59:46 +0000 (10:59 -0400)]
A few more fixes for run-build-packages-python-and-ruby.sh, and a small
safeguard for run-build-packages.sh.
refs #8864
Peter Amstutz [Fri, 1 Apr 2016 14:01:39 +0000 (10:01 -0400)]
Merge branch 'master' into 8857-cwl-job-reuse
Conflicts:
sdk/cwl/arvados_cwl/__init__.py
Ward Vandewege [Fri, 1 Apr 2016 01:38:50 +0000 (21:38 -0400)]
Add build/run-build-packages-python-and-ruby.sh script to handle upload
to pypi and rubygems.
refs #8864
Peter Amstutz [Thu, 31 Mar 2016 22:29:45 +0000 (18:29 -0400)]
Merge branch '8828-which-crunchrunner' closes #8828
Peter Amstutz [Thu, 31 Mar 2016 21:25:05 +0000 (17:25 -0400)]
8828: Fix bind mount point for certificates.
Peter Amstutz [Thu, 31 Mar 2016 19:58:47 +0000 (15:58 -0400)]
8828: Move logic for checking $(which crunchrunner) into script that runs before invoking Docker on the compute node.
Ward Vandewege [Thu, 31 Mar 2016 17:49:54 +0000 (13:49 -0400)]
Build newer cwltool version.
No issue #
Peter Amstutz [Thu, 31 Mar 2016 15:54:40 +0000 (11:54 -0400)]
Merge branch '8840-lock-job-record' closes #8840
Peter Amstutz [Thu, 31 Mar 2016 15:35:49 +0000 (11:35 -0400)]
Merge branch '8654-arv-jobs-cwl-runner' closes #8654
Peter Amstutz [Thu, 31 Mar 2016 15:30:31 +0000 (11:30 -0400)]
8654: Update test because input cwl files changed.
Peter Amstutz [Thu, 31 Mar 2016 15:00:15 +0000 (11:00 -0400)]
8654: Fix versionstring(). Improve help text / comments / style tweaks.
Peter Amstutz [Thu, 31 Mar 2016 14:32:09 +0000 (10:32 -0400)]
Merge branch 'master' into 8654-arv-jobs-cwl-runner
Peter Amstutz [Thu, 31 Mar 2016 14:31:59 +0000 (10:31 -0400)]
8654: Rename tests/inp/ to test/input/
Peter Amstutz [Thu, 31 Mar 2016 14:21:17 +0000 (10:21 -0400)]
8857: Add --ignore-docker-for-reuse option to assist workflow development.
Peter Amstutz [Wed, 30 Mar 2016 19:00:10 +0000 (15:00 -0400)]
8840: Use 'with_lock' instead of 'transaction' in Job.lock method.
Peter Amstutz [Wed, 30 Mar 2016 18:58:30 +0000 (14:58 -0400)]
8654: Add missing test_submit
Peter Amstutz [Wed, 30 Mar 2016 18:45:12 +0000 (14:45 -0400)]
8654: Add comments
Peter Amstutz [Wed, 30 Mar 2016 18:00:24 +0000 (14:00 -0400)]
8654: Print uuid of uploaded docker image on stderr instead of stdout.
radhika [Wed, 30 Mar 2016 13:47:29 +0000 (09:47 -0400)]
closes #8703
Merge branch '8703-job-components'
radhika [Wed, 30 Mar 2016 13:47:08 +0000 (09:47 -0400)]
Merge branch 'master' into 8703-job-components
Peter Amstutz [Wed, 30 Mar 2016 13:33:49 +0000 (09:33 -0400)]
8654: Update test_with_arvbox.sh
Peter Amstutz [Wed, 30 Mar 2016 01:11:06 +0000 (21:11 -0400)]
crunchrunner crunch script selects between $JOB_PARAMETER_CRUNCHRUNNER
or /usr/local/bin/crunchrunner, refs #8827
Peter Amstutz [Tue, 29 Mar 2016 20:28:27 +0000 (16:28 -0400)]
8654: Pin pyasn1_modules to version that is compatible with pyasn1==0.1.7.
Peter Amstutz [Tue, 29 Mar 2016 20:28:19 +0000 (16:28 -0400)]
8654: Fix version string produced by arvados-cwl-runner.
Peter Amstutz [Tue, 29 Mar 2016 19:34:48 +0000 (15:34 -0400)]
Fix sdk/cwl test refs #8815
Peter Amstutz [Tue, 29 Mar 2016 18:16:58 +0000 (14:16 -0400)]
Merge branch '8815-crunchrunner-everywhere' closes #8815
Peter Amstutz [Tue, 29 Mar 2016 17:34:40 +0000 (13:34 -0400)]
Merge branch '8815-crunchrunner-everywhere' into 8654-arv-jobs-cwl-runner
Conflicts:
sdk/cwl/arvados_cwl/__init__.py
Peter Amstutz [Tue, 29 Mar 2016 17:30:17 +0000 (13:30 -0400)]
8654: Make --submit --wait the default mode.
radhika [Tue, 29 Mar 2016 17:10:49 +0000 (13:10 -0400)]
8703: better organized tests
Peter Amstutz [Tue, 29 Mar 2016 17:06:22 +0000 (13:06 -0400)]
8815: Fix syntax errors.
Peter Amstutz [Tue, 29 Mar 2016 16:18:15 +0000 (12:18 -0400)]
8815: Now expect /usr/local/bin/crunchrunner. Bind mount host certificates to
/etc/arvados/ca-certificates.crt
radhika [Tue, 29 Mar 2016 15:54:01 +0000 (11:54 -0400)]
Merge branch 'master' into 8703-job-components
Ward Vandewege [Tue, 29 Mar 2016 15:24:59 +0000 (11:24 -0400)]
Build a package for crunchrunner.
refs #8815
Peter Amstutz [Tue, 29 Mar 2016 13:38:16 +0000 (09:38 -0400)]
8815: Rely on system-provided crunchrunner. Also use arvados/jobs by default if no docker provided.
Peter Amstutz [Tue, 29 Mar 2016 13:23:51 +0000 (09:23 -0400)]
8815: Crunch-job bind mounts crunchrunner binary and certificates from host.
Updated arvbox to compile and install crunchrunner.
Peter Amstutz [Mon, 28 Mar 2016 14:06:32 +0000 (10:06 -0400)]
8654: Bump cwltool version dependency and print cwl version string in cwl-runner crunch script.
Peter Amstutz [Mon, 28 Mar 2016 13:37:37 +0000 (09:37 -0400)]
Merge branch 'master' into 8654-arv-jobs-cwl-runner
Conflicts:
docker/jobs/Dockerfile
Brett Smith [Sun, 27 Mar 2016 20:47:07 +0000 (16:47 -0400)]
Merge branch '8800-queue-query'
Closes #8800, #8809.
Brett Smith [Sun, 27 Mar 2016 20:43:55 +0000 (16:43 -0400)]
8800: Document the new queue_position implementation.
For the benefit of future readers.
Ward Vandewege [Sun, 27 Mar 2016 02:19:09 +0000 (22:19 -0400)]
Fix package build and test of the arvados-cwl-runner package for
ubuntu1204 and centos6.
refs #8671
Tom Clegg [Fri, 25 Mar 2016 19:59:34 +0000 (15:59 -0400)]
8800: Drop queue_position support.
Ward Vandewege [Sat, 26 Mar 2016 01:50:28 +0000 (21:50 -0400)]
Bump up the iteration for the python-arvados-cwl-runner package so that
it gets rebuilt.
refs #8671
Ward Vandewege [Sat, 26 Mar 2016 01:36:19 +0000 (21:36 -0400)]
Fix a few more dependencies for the python-arvados-cwl-runner package.
refs #8671
Peter Amstutz [Fri, 25 Mar 2016 20:35:10 +0000 (16:35 -0400)]
8654: Passes 100% CWL conformance tests using running cwl-runner in a crunch job!
Peter Amstutz [Fri, 25 Mar 2016 17:54:29 +0000 (13:54 -0400)]
8654: --version reports versions for arvados-cwl-runner, arvados-python-client,
and cwltool.