Tom Clegg [Wed, 20 Apr 2016 14:30:04 +0000 (10:30 -0400)]
Merge branch '8697-ruby187-compat'
refs #8697
refs #8689
Nico Cesar [Wed, 20 Apr 2016 13:58:57 +0000 (09:58 -0400)]
Merge branch '9014-keep-block-check-package'
closes #9014
Nico Cesar [Wed, 20 Apr 2016 13:28:55 +0000 (09:28 -0400)]
adding new package for block checks
refs #9014
Peter Amstutz [Wed, 20 Apr 2016 13:10:33 +0000 (09:10 -0400)]
Merge branch '8953-no-double-count' refs #8953
Tom Clegg [Tue, 19 Apr 2016 20:57:41 +0000 (16:57 -0400)]
6833: Fix excessive debug logging in TokenExpiryTest and subsequent tests.
Excessive logging was introduced (seemingly unintentionally) in
d3313e65.
refs #6833
Peter Amstutz [Sat, 16 Apr 2016 02:48:13 +0000 (22:48 -0400)]
Don't double-count nodes that are shutting down. refs #8953
Tom Clegg [Tue, 19 Apr 2016 18:44:45 +0000 (14:44 -0400)]
Merge branch '9009-keep-web-close-conns'
closes #9009
Tom Clegg [Tue, 19 Apr 2016 15:43:09 +0000 (11:43 -0400)]
9009: Fix missing Close() in collectionreader.
Tom Clegg [Tue, 19 Apr 2016 15:17:07 +0000 (11:17 -0400)]
Merge branch '9004-close-keep-connections'
refs #9004
refs #9005
Tom Clegg [Tue, 19 Apr 2016 15:17:00 +0000 (11:17 -0400)]
Change Check to Assert to avoid crash after failure. No issue #
Tom Clegg [Tue, 19 Apr 2016 14:18:51 +0000 (10:18 -0400)]
9005: Workaround: Close idle connections aggressively.
Currently, the SDK code never reuses connections anyway, so it's best
to shut them down right away.
Peter Amstutz [Sat, 16 Apr 2016 02:31:21 +0000 (22:31 -0400)]
Don't issue drain when shutdown has been cancelled. refs #8953
Peter Amstutz [Sat, 16 Apr 2016 02:20:54 +0000 (22:20 -0400)]
Don't try to drain node if no nodeename associated. refs #8953
Peter Amstutz [Fri, 15 Apr 2016 20:18:32 +0000 (16:18 -0400)]
Merge branch '8953-node-manager-FSM' closes #8953
Peter Amstutz [Fri, 15 Apr 2016 19:50:06 +0000 (15:50 -0400)]
8953: Don't start shutdown on 'drng*' or 'alloc*'.
Peter Amstutz [Fri, 15 Apr 2016 18:06:29 +0000 (14:06 -0400)]
8953: Assign to tuple (eligible, reason)
Peter Amstutz [Fri, 15 Apr 2016 15:29:50 +0000 (11:29 -0400)]
8953: shutdown_eligible() returns a tuple. Report reason for shutdown decision.
Peter Amstutz [Fri, 15 Apr 2016 15:13:12 +0000 (11:13 -0400)]
8953: Clarify how to use return value from consulting transitions table and shutdown_eligible().
Peter Amstutz [Fri, 15 Apr 2016 14:37:34 +0000 (10:37 -0400)]
8953: Fix indentation of shutdown_eligible().
Peter Amstutz [Fri, 15 Apr 2016 14:32:16 +0000 (10:32 -0400)]
8953: Add missing transitions.py
Peter Amstutz [Fri, 15 Apr 2016 14:26:46 +0000 (10:26 -0400)]
Fixup test_node_undrained_when_shutdown_cancelled and test_alloc_node_undrained_when_shutdown_cancelled.
Peter Amstutz [Fri, 15 Apr 2016 12:10:13 +0000 (08:10 -0400)]
8953: Tests pass, with some removed due to removal of the corresponding behavior.
Peter Amstutz [Thu, 14 Apr 2016 20:02:16 +0000 (16:02 -0400)]
8953: Node manager shutdown policy change WIP. Still fixing tests.
Peter Amstutz [Thu, 14 Apr 2016 13:57:41 +0000 (09:57 -0400)]
Add /var/lib/gopath and /var/lib/pip permissions fixup to Arvbox createusers.sh
to fix demo build, no issue #
Tom Clegg [Wed, 13 Apr 2016 14:57:09 +0000 (10:57 -0400)]
8697: Fix Locator.parse() (was failing on locators with hints).
Tom Clegg [Wed, 13 Apr 2016 14:24:14 +0000 (10:24 -0400)]
8697: ruby 1.8.7 compatibility in arvados/collection.
Tom Clegg [Tue, 22 Mar 2016 21:29:29 +0000 (17:29 -0400)]
8697: Move error messages from stdout to stderr.
Tom Clegg [Tue, 22 Mar 2016 21:28:39 +0000 (17:28 -0400)]
8697: Relax version constraints so gem can be used in ruby187/ree projects.
Nico Cesar [Wed, 13 Apr 2016 17:30:35 +0000 (13:30 -0400)]
Merge branch '8959-python-gflags-dependencies'
Nico Cesar [Wed, 13 Apr 2016 17:29:08 +0000 (13:29 -0400)]
Revert "I <3 pythong-gflags"
This reverts commit
e935e107a4ac6250ae64878262c3145d7a62b8e8.
Nico Cesar [Wed, 13 Apr 2016 15:43:39 +0000 (11:43 -0400)]
Merge branch '8959-python-gflags-dependencies'
Nico Cesar [Wed, 13 Apr 2016 15:41:16 +0000 (11:41 -0400)]
I <3 pythong-gflags
closes #8959
Nico Cesar [Wed, 13 Apr 2016 14:42:17 +0000 (10:42 -0400)]
8959: pinning out the version of python-gflags
refs #8959
Brett Smith [Tue, 12 Apr 2016 20:20:39 +0000 (16:20 -0400)]
Merge branch '8893-crunch-job-volumes-array-wip'
Closes #8893, #8921.
Brett Smith [Fri, 8 Apr 2016 14:56:34 +0000 (10:56 -0400)]
8893: Safer quoting of crunch-job's conditional volume switches.
Packing arguments into an array allows us to both have a variable
number of switches, with correct word splitting, even when the
indivdiual arguments in the array have whitespace.
Peter Amstutz [Tue, 12 Apr 2016 19:35:10 +0000 (15:35 -0400)]
Propagate designated stdout stream from keepdocker.main() to put.main().
no issue #
radhika [Tue, 12 Apr 2016 19:30:53 +0000 (15:30 -0400)]
closes #8724
Merge branch '8724-keep-block-check-script'
radhika [Tue, 12 Apr 2016 19:29:58 +0000 (15:29 -0400)]
Merge branch 'master' into 8724-keep-block-check-script
Peter Amstutz [Tue, 12 Apr 2016 17:45:08 +0000 (13:45 -0400)]
Rename "rebuild" back to "reboot" and change "rebuild" to mean "build
--no-cache". "arvbox start" no longer fails if the container is already
running. Update docs. no issue #
Brett Smith [Tue, 12 Apr 2016 15:36:31 +0000 (11:36 -0400)]
4083: crunchstat-summary imports _strptime.
Refs #4083 for rationale.
Refs #8933 where this was reported.
See also
d9014288.
Brett Smith [Tue, 12 Apr 2016 14:51:28 +0000 (10:51 -0400)]
Merge branch '8912-node-manager-patch-nodes-wip'
Closes #8913, #8923. (The branch name has a typo.)
Brett Smith [Fri, 8 Apr 2016 22:56:44 +0000 (18:56 -0400)]
8912: Node Manager search_for_now uses overridden methods.
This wasn't possible in the original implementation because of the way
we used to proxy methods to self.real. Now that we proxy them
transparently, we can call methods on the Node Manager driver, and let
them be proxied to the underlying libcloud driver if needed.
radhika [Tue, 12 Apr 2016 14:17:24 +0000 (10:17 -0400)]
8724: some more cleanup of tests.
Peter Amstutz [Mon, 11 Apr 2016 20:57:17 +0000 (16:57 -0400)]
Add args.ignore_docker_for_reuse=False to cwl-runner crunch script. refs #8857
radhika [Tue, 12 Apr 2016 03:37:22 +0000 (23:37 -0400)]
Merge branch 'master' into 8724-keep-block-check-script
radhika [Tue, 12 Apr 2016 03:36:11 +0000 (23:36 -0400)]
8724: test updates
Peter Amstutz [Mon, 11 Apr 2016 20:22:17 +0000 (16:22 -0400)]
Add --help to test_with_arvbox.sh, no issue #
Peter Amstutz [Mon, 11 Apr 2016 20:19:11 +0000 (16:19 -0400)]
Separate out Go and Python dependencies into separate directories that don't
get deleted by "reset" because they include code from downloading external
dependencies. Add -data-manager-token to keepstore invocation for datamanager
testing.
radhika [Mon, 11 Apr 2016 14:02:33 +0000 (10:02 -0400)]
closes #7658
Merge branch '7658-websockets-reconnect-on-close'
radhika [Mon, 11 Apr 2016 14:02:15 +0000 (10:02 -0400)]
Merge branch 'master' into 7658-websockets-reconnect-on-close
radhika [Mon, 11 Apr 2016 14:00:36 +0000 (10:00 -0400)]
7658: update connect error test to use stream handler to read the log file, instead of using a temp file.
Peter Amstutz [Sun, 10 Apr 2016 02:11:32 +0000 (22:11 -0400)]
Merge branch '8799-make-drained-nodes-idle' closes #8799
Peter Amstutz [Fri, 8 Apr 2016 21:25:25 +0000 (17:25 -0400)]
8799: shutdown_eligible() returns "node is draining" when in drain state. Add comments about iterating over cloud_nodes to check for "down" nodes. Fix tests.
Brett Smith [Fri, 8 Apr 2016 20:31:17 +0000 (16:31 -0400)]
Pin dockercleaner's docker-py requirement to 1.7.2.
Finishes the job started by
8680c874. It only seems to be really
necessary on wheezy (because docker-py or its requirements abandoned
support for Python 3.2), but since we're pinning more as a general
rule, might as well make it universal. Closes #8904, #8922.
Peter Amstutz [Fri, 8 Apr 2016 20:14:59 +0000 (16:14 -0400)]
7658: Clean up & handle subscription filters consistently across EventClient,
_EventClient and PollClient.
radhika [Fri, 8 Apr 2016 15:23:53 +0000 (11:23 -0400)]
7658: improve the log verification in case of unexpected close.
radhika [Fri, 8 Apr 2016 14:32:33 +0000 (10:32 -0400)]
7658: add test that verifies reconnect retry behavior
Brett Smith [Fri, 8 Apr 2016 14:12:22 +0000 (10:12 -0400)]
Merge branch '8904-support-python3.2'
Closes #8904.
Tom Clegg [Thu, 7 Apr 2016 02:26:41 +0000 (22:26 -0400)]
8904: Avoid installing pip >= 8 in a Python 3.2 virtualenv.
Ward Vandewege [Fri, 8 Apr 2016 01:51:11 +0000 (21:51 -0400)]
Package crunchstat-summary.
closes #8911
Brett Smith [Thu, 7 Apr 2016 21:30:44 +0000 (17:30 -0400)]
Merge branch '8872-node-manager-create-search-handling-wip'
Closes #8872, #8900.
Brett Smith [Wed, 6 Apr 2016 18:23:11 +0000 (14:23 -0400)]
8872: Bugfix Node Manager's node search after node create failure.
search_for raises ValueError if the thing isn't found. create_node
seems to be expecting it to return None instead. Bring create_node in
line with search_for's documented API.
In order to get the tests to pass, I had to separate out the raw
search code from the caching, and use that in create_node. Otherwise,
the cloud node from the "node found" test would be cached and returned
in the "node not found" test.
radhika [Thu, 7 Apr 2016 15:21:57 +0000 (11:21 -0400)]
Merge branch 'master' into 7658-websockets-reconnect-on-close
radhika [Thu, 7 Apr 2016 15:17:25 +0000 (11:17 -0400)]
8724: performKeepBlockCheck() returns error when any of the listed blocks are not found.
radhika [Thu, 7 Apr 2016 13:51:06 +0000 (09:51 -0400)]
8724: test assertion improvements
Peter Amstutz [Thu, 7 Apr 2016 02:16:33 +0000 (22:16 -0400)]
Remove over-quoting from crunchrunner and certificate volume mounts. refs #8893
radhika [Wed, 6 Apr 2016 22:30:35 +0000 (18:30 -0400)]
8724: add keep-block-check script
Peter Amstutz [Wed, 6 Apr 2016 19:51:56 +0000 (15:51 -0400)]
8799: Nodes in "drain" state are not automatically eligible for shutdown to
avoid a race between starting a shutdown and resume_node().
Brett Smith [Wed, 6 Apr 2016 19:50:23 +0000 (15:50 -0400)]
Merge branch '8879-cwl-runner-job-owner-wip'
Closes #8879, #8887.
Brett Smith [Tue, 5 Apr 2016 19:45:21 +0000 (15:45 -0400)]
8879: Clean indentation in CWL SDK tests.
Brett Smith [Tue, 5 Apr 2016 19:37:35 +0000 (15:37 -0400)]
8879: cwl-runner --submit respects --project-uuid.
Peter Amstutz [Wed, 6 Apr 2016 15:22:31 +0000 (11:22 -0400)]
8799: Nodes with slurm_state are "down" are checked with sinfo and either reenabled or are valid for shutdown.
Brett Smith [Wed, 6 Apr 2016 16:13:06 +0000 (12:13 -0400)]
Merge branch '8810-crunch-improve-docker-loading-wip'
Closes #8810, #8888.
Brett Smith [Tue, 5 Apr 2016 20:21:20 +0000 (16:21 -0400)]
8810: crunch-job reports errors when checking if Docker image is loaded.
Since the check was previously in an `if !` condition, errors in it
would cause us to enter the branch.
Brett Smith [Wed, 6 Apr 2016 15:37:12 +0000 (11:37 -0400)]
Merge branch '8893-crunch-job-crunchrunner-quoting-wip'
Closes #8893, #8895.
Brett Smith [Wed, 6 Apr 2016 14:32:03 +0000 (10:32 -0400)]
8893: crunch-job doesn't pass empty strings to `docker run`.
We solve this issue by requiring $VOLUME_CRUNCHRUNNER and
$VOLUME_CERTS to contain their own quoting. Because of that, we clear
their values first, to make sure we don't inherit values that might
break the `docker run` invocation.
Nico Cesar [Tue, 5 Apr 2016 18:15:32 +0000 (14:15 -0400)]
Merge branch '8712-fuse-cache-reload-bug'
closes #8712
Peter Amstutz [Tue, 5 Apr 2016 17:22:06 +0000 (13:22 -0400)]
8712: Propagate return value of clear() from super method. Test cache clearing
collections with subdirs.
radhika [Tue, 5 Apr 2016 16:00:19 +0000 (12:00 -0400)]
7658: update EventClient.on_closed to retry on connect errors.
radhika [Mon, 4 Apr 2016 19:41:56 +0000 (15:41 -0400)]
7658: add reconnect logic when a websocket is closed unexpectedly.
Peter Amstutz [Mon, 4 Apr 2016 19:40:33 +0000 (15:40 -0400)]
8712: Set self.collection = None when clearing the contents of a
CollectionDirectory, so that it gets properly reloaded on update().
Peter Amstutz [Mon, 4 Apr 2016 18:59:10 +0000 (14:59 -0400)]
8712: Test case that reproduces cache-spill bug.
Brett Smith [Fri, 1 Apr 2016 19:50:01 +0000 (15:50 -0400)]
Merge branch '8811-srun-sync-tempfail-wip'
Closes #8811, #8862.
Brett Smith [Thu, 31 Mar 2016 21:46:51 +0000 (17:46 -0400)]
8811: crunch-job srun_sync detects and reports SLURM tempfails.
preprocess_stderr needed updating to check for these tempfails even in
cases where the child process does not have a slotindex.
Peter Amstutz [Fri, 1 Apr 2016 19:46:37 +0000 (15:46 -0400)]
Merge branch '8816-compute-node-update-exception' close #8816
Peter Amstutz [Fri, 1 Apr 2016 19:35:08 +0000 (15:35 -0400)]
8816: Use is_cloud_exception to determine if exception is a "cloud error". Add
test that exceptions don't crash ComputeNodeUpdateActor.
Ward Vandewege [Fri, 1 Apr 2016 19:16:49 +0000 (15:16 -0400)]
Fix package building by pinning docker-py to version 1.7.2
No issue #
Brett Smith [Fri, 1 Apr 2016 18:47:19 +0000 (14:47 -0400)]
Merge branch '8782-reapchildren-after-signal-wip'
Closes #8782, #8860, #8870.
Brett Smith [Fri, 1 Apr 2016 18:37:34 +0000 (14:37 -0400)]
8782: Remove WIFEXITED check from crunch-job reapchildren.
The intent of this check was to avoid reaping children that got
SIGSTOP. But from the waitpid(2) man page, you must pass specific
flags for waitpid to return those children. Without those flags,
waitpid will only return the pids of children that have terminated.
Meanwhile, WIFEXITED only returns true if the exit code indicates that
the child terminated normally. It returns false if the child was
killed by a signal like SIGINT or SIGKILL. This means children so
killed were not reaped by reapchildren, leading to infinite loops.
Peter Amstutz [Fri, 1 Apr 2016 18:32:14 +0000 (14:32 -0400)]
Merge branch '8857-cwl-job-reuse' closes #8857
Peter Amstutz [Fri, 1 Apr 2016 17:28:03 +0000 (13:28 -0400)]
8816: Handle cloud errors slightly differently from unrecognized errors.
Peter Amstutz [Fri, 1 Apr 2016 17:11:51 +0000 (13:11 -0400)]
8816: ComputeNodeUpdateActor._throttle_errors logs errors instead of re-throwing them.
Ward Vandewege [Fri, 1 Apr 2016 14:59:46 +0000 (10:59 -0400)]
A few more fixes for run-build-packages-python-and-ruby.sh, and a small
safeguard for run-build-packages.sh.
refs #8864
Peter Amstutz [Fri, 1 Apr 2016 14:01:39 +0000 (10:01 -0400)]
Merge branch 'master' into 8857-cwl-job-reuse
Conflicts:
sdk/cwl/arvados_cwl/__init__.py
Ward Vandewege [Fri, 1 Apr 2016 01:38:50 +0000 (21:38 -0400)]
Add build/run-build-packages-python-and-ruby.sh script to handle upload
to pypi and rubygems.
refs #8864
Peter Amstutz [Thu, 31 Mar 2016 22:29:45 +0000 (18:29 -0400)]
Merge branch '8828-which-crunchrunner' closes #8828
Peter Amstutz [Thu, 31 Mar 2016 21:25:05 +0000 (17:25 -0400)]
8828: Fix bind mount point for certificates.
Peter Amstutz [Thu, 31 Mar 2016 19:58:47 +0000 (15:58 -0400)]
8828: Move logic for checking $(which crunchrunner) into script that runs before invoking Docker on the compute node.
Ward Vandewege [Thu, 31 Mar 2016 17:49:54 +0000 (13:49 -0400)]
Build newer cwltool version.
No issue #
Peter Amstutz [Thu, 31 Mar 2016 15:54:40 +0000 (11:54 -0400)]
Merge branch '8840-lock-job-record' closes #8840