radhika [Thu, 21 Apr 2016 18:34:20 +0000 (14:34 -0400)]
closes #8936
Merge branch '8936-ttl-in-signing-key'
Peter Amstutz [Thu, 21 Apr 2016 18:02:54 +0000 (14:02 -0400)]
Fix race conditions in test_node_undrained_when_shutdown_cancelled
and test_boot_new_node_when_all_nodes_busy. refs #8953
radhika [Thu, 21 Apr 2016 17:23:32 +0000 (13:23 -0400)]
8936: updated blob_test.rb to continue to use the default blob_signature_ttl.
radhika [Thu, 21 Apr 2016 17:04:59 +0000 (13:04 -0400)]
8936: update go tests to use a blob-signature-ttl different than 1s.
Peter Amstutz [Thu, 21 Apr 2016 15:03:19 +0000 (11:03 -0400)]
Pin bump cwltool dependency and pin version so it doesn't break again due to
external changes. no issue #
radhika [Thu, 21 Apr 2016 15:00:11 +0000 (11:00 -0400)]
8936: update the blob_test to use a specific blob_signature_ttl to ensure consistent results.
radhika [Thu, 21 Apr 2016 13:52:50 +0000 (09:52 -0400)]
8936: update comment on keepstore and go fmt
radhika [Thu, 21 Apr 2016 13:32:11 +0000 (09:32 -0400)]
8936: address review comments
radhika [Thu, 21 Apr 2016 11:23:00 +0000 (07:23 -0400)]
Merge branch '8936-ttl-in-signing-key-TC' into 8936-ttl-in-signing-key
Peter Amstutz [Wed, 20 Apr 2016 20:19:20 +0000 (16:19 -0400)]
Arvbox run websockets in separate puma server instead of in API server process.
no issue #
Peter Amstutz [Wed, 20 Apr 2016 18:26:15 +0000 (14:26 -0400)]
Don't shut down if state is ('down', 'closed', 'boot wait', *) refs #8953
Tom Clegg [Wed, 20 Apr 2016 14:30:04 +0000 (10:30 -0400)]
Merge branch '8697-ruby187-compat'
refs #8697
refs #8689
Nico Cesar [Wed, 20 Apr 2016 13:58:57 +0000 (09:58 -0400)]
Merge branch '9014-keep-block-check-package'
closes #9014
Nico Cesar [Wed, 20 Apr 2016 13:28:55 +0000 (09:28 -0400)]
adding new package for block checks
refs #9014
Peter Amstutz [Wed, 20 Apr 2016 13:10:33 +0000 (09:10 -0400)]
Merge branch '8953-no-double-count' refs #8953
Tom Clegg [Tue, 19 Apr 2016 20:57:41 +0000 (16:57 -0400)]
6833: Fix excessive debug logging in TokenExpiryTest and subsequent tests.
Excessive logging was introduced (seemingly unintentionally) in
d3313e65.
refs #6833
Peter Amstutz [Sat, 16 Apr 2016 02:48:13 +0000 (22:48 -0400)]
Don't double-count nodes that are shutting down. refs #8953
Tom Clegg [Tue, 19 Apr 2016 18:44:45 +0000 (14:44 -0400)]
Merge branch '9009-keep-web-close-conns'
closes #9009
Tom Clegg [Tue, 19 Apr 2016 15:43:09 +0000 (11:43 -0400)]
9009: Fix missing Close() in collectionreader.
Tom Clegg [Tue, 19 Apr 2016 15:17:07 +0000 (11:17 -0400)]
Merge branch '9004-close-keep-connections'
refs #9004
refs #9005
Tom Clegg [Tue, 19 Apr 2016 15:17:00 +0000 (11:17 -0400)]
Change Check to Assert to avoid crash after failure. No issue #
Tom Clegg [Tue, 19 Apr 2016 14:18:51 +0000 (10:18 -0400)]
9005: Workaround: Close idle connections aggressively.
Currently, the SDK code never reuses connections anyway, so it's best
to shut them down right away.
Tom Clegg [Mon, 18 Apr 2016 20:23:01 +0000 (16:23 -0400)]
8936: Warn about disruptive effect of modifying blob_signature_ttl and blob_signing_key.
radhika [Mon, 18 Apr 2016 15:44:16 +0000 (11:44 -0400)]
8936: update blob-signing-ttl related documentation.
radhika [Sat, 16 Apr 2016 22:16:46 +0000 (18:16 -0400)]
8936: consider blobSigningTtl while generating and verifying signatures.
Peter Amstutz [Sat, 16 Apr 2016 02:31:21 +0000 (22:31 -0400)]
Don't issue drain when shutdown has been cancelled. refs #8953
Peter Amstutz [Sat, 16 Apr 2016 02:20:54 +0000 (22:20 -0400)]
Don't try to drain node if no nodeename associated. refs #8953
Peter Amstutz [Fri, 15 Apr 2016 20:18:32 +0000 (16:18 -0400)]
Merge branch '8953-node-manager-FSM' closes #8953
Peter Amstutz [Fri, 15 Apr 2016 19:50:06 +0000 (15:50 -0400)]
8953: Don't start shutdown on 'drng*' or 'alloc*'.
Peter Amstutz [Fri, 15 Apr 2016 18:06:29 +0000 (14:06 -0400)]
8953: Assign to tuple (eligible, reason)
Peter Amstutz [Fri, 15 Apr 2016 15:29:50 +0000 (11:29 -0400)]
8953: shutdown_eligible() returns a tuple. Report reason for shutdown decision.
Peter Amstutz [Fri, 15 Apr 2016 15:13:12 +0000 (11:13 -0400)]
8953: Clarify how to use return value from consulting transitions table and shutdown_eligible().
Peter Amstutz [Fri, 15 Apr 2016 14:37:34 +0000 (10:37 -0400)]
8953: Fix indentation of shutdown_eligible().
Peter Amstutz [Fri, 15 Apr 2016 14:32:16 +0000 (10:32 -0400)]
8953: Add missing transitions.py
Peter Amstutz [Fri, 15 Apr 2016 14:26:46 +0000 (10:26 -0400)]
Fixup test_node_undrained_when_shutdown_cancelled and test_alloc_node_undrained_when_shutdown_cancelled.
Peter Amstutz [Fri, 15 Apr 2016 12:10:13 +0000 (08:10 -0400)]
8953: Tests pass, with some removed due to removal of the corresponding behavior.
Peter Amstutz [Thu, 14 Apr 2016 20:02:16 +0000 (16:02 -0400)]
8953: Node manager shutdown policy change WIP. Still fixing tests.
Peter Amstutz [Thu, 14 Apr 2016 13:57:41 +0000 (09:57 -0400)]
Add /var/lib/gopath and /var/lib/pip permissions fixup to Arvbox createusers.sh
to fix demo build, no issue #
Tom Clegg [Wed, 13 Apr 2016 14:57:09 +0000 (10:57 -0400)]
8697: Fix Locator.parse() (was failing on locators with hints).
Tom Clegg [Wed, 13 Apr 2016 14:24:14 +0000 (10:24 -0400)]
8697: ruby 1.8.7 compatibility in arvados/collection.
Tom Clegg [Tue, 22 Mar 2016 21:29:29 +0000 (17:29 -0400)]
8697: Move error messages from stdout to stderr.
Tom Clegg [Tue, 22 Mar 2016 21:28:39 +0000 (17:28 -0400)]
8697: Relax version constraints so gem can be used in ruby187/ree projects.
Nico Cesar [Wed, 13 Apr 2016 17:30:35 +0000 (13:30 -0400)]
Merge branch '8959-python-gflags-dependencies'
Nico Cesar [Wed, 13 Apr 2016 17:29:08 +0000 (13:29 -0400)]
Revert "I <3 pythong-gflags"
This reverts commit
e935e107a4ac6250ae64878262c3145d7a62b8e8.
Nico Cesar [Wed, 13 Apr 2016 15:43:39 +0000 (11:43 -0400)]
Merge branch '8959-python-gflags-dependencies'
Nico Cesar [Wed, 13 Apr 2016 15:41:16 +0000 (11:41 -0400)]
I <3 pythong-gflags
closes #8959
Nico Cesar [Wed, 13 Apr 2016 14:42:17 +0000 (10:42 -0400)]
8959: pinning out the version of python-gflags
refs #8959
Brett Smith [Tue, 12 Apr 2016 20:20:39 +0000 (16:20 -0400)]
Merge branch '8893-crunch-job-volumes-array-wip'
Closes #8893, #8921.
Brett Smith [Fri, 8 Apr 2016 14:56:34 +0000 (10:56 -0400)]
8893: Safer quoting of crunch-job's conditional volume switches.
Packing arguments into an array allows us to both have a variable
number of switches, with correct word splitting, even when the
indivdiual arguments in the array have whitespace.
Peter Amstutz [Tue, 12 Apr 2016 19:35:10 +0000 (15:35 -0400)]
Propagate designated stdout stream from keepdocker.main() to put.main().
no issue #
radhika [Tue, 12 Apr 2016 19:30:53 +0000 (15:30 -0400)]
closes #8724
Merge branch '8724-keep-block-check-script'
radhika [Tue, 12 Apr 2016 19:29:58 +0000 (15:29 -0400)]
Merge branch 'master' into 8724-keep-block-check-script
Peter Amstutz [Tue, 12 Apr 2016 17:45:08 +0000 (13:45 -0400)]
Rename "rebuild" back to "reboot" and change "rebuild" to mean "build
--no-cache". "arvbox start" no longer fails if the container is already
running. Update docs. no issue #
Brett Smith [Tue, 12 Apr 2016 15:36:31 +0000 (11:36 -0400)]
4083: crunchstat-summary imports _strptime.
Refs #4083 for rationale.
Refs #8933 where this was reported.
See also
d9014288.
Brett Smith [Tue, 12 Apr 2016 14:51:28 +0000 (10:51 -0400)]
Merge branch '8912-node-manager-patch-nodes-wip'
Closes #8913, #8923. (The branch name has a typo.)
Brett Smith [Fri, 8 Apr 2016 22:56:44 +0000 (18:56 -0400)]
8912: Node Manager search_for_now uses overridden methods.
This wasn't possible in the original implementation because of the way
we used to proxy methods to self.real. Now that we proxy them
transparently, we can call methods on the Node Manager driver, and let
them be proxied to the underlying libcloud driver if needed.
radhika [Tue, 12 Apr 2016 14:17:24 +0000 (10:17 -0400)]
8724: some more cleanup of tests.
Peter Amstutz [Mon, 11 Apr 2016 20:57:17 +0000 (16:57 -0400)]
Add args.ignore_docker_for_reuse=False to cwl-runner crunch script. refs #8857
radhika [Tue, 12 Apr 2016 03:37:22 +0000 (23:37 -0400)]
Merge branch 'master' into 8724-keep-block-check-script
radhika [Tue, 12 Apr 2016 03:36:11 +0000 (23:36 -0400)]
8724: test updates
Peter Amstutz [Mon, 11 Apr 2016 20:22:17 +0000 (16:22 -0400)]
Add --help to test_with_arvbox.sh, no issue #
Peter Amstutz [Mon, 11 Apr 2016 20:19:11 +0000 (16:19 -0400)]
Separate out Go and Python dependencies into separate directories that don't
get deleted by "reset" because they include code from downloading external
dependencies. Add -data-manager-token to keepstore invocation for datamanager
testing.
radhika [Mon, 11 Apr 2016 14:02:33 +0000 (10:02 -0400)]
closes #7658
Merge branch '7658-websockets-reconnect-on-close'
radhika [Mon, 11 Apr 2016 14:02:15 +0000 (10:02 -0400)]
Merge branch 'master' into 7658-websockets-reconnect-on-close
radhika [Mon, 11 Apr 2016 14:00:36 +0000 (10:00 -0400)]
7658: update connect error test to use stream handler to read the log file, instead of using a temp file.
Peter Amstutz [Sun, 10 Apr 2016 02:11:32 +0000 (22:11 -0400)]
Merge branch '8799-make-drained-nodes-idle' closes #8799
Peter Amstutz [Fri, 8 Apr 2016 21:25:25 +0000 (17:25 -0400)]
8799: shutdown_eligible() returns "node is draining" when in drain state. Add comments about iterating over cloud_nodes to check for "down" nodes. Fix tests.
Brett Smith [Fri, 8 Apr 2016 20:31:17 +0000 (16:31 -0400)]
Pin dockercleaner's docker-py requirement to 1.7.2.
Finishes the job started by
8680c874. It only seems to be really
necessary on wheezy (because docker-py or its requirements abandoned
support for Python 3.2), but since we're pinning more as a general
rule, might as well make it universal. Closes #8904, #8922.
Peter Amstutz [Fri, 8 Apr 2016 20:14:59 +0000 (16:14 -0400)]
7658: Clean up & handle subscription filters consistently across EventClient,
_EventClient and PollClient.
radhika [Fri, 8 Apr 2016 15:23:53 +0000 (11:23 -0400)]
7658: improve the log verification in case of unexpected close.
radhika [Fri, 8 Apr 2016 14:32:33 +0000 (10:32 -0400)]
7658: add test that verifies reconnect retry behavior
Brett Smith [Fri, 8 Apr 2016 14:12:22 +0000 (10:12 -0400)]
Merge branch '8904-support-python3.2'
Closes #8904.
Tom Clegg [Thu, 7 Apr 2016 02:26:41 +0000 (22:26 -0400)]
8904: Avoid installing pip >= 8 in a Python 3.2 virtualenv.
Ward Vandewege [Fri, 8 Apr 2016 01:51:11 +0000 (21:51 -0400)]
Package crunchstat-summary.
closes #8911
Brett Smith [Thu, 7 Apr 2016 21:30:44 +0000 (17:30 -0400)]
Merge branch '8872-node-manager-create-search-handling-wip'
Closes #8872, #8900.
Brett Smith [Wed, 6 Apr 2016 18:23:11 +0000 (14:23 -0400)]
8872: Bugfix Node Manager's node search after node create failure.
search_for raises ValueError if the thing isn't found. create_node
seems to be expecting it to return None instead. Bring create_node in
line with search_for's documented API.
In order to get the tests to pass, I had to separate out the raw
search code from the caching, and use that in create_node. Otherwise,
the cloud node from the "node found" test would be cached and returned
in the "node not found" test.
radhika [Thu, 7 Apr 2016 15:21:57 +0000 (11:21 -0400)]
Merge branch 'master' into 7658-websockets-reconnect-on-close
radhika [Thu, 7 Apr 2016 15:17:25 +0000 (11:17 -0400)]
8724: performKeepBlockCheck() returns error when any of the listed blocks are not found.
radhika [Thu, 7 Apr 2016 13:51:06 +0000 (09:51 -0400)]
8724: test assertion improvements
Peter Amstutz [Thu, 7 Apr 2016 02:16:33 +0000 (22:16 -0400)]
Remove over-quoting from crunchrunner and certificate volume mounts. refs #8893
radhika [Wed, 6 Apr 2016 22:30:35 +0000 (18:30 -0400)]
8724: add keep-block-check script
Peter Amstutz [Wed, 6 Apr 2016 19:51:56 +0000 (15:51 -0400)]
8799: Nodes in "drain" state are not automatically eligible for shutdown to
avoid a race between starting a shutdown and resume_node().
Brett Smith [Wed, 6 Apr 2016 19:50:23 +0000 (15:50 -0400)]
Merge branch '8879-cwl-runner-job-owner-wip'
Closes #8879, #8887.
Brett Smith [Tue, 5 Apr 2016 19:45:21 +0000 (15:45 -0400)]
8879: Clean indentation in CWL SDK tests.
Brett Smith [Tue, 5 Apr 2016 19:37:35 +0000 (15:37 -0400)]
8879: cwl-runner --submit respects --project-uuid.
Peter Amstutz [Wed, 6 Apr 2016 15:22:31 +0000 (11:22 -0400)]
8799: Nodes with slurm_state are "down" are checked with sinfo and either reenabled or are valid for shutdown.
Brett Smith [Wed, 6 Apr 2016 16:13:06 +0000 (12:13 -0400)]
Merge branch '8810-crunch-improve-docker-loading-wip'
Closes #8810, #8888.
Brett Smith [Tue, 5 Apr 2016 20:21:20 +0000 (16:21 -0400)]
8810: crunch-job reports errors when checking if Docker image is loaded.
Since the check was previously in an `if !` condition, errors in it
would cause us to enter the branch.
Brett Smith [Wed, 6 Apr 2016 15:37:12 +0000 (11:37 -0400)]
Merge branch '8893-crunch-job-crunchrunner-quoting-wip'
Closes #8893, #8895.
Brett Smith [Wed, 6 Apr 2016 14:32:03 +0000 (10:32 -0400)]
8893: crunch-job doesn't pass empty strings to `docker run`.
We solve this issue by requiring $VOLUME_CRUNCHRUNNER and
$VOLUME_CERTS to contain their own quoting. Because of that, we clear
their values first, to make sure we don't inherit values that might
break the `docker run` invocation.
Nico Cesar [Tue, 5 Apr 2016 18:15:32 +0000 (14:15 -0400)]
Merge branch '8712-fuse-cache-reload-bug'
closes #8712
Peter Amstutz [Tue, 5 Apr 2016 17:22:06 +0000 (13:22 -0400)]
8712: Propagate return value of clear() from super method. Test cache clearing
collections with subdirs.
radhika [Tue, 5 Apr 2016 16:00:19 +0000 (12:00 -0400)]
7658: update EventClient.on_closed to retry on connect errors.
radhika [Mon, 4 Apr 2016 19:41:56 +0000 (15:41 -0400)]
7658: add reconnect logic when a websocket is closed unexpectedly.
Peter Amstutz [Mon, 4 Apr 2016 19:40:33 +0000 (15:40 -0400)]
8712: Set self.collection = None when clearing the contents of a
CollectionDirectory, so that it gets properly reloaded on update().
Peter Amstutz [Mon, 4 Apr 2016 18:59:10 +0000 (14:59 -0400)]
8712: Test case that reproduces cache-spill bug.
Brett Smith [Fri, 1 Apr 2016 19:50:01 +0000 (15:50 -0400)]
Merge branch '8811-srun-sync-tempfail-wip'
Closes #8811, #8862.
Brett Smith [Thu, 31 Mar 2016 21:46:51 +0000 (17:46 -0400)]
8811: crunch-job srun_sync detects and reports SLURM tempfails.
preprocess_stderr needed updating to check for these tempfails even in
cases where the child process does not have a slotindex.
Peter Amstutz [Fri, 1 Apr 2016 19:46:37 +0000 (15:46 -0400)]
Merge branch '8816-compute-node-update-exception' close #8816
Peter Amstutz [Fri, 1 Apr 2016 19:35:08 +0000 (15:35 -0400)]
8816: Use is_cloud_exception to determine if exception is a "cloud error". Add
test that exceptions don't crash ComputeNodeUpdateActor.