Tom Clegg [Thu, 26 May 2016 19:49:49 +0000 (15:49 -0400)]
9272: Fix some race conditions in flaky tests.
Peter Amstutz [Fri, 27 May 2016 15:25:22 +0000 (11:25 -0400)]
Arvbox installs binaries for go 1.6 instead of golang Debian package
(which is stuck at 1.3) no issue #
Ward Vandewege [Thu, 26 May 2016 15:37:06 +0000 (11:37 -0400)]
Package ruamel.yaml, which is a new dependency of schema-salad.
No issue #
Peter Amstutz [Thu, 26 May 2016 14:09:04 +0000 (10:09 -0400)]
Merge branch '9303-actor-dead-dead' refs #9303
Peter Amstutz [Thu, 26 May 2016 13:51:24 +0000 (09:51 -0400)]
9303: Fetch arv_node before trying to shut down node, because monitor actor may
go away once the node has been successfully shut down. Also handle case of
node_finished_shutdown called after shutdown actor is stopped.
Peter Amstutz [Wed, 25 May 2016 20:33:49 +0000 (16:33 -0400)]
Log watchdog exception refs #9303
Peter Amstutz [Wed, 25 May 2016 20:30:10 +0000 (16:30 -0400)]
Merge branch '9303-kill-nodemanager-on-dead-actor' refs #9303
Peter Amstutz [Wed, 25 May 2016 20:29:30 +0000 (16:29 -0400)]
9303: Watchdog kill node manager on any error
Ward Vandewege [Tue, 24 May 2016 19:41:08 +0000 (15:41 -0400)]
Build distribution packages for the version of python-schema-salad that
python-cwltool now depends on.
refs #8653
Ward Vandewege [Tue, 24 May 2016 17:10:53 +0000 (13:10 -0400)]
Build distribution packages for the version of python-cwltool that
python-arvados-cwl-runner now depends on.
refs #8653
radhika [Tue, 24 May 2016 16:41:17 +0000 (12:41 -0400)]
closes #8556
Merge branch '8556-azure-trash'
radhika [Tue, 24 May 2016 16:39:28 +0000 (12:39 -0400)]
8556: implement trash/untrash for azure volumes.
Tom Clegg [Fri, 20 May 2016 14:01:55 +0000 (10:01 -0400)]
Merge branch 'wtsi-hgi-9231-rename-redunancy-to-replication-desired'
closes #9231
Peter Amstutz [Thu, 19 May 2016 20:02:17 +0000 (16:02 -0400)]
Merge branch '8653-cwl-runner-handle-files' closes #8653
Peter Amstutz [Thu, 19 May 2016 19:48:46 +0000 (15:48 -0400)]
8653: Fix tests.
Peter Amstutz [Thu, 19 May 2016 17:50:47 +0000 (13:50 -0400)]
8653: Use load_tool.fetch_document() instead of Loader() to read raw document.
Peter Amstutz [Thu, 19 May 2016 15:32:00 +0000 (11:32 -0400)]
8653: add cwlVersion so file validate correctly.
Peter Amstutz [Thu, 19 May 2016 02:08:48 +0000 (22:08 -0400)]
8653: Fix pathmapper API
Peter Amstutz [Thu, 19 May 2016 02:06:19 +0000 (22:06 -0400)]
8653: Set basedir for CollectionFsAccess
Peter Amstutz [Thu, 19 May 2016 02:01:15 +0000 (22:01 -0400)]
8653: Update load_tool in cwl-runner crunch script
Joshua C. Randall [Wed, 18 May 2016 13:35:37 +0000 (14:35 +0100)]
Renames 'redundancy' to 'replication_desired'
Peter Amstutz [Wed, 18 May 2016 21:42:44 +0000 (17:42 -0400)]
8653: Check that parameters are basestring before matching regex.
Peter Amstutz [Wed, 18 May 2016 20:40:48 +0000 (16:40 -0400)]
8653: Update cwl-runner to match changes in sdk/arvados-cwl-runner
Peter Amstutz [Wed, 18 May 2016 20:33:57 +0000 (16:33 -0400)]
8653: cwl-runner crunch script rewrites keep file paths into CWL File objects.
Clean up argument handling in arvados-cwl-runner so that --create-template
doesn't require a job object, and that --help doesn't present options that are
irrelevant or don't work.
Peter Amstutz [Wed, 18 May 2016 15:00:59 +0000 (11:00 -0400)]
Merge branch '9018-nodemanager-kill-instead-of-killpg' closes #9018
Peter Amstutz [Wed, 18 May 2016 14:59:03 +0000 (10:59 -0400)]
9018: Change os.killpg() -> os.kill, don't create new process group.
Peter Amstutz [Wed, 18 May 2016 13:27:15 +0000 (09:27 -0400)]
Merge branch '8236-nodemanager-watchdog' closes #8236
Peter Amstutz [Tue, 17 May 2016 20:59:20 +0000 (16:59 -0400)]
8236: Restore os.killpg(). Create a new process group so that it won't kill
the parent process by accident. Watchdog process now only monitors specific
actors.
Brett Smith [Tue, 17 May 2016 20:22:50 +0000 (16:22 -0400)]
Merge branch '9049-arv-copy-filters-wip'
Closes #9049, #9225.
Brett Smith [Tue, 17 May 2016 16:38:39 +0000 (12:38 -0400)]
9049: arv-copy checks and updates pipeline template filters.
Peter Amstutz [Tue, 17 May 2016 15:44:05 +0000 (11:44 -0400)]
8236: Add comment to BogusActor.ping()
Peter Amstutz [Tue, 17 May 2016 15:16:47 +0000 (11:16 -0400)]
Merge branch 'master' into 8236-nodemanager-watchdog
Peter Amstutz [Tue, 17 May 2016 15:15:53 +0000 (11:15 -0400)]
Merge branch '9161-node-state-fixes' closes #9161
Peter Amstutz [Tue, 17 May 2016 15:15:16 +0000 (11:15 -0400)]
8236: Add watchdog actor. This calls ping() on every other actor to check that
it is responsive. If an actor fails to respond, kill node manager.
Peter Amstutz [Tue, 17 May 2016 13:18:06 +0000 (09:18 -0400)]
9161: Remove unused "paired()" function
Peter Amstutz [Tue, 17 May 2016 12:50:51 +0000 (08:50 -0400)]
Merge branch 'master' into 9161-node-state-fixes
Ward Vandewege [Tue, 17 May 2016 01:05:37 +0000 (21:05 -0400)]
Remove hardcoded -v in call to run_upload_packages.py
refs #9224
Ward Vandewege [Mon, 16 May 2016 21:33:44 +0000 (17:33 -0400)]
When running run-build-packages-python-and-ruby.sh with --debug, pass
--verbose to the upload command.
refs #9224
Peter Amstutz [Mon, 16 May 2016 20:36:41 +0000 (16:36 -0400)]
9161: Remove spurious prints
Peter Amstutz [Mon, 16 May 2016 18:30:06 +0000 (14:30 -0400)]
9161: Don't automatically consider nodes with job_uuid set to be 'busy'.
Peter Amstutz [Mon, 16 May 2016 14:29:50 +0000 (10:29 -0400)]
9161: Decisions to start and stop compute nodes are now based on an explicit
set of states: booting, unpaired, idle, busy, down, shutdown. Refactor to
remove 'shutdowns' dict and fold into cloud_nodes. Nodes_wanted uses same
computation of node state as used for decision to shut down nodes. Nodes for
which the state is unclear are either idle (if in the boot grace period) or
down (if older).
Peter Amstutz [Fri, 13 May 2016 20:36:02 +0000 (16:36 -0400)]
9161: Put nodes tagged _nodemanager_recently_booted nodes back into the node list.
Peter Amstutz [Fri, 13 May 2016 20:09:10 +0000 (16:09 -0400)]
9161: Add _nodemanager_recently_booted as new way of remembering nodes which are in intermediate state between being created and showing up in the cloud node list.
Tom Clegg [Fri, 13 May 2016 19:38:51 +0000 (15:38 -0400)]
Accept auth tokens with uppercase letters.
No issue #
Peter Amstutz [Fri, 13 May 2016 18:26:30 +0000 (14:26 -0400)]
9161: Adjusting behavior to accomodate down/broken/missing nodes.
Brett Smith [Fri, 13 May 2016 15:25:36 +0000 (11:25 -0400)]
Merge branch '9213-fix-arv-gems-wip'
Closes #9213, #9215.
Brett Smith [Thu, 12 May 2016 20:48:43 +0000 (16:48 -0400)]
9213: Update arv's `gem install` suggestions.
This makes it match what it actually loads.
Brett Smith [Thu, 12 May 2016 20:40:37 +0000 (16:40 -0400)]
9213: Improve gem loading in `arv`.
* Include the exception string in the error message.
* Separate stdlib loading problems from gem loading problems.
* Load gems with more dependencies first, to avoid situations like
this:
irb(main):001:0> require 'active_support/inflector'
=> true
irb(main):002:0> require 'arvados/google_api_client'
Gem::LoadError: Unable to activate arvados-0.1.
20160420143004, because activesupport-4.2.6 conflicts with activesupport (< 4.2.6, >= 3)
Brett Smith [Thu, 12 May 2016 20:37:59 +0000 (16:37 -0400)]
9213: Fix google-api-client dependency range in gemspecs.
Brett Smith [Fri, 13 May 2016 14:55:31 +0000 (10:55 -0400)]
Merge branch '9135-eventclient-run-forever-wip'
Closes #9135, #9157.
Brett Smith [Mon, 9 May 2016 16:54:23 +0000 (12:54 -0400)]
9135: Bring EventClient's public interface closer to PollClient's.
* Restore the run_forever method, which was previously inherited from
WebSocketClient.
* Remove the connect and close_connection methods, which are
WebSocketClient implementation details that don't make sense as part
of the public interface. (A running EventClient will just reconnect
if you call close_connection on it.)
Brett Smith [Mon, 9 May 2016 16:57:42 +0000 (12:57 -0400)]
9135: Make EventClient initialization more consistent.
* DRY up the setup code. This includes always trying to close the
conenction after failure, since we were doing that in the initial
connection.
* Make the client a daemon thread, for consistency with PollClient.
Brett Smith [Mon, 9 May 2016 16:40:10 +0000 (12:40 -0400)]
9135: Clean imports in test_events.
Brett Smith [Mon, 9 May 2016 16:18:28 +0000 (12:18 -0400)]
9135: Add basic tests for Python events listeners.
These ensure that both classes have the core methods subscribe,
unsubscribe, run_forever, and close.
Rename the test file to test_events, to better match other test
patterns, and account for the fact it tests both classes in the
module.
Peter Amstutz [Fri, 13 May 2016 14:11:39 +0000 (10:11 -0400)]
9161: Eliminate 'booted' list and put nodes directly into cloud_nodes list.
Refactor logic for registering cloud nodes. Refactor computation of nodes
wanted; explicitly model 'unpaired' and 'down'.
Tom Clegg [Fri, 13 May 2016 13:37:49 +0000 (09:37 -0400)]
9188: Update SetBlobMetadata func signature.
refs #9188
Tom Clegg [Thu, 12 May 2016 15:34:51 +0000 (11:34 -0400)]
Merge branch '8128-crunch2-auth-api'
closes #8128
Tom Clegg [Thu, 12 May 2016 14:23:31 +0000 (10:23 -0400)]
8128: Fix test race.
Tom Clegg [Thu, 12 May 2016 13:08:14 +0000 (09:08 -0400)]
8128: Fix flaky test: pipe the "echo UUID" script to sh, not to "echo UUID".
Tom Clegg [Wed, 11 May 2016 15:01:28 +0000 (11:01 -0400)]
8128: Use row lock during Container update, add comments.
Tom Clegg [Tue, 10 May 2016 14:45:05 +0000 (10:45 -0400)]
8128: Add arvados.v1.api_client_authorizations.current
Tom Clegg [Mon, 9 May 2016 19:33:05 +0000 (15:33 -0400)]
8128: Add runtime tokens for containers, and locks for multiple dispatchers
Tom Clegg [Thu, 5 May 2016 21:50:44 +0000 (17:50 -0400)]
8128: Update crunch-dispatch-local to use new Locked state.
Tom Clegg [Thu, 5 May 2016 21:15:51 +0000 (17:15 -0400)]
8128: Update crunch-dispatch-slurm to use new Locked state.
Tom Clegg [Thu, 5 May 2016 19:46:20 +0000 (15:46 -0400)]
8128: Add Locked state to Container model.
Tom Clegg [Thu, 28 Apr 2016 15:16:50 +0000 (11:16 -0400)]
8128: De-dup container unit tests
Peter Amstutz [Wed, 11 May 2016 20:55:00 +0000 (16:55 -0400)]
9161: There's a window between when a node pings for the first time and the
value of 'slurm_state' is synchronized by crunch-dispatch. In this window, the
node will still report as 'down'. Check first_ping_at and implement a grace
period where the node should will be considered 'idle'.
Peter Amstutz [Wed, 11 May 2016 15:42:44 +0000 (11:42 -0400)]
Merge branch '8886-async-permission-update' refs #8886
Peter Amstutz [Wed, 11 May 2016 13:50:23 +0000 (09:50 -0400)]
8886: Restore behavior in group_permissions to call
calculate_group_permissions when cache is empty and async_permissions_update is
not true.
radhika [Tue, 10 May 2016 17:35:24 +0000 (13:35 -0400)]
closes #8017
Merge branch '8017-slurm-runtime-constraints'
radhika [Tue, 10 May 2016 17:34:34 +0000 (13:34 -0400)]
8017: RuntimeConstraints uses int64
radhika [Tue, 10 May 2016 17:23:28 +0000 (13:23 -0400)]
closes #8464
Merge branch '8464-crunch2-stdout'
radhika [Tue, 10 May 2016 15:59:16 +0000 (11:59 -0400)]
8017: RuntimeConstraints uses int64
radhika [Tue, 10 May 2016 15:28:54 +0000 (11:28 -0400)]
Merge branch '8017-slurm-runtime-constraints' of git.curoverse.com:arvados into 8017-slurm-runtime-constraints
radhika [Tue, 10 May 2016 15:25:45 +0000 (11:25 -0400)]
8464: stdout handling
Peter Amstutz [Mon, 9 May 2016 20:40:58 +0000 (16:40 -0400)]
8886: Add timestamp checking to permission updates.
radhika [Mon, 9 May 2016 16:08:15 +0000 (12:08 -0400)]
8017: mem-per-cpu
radhika [Tue, 3 May 2016 19:09:54 +0000 (15:09 -0400)]
8017: pass ram and vcpus runtime_constraints from Container to sbatch command.
radhika [Mon, 9 May 2016 16:08:15 +0000 (12:08 -0400)]
8017: mem-per-cpu
radhika [Mon, 9 May 2016 14:34:47 +0000 (10:34 -0400)]
Merge branch 'master' into 8017-slurm-runtime-constraints
radhika [Thu, 5 May 2016 21:51:47 +0000 (17:51 -0400)]
8464: Add stdout redirection in crunch2.
Tom Clegg [Thu, 5 May 2016 14:54:58 +0000 (10:54 -0400)]
Merge branch '9017-apiserver-short-tests'
refs #9017
Tom Clegg [Thu, 5 May 2016 14:09:07 +0000 (10:09 -0400)]
9017: Skip some slow API server tests in --short mode.
Tom Clegg [Wed, 4 May 2016 20:20:29 +0000 (16:20 -0400)]
Update API server and Workbench bundles to latest arvados gems.
No issue #
Peter Amstutz [Wed, 4 May 2016 18:55:03 +0000 (14:55 -0400)]
8886: Experimental asynchronous permissions update.
Add configuration parameter 'async_permissions_update' (default false). If
true, do not delete permission cache in #invalidate_permissions_cache, but
instead trigger "NOTIFY invalidate_permissions_cache" on the database.
Add script/permission-updater.rb which runs as an independent process. It
blocks on "LISTEN invalidate_permissions_cache" and updates the permission
cache whenever notified.
This is not ready for use; in particular it creates a race condition
recomputing permissions with effects such as not being able to read back API
records that were just created.
Tom Clegg [Wed, 4 May 2016 18:53:19 +0000 (14:53 -0400)]
Fix compatibility with latest azure-sdk-for-go.
No issue #
Tom Clegg [Wed, 4 May 2016 17:37:41 +0000 (13:37 -0400)]
Merge branch '9068-drop-abandoned-conns'
closes #9068
Tom Clegg [Wed, 4 May 2016 14:16:32 +0000 (10:16 -0400)]
9068: Fix inconsistent receiver names.
Tom Clegg [Fri, 29 Apr 2016 16:57:09 +0000 (12:57 -0400)]
9068: Do not use coverage tools when using non-default test flags ({gostuff}_test=...)
Tom Clegg [Fri, 29 Apr 2016 16:55:24 +0000 (12:55 -0400)]
9068: Move buffer allocation from volumes to GetBlockHandler.
This makes the Volume interface more idiomatic: Get() accepts a buffer
to read into, and returns a number of bytes read, much like the Read()
method of an io.Reader.
It also makes it possible for GetBlockHandler to notice, while waiting
for a buffer, that the client has disconnected: In this case, it
releases the network socket and never asks any volumes to do any work.
Tom Clegg [Fri, 29 Apr 2016 14:02:39 +0000 (10:02 -0400)]
9068: Drop PUT requests if the client disconnects before we get a buffer.
Tom Clegg [Tue, 3 May 2016 20:42:00 +0000 (16:42 -0400)]
Relax arvados-cli gem dependency version constraints in order to be
compatible with the latest arvados gem.
No issue #
Brett Smith [Tue, 3 May 2016 19:22:33 +0000 (15:22 -0400)]
Merge branch '9120-node-manager-search-ex-methods-wip'
Closes #9120, #9124.
Brett Smith [Mon, 2 May 2016 21:06:09 +0000 (17:06 -0400)]
9120: search_for_now falls back to real driver methods when needed.
This fixes a regression introduced in
32eb510594.
Brett Smith [Mon, 2 May 2016 20:59:21 +0000 (16:59 -0400)]
9120: Add tests for BaseComputeNodeDriver's search_for methods.
Brett Smith [Tue, 3 May 2016 19:22:00 +0000 (15:22 -0400)]
Merge branch '9118-arv-put-nameerror-fix-wip'
Closes #9118, #9127.
Brett Smith [Mon, 2 May 2016 21:47:45 +0000 (17:47 -0400)]
9118: Fix arv-put crash when finishing without output.
radhika [Tue, 3 May 2016 19:09:54 +0000 (15:09 -0400)]
8017: pass ram and vcpus runtime_constraints from Container to sbatch command.
Tom Clegg [Tue, 3 May 2016 16:48:01 +0000 (12:48 -0400)]
Merge branch '9119-oj-load-strict'
refs #9119
Peter Amstutz [Tue, 3 May 2016 15:04:24 +0000 (11:04 -0400)]
9119: Use Oj strict mode for decoding JSON.