Peter Amstutz [Fri, 13 May 2016 18:26:30 +0000 (14:26 -0400)]
9161: Adjusting behavior to accomodate down/broken/missing nodes.
Peter Amstutz [Fri, 13 May 2016 14:11:39 +0000 (10:11 -0400)]
9161: Eliminate 'booted' list and put nodes directly into cloud_nodes list.
Refactor logic for registering cloud nodes. Refactor computation of nodes
wanted; explicitly model 'unpaired' and 'down'.
Peter Amstutz [Wed, 11 May 2016 20:55:00 +0000 (16:55 -0400)]
9161: There's a window between when a node pings for the first time and the
value of 'slurm_state' is synchronized by crunch-dispatch. In this window, the
node will still report as 'down'. Check first_ping_at and implement a grace
period where the node should will be considered 'idle'.
Peter Amstutz [Wed, 11 May 2016 15:42:44 +0000 (11:42 -0400)]
Merge branch '8886-async-permission-update' refs #8886
Peter Amstutz [Wed, 11 May 2016 13:50:23 +0000 (09:50 -0400)]
8886: Restore behavior in group_permissions to call
calculate_group_permissions when cache is empty and async_permissions_update is
not true.
radhika [Tue, 10 May 2016 17:35:24 +0000 (13:35 -0400)]
closes #8017
Merge branch '8017-slurm-runtime-constraints'
radhika [Tue, 10 May 2016 17:34:34 +0000 (13:34 -0400)]
8017: RuntimeConstraints uses int64
radhika [Tue, 10 May 2016 17:23:28 +0000 (13:23 -0400)]
closes #8464
Merge branch '8464-crunch2-stdout'
radhika [Tue, 10 May 2016 15:59:16 +0000 (11:59 -0400)]
8017: RuntimeConstraints uses int64
radhika [Tue, 10 May 2016 15:28:54 +0000 (11:28 -0400)]
Merge branch '8017-slurm-runtime-constraints' of git.curoverse.com:arvados into 8017-slurm-runtime-constraints
radhika [Tue, 10 May 2016 15:25:45 +0000 (11:25 -0400)]
8464: stdout handling
Peter Amstutz [Mon, 9 May 2016 20:40:58 +0000 (16:40 -0400)]
8886: Add timestamp checking to permission updates.
radhika [Mon, 9 May 2016 16:08:15 +0000 (12:08 -0400)]
8017: mem-per-cpu
radhika [Tue, 3 May 2016 19:09:54 +0000 (15:09 -0400)]
8017: pass ram and vcpus runtime_constraints from Container to sbatch command.
radhika [Mon, 9 May 2016 16:08:15 +0000 (12:08 -0400)]
8017: mem-per-cpu
radhika [Mon, 9 May 2016 14:34:47 +0000 (10:34 -0400)]
Merge branch 'master' into 8017-slurm-runtime-constraints
radhika [Thu, 5 May 2016 21:51:47 +0000 (17:51 -0400)]
8464: Add stdout redirection in crunch2.
Tom Clegg [Thu, 5 May 2016 14:54:58 +0000 (10:54 -0400)]
Merge branch '9017-apiserver-short-tests'
refs #9017
Tom Clegg [Thu, 5 May 2016 14:09:07 +0000 (10:09 -0400)]
9017: Skip some slow API server tests in --short mode.
Tom Clegg [Wed, 4 May 2016 20:20:29 +0000 (16:20 -0400)]
Update API server and Workbench bundles to latest arvados gems.
No issue #
Peter Amstutz [Wed, 4 May 2016 18:55:03 +0000 (14:55 -0400)]
8886: Experimental asynchronous permissions update.
Add configuration parameter 'async_permissions_update' (default false). If
true, do not delete permission cache in #invalidate_permissions_cache, but
instead trigger "NOTIFY invalidate_permissions_cache" on the database.
Add script/permission-updater.rb which runs as an independent process. It
blocks on "LISTEN invalidate_permissions_cache" and updates the permission
cache whenever notified.
This is not ready for use; in particular it creates a race condition
recomputing permissions with effects such as not being able to read back API
records that were just created.
Tom Clegg [Wed, 4 May 2016 18:53:19 +0000 (14:53 -0400)]
Fix compatibility with latest azure-sdk-for-go.
No issue #
Tom Clegg [Wed, 4 May 2016 17:37:41 +0000 (13:37 -0400)]
Merge branch '9068-drop-abandoned-conns'
closes #9068
Tom Clegg [Wed, 4 May 2016 14:16:32 +0000 (10:16 -0400)]
9068: Fix inconsistent receiver names.
Tom Clegg [Fri, 29 Apr 2016 16:57:09 +0000 (12:57 -0400)]
9068: Do not use coverage tools when using non-default test flags ({gostuff}_test=...)
Tom Clegg [Fri, 29 Apr 2016 16:55:24 +0000 (12:55 -0400)]
9068: Move buffer allocation from volumes to GetBlockHandler.
This makes the Volume interface more idiomatic: Get() accepts a buffer
to read into, and returns a number of bytes read, much like the Read()
method of an io.Reader.
It also makes it possible for GetBlockHandler to notice, while waiting
for a buffer, that the client has disconnected: In this case, it
releases the network socket and never asks any volumes to do any work.
Tom Clegg [Fri, 29 Apr 2016 14:02:39 +0000 (10:02 -0400)]
9068: Drop PUT requests if the client disconnects before we get a buffer.
Tom Clegg [Tue, 3 May 2016 20:42:00 +0000 (16:42 -0400)]
Relax arvados-cli gem dependency version constraints in order to be
compatible with the latest arvados gem.
No issue #
Brett Smith [Tue, 3 May 2016 19:22:33 +0000 (15:22 -0400)]
Merge branch '9120-node-manager-search-ex-methods-wip'
Closes #9120, #9124.
Brett Smith [Mon, 2 May 2016 21:06:09 +0000 (17:06 -0400)]
9120: search_for_now falls back to real driver methods when needed.
This fixes a regression introduced in
32eb510594.
Brett Smith [Mon, 2 May 2016 20:59:21 +0000 (16:59 -0400)]
9120: Add tests for BaseComputeNodeDriver's search_for methods.
Brett Smith [Tue, 3 May 2016 19:22:00 +0000 (15:22 -0400)]
Merge branch '9118-arv-put-nameerror-fix-wip'
Closes #9118, #9127.
Brett Smith [Mon, 2 May 2016 21:47:45 +0000 (17:47 -0400)]
9118: Fix arv-put crash when finishing without output.
radhika [Tue, 3 May 2016 19:09:54 +0000 (15:09 -0400)]
8017: pass ram and vcpus runtime_constraints from Container to sbatch command.
Tom Clegg [Tue, 3 May 2016 16:48:01 +0000 (12:48 -0400)]
Merge branch '9119-oj-load-strict'
refs #9119
Peter Amstutz [Tue, 3 May 2016 15:04:24 +0000 (11:04 -0400)]
9119: Use Oj strict mode for decoding JSON.
Peter Amstutz [Tue, 3 May 2016 13:23:34 +0000 (09:23 -0400)]
Update version pin for cwltool fpm packages. refs #8653
Peter Amstutz [Tue, 3 May 2016 13:20:46 +0000 (09:20 -0400)]
Merge branch '8998-optimize-decode-www-form-component' closes #8998
Ward Vandewege [Mon, 2 May 2016 21:29:30 +0000 (17:29 -0400)]
Fix inverted test for pypi/gem upload logic. Make upload more verbose.
No issue #
Tom Clegg [Mon, 2 May 2016 21:13:35 +0000 (17:13 -0400)]
Use "grep -xF ... >/dev/null" instead of "grep -qxF ..."
1. -q "Exit immediately with zero status if any match is found, even if
an error was detected." --grep(1)
Depending on buffering and timing, if grep exits early (before
consuming stdin) "docker images" can receive SIGPIPE and exit
non-zero. We use "set -o pipefail" here, so this fails the "docker
load" phase and then the whole job.
2. "Portable shell scripts should avoid both -q and -s and should
redirect standard and error output to /dev/null instead." --grep(1)
No issue #
Tom Clegg [Mon, 2 May 2016 20:27:16 +0000 (16:27 -0400)]
Merge branch '8653-cwl-to-template'
refs #8653
Tom Clegg [Wed, 27 Apr 2016 18:58:11 +0000 (14:58 -0400)]
8653: Add arvados-cwl-runner --create-template flag
Tom Clegg [Thu, 21 Apr 2016 19:15:56 +0000 (15:15 -0400)]
8653: DRY testing code.
Tom Clegg [Thu, 21 Apr 2016 14:55:18 +0000 (10:55 -0400)]
8653: Turn off debug messages / verbose logging in test suite.
Tom Clegg [Thu, 21 Apr 2016 14:54:11 +0000 (10:54 -0400)]
8653: Fix whitespace.
Peter Amstutz [Fri, 29 Apr 2016 13:11:15 +0000 (09:11 -0400)]
8998: Monkey patch URI.decode_www_form_component to validate efficiently.
Rack uses the standard library method URI.decode_www_form_component to process
parameters. This method first validates the string with a regular expression,
and then decodes it using another regular expression. Ruby 2.1 and earlier has
a bug is in the validation; the regular expression that is used generates many
backtracking points, which results in exponential memory growth when matching
large strings. The fix is to monkey-patch the version of the method from Ruby
2.2 which checks that the string is not invalid instead of checking it is
valid.
Peter Amstutz [Mon, 2 May 2016 20:01:26 +0000 (16:01 -0400)]
Fix arvbox only run pipelines with crunch-dispatch0 to avoid submitting duplicated jobs. no issue #
radhika [Mon, 2 May 2016 18:36:02 +0000 (14:36 -0400)]
refs #8937
Merge branch '8937-arv-put-cache-check'
radhika [Wed, 27 Apr 2016 15:56:01 +0000 (11:56 -0400)]
8937: refactor cache check logic into a check_cache method and update all references.
radhika [Mon, 25 Apr 2016 13:36:19 +0000 (09:36 -0400)]
8937: invalidate cache and create new one if there are errors on head request during ResumeCache.
radhika [Thu, 21 Apr 2016 18:22:46 +0000 (14:22 -0400)]
8937: updated arvados_testutil.py to skip setting resp_body to writer when it is a boolean.
radhika [Thu, 21 Apr 2016 14:13:15 +0000 (10:13 -0400)]
8937: Return True for Head requests in KeepClients. The tests in KeepClientRetryHeadTestCase are failing due to this and need to be worked on.
radhika [Wed, 20 Apr 2016 22:28:19 +0000 (18:28 -0400)]
8937: test updates
radhika [Wed, 20 Apr 2016 21:28:31 +0000 (17:28 -0400)]
8937: bypass cache for all head requests.
radhika [Wed, 20 Apr 2016 19:54:03 +0000 (15:54 -0400)]
8937: add head request to python keep client.
Tom Clegg [Mon, 2 May 2016 18:22:47 +0000 (14:22 -0400)]
Read resource object from a file, e.g., arv collection create --collection /tmp/foo.json
No issue #
Tom Clegg [Sat, 30 Apr 2016 17:18:42 +0000 (13:18 -0400)]
Log a banner at the top of each test.
No issue #
Brett Smith [Fri, 29 Apr 2016 21:28:15 +0000 (17:28 -0400)]
Merge branch '8963-arv-copy-link-properties-wip'
Closes #8963, #9110.
Brett Smith [Thu, 28 Apr 2016 21:32:38 +0000 (17:32 -0400)]
8963: arv-keepdocker copies metadata links' properties.
Tom Clegg [Fri, 29 Apr 2016 17:00:11 +0000 (13:00 -0400)]
Merge branch '9066-max-requests'
refs #9066
Tom Clegg [Thu, 28 Apr 2016 13:11:33 +0000 (09:11 -0400)]
9066: Add keepstore -max-requests argument.
Tom Clegg [Tue, 26 Apr 2016 22:07:07 +0000 (18:07 -0400)]
Merge branch '9017-skip-slow-tests'
refs #9017
Tom Clegg [Tue, 26 Apr 2016 22:06:53 +0000 (18:06 -0400)]
9017: Add run-tests.sh --short flag to skip (some) slow tests.
Peter Amstutz [Tue, 26 Apr 2016 17:00:25 +0000 (13:00 -0400)]
Merge branch '8931-event-thread-catch-exceptions' closes #8931
Peter Amstutz [Tue, 26 Apr 2016 16:01:45 +0000 (12:01 -0400)]
8931: Fix indentation mistakes. Fix tests.
Ward Vandewege [Tue, 26 Apr 2016 15:58:53 +0000 (11:58 -0400)]
Fix centos6 packages build.
No issue #
Peter Amstutz [Tue, 26 Apr 2016 13:51:53 +0000 (09:51 -0400)]
8931: Use RetryLoop around websocket reconnect. Create a new _EventClient
object on each loop iteration. Handle unexpected exceptions in PollClient
retry loop.
radhika [Mon, 25 Apr 2016 21:31:12 +0000 (17:31 -0400)]
closes #8936
Merge branch '8936-ttl-in-signing-key'
radhika [Mon, 25 Apr 2016 21:30:50 +0000 (17:30 -0400)]
Merge branch 'master' into 8936-ttl-in-signing-key
Tom Clegg [Mon, 25 Apr 2016 21:15:43 +0000 (17:15 -0400)]
Merge branch '8831-crunchrunner-doc'
closes #8831
Tom Clegg [Mon, 25 Apr 2016 21:15:08 +0000 (17:15 -0400)]
8831: Add crunchrunner to shell node dependencies.
Tom Clegg [Mon, 18 Apr 2016 21:41:50 +0000 (17:41 -0400)]
8831: Add crunchrunner to compute node dependencies.
radhika [Mon, 25 Apr 2016 19:38:24 +0000 (15:38 -0400)]
8936: add test to verify blobSignatureTTL from discovery when it is not provided.
Ward Vandewege [Mon, 25 Apr 2016 15:38:52 +0000 (11:38 -0400)]
Fix building packages: work around pip having a mind of its own with
regard to downloading zip or tar archives.
This requires the addition of the 'unzip' tool to the docker images used
to build packages, so they will need to be rebuilt.
No issue #
radhika [Fri, 22 Apr 2016 22:51:17 +0000 (18:51 -0400)]
8936: update keep-block-check and keep-rsync to properly use blob-signature-ttl to perform the signature verification.
Ward Vandewege [Fri, 22 Apr 2016 01:51:12 +0000 (21:51 -0400)]
Fix a bunch of misspellings in our Go code (all in comments).
Thank you https://goreportcard.com/report/github.com/curoverse/arvados#misspell
No issue #
Peter Amstutz [Thu, 21 Apr 2016 20:11:05 +0000 (16:11 -0400)]
Bump cwltool fpm package version to fix package builds. no issue #
Peter Amstutz [Thu, 21 Apr 2016 19:39:00 +0000 (15:39 -0400)]
8931: Use RetryLoop to retry api calls. Add max_wait to RetryLoop. Add test
for api error retry.
radhika [Thu, 21 Apr 2016 18:34:20 +0000 (14:34 -0400)]
closes #8936
Merge branch '8936-ttl-in-signing-key'
Peter Amstutz [Thu, 21 Apr 2016 18:02:54 +0000 (14:02 -0400)]
Fix race conditions in test_node_undrained_when_shutdown_cancelled
and test_boot_new_node_when_all_nodes_busy. refs #8953
radhika [Thu, 21 Apr 2016 17:23:32 +0000 (13:23 -0400)]
8936: updated blob_test.rb to continue to use the default blob_signature_ttl.
radhika [Thu, 21 Apr 2016 17:04:59 +0000 (13:04 -0400)]
8936: update go tests to use a blob-signature-ttl different than 1s.
Peter Amstutz [Thu, 21 Apr 2016 15:03:19 +0000 (11:03 -0400)]
Pin bump cwltool dependency and pin version so it doesn't break again due to
external changes. no issue #
radhika [Thu, 21 Apr 2016 15:00:11 +0000 (11:00 -0400)]
8936: update the blob_test to use a specific blob_signature_ttl to ensure consistent results.
radhika [Thu, 21 Apr 2016 13:52:50 +0000 (09:52 -0400)]
8936: update comment on keepstore and go fmt
radhika [Thu, 21 Apr 2016 13:32:11 +0000 (09:32 -0400)]
8936: address review comments
radhika [Thu, 21 Apr 2016 11:23:00 +0000 (07:23 -0400)]
Merge branch '8936-ttl-in-signing-key-TC' into 8936-ttl-in-signing-key
Peter Amstutz [Wed, 20 Apr 2016 20:19:20 +0000 (16:19 -0400)]
Arvbox run websockets in separate puma server instead of in API server process.
no issue #
Peter Amstutz [Wed, 20 Apr 2016 18:26:15 +0000 (14:26 -0400)]
Don't shut down if state is ('down', 'closed', 'boot wait', *) refs #8953
Tom Clegg [Wed, 20 Apr 2016 14:30:04 +0000 (10:30 -0400)]
Merge branch '8697-ruby187-compat'
refs #8697
refs #8689
Nico Cesar [Wed, 20 Apr 2016 13:58:57 +0000 (09:58 -0400)]
Merge branch '9014-keep-block-check-package'
closes #9014
Nico Cesar [Wed, 20 Apr 2016 13:28:55 +0000 (09:28 -0400)]
adding new package for block checks
refs #9014
Peter Amstutz [Wed, 20 Apr 2016 13:10:33 +0000 (09:10 -0400)]
Merge branch '8953-no-double-count' refs #8953
Tom Clegg [Tue, 19 Apr 2016 20:57:41 +0000 (16:57 -0400)]
6833: Fix excessive debug logging in TokenExpiryTest and subsequent tests.
Excessive logging was introduced (seemingly unintentionally) in
d3313e65.
refs #6833
Peter Amstutz [Sat, 16 Apr 2016 02:48:13 +0000 (22:48 -0400)]
Don't double-count nodes that are shutting down. refs #8953
Tom Clegg [Tue, 19 Apr 2016 18:44:45 +0000 (14:44 -0400)]
Merge branch '9009-keep-web-close-conns'
closes #9009
Tom Clegg [Tue, 19 Apr 2016 15:43:09 +0000 (11:43 -0400)]
9009: Fix missing Close() in collectionreader.
Tom Clegg [Tue, 19 Apr 2016 15:17:07 +0000 (11:17 -0400)]
Merge branch '9004-close-keep-connections'
refs #9004
refs #9005
Tom Clegg [Tue, 19 Apr 2016 15:17:00 +0000 (11:17 -0400)]
Change Check to Assert to avoid crash after failure. No issue #
Tom Clegg [Tue, 19 Apr 2016 14:18:51 +0000 (10:18 -0400)]
9005: Workaround: Close idle connections aggressively.
Currently, the SDK code never reuses connections anyway, so it's best
to shut them down right away.