Peter Amstutz [Tue, 17 Nov 2015 16:11:20 +0000 (11:11 -0500)]
3137: Add stat counters for bytes uploaded/downloaded (keep) and read/written (fuse).
Brett Smith [Tue, 17 Nov 2015 03:42:31 +0000 (22:42 -0500)]
7313: crunch-job reports an error when a task doesn't record state.
Closes #7313.
Brett Smith [Fri, 13 Nov 2015 14:29:40 +0000 (09:29 -0500)]
Merge branch '7696-pysdk-all-keep-service-types-wip'
Closes #7696, #7758.
Brett Smith [Wed, 11 Nov 2015 22:08:39 +0000 (17:08 -0500)]
7696: Improve PySDK KeepClient.ThreadLimiter.
* Move the calculation of how many threads to allow into the class.
* Teach it to handle cases where max_replicas_per_service is known and
greater than 1. This will never happen today, but is an anticipated
improvement.
* Update docstrings to reflect current reality.
These are all changes I made while debugging the previous race
condition.
Brett Smith [Wed, 11 Nov 2015 21:50:18 +0000 (16:50 -0500)]
7696: PySDK determines max_replicas_per_service after querying services.
Because max_replicas_per_service was set to 1 in the case where
KeepClient was instantiated with no direct information about available
Keep services, and because ThreadLimiter was being instantiated before
querying available Keep services (via map_new_services), the first
Keep request to talk to non-disk services would let multiple threads
run at once. This fixes that race condition, and adds a test that was
triggering it semi-reliably.
Brett Smith [Wed, 11 Nov 2015 17:17:46 +0000 (12:17 -0500)]
7696: PySDK KeepClient uses all service types.
Filter out gateway services from the list of usable services, rather
than selecting only disk and proxy types.
Brett Smith [Wed, 11 Nov 2015 17:18:46 +0000 (12:18 -0500)]
7696: Clean imports in PySDK arvados.keep module.
Brett Smith [Wed, 11 Nov 2015 15:06:51 +0000 (10:06 -0500)]
7696: Refactor locator builder method in PySDK tests.
Brett Smith [Fri, 13 Nov 2015 14:28:12 +0000 (09:28 -0500)]
Merge branch '7123-crunch-no-record-log-failure-wip'
Closes #7123, #7741.
Brett Smith [Mon, 9 Nov 2015 15:28:51 +0000 (10:28 -0500)]
7123: Crunch doesn't update job log when arv-put fails.
This prevents crunch-job from recording the empty collection as a
job's log. Most other components (Workbench, the log clenaer)
recognize a null log as a special case; less so the empty collection.
Brett Smith [Thu, 12 Nov 2015 21:33:48 +0000 (16:33 -0500)]
Merge branch '7645-doc-client-max-body-size-wip'
Closes #7645, #7742. Refs #7356.
Brett Smith [Mon, 9 Nov 2015 17:44:38 +0000 (12:44 -0500)]
7356: Install guide sets client_max_body_size for arv-git-httpd.
Brett Smith [Mon, 9 Nov 2015 17:43:58 +0000 (12:43 -0500)]
7645: Install guide suggests setting client_max_body_size consistently.
Without these changes, the upstream Passenger processes may reject
large request bodies.
Brett Smith [Thu, 12 Nov 2015 21:12:28 +0000 (16:12 -0500)]
Merge branch '6846-workbench-top-nav-login-returns-wip'
Closes #6846, #7739.
Brett Smith [Mon, 9 Nov 2015 17:02:25 +0000 (12:02 -0500)]
6846: Workbench navigation bar login returns user to the same page.
Brett Smith [Thu, 12 Nov 2015 20:31:09 +0000 (15:31 -0500)]
Merge branch '6356-crunch-permfail-task-retry-fix-wip'
Closes #6356, #7738.
Brett Smith [Mon, 9 Nov 2015 13:30:14 +0000 (08:30 -0500)]
6356: crunch-job doesn't create new tasks after job success is set.
#6356 reported that a permanently failed task was retried. Note 3
discusses why this happened and suggests two fixes:
* Only put tempfailed task back on the todo list.
* Run `last THISROUND if $main::please_freeze || defined($main::success);`
after we call reapchildren(), since it's the main place where the
value of $main::success can change.
The first change would revert part of
75be7487c2bbd83aa5116aa5f8ade5ddf31501da, which intentionally puts
these tasks back on the todo list to get a correct tasks count.
The current `last if…` line was added in
b306eb48ab12676ffb365ede8197e4f2d7e92011, with the rationale "Don't
create new tasks if $main::success is defined." This change corrects
the code to implement the desired functionality, by checking and
stopping just before we create a new task (functionally, at least).
Tom Clegg [Thu, 12 Nov 2015 20:00:59 +0000 (15:00 -0500)]
Merge branch '5824-keep-web-workbench' closes #5824
Tom Clegg [Wed, 11 Nov 2015 23:32:50 +0000 (18:32 -0500)]
5824: Fix clear-download-dir helper.
Tom Clegg [Wed, 11 Nov 2015 23:32:23 +0000 (18:32 -0500)]
5824: Fix path and query escapes.
Paths encode spaces as "%20", not "+".
Rails to_query helper does undesirable things like
"disposition[]=attachment".
Tom Clegg [Wed, 11 Nov 2015 23:29:39 +0000 (18:29 -0500)]
5824: Fix -attachment-only-host test config. Test more preview/download variants.
Tom Clegg [Wed, 11 Nov 2015 17:14:16 +0000 (12:14 -0500)]
Merge branch '5824-keep-web-workbench' refs #5824
Tom Clegg [Wed, 11 Nov 2015 17:11:46 +0000 (12:11 -0500)]
5824: Merge branch 'master' into 5824-keep-web-workbench
Conflicts:
services/keepproxy/keepproxy_test.go
radhika [Wed, 11 Nov 2015 16:01:24 +0000 (11:01 -0500)]
closes #7661
Merge branch '7661-fuse-by-pdh'
radhika [Wed, 11 Nov 2015 16:01:02 +0000 (11:01 -0500)]
Merge branch 'master' into 7661-fuse-by-pdh
Tom Clegg [Wed, 11 Nov 2015 01:48:24 +0000 (20:48 -0500)]
5824: Update/clarify docs and comments.
radhika [Tue, 10 Nov 2015 23:41:55 +0000 (18:41 -0500)]
7661: Pass pdh_only when adding by_id subdir; test now passes.
Tom Clegg [Tue, 10 Nov 2015 16:35:03 +0000 (11:35 -0500)]
Merge branch '5538-test-post-retry' refs #5538
Tom Clegg [Tue, 10 Nov 2015 16:33:32 +0000 (11:33 -0500)]
5538: Update comments to match new tests.
radhika [Tue, 10 Nov 2015 15:52:35 +0000 (10:52 -0500)]
7661: added test with only_pdh (not working yet)
Tom Clegg [Tue, 10 Nov 2015 15:10:55 +0000 (10:10 -0500)]
5538: Test that POST method is not retried.
Tom Clegg [Tue, 10 Nov 2015 07:20:34 +0000 (02:20 -0500)]
Use a different port number for each test case. No issue #
Tom Clegg [Tue, 10 Nov 2015 06:29:11 +0000 (01:29 -0500)]
5824: Support configuration with a download-only host.
radhika [Mon, 9 Nov 2015 20:41:46 +0000 (15:41 -0500)]
Merge branch 'master' into 7661-fuse-by-pdh
Tom Clegg [Mon, 9 Nov 2015 20:00:14 +0000 (15:00 -0500)]
5824: Preserve query in keep_web_url template. Warn when redirecting preview to a single-origin keep_web_url.
Peter Amstutz [Mon, 9 Nov 2015 19:33:09 +0000 (14:33 -0500)]
Merge branch '3585-arpi-project-uuid-wip' closes #3585
radhika [Mon, 9 Nov 2015 19:01:17 +0000 (14:01 -0500)]
Merge branch 'master' into 7661-fuse-by-pdh
radhika [Mon, 9 Nov 2015 18:54:29 +0000 (13:54 -0500)]
closes #5538
Merge branch '5538-arvadosclient-retry'
radhika [Mon, 9 Nov 2015 18:49:31 +0000 (13:49 -0500)]
5538: update the test case for "error" to use better stub parameters with nil status codes and response body to avoid any confusion to the reader.
radhika [Mon, 9 Nov 2015 16:21:35 +0000 (11:21 -0500)]
7661: rename MagiDirectory by_pdh as pdh_only
radhika [Mon, 9 Nov 2015 15:43:13 +0000 (10:43 -0500)]
Merge branch 'master' into 7661-fuse-by-pdh
radhika [Mon, 9 Nov 2015 13:38:29 +0000 (08:38 -0500)]
5538: add a test that simulates error during requesting server so that we can test the error path as well.
Brett Smith [Mon, 9 Nov 2015 11:05:28 +0000 (06:05 -0500)]
3585: Add --project-uuid switch to a-r-p-i.
Tom Clegg [Mon, 9 Nov 2015 08:28:50 +0000 (03:28 -0500)]
5824: Add anonymous-404 and download-by-pdh tests.
Tom Clegg [Sun, 8 Nov 2015 20:52:29 +0000 (15:52 -0500)]
5824: Propagate non-token parts of query string (notably ?attachment=disposition) when redirecting.
Tom Clegg [Sun, 8 Nov 2015 11:39:05 +0000 (06:39 -0500)]
5824: Support partial content with Range header (only if start==0).
Tom Clegg [Sat, 7 Nov 2015 09:36:01 +0000 (04:36 -0500)]
5824: Fix disposition=attachment handling.
Propagate disposition=attachment from Workbench to keep-web when
redirecting.
Include a filename in the Content-Disposition header if the request
URL contains "?", so UAs don't mistakenly include the query string as
part of the default filename.
Tom Clegg [Sat, 7 Nov 2015 09:06:47 +0000 (04:06 -0500)]
5824: Fixup new keepproxy tests to use simplified test setup.
See
813d35123538b00ab70719e247b6bb0881269460
Tom Clegg [Sat, 7 Nov 2015 09:03:27 +0000 (04:03 -0500)]
5824: Move "periodically refresh Keep services" func from keepproxy to SDK.
Tom Clegg [Sat, 7 Nov 2015 09:00:50 +0000 (04:00 -0500)]
5824: Fix server shutdown code.
* Pay attention to --num-keep-servers in stop_keep.
* Wait for processes to exit, to avoid start/stop races.
* Tighten exception handling in kill_server_pid() and warn instead of
crashing in various races.
* Log TERM signals.
* Log when a server does not shut down within the given deadline.
Tom Clegg [Sat, 7 Nov 2015 08:54:03 +0000 (03:54 -0500)]
5824: Fix Keep server shutdown, check errors, simplify stderr redirection.
(Oops, we forgot to actually Run() the python command for stop_keep.)
radhika [Sat, 7 Nov 2015 14:25:48 +0000 (09:25 -0500)]
5538: update the test to set resp.body with the given string from stub than hard code it (overlooked in previous commit)
radhika [Sat, 7 Nov 2015 14:00:49 +0000 (09:00 -0500)]
5538: correct retryable list and use it to determine whether to close idle connections; add a few more test cases.
radhika [Sat, 7 Nov 2015 13:42:38 +0000 (08:42 -0500)]
Merge branch 'master' into 5538-arvadosclient-retry
Tom Clegg [Sat, 7 Nov 2015 07:22:07 +0000 (02:22 -0500)]
5824: Use fifo2stderr for arv-git-httpd and keep-web logs, too.
Tom Clegg [Fri, 6 Nov 2015 21:58:32 +0000 (16:58 -0500)]
5824: Sync test suite to new keep-web argument names.
Tom Clegg [Fri, 6 Nov 2015 21:53:01 +0000 (16:53 -0500)]
5824: Merge branch 'master' into 5824-keep-web-workbench
radhika [Fri, 6 Nov 2015 15:11:27 +0000 (10:11 -0500)]
5538: much simpler and neater api stub test case array; golint
Tom Clegg [Fri, 6 Nov 2015 04:17:32 +0000 (23:17 -0500)]
7724: Use datamanager token in keep-rsync tests. refs #7724
radhika [Fri, 6 Nov 2015 03:19:54 +0000 (22:19 -0500)]
Merge branch 'master' into 5538-arvadosclient-retry
radhika [Fri, 6 Nov 2015 03:17:59 +0000 (22:17 -0500)]
5538: Merge FailHandler and FailThenSucceedHandler into one APIStub to facilitate testing many more error states; also add update and delete retry tests.
radhika [Fri, 6 Nov 2015 01:13:32 +0000 (20:13 -0500)]
5538: code improvements; use switch statement instead of if statement with several status code checks, sleep between retries.
Tom Clegg [Thu, 5 Nov 2015 19:35:38 +0000 (14:35 -0500)]
7724: Use datamanager token in keepproxy index test. refs #7724
Tom Clegg [Thu, 5 Nov 2015 18:46:20 +0000 (13:46 -0500)]
Merge branch '7724-scoped-token' closes #7724
Brett Smith [Thu, 5 Nov 2015 17:12:17 +0000 (12:12 -0500)]
Fix non-packaged API server paths in the install guide.
No issue #.
Tom Clegg [Thu, 5 Nov 2015 16:37:03 +0000 (11:37 -0500)]
Merge branch '5824-keep-web' refs #5824
Tom Clegg [Thu, 5 Nov 2015 16:33:42 +0000 (11:33 -0500)]
7724: Use a scoped token in data manager tests.
Tom Clegg [Thu, 5 Nov 2015 15:54:13 +0000 (10:54 -0500)]
5824: Use ARVADOS_API_TOKEN=foo + -allow-anonymous instead of -anonymous-token=foo.
Tom Clegg [Thu, 5 Nov 2015 15:11:06 +0000 (10:11 -0500)]
5824: Rename -address to -listen
radhika [Wed, 4 Nov 2015 22:18:25 +0000 (17:18 -0500)]
5538: update the newly added TestFail* to use proper client with http.Transport
radhika [Wed, 4 Nov 2015 22:11:10 +0000 (17:11 -0500)]
Merge branch 'master' into 5538-arvadosclient-retry
Conflicts:
sdk/go/arvadosclient/arvadosclient.go
radhika [Wed, 4 Nov 2015 21:39:59 +0000 (16:39 -0500)]
refs #5538
Merge branch '5538-close-idle-connections'
radhika [Wed, 4 Nov 2015 21:38:28 +0000 (16:38 -0500)]
5538: update test to reuse arvados client in TestCreatePipelineTemplate between idle and current connections.
radhika [Wed, 4 Nov 2015 21:25:32 +0000 (16:25 -0500)]
Merge branch 'master' into 5538-close-idle-connections
radhika [Wed, 4 Nov 2015 21:19:51 +0000 (16:19 -0500)]
closes #7719
Merge branch '7719-permit-net-delete'
radhika [Wed, 4 Nov 2015 21:13:29 +0000 (16:13 -0500)]
7719: permit never-delte to be set to false; add warning that datamanager is not yet fully tested.
radhika [Wed, 4 Nov 2015 19:58:46 +0000 (14:58 -0500)]
5538: add test with a connection idle for longer than MaxIdleConnectionDuration
radhika [Wed, 4 Nov 2015 19:36:42 +0000 (14:36 -0500)]
Merge branch 'master' into 5538-close-idle-connections
Brett Smith [Wed, 4 Nov 2015 19:32:01 +0000 (14:32 -0500)]
Merge branch '7713-node-manager-blacklist-broken-nodes-wip'
Closes #7713, #7718.
radhika [Wed, 4 Nov 2015 19:08:24 +0000 (14:08 -0500)]
5538: using fake arvados server to generate errors, added tests with retries.
Brett Smith [Wed, 4 Nov 2015 17:20:36 +0000 (12:20 -0500)]
7713: Node Manager blackholes broken nodes that can't shut down.
We are seeing situations on Azure where some nodes in an UNKNOWN state
cannot be shut down. The API call to destroy them always fails.
There are two related halves to this commit. In the first half,
after a cloud shutdown request fails, ComputeNodeShutdownActor checks
whether the node is broken. If it is, it cancels shutdown retries.
In the second half, the daemon checks for this shutdown outcome. When
it happens, it blacklists the broken node: it will immediately filter
it out of node lists from the cloud. It is no longer monitored in any
way or counted as a live node, so Node Manager will boot a replacement
for it.
This lets Node Manager create cloud nodes above max_nodes, up to the
number of broken nodes. We're reasonably bounded in for now because
only the Azure driver will ever declare a node broken. Other clouds
will never blacklist nodes this way.
radhika [Wed, 4 Nov 2015 16:36:24 +0000 (11:36 -0500)]
Merge branch 'master' into 5538-arvadosclient-retry
radhika [Wed, 4 Nov 2015 16:34:35 +0000 (11:34 -0500)]
5538: close any idle connections before a POST or DELETE request.
radhika [Wed, 4 Nov 2015 15:13:53 +0000 (10:13 -0500)]
5538: retry failed arvados api requests when appropriate.
Tom Clegg [Wed, 4 Nov 2015 05:19:40 +0000 (00:19 -0500)]
Merge branch '7444-dockercleaner-containers' closes #7444
Tom Clegg [Wed, 4 Nov 2015 04:55:11 +0000 (23:55 -0500)]
Merge branch '5824-keep-web'
refs #5824
Tom Clegg [Wed, 4 Nov 2015 04:20:27 +0000 (23:20 -0500)]
Merge branch '5824-keep-web' into 5824-keep-web-workbench
Conflicts:
services/keepproxy/keepproxy.go
Tom Clegg [Wed, 4 Nov 2015 04:06:45 +0000 (23:06 -0500)]
5824: Merge branch 'master' into 5824-keep-web
Tom Clegg [Wed, 4 Nov 2015 03:50:50 +0000 (22:50 -0500)]
5824: Avoid sending empty slices through toRead chan. Fixes race in test case.
Tom Clegg [Tue, 3 Nov 2015 19:06:52 +0000 (14:06 -0500)]
5824: Turn off debug printfs unless enabled by calling program.
radhika [Tue, 3 Nov 2015 15:43:33 +0000 (10:43 -0500)]
closes #7534
Merge branch '7534-superuser-token'
radhika [Tue, 3 Nov 2015 15:43:07 +0000 (10:43 -0500)]
Merge branch 'master' into 7534-superuser-token
radhika [Tue, 3 Nov 2015 15:04:02 +0000 (10:04 -0500)]
7661: add --by-pdh option to FUSE and use this option in crunch-job. Do not start web socket client when --by-pdh is used.
Tom Clegg [Tue, 3 Nov 2015 14:55:50 +0000 (09:55 -0500)]
5824: Use session cookie instead of persistent.
Tom Clegg [Tue, 3 Nov 2015 14:52:33 +0000 (09:52 -0500)]
5824: Clarity edits in usage docs.
Peter Amstutz [Mon, 2 Nov 2015 22:42:54 +0000 (17:42 -0500)]
Merge branch '7593-cwl-crunchrunner' closes #7593
Tom Clegg [Mon, 2 Nov 2015 20:53:05 +0000 (15:53 -0500)]
7444: Rename kwarg remove_stopped_containers -> remove_containers_onexit
Tom Clegg [Mon, 2 Nov 2015 15:26:39 +0000 (10:26 -0500)]
7444: Set docker container name to {taskUUID}-{attemptNum}.
Tom Clegg [Fri, 30 Oct 2015 22:28:46 +0000 (18:28 -0400)]
7444: Do not remove docker containers with docker --rm; let dockercleaner do it.
Tom Clegg [Mon, 2 Nov 2015 20:46:22 +0000 (15:46 -0500)]
7444: Clean stopped containers at startup.