Tom Clegg [Tue, 10 Mar 2015 06:58:58 +0000 (02:58 -0400)]
5182: Improve error reporting in uploader.
Missing CORS headers (and network errors which force the browser to
assume CORS headers are missing) are reported as error==="". In place
of the enigmatic "error:", we show a message hinting at network/CORS
problems and pointing the user to the browser debug console for
further clues.
Mixed-content errors (https://workbench/*.js attempts AJAX request at
http://proxy/*) don't invoke success/fail handlers at all, so we catch
them ahead of time and show an appropriate message.
Brett Smith [Tue, 3 Mar 2015 23:00:21 +0000 (18:00 -0500)]
5319: Add API migration for manifests with bad portable data hashes.
Refer to #5319 for background discussion and rationale. The migration
ensures that collections are still addressable by the bad
portable_data_hash, but the existing collection object has the correct
portable_data_hash.
Tom Clegg [Thu, 5 Mar 2015 19:07:52 +0000 (14:07 -0500)]
5261: When redirecting during an AJAX request, send the target URI in
a JSON object {"href":"..."} instead of responding 302.
This lets us use "redirect_to X" to mean "send the user to page X"
regardless of whether the request is an XHR. Without it, client-side
code never sees the 302 at all: the browser handles the redirect
transparently, and the client-side code typically ends up trying to
parse HTML content as JSON.
Peter Amstutz [Wed, 4 Mar 2015 19:05:10 +0000 (14:05 -0500)]
4956: Refactor http request patching used in Python SDK.
Test_request_too_large uses published size instead of hardcoded size. Make
note that user must configure upstream web server to set request size limits.
Brett Smith [Tue, 3 Mar 2015 15:08:17 +0000 (10:08 -0500)]
5313: Rely more on datacenter constructor in Node Manager GCE driver.
When initialized with a datacenter argument, the GCE libcloud driver
acts a lot more like the EC2 one. Many listings are implicitly
limited to that zone, saving us the need of limiting searches
ourselves. Let's rely on libcloud instead of our own code.
Brett Smith [Tue, 3 Mar 2015 15:06:24 +0000 (10:06 -0500)]
5313: Revert Node Manager's GCE boot disk destroy code.
After upgrading to libcloud>=0.16, it's redundant to create a node
with ex_disk_auto_delete=True, then destroy the node with
destory_boot_disk=True. During the destroy process, libcloud will
fail to destroy the boot disk, because Google has already deleted it.
ex_disk_auto_delete is closer to what we want, so just rely on that.
Radhika Chippada [Mon, 2 Mar 2015 21:04:17 +0000 (16:04 -0500)]
5349: Reverted "Time.iso8601(current_job[:created_at]" back to "current_job[:created_at]". All tests and manual testing passed and no negative side effects are observed.
Peter Amstutz [Mon, 2 Mar 2015 20:51:55 +0000 (15:51 -0500)]
4823: More fixes and cleanups.
* Renamed SynchronizedCollectionBase to RichCollectionBase
* Renamed arvapi parameter of one_task_per_input_file to api_client
* KeepLocator.stripped() returns bare hash if self.size is None
* Permit closing an ArvadosFileWriter more than once
* Fix various docstrings
* Strive to follow PEP 8 spacing guidelines
Brett Smith [Mon, 2 Mar 2015 16:27:59 +0000 (11:27 -0500)]
4751: Node Manager considers ping times for stricter node pairing.
Because the pairing decision is currently based on IP address alone,
Node Manager will occasionally pair a cloud node with the wrong
Arvados node after an IP address is reused. Fix that by bringing the
node's first_ping_at into consideration: if it's older than the cloud
node, refuse to pair.
This more closely matches the behavior of the EC2 driver, which we
want.
* Upgrade to libcloud 0.16, which adds an ex_disk_auto_delete argument
to GCE's create_node method, with True as the default.
* Set destroy_boot_disk=True when calling destroy_node().
Brett Smith [Wed, 25 Feb 2015 16:37:26 +0000 (11:37 -0500)]
5283: crunch-job doesn't use freeze logic after a job fails.
If the job has failed permanently, we want to go through all the
end-of-job logic. Previously, we were getting sidetracked into
freeze_if_want_freeze, which skips some steps like setting the
permanent job output record. Refs #4472.
Brett Smith [Fri, 27 Feb 2015 19:20:12 +0000 (14:20 -0500)]
5283: Improve reliability of crunch-job output collation.
* Check the results of all pipe opens, exit statuses, and writes.
Log any problems.
* Have fetch_block return undef when it encounters trouble, rather
than dying. create_output_collection already checks for this, so it
effectively bubbles up the error.
* Retry all of the associated API calls.
* Kill the manifest creation pipe if we give up on it, per the TODO.
This probably won't resolve #5283, but hopefully these changes will
give us additional information to help diagnose the problem.
Peter Amstutz [Thu, 26 Feb 2015 18:19:50 +0000 (13:19 -0500)]
4823: Add flush() to ArvadosFile. Fix tests to avoid using internal APIs. Fix
import in _normalize_stream. Make KeepRequestError more generic (now
represents a list of "request errors" instead of "service errors").
Peter Amstutz [Wed, 25 Feb 2015 21:42:10 +0000 (16:42 -0500)]
5309: Fix keepclient and keepproxy bugs related to error handling:
* KeepClient: Handle nil response and nil response Body from Client.Do(GET)
* KeepProxy: Only defer reader.Close() if reader is not nil
* KeepProxy tests: Add test for GET failure to read (404)
Peter Amstutz [Wed, 25 Feb 2015 19:09:09 +0000 (14:09 -0500)]
5310: Use c.get('name') instead of c['name']
* 'name' isn't necessarily present when obj_uuid is a PDH,
src.collections().get(uuid=obj_uuid).execute() may return a synthetic record
without a name field.