Peter Amstutz [Wed, 4 Mar 2015 19:05:10 +0000 (14:05 -0500)]
4956: Refactor http request patching used in Python SDK.
Test_request_too_large uses published size instead of hardcoded size. Make
note that user must configure upstream web server to set request size limits.
Brett Smith [Tue, 3 Mar 2015 15:08:17 +0000 (10:08 -0500)]
5313: Rely more on datacenter constructor in Node Manager GCE driver.
When initialized with a datacenter argument, the GCE libcloud driver
acts a lot more like the EC2 one. Many listings are implicitly
limited to that zone, saving us the need of limiting searches
ourselves. Let's rely on libcloud instead of our own code.
Brett Smith [Tue, 3 Mar 2015 15:06:24 +0000 (10:06 -0500)]
5313: Revert Node Manager's GCE boot disk destroy code.
After upgrading to libcloud>=0.16, it's redundant to create a node
with ex_disk_auto_delete=True, then destroy the node with
destory_boot_disk=True. During the destroy process, libcloud will
fail to destroy the boot disk, because Google has already deleted it.
ex_disk_auto_delete is closer to what we want, so just rely on that.
Radhika Chippada [Mon, 2 Mar 2015 21:04:17 +0000 (16:04 -0500)]
5349: Reverted "Time.iso8601(current_job[:created_at]" back to "current_job[:created_at]". All tests and manual testing passed and no negative side effects are observed.
Peter Amstutz [Mon, 2 Mar 2015 20:51:55 +0000 (15:51 -0500)]
4823: More fixes and cleanups.
* Renamed SynchronizedCollectionBase to RichCollectionBase
* Renamed arvapi parameter of one_task_per_input_file to api_client
* KeepLocator.stripped() returns bare hash if self.size is None
* Permit closing an ArvadosFileWriter more than once
* Fix various docstrings
* Strive to follow PEP 8 spacing guidelines
Brett Smith [Mon, 2 Mar 2015 16:27:59 +0000 (11:27 -0500)]
4751: Node Manager considers ping times for stricter node pairing.
Because the pairing decision is currently based on IP address alone,
Node Manager will occasionally pair a cloud node with the wrong
Arvados node after an IP address is reused. Fix that by bringing the
node's first_ping_at into consideration: if it's older than the cloud
node, refuse to pair.
This more closely matches the behavior of the EC2 driver, which we
want.
* Upgrade to libcloud 0.16, which adds an ex_disk_auto_delete argument
to GCE's create_node method, with True as the default.
* Set destroy_boot_disk=True when calling destroy_node().
Brett Smith [Wed, 25 Feb 2015 16:37:26 +0000 (11:37 -0500)]
5283: crunch-job doesn't use freeze logic after a job fails.
If the job has failed permanently, we want to go through all the
end-of-job logic. Previously, we were getting sidetracked into
freeze_if_want_freeze, which skips some steps like setting the
permanent job output record. Refs #4472.
Brett Smith [Fri, 27 Feb 2015 19:20:12 +0000 (14:20 -0500)]
5283: Improve reliability of crunch-job output collation.
* Check the results of all pipe opens, exit statuses, and writes.
Log any problems.
* Have fetch_block return undef when it encounters trouble, rather
than dying. create_output_collection already checks for this, so it
effectively bubbles up the error.
* Retry all of the associated API calls.
* Kill the manifest creation pipe if we give up on it, per the TODO.
This probably won't resolve #5283, but hopefully these changes will
give us additional information to help diagnose the problem.
Peter Amstutz [Thu, 26 Feb 2015 18:19:50 +0000 (13:19 -0500)]
4823: Add flush() to ArvadosFile. Fix tests to avoid using internal APIs. Fix
import in _normalize_stream. Make KeepRequestError more generic (now
represents a list of "request errors" instead of "service errors").
Peter Amstutz [Wed, 25 Feb 2015 21:42:10 +0000 (16:42 -0500)]
5309: Fix keepclient and keepproxy bugs related to error handling:
* KeepClient: Handle nil response and nil response Body from Client.Do(GET)
* KeepProxy: Only defer reader.Close() if reader is not nil
* KeepProxy tests: Add test for GET failure to read (404)
Peter Amstutz [Wed, 25 Feb 2015 19:09:09 +0000 (14:09 -0500)]
5310: Use c.get('name') instead of c['name']
* 'name' isn't necessarily present when obj_uuid is a PDH,
src.collections().get(uuid=obj_uuid).execute() may return a synthetic record
without a name field.
Peter Amstutz [Tue, 24 Feb 2015 22:13:33 +0000 (17:13 -0500)]
4823: Remove sync_mode() from Collection in favor of writable() flag.
Collection constructor raises ArgumentError() on bad manifest. Fix
assertEquals() -> assertEqual().
Peter Amstutz [Mon, 23 Feb 2015 21:36:09 +0000 (16:36 -0500)]
4520: Coerce unicode strings to ascii in put(). Use result.content (returns
literal result bytes) instead result.text (returns unicode) when processing
Keep responses.
Peter Amstutz [Mon, 23 Feb 2015 20:17:19 +0000 (15:17 -0500)]
4520: manifest_text() is utf-8 encoded by default so it can be safely put() to
Keep. Add test that calling put() with a unicode string raises an error.
Fetching user uuid in arv-copy uses num_retries.
Peter Amstutz [Mon, 23 Feb 2015 19:17:12 +0000 (14:17 -0500)]
4520: Better checking to see if collection already exists at the destination.
Set args.project_uuid to default value (current user uuid) if not set on
command line to simplify code.
Peter Amstutz [Mon, 23 Feb 2015 18:54:47 +0000 (13:54 -0500)]
4520: Refactor code to create the collection record. Also refactored code
which creates Docker metadata links so that copying any collection which
represents a Docker image will also copy over the metadata.
Peter Amstutz [Mon, 23 Feb 2015 14:44:27 +0000 (09:44 -0500)]
4823: Handle edge cases of files named '.' so that the FUSE test passes. Added
tests for invalid manifests. Defer populating CollectionReader streams until
needed to avoid extra copy of the manifest.
Phil Hodgson [Sat, 21 Feb 2015 09:16:31 +0000 (10:16 +0100)]
4232: revert experimental change to using find? for each of the jobs in a pipeline, rather than simply a where clause: there is no evidence that this switch to find? was helping to speed up anything overall
Brett Smith [Mon, 16 Feb 2015 16:06:41 +0000 (11:06 -0500)]
4138: Prepare Node Manager GCE driver for production.
* Set node metadata in more appropriate places.
* Bridge more differences between GCE and EC2, like the fact that
sizes are listed for each location they're available, and GCE
doesn't provide node boot times.
* Use more infrastructure from BaseComputeNodeDriver to reduce code
duplication.
* Load as many objects as possible at initialization time, to reduce
API overhead of creating nodes.
Brett Smith [Fri, 13 Feb 2015 20:24:04 +0000 (15:24 -0500)]
4138: Revamp Node Manager driver proxying in BaseComputeNodeDriver.
Accessing attributes through a super() proxy does not invoke
__getattr__ on base classes, so the old implementation made it
impossible for subclasses to be agnostic about whether a method was
implemented in BaseComputeNodeDriver or the real libcloud driver.
This version makes that possible. It's also a little nicer because
now the class will report these method names to dir(), hasattr(), etc.
Brett Smith [Thu, 12 Feb 2015 20:53:16 +0000 (15:53 -0500)]
4138: Fix noop Node Manager EC2 driver tests.
The previous tests simply instantiated the driver, then checked that a
mock method was truthy (which it will always be). This makes the test
work as intended.
Brett Smith [Wed, 11 Feb 2015 20:12:37 +0000 (15:12 -0500)]
4138: Simplify Node Manager GCE credential handling.
Because libcloud's GCE driver accepts a key path as a constructor
argument, it's relatively straightforward to put all the constructor
arguments directly in the Node Manager configuration. No need to
parse out JSON.
Tim Pierce [Fri, 23 Jan 2015 22:24:54 +0000 (17:24 -0500)]
4138: GCE fixes
The 'network_id' parameter needs to be delivered as 'location' in GCE.
The ping_url parameter is now delivered in the node metadata as
'pingUrl'.
When creating a new GCE instance, 'name' is a required parameter and
must begin with a letter. The default name is the UUID of the
corresponding Arvados node, prepended with 'arv-'.