Brett Smith [Wed, 11 Feb 2015 20:12:37 +0000 (15:12 -0500)]
4138: Simplify Node Manager GCE credential handling.
Because libcloud's GCE driver accepts a key path as a constructor
argument, it's relatively straightforward to put all the constructor
arguments directly in the Node Manager configuration. No need to
parse out JSON.
Tim Pierce [Fri, 23 Jan 2015 22:24:54 +0000 (17:24 -0500)]
4138: GCE fixes
The 'network_id' parameter needs to be delivered as 'location' in GCE.
The ping_url parameter is now delivered in the node metadata as
'pingUrl'.
When creating a new GCE instance, 'name' is a required parameter and
must begin with a letter. The default name is the UUID of the
corresponding Arvados node, prepended with 'arv-'.
Tim Pierce [Wed, 21 Jan 2015 18:06:35 +0000 (13:06 -0500)]
4138: general GCE fixes
* JSON credential file
** GCE credentials are delivered as a JSON string (and the key is formatted as a multi-line RSA private key). Let the GCE config file specify a path to the JSON credential file for simplicity.
* Accept NodeSizes addressed by id or name
** In EC2, NodeSizes are identified by the 'id' field. In GCE they are identified by the 'name' field. Allow the Node Manager config module to accept either.
Tom Clegg [Sat, 14 Feb 2015 21:54:00 +0000 (16:54 -0500)]
Ensure result order is predictable, even if client-provided orders do not specify a complete ordering.
Fixes intermittent test failures. Example (from
https://ci.curoverse.com/job/arvados-api-server/1305/console):
GroupsTest#test_get_all_pages_of_group-owned_objects [/data/1/jenkins/workspace/arvados-api-server/services/api/test/integration/groups_test.rb:31]:
Received 'zzzzz-4zz18-fy296fx3hot09f7' again on page 3.
<nil> expected but was
<true>.
mishaz [Fri, 30 Jan 2015 01:25:11 +0000 (01:25 +0000)]
Now fetch Keep Server Status and record it to the log. Renamed some fields and added a comment for a potential improvement to decrease lock contention.
mishaz [Tue, 27 Jan 2015 01:06:21 +0000 (01:06 +0000)]
Renamed BlockDigest's ToString() to String() to implement fmt.Stringer() interface so that we get more readable error messages when structs contain BlockDigests.
mishaz [Sat, 24 Jan 2015 02:22:01 +0000 (02:22 +0000)]
A bunch of changes, most in response to Peter's review.
Logger:
Edit() and Record() have been replaced with the single Update() method which takes a function as input (suggested by Tom).
lastWrite replaced by nextWriteAllowed, for cleaner logic
Added writeScheduled to reduce the number of writes scheduled and attempted, thereby reducing lock contention
Added sanity-checking of params
A bunch of overdue cleanup
Update documentation to reflect the above changes
Manifest:
Renamed ManifestLine to ManifestStream
Util:
Deleted a lot of crap that proved less useful than I thought.
Moved collection.NumberCollectionsAvailable() to util.NumberItemsAvailable() and made it more generic.
collection:
Just cleanup in response to changes in above packages.
keep:
Switched Mtime from int to int64 to avoid y2038 problems.
Switched approach for avoiding keep proxy from using "accessible" to filtering on service_type = disk.
Cleanup in response to changes in above packages.
loggerutil:
Cleanup in response to changes in logger.
mishaz [Wed, 7 Jan 2015 04:16:40 +0000 (04:16 +0000)]
Started focusing on Keep Server responses again. Switched to using blockdigest instead of strings. Added per block info so that we can track block replication across servers.
mishaz [Wed, 24 Dec 2014 20:26:38 +0000 (20:26 +0000)]
Added string copying to try to reduce memory usage, didn't seem to work. Cleaned up logging (and logging logic) so that we only see one line per batch.
mishaz [Wed, 24 Dec 2014 01:36:43 +0000 (01:36 +0000)]
Switched from strings to BlockDigests to hold block digests more efficiently. Started clearing out manifest text once we finished with it. Made profiling conitional on flag (before it crashed if not provided). Added final heap profile once collections were finished.
mishaz [Tue, 23 Dec 2014 23:55:12 +0000 (23:55 +0000)]
Added blockdigest class to store digests more efficiently. This has the nice side effect of reducing how many string slices we use from the SDK, so the large string can get garbage collected once we remove other usages.
mishaz [Tue, 23 Dec 2014 19:33:07 +0000 (19:33 +0000)]
Long overdue checkin of data manager. Current code runs, but uses way too much memory and eventually crashes. This checkin includes heap profiling to track down memory usage.
mishaz [Sat, 22 Nov 2014 00:57:40 +0000 (00:57 +0000)]
Added reporting of disk usage. This is the Collection Storage of each user as described here: https://arvados.org/projects/arvados/wiki/Data_Manager_Design_Doc#Reports-Produced
But it does not include the size of projects owned by the user (projects and subprojects are each reported as their own users)
Tom Clegg [Fri, 13 Feb 2015 00:04:01 +0000 (19:04 -0500)]
5011: Fix unreliable test.
The collection writer was (sometimes) consuming the last 200 response
even though it could write 3 copies without it. This shouldn't fail
the test: the only reason we count the PUT calls is to verify all
three of the 200 responses were consumed (i.e., none of the 500
responses were counted toward the achieved replication level). To
verify this without being sensitive to extra requests, we simply
arrange for the three 200 responses to be the last ones available.