Added logger util GetOrCreateMap() and started using it everywhere.
Moved logging of summary info into ReadCollections.Summarize().
Renamed ComputeBlockReplicationCounts(readServers *ReadServers) to ReadServers.Summarize() and moved logging into method.
Split SummarizeReplication() into BucketReplication() and ComputeCounts().
Some general go-fmting, cleanup and documentation.
mishaz [Sat, 28 Feb 2015 02:36:46 +0000 (02:36 +0000)]
Added lots of unit tests.
Switched collections.ReadCollections.BlockToCollectionIndex to collections.ReadCollections.BlockToCollectionIndices since a block can belong to more than one collection.
Made collection.Summarize a method of ReadCollections.
Made a couple testing libraries in blockdigest and collection.
Moved collection.MakeBlockDigest() to blockdigest.MakeTestBlockDigest() (in the testing library).
Created collection.MakeTestReadCollections to simply writing and reading tests (in the testing library).
Created the BlockSet and CollectionIndexSet types to hide some of the awkwardness of using maps as sets and added FromSlice methods.
Moved functions for reading and writing data locally to separate file.
Created separate ReplicationSummaryCounts struct and PrettyPrint method.
Added stats for Collections in addition to Blocks.
mishaz [Mon, 23 Feb 2015 23:14:17 +0000 (23:14 +0000)]
Added block to collection index map. Started using collection index to save memory over using long uuid strings to identify collections.
Started running summarize() methods after reading data from disk so that we can modify the summarization code and test it without network i/o.
mishaz [Mon, 23 Feb 2015 21:41:19 +0000 (21:41 +0000)]
Added flags to write network data and then read it back. This is useful to speed up development, but should not be used in production since data will be stale.
Unfortunately had to switch H,L fields in blockdigest from private to public, otherwise they would not be exported.
Tom Clegg [Sat, 14 Feb 2015 21:54:00 +0000 (16:54 -0500)]
Ensure result order is predictable, even if client-provided orders do not specify a complete ordering.
Fixes intermittent test failures. Example (from
https://ci.curoverse.com/job/arvados-api-server/1305/console):
GroupsTest#test_get_all_pages_of_group-owned_objects [/data/1/jenkins/workspace/arvados-api-server/services/api/test/integration/groups_test.rb:31]:
Received 'zzzzz-4zz18-fy296fx3hot09f7' again on page 3.
<nil> expected but was
<true>.
mishaz [Fri, 30 Jan 2015 01:25:11 +0000 (01:25 +0000)]
Now fetch Keep Server Status and record it to the log. Renamed some fields and added a comment for a potential improvement to decrease lock contention.
mishaz [Tue, 27 Jan 2015 01:06:21 +0000 (01:06 +0000)]
Renamed BlockDigest's ToString() to String() to implement fmt.Stringer() interface so that we get more readable error messages when structs contain BlockDigests.
mishaz [Sat, 24 Jan 2015 02:22:01 +0000 (02:22 +0000)]
A bunch of changes, most in response to Peter's review.
Logger:
Edit() and Record() have been replaced with the single Update() method which takes a function as input (suggested by Tom).
lastWrite replaced by nextWriteAllowed, for cleaner logic
Added writeScheduled to reduce the number of writes scheduled and attempted, thereby reducing lock contention
Added sanity-checking of params
A bunch of overdue cleanup
Update documentation to reflect the above changes
Manifest:
Renamed ManifestLine to ManifestStream
Util:
Deleted a lot of crap that proved less useful than I thought.
Moved collection.NumberCollectionsAvailable() to util.NumberItemsAvailable() and made it more generic.
collection:
Just cleanup in response to changes in above packages.
keep:
Switched Mtime from int to int64 to avoid y2038 problems.
Switched approach for avoiding keep proxy from using "accessible" to filtering on service_type = disk.
Cleanup in response to changes in above packages.
loggerutil:
Cleanup in response to changes in logger.
mishaz [Wed, 7 Jan 2015 04:16:40 +0000 (04:16 +0000)]
Started focusing on Keep Server responses again. Switched to using blockdigest instead of strings. Added per block info so that we can track block replication across servers.
mishaz [Wed, 24 Dec 2014 20:26:38 +0000 (20:26 +0000)]
Added string copying to try to reduce memory usage, didn't seem to work. Cleaned up logging (and logging logic) so that we only see one line per batch.
mishaz [Wed, 24 Dec 2014 01:36:43 +0000 (01:36 +0000)]
Switched from strings to BlockDigests to hold block digests more efficiently. Started clearing out manifest text once we finished with it. Made profiling conitional on flag (before it crashed if not provided). Added final heap profile once collections were finished.
mishaz [Tue, 23 Dec 2014 23:55:12 +0000 (23:55 +0000)]
Added blockdigest class to store digests more efficiently. This has the nice side effect of reducing how many string slices we use from the SDK, so the large string can get garbage collected once we remove other usages.
mishaz [Tue, 23 Dec 2014 19:33:07 +0000 (19:33 +0000)]
Long overdue checkin of data manager. Current code runs, but uses way too much memory and eventually crashes. This checkin includes heap profiling to track down memory usage.