mishaz [Tue, 10 Feb 2015 02:39:39 +0000 (02:39 +0000)]
Added different event types for started, partially complete and final log entries.
mishaz [Tue, 10 Feb 2015 02:11:34 +0000 (02:11 +0000)]
Moved some logging code from datamananager to loggerutil.
mishaz [Tue, 10 Feb 2015 01:55:37 +0000 (01:55 +0000)]
Updated logger to do all work in a dedicated goroutine, so we don't need to worry about locking. Small changes to calling code.
mishaz [Fri, 30 Jan 2015 01:32:31 +0000 (01:32 +0000)]
Renamed timestamp fields to begin with "time_"
mishaz [Fri, 30 Jan 2015 01:25:11 +0000 (01:25 +0000)]
Now fetch Keep Server Status and record it to the log. Renamed some fields and added a comment for a potential improvement to decrease lock contention.
mishaz [Tue, 27 Jan 2015 01:27:37 +0000 (01:27 +0000)]
Improved erorr message to make it clear what's a size and what's a timestamp.
mishaz [Tue, 27 Jan 2015 01:06:21 +0000 (01:06 +0000)]
Renamed BlockDigest's ToString() to String() to implement fmt.Stringer() interface so that we get more readable error messages when structs contain BlockDigests.
mishaz [Sat, 24 Jan 2015 02:22:01 +0000 (02:22 +0000)]
A bunch of changes, most in response to Peter's review.
Logger:
Edit() and Record() have been replaced with the single Update() method which takes a function as input (suggested by Tom).
lastWrite replaced by nextWriteAllowed, for cleaner logic
Added writeScheduled to reduce the number of writes scheduled and attempted, thereby reducing lock contention
Added sanity-checking of params
A bunch of overdue cleanup
Update documentation to reflect the above changes
Manifest:
Renamed ManifestLine to ManifestStream
Util:
Deleted a lot of crap that proved less useful than I thought.
Moved collection.NumberCollectionsAvailable() to util.NumberItemsAvailable() and made it more generic.
collection:
Just cleanup in response to changes in above packages.
keep:
Switched Mtime from int to int64 to avoid y2038 problems.
Switched approach for avoiding keep proxy from using "accessible" to filtering on service_type = disk.
Cleanup in response to changes in above packages.
loggerutil:
Cleanup in response to changes in logger.
mishaz [Wed, 21 Jan 2015 01:34:29 +0000 (01:34 +0000)]
Added comment, ran gofmt.
mishaz [Wed, 21 Jan 2015 01:31:17 +0000 (01:31 +0000)]
Added Logger.MutateLog() on Tom's suggestion. Tried it out in one instance to make sure it works.
mishaz [Wed, 14 Jan 2015 01:40:50 +0000 (01:40 +0000)]
Finished adding logging to keep.GetServerContents but have not tested fully yet.
mishaz [Wed, 14 Jan 2015 00:50:21 +0000 (00:50 +0000)]
ran gofmt
mishaz [Wed, 14 Jan 2015 00:49:31 +0000 (00:49 +0000)]
Broke keep.GetServerContents() into smaller functions.
mishaz [Tue, 13 Jan 2015 23:56:05 +0000 (23:56 +0000)]
Removed channel awareness from keep.GetServerContents().
mishaz [Tue, 13 Jan 2015 23:27:05 +0000 (23:27 +0000)]
gofmt'd all my source code. No other changes.
mishaz [Tue, 13 Jan 2015 01:15:34 +0000 (01:15 +0000)]
Started logging in keep.go. More work to be done.
mishaz [Tue, 13 Jan 2015 00:49:53 +0000 (00:49 +0000)]
Created loggerutil to hold common datamanager logger code. Moved FatalWithMessage to it.
mishaz [Tue, 13 Jan 2015 00:23:17 +0000 (00:23 +0000)]
Moved some logic from datamanager.go to keep.go.
mishaz [Mon, 12 Jan 2015 23:20:50 +0000 (23:20 +0000)]
Started reading collections and keep data in parallel. Moved some logic from datamanager.go to collections.go. Added logging to end of run.
mishaz [Sun, 11 Jan 2015 18:29:13 +0000 (18:29 +0000)]
Merge branch 'master' of git.curoverse.com:arvados into 3408-production-datamanager
Conflicts resolved:
services/api/Gemfile
services/api/Gemfile.lock
mishaz [Fri, 9 Jan 2015 04:53:42 +0000 (04:53 +0000)]
Added recording of fatal errors to logger.
mishaz [Fri, 9 Jan 2015 04:11:29 +0000 (04:11 +0000)]
Added ForceRecord() method to enable writing remaining log changes before exiting.
mishaz [Fri, 9 Jan 2015 04:00:40 +0000 (04:00 +0000)]
Switched Logger edit hooks to write hooks so they'll trigger less often.
mishaz [Thu, 8 Jan 2015 23:22:09 +0000 (23:22 +0000)]
Switched batch size to 50. Added logging of batch size.
mishaz [Thu, 8 Jan 2015 23:17:52 +0000 (23:17 +0000)]
Added memory alloc in use to stats exported to log. Also added EditHooks to Logger, enabling users to add functions to get called on each Edit() call.
mishaz [Thu, 8 Jan 2015 22:35:37 +0000 (22:35 +0000)]
Added structure to data manager log entries, grouping similar fields.
mishaz [Thu, 8 Jan 2015 22:24:00 +0000 (22:24 +0000)]
Added ability to turn off logging by passing an empty string as the event type.
mishaz [Thu, 8 Jan 2015 22:16:49 +0000 (22:16 +0000)]
Started using Logger in data manager and collections.
mishaz [Thu, 8 Jan 2015 21:06:49 +0000 (21:06 +0000)]
Added support for MinimumWriteInterval.
mishaz [Thu, 8 Jan 2015 20:14:10 +0000 (20:14 +0000)]
Fixed bugs in logger, changed interface some, added documentation.
Still need to add support for MinimunWriteInterval.
mishaz [Thu, 8 Jan 2015 01:47:51 +0000 (01:47 +0000)]
Added logger to write log messages that grow over time. Not working yet.
mishaz [Wed, 7 Jan 2015 04:16:40 +0000 (04:16 +0000)]
Started focusing on Keep Server responses again. Switched to using blockdigest instead of strings. Added per block info so that we can track block replication across servers.
mishaz [Wed, 7 Jan 2015 01:45:55 +0000 (01:45 +0000)]
Fixed heap profile writing so that we overwrite previous heap profiles rather than adding to them. Minor cleanup too.
mishaz [Wed, 24 Dec 2014 20:26:38 +0000 (20:26 +0000)]
Added string copying to try to reduce memory usage, didn't seem to work. Cleaned up logging (and logging logic) so that we only see one line per batch.
mishaz [Wed, 24 Dec 2014 19:29:08 +0000 (19:29 +0000)]
Started parsing modification date as a timestamp instead of leaving it as a string.
mishaz [Wed, 24 Dec 2014 01:36:43 +0000 (01:36 +0000)]
Switched from strings to BlockDigests to hold block digests more efficiently. Started clearing out manifest text once we finished with it. Made profiling conitional on flag (before it crashed if not provided). Added final heap profile once collections were finished.
Runs to completion!
mishaz [Wed, 24 Dec 2014 00:24:07 +0000 (00:24 +0000)]
Changes to manifest that I forgot to add to previous checking.
mishaz [Tue, 23 Dec 2014 23:55:12 +0000 (23:55 +0000)]
Added blockdigest class to store digests more efficiently. This has the nice side effect of reducing how many string slices we use from the SDK, so the large string can get garbage collected once we remove other usages.
mishaz [Tue, 23 Dec 2014 19:33:07 +0000 (19:33 +0000)]
Long overdue checkin of data manager. Current code runs, but uses way too much memory and eventually crashes. This checkin includes heap profiling to track down memory usage.
mishaz [Sat, 22 Nov 2014 00:57:40 +0000 (00:57 +0000)]
Added reporting of disk usage. This is the Collection Storage of each user as described here: https://arvados.org/projects/arvados/wiki/Data_Manager_Design_Doc#Reports-Produced
But it does not include the size of projects owned by the user (projects and subprojects are each reported as their own users)
Report is just logged to screen for now.
mishaz [Thu, 16 Oct 2014 20:57:06 +0000 (20:57 +0000)]
Started reading index from keep servers.
Added lots of code to handle unexpected results from keep server.
mishaz [Wed, 15 Oct 2014 20:53:53 +0000 (20:53 +0000)]
Started reading response from keep server.
Tom Clegg [Fri, 13 Feb 2015 21:22:55 +0000 (16:22 -0500)]
Merge branch 'master' of git.curoverse.com:arvados into 3408-production-datamanager
Tom Clegg [Fri, 9 Jan 2015 22:46:15 +0000 (17:46 -0500)]
Add a magic pseudoclass to body, instead of appending a magic div. Selenium seems to like this better. refs #3021
Tom Clegg [Fri, 9 Jan 2015 22:45:23 +0000 (17:45 -0500)]
Diagnostics really do need selenium. refs #3021
Tom Clegg [Fri, 9 Jan 2015 22:09:28 +0000 (17:09 -0500)]
Make angular shim minify-safe. No issue #
Tom Clegg [Thu, 8 Jan 2015 21:22:40 +0000 (16:22 -0500)]
3021: Fix phantomjs races by waiting for pages to appear. refs #3021
Tom Clegg [Thu, 8 Jan 2015 21:04:01 +0000 (16:04 -0500)]
Merge branch '3408-go-sdk-api-errors' refs #3408
Tom Clegg [Thu, 8 Jan 2015 20:50:42 +0000 (15:50 -0500)]
3408: Propagate API error messages to caller.
Peter Amstutz [Thu, 8 Jan 2015 18:49:05 +0000 (13:49 -0500)]
Merge branch '4312-crunch-report-sdk-version' closes #4312
Peter Amstutz [Thu, 8 Jan 2015 18:48:42 +0000 (13:48 -0500)]
4312: Remove redundant parenthesis.
Peter Amstutz [Wed, 7 Jan 2015 21:32:23 +0000 (16:32 -0500)]
4312: Fix dpkg search to use dpkg-query.
Peter Amstutz [Wed, 7 Jan 2015 19:51:16 +0000 (14:51 -0500)]
4312: Call virtualenv pip directly instead of using activate.
Peter Amstutz [Wed, 7 Jan 2015 19:38:41 +0000 (14:38 -0500)]
4312: Use "install" phase of bootstrap script to report the installed versions
of any arvados pip or debian packages. Like virtualenv logic, only reports for
task 0 (since every task starts the same image).
Tom Clegg [Wed, 7 Jan 2015 17:14:42 +0000 (12:14 -0500)]
Merge branch '3021-more-phantomjs' refs #3021
Tom Clegg [Wed, 7 Jan 2015 17:13:51 +0000 (12:13 -0500)]
3021: Use selenium to land on #Advanced tab.
Tom Clegg [Wed, 7 Jan 2015 17:14:21 +0000 (12:14 -0500)]
3021: Merge branch 'master' into 3021-more-phantomjs
Tom Clegg [Wed, 7 Jan 2015 15:01:32 +0000 (10:01 -0500)]
3021: Add random part to magic string.
Tim Pierce [Wed, 7 Jan 2015 14:46:10 +0000 (09:46 -0500)]
Merge branch '4598-crunch-failure-stats'
Fixes #4598.
Tim Pierce [Wed, 7 Jan 2015 14:45:06 +0000 (09:45 -0500)]
4598: actually rename this time
PEBCAK failure led to deleting the file without staging the new one. d'oh.
Tim Pierce [Wed, 7 Jan 2015 14:43:56 +0000 (09:43 -0500)]
4598: rename script
Renamed crunch-failure-report.py to crunch_failure_report.py to permit
importing (and eventually testing).
Tom Clegg [Tue, 6 Jan 2015 22:48:42 +0000 (17:48 -0500)]
3021: Wait for shown.bs.modal before trying to click butttons in the modal.
Remove a stray Headless.new.start.
Tim Pierce [Tue, 6 Jan 2015 21:21:10 +0000 (21:21 +0000)]
4598: catch exceptions more aggressively when looking up pipeline names
Added exception handling for cases where:
* job is not recorded as belonging to any pipeline instance
* pipeline instance has no pipeline template
Peter Amstutz [Tue, 6 Jan 2015 19:06:28 +0000 (14:06 -0500)]
Merge branch '4570-multi-auth-method' refs #4570
Tom Clegg [Tue, 6 Jan 2015 18:52:16 +0000 (13:52 -0500)]
3021: Fix assertion broken in
9c10212.
Peter Amstutz [Tue, 6 Jan 2015 18:24:06 +0000 (13:24 -0500)]
4570: Fix tabs, CSS on log in button.
Tom Clegg [Tue, 6 Jan 2015 17:26:49 +0000 (12:26 -0500)]
3021: Wait for dialog to close before asserting page transition.
Brett Smith [Tue, 6 Jan 2015 17:12:27 +0000 (12:12 -0500)]
Merge branch '4836-first-tab-load-wip'
Closes #4836, #4870.
Brett Smith [Fri, 19 Dec 2014 22:40:13 +0000 (17:40 -0500)]
4836: Trigger Workbench infinite scroll load on tab show.
If an infinite scroller is in the first tab of a show page, but the
user is going to a different tab, we'll queue up the first event
to load data for the container, but when it fires the container won't
be visible so it will decline to load anything. Then you can only get
data to load if you resize the window.
Fire a scroll event when a new tab is shown, to spur the infinite
scroller to load data as appropriate.
Tim Pierce [Tue, 6 Jan 2015 16:03:10 +0000 (11:03 -0500)]
4598: account for queued and cancelled jobs, fix sorting
Per code review:
* Updated report to include job states "Cancelled" and "Queued" as well
as Failed, Running and Complete, and to take these into account when
calculating job counts.
* Fixed sorting for failure classes.
Peter Amstutz [Tue, 6 Jan 2015 13:45:08 +0000 (08:45 -0500)]
Merge branch 'master' into 4570-multi-auth-method
Peter Amstutz [Tue, 6 Jan 2015 13:44:49 +0000 (08:44 -0500)]
4570: Revert to links on log in page instead of form. Fixup documentation to
to describe a production setup.
Tom Clegg [Tue, 6 Jan 2015 06:02:06 +0000 (01:02 -0500)]
3021: Use headless helper in performance and diagnostics tests, too.
Tom Clegg [Tue, 6 Jan 2015 05:59:02 +0000 (00:59 -0500)]
3021: 4399: Refactor headless stuff into a module. Clear up new/start/stop use.
* Create one Headless per test process, when encountering the first
test case that needs one.
* Call headless.start & stop exactly once for each test case that uses
it.
Tim Pierce [Mon, 5 Jan 2015 19:22:47 +0000 (14:22 -0500)]
4598: formatting and calculation fixes (code review)
Incorporating code review feedback from #4598-13.
Bugs fixed:
* Correct counting and percentage calculation of job failures.
** Jobs were getting categorized as both "unknown" and as a specific failure type.
* Crashes fixed: should not raise any unhandled exceptions.
Formatting fixes:
* Itemized failures are now sorted in descending order by failure type
* Better horizontal alignment
* Modified formatting to account for updated description.
Peter Amstutz [Mon, 5 Jan 2015 16:37:54 +0000 (11:37 -0500)]
Merge branch '4869-keepalive' refs #4869
Peter Amstutz [Mon, 5 Jan 2015 15:17:42 +0000 (10:17 -0500)]
4869: Client.Timeout and Client.Transport are now correctly set in
DiscoverKeepServers(). Improved comments.
Tom Clegg [Sun, 4 Jan 2015 08:17:45 +0000 (03:17 -0500)]
3021: Clean up headless/selenium/javascript choices.
Tom Clegg [Sat, 3 Jan 2015 06:52:34 +0000 (01:52 -0500)]
3021: Skip angular init if angular is not loaded.
Tom Clegg [Sat, 3 Jan 2015 03:20:59 +0000 (22:20 -0500)]
3021: Silence "invalid regexp" errors while typing regexp; put input in "has-error" state instead.
Tom Clegg [Fri, 2 Jan 2015 22:38:34 +0000 (17:38 -0500)]
3021: 4399: Convert some tests from selenium to phantomjs. Restart Headless less.
Tom Clegg [Wed, 31 Dec 2014 21:33:57 +0000 (16:33 -0500)]
Remove cruft. No issue #
Ward Vandewege [Wed, 31 Dec 2014 15:01:59 +0000 (10:01 -0500)]
Merge branch '4887-invalidate-duplicate-ip-on-old-compute-nodes'
closes #4887
Ward Vandewege [Wed, 31 Dec 2014 15:01:30 +0000 (10:01 -0500)]
Merge branch 'master' into 4887-invalidate-duplicate-ip-on-old-compute-nodes
Ward Vandewege [Wed, 31 Dec 2014 15:00:21 +0000 (10:00 -0500)]
Address review comments:
* change stale_conflicting_nodes to a local variable
* minor performance optimization: add an additional check for ip_address being nil
refs #4887
Tim Pierce [Tue, 30 Dec 2014 21:50:04 +0000 (16:50 -0500)]
Merge branch '4877-dont-delete-stdout'
Fixes #4877
Tim Pierce [Tue, 30 Dec 2014 21:45:42 +0000 (16:45 -0500)]
4877: don't delete /dev/stdout
Fixed the filename check before trying to delete /dev/stdout.
Tim Pierce [Tue, 30 Dec 2014 21:07:02 +0000 (16:07 -0500)]
4598: added failure types and short names
Added the sys/docker failure type. Failures now reported by short
failure name rather than by regex.
Tim Pierce [Tue, 30 Dec 2014 19:48:59 +0000 (14:48 -0500)]
4598: remove more dev/debugging features.
Tim Pierce [Tue, 30 Dec 2014 19:47:31 +0000 (14:47 -0500)]
4598: take out some debug reporting and --match option
Remove debugging features.
Tim Pierce [Tue, 30 Dec 2014 19:42:28 +0000 (14:42 -0500)]
4598: fetch logs from Keep, more failure reporting
Per standup review: fetch logs with a CollectionReader on the log
collection uuid, rather than fetching log records from the API server.
Perform full failure reporting including job URL details.
Ward Vandewege [Tue, 30 Dec 2014 19:31:53 +0000 (14:31 -0500)]
Detect stale compute node records with the same IP address as the new
node on its first ping. Clear the ip_address field on the stale nodes.
Refs #4887
Ward Vandewege [Tue, 30 Dec 2014 18:28:57 +0000 (13:28 -0500)]
Cleanups:
* Remove old commented out code
* Remove superfluous test for presence of file on disk
refs #4887
Tim Pierce [Tue, 30 Dec 2014 16:00:51 +0000 (11:00 -0500)]
4598: bug fixes, added full stats collection
Added code to report full stats on failed, successful, and incomplete
jobs. Perform basic reporting on failed job causes (not yet working).
Peter Amstutz [Tue, 30 Dec 2014 15:39:50 +0000 (10:39 -0500)]
4869: Enable TCP keepalive and adjust connection timeouts to Keep client.
Tom Clegg [Mon, 29 Dec 2014 22:02:01 +0000 (17:02 -0500)]
Fix whitespace, cf. gofmt. refs #4875
Tom Clegg [Mon, 29 Dec 2014 21:59:35 +0000 (16:59 -0500)]
Merge branch '4875-keepclient-test-race' closes #4875
Tom Clegg [Mon, 29 Dec 2014 21:29:17 +0000 (16:29 -0500)]
4875: Merge branch 'master' into 4875-keepclient-test-race
Conflicts:
sdk/go/keepclient/keepclient_test.go
Tom Clegg [Mon, 29 Dec 2014 20:45:30 +0000 (15:45 -0500)]
Fix version strings to comply with PEP-440. No issue #
Tom Clegg [Mon, 29 Dec 2014 20:12:46 +0000 (15:12 -0500)]
Merge branch '4523-owner_uuid-index' refs #4523