arvados.git
9 years agoAdd 'apps/arv-web/' from commit 'f9732ad8460d013c2f28363655d0d1b91894dca5'
Peter Amstutz [Fri, 16 Jan 2015 19:05:48 +0000 (14:05 -0500)]
Add 'apps/arv-web/' from commit 'f9732ad8460d013c2f28363655d0d1b91894dca5'

git-subtree-dir: apps/arv-web
git-subtree-mainline: b97ac7f96234cbbb491bdbaade840ab50802f357
git-subtree-split: f9732ad8460d013c2f28363655d0d1b91894dca5

9 years ago4904: Rename to arv-web.py to reflect a more general purpose tool.
Peter Amstutz [Fri, 16 Jan 2015 19:05:33 +0000 (14:05 -0500)]
4904: Rename to arv-web.py to reflect a more general purpose tool.

9 years ago4904: CGI sample works.
Peter Amstutz [Fri, 16 Jan 2015 18:54:42 +0000 (13:54 -0500)]
4904: CGI sample works.

9 years ago4838: Add --set-executable-bit option to make all files from mounted collections...
Peter Amstutz [Fri, 16 Jan 2015 18:30:13 +0000 (13:30 -0500)]
4838: Add --set-executable-bit option to make all files from mounted collections be executable.

9 years agoConfigure dockerfile with passenger. Add sample applications.
Peter Amstutz [Fri, 16 Jan 2015 17:02:55 +0000 (12:02 -0500)]
Configure dockerfile with passenger.  Add sample applications.

9 years ago4904: Renamed "runit.py" to "arv-web-example.py"
Peter Amstutz [Wed, 14 Jan 2015 21:21:52 +0000 (16:21 -0500)]
4904: Renamed "runit.py" to "arv-web-example.py"

9 years ago4904: Fixed event listening. Terminates properly on signals. Tested and works now.
Peter Amstutz [Wed, 14 Jan 2015 21:14:05 +0000 (16:14 -0500)]
4904: Fixed event listening.  Terminates properly on signals.  Tested and works now.

9 years ago4904: Chooses most recently modified collection and runs web service on it.
Peter Amstutz [Wed, 14 Jan 2015 20:21:04 +0000 (15:21 -0500)]
4904: Chooses most recently modified collection and runs web service on it.

9 years ago4904: Set up fuse, set up event bus, run docker
Peter Amstutz [Wed, 14 Jan 2015 19:59:37 +0000 (14:59 -0500)]
4904: Set up fuse, set up event bus, run docker

9 years agoMerge branch '4968-refresh-project-dir' closes #4968
Peter Amstutz [Tue, 13 Jan 2015 20:15:56 +0000 (15:15 -0500)]
Merge branch '4968-refresh-project-dir' closes #4968

9 years agoExplicitly specify ruby 2.1 in "rvm alias" no issue #
Peter Amstutz [Mon, 12 Jan 2015 21:56:09 +0000 (16:56 -0500)]
Explicitly specify ruby 2.1 in "rvm alias" no issue #

9 years ago4968: Fix polling refresh on project directories
Peter Amstutz [Fri, 9 Jan 2015 02:26:32 +0000 (21:26 -0500)]
4968: Fix polling refresh on project directories

9 years agoMerge branch '4924-arv-edit-error-handling' closes #4924
Peter Amstutz [Mon, 12 Jan 2015 18:39:34 +0000 (13:39 -0500)]
Merge branch '4924-arv-edit-error-handling' closes #4924

9 years ago4924: Collapse JSON rescue blocks to reduce duplicate code.
Peter Amstutz [Mon, 12 Jan 2015 18:38:45 +0000 (13:38 -0500)]
4924: Collapse JSON rescue blocks to reduce duplicate code.

9 years ago4924: Update prints uuid from results, not the uuid originally specified.
Peter Amstutz [Mon, 12 Jan 2015 17:23:40 +0000 (12:23 -0500)]
4924: Update prints uuid from results, not the uuid originally specified.
Catch Oj::ParseError as well as JSON::ParserError.  Titleize only HTTP status
messages, not every error message.

9 years ago4924: Rename HTTPResponse to ArvadosAPIError. Use NET::HTTP Response titles
Peter Amstutz [Mon, 12 Jan 2015 15:33:36 +0000 (10:33 -0500)]
4924: Rename HTTPResponse to ArvadosAPIError.  Use NET::HTTP Response titles
for error codes if no other error is available.

9 years agoAdd a magic pseudoclass to body, instead of appending a magic div. Selenium seems...
Tom Clegg [Fri, 9 Jan 2015 22:46:15 +0000 (17:46 -0500)]
Add a magic pseudoclass to body, instead of appending a magic div. Selenium seems to like this better. refs #3021

9 years agoDiagnostics really do need selenium. refs #3021
Tom Clegg [Fri, 9 Jan 2015 22:45:23 +0000 (17:45 -0500)]
Diagnostics really do need selenium. refs #3021

9 years agoMake angular shim minify-safe. No issue #
Tom Clegg [Fri, 9 Jan 2015 22:09:28 +0000 (17:09 -0500)]
Make angular shim minify-safe. No issue #

9 years ago4924: Distinguish between errors the user can do something about (syntax errors
Peter Amstutz [Fri, 9 Jan 2015 20:37:42 +0000 (15:37 -0500)]
4924: Distinguish between errors the user can do something about (syntax errors
or well-formed server errors) and errors that the user probably can't recover
from (everything else.)  Prints "Updated object" on success.

9 years ago3021: Fix phantomjs races by waiting for pages to appear. refs #3021
Tom Clegg [Thu, 8 Jan 2015 21:22:40 +0000 (16:22 -0500)]
3021: Fix phantomjs races by waiting for pages to appear. refs #3021

9 years agoMerge branch '3408-go-sdk-api-errors' refs #3408
Tom Clegg [Thu, 8 Jan 2015 21:04:01 +0000 (16:04 -0500)]
Merge branch '3408-go-sdk-api-errors' refs #3408

9 years ago3408: Propagate API error messages to caller.
Tom Clegg [Thu, 8 Jan 2015 20:50:42 +0000 (15:50 -0500)]
3408: Propagate API error messages to caller.

9 years agoMerge branch '4312-crunch-report-sdk-version' closes #4312
Peter Amstutz [Thu, 8 Jan 2015 18:49:05 +0000 (13:49 -0500)]
Merge branch '4312-crunch-report-sdk-version' closes #4312

9 years ago4312: Remove redundant parenthesis.
Peter Amstutz [Thu, 8 Jan 2015 18:48:42 +0000 (13:48 -0500)]
4312: Remove redundant parenthesis.

9 years ago4924: Refactor arv edit and arv create to improve error handling.
Peter Amstutz [Thu, 8 Jan 2015 15:26:35 +0000 (10:26 -0500)]
4924: Refactor arv edit and arv create to improve error handling.

* Error messages are now added in a comment block at the top of the file, and
  the file is re-opened in the user's editor.
* Does not try to update attributes that are not changed.
* Exiting the editor with the file unchanged exits the editing loop.

9 years ago4312: Fix dpkg search to use dpkg-query.
Peter Amstutz [Wed, 7 Jan 2015 21:32:23 +0000 (16:32 -0500)]
4312: Fix dpkg search to use dpkg-query.

9 years ago4312: Call virtualenv pip directly instead of using activate.
Peter Amstutz [Wed, 7 Jan 2015 19:51:16 +0000 (14:51 -0500)]
4312: Call virtualenv pip directly instead of using activate.

9 years ago4312: Use "install" phase of bootstrap script to report the installed versions
Peter Amstutz [Wed, 7 Jan 2015 19:38:41 +0000 (14:38 -0500)]
4312: Use "install" phase of bootstrap script to report the installed versions
of any arvados pip or debian packages.  Like virtualenv logic, only reports for
task 0 (since every task starts the same image).

9 years agoMerge branch '3021-more-phantomjs' refs #3021
Tom Clegg [Wed, 7 Jan 2015 17:14:42 +0000 (12:14 -0500)]
Merge branch '3021-more-phantomjs' refs #3021

9 years ago3021: Use selenium to land on #Advanced tab.
Tom Clegg [Wed, 7 Jan 2015 17:13:51 +0000 (12:13 -0500)]
3021: Use selenium to land on #Advanced tab.

9 years ago3021: Merge branch 'master' into 3021-more-phantomjs
Tom Clegg [Wed, 7 Jan 2015 17:14:21 +0000 (12:14 -0500)]
3021: Merge branch 'master' into 3021-more-phantomjs

9 years ago3021: Add random part to magic string.
Tom Clegg [Wed, 7 Jan 2015 15:01:32 +0000 (10:01 -0500)]
3021: Add random part to magic string.

9 years agoMerge branch '4598-crunch-failure-stats'
Tim Pierce [Wed, 7 Jan 2015 14:46:10 +0000 (09:46 -0500)]
Merge branch '4598-crunch-failure-stats'

Fixes #4598.

9 years ago4598: actually rename this time
Tim Pierce [Wed, 7 Jan 2015 14:45:06 +0000 (09:45 -0500)]
4598: actually rename this time

PEBCAK failure led to deleting the file without staging the new one. d'oh.

9 years ago4598: rename script
Tim Pierce [Wed, 7 Jan 2015 14:43:56 +0000 (09:43 -0500)]
4598: rename script

Renamed crunch-failure-report.py to crunch_failure_report.py to permit
importing (and eventually testing).

9 years ago3021: Wait for shown.bs.modal before trying to click butttons in the modal.
Tom Clegg [Tue, 6 Jan 2015 22:48:42 +0000 (17:48 -0500)]
3021: Wait for shown.bs.modal before trying to click butttons in the modal.
Remove a stray Headless.new.start.

9 years ago4598: catch exceptions more aggressively when looking up pipeline names
Tim Pierce [Tue, 6 Jan 2015 21:21:10 +0000 (21:21 +0000)]
4598: catch exceptions more aggressively when looking up pipeline names

Added exception handling for cases where:
* job is not recorded as belonging to any pipeline instance
* pipeline instance has no pipeline template

9 years agoMerge branch '4570-multi-auth-method' refs #4570
Peter Amstutz [Tue, 6 Jan 2015 19:06:28 +0000 (14:06 -0500)]
Merge branch '4570-multi-auth-method' refs #4570

9 years ago3021: Fix assertion broken in 9c10212.
Tom Clegg [Tue, 6 Jan 2015 18:52:16 +0000 (13:52 -0500)]
3021: Fix assertion broken in 9c10212.

9 years ago4570: Fix tabs, CSS on log in button.
Peter Amstutz [Tue, 6 Jan 2015 18:24:06 +0000 (13:24 -0500)]
4570: Fix tabs, CSS on log in button.

9 years ago3021: Wait for dialog to close before asserting page transition.
Tom Clegg [Tue, 6 Jan 2015 17:26:49 +0000 (12:26 -0500)]
3021: Wait for dialog to close before asserting page transition.

9 years agoMerge branch '4836-first-tab-load-wip'
Brett Smith [Tue, 6 Jan 2015 17:12:27 +0000 (12:12 -0500)]
Merge branch '4836-first-tab-load-wip'

Closes #4836, #4870.

9 years ago4836: Trigger Workbench infinite scroll load on tab show.
Brett Smith [Fri, 19 Dec 2014 22:40:13 +0000 (17:40 -0500)]
4836: Trigger Workbench infinite scroll load on tab show.

If an infinite scroller is in the first tab of a show page, but the
user is going to a different tab, we'll queue up the first event
to load data for the container, but when it fires the container won't
be visible so it will decline to load anything.  Then you can only get
data to load if you resize the window.

Fire a scroll event when a new tab is shown, to spur the infinite
scroller to load data as appropriate.

9 years ago4598: account for queued and cancelled jobs, fix sorting
Tim Pierce [Tue, 6 Jan 2015 16:03:10 +0000 (11:03 -0500)]
4598: account for queued and cancelled jobs, fix sorting

Per code review:
* Updated report to include job states "Cancelled" and "Queued" as well
  as Failed, Running and Complete, and to take these into account when
  calculating job counts.
* Fixed sorting for failure classes.

9 years agoMerge branch 'master' into 4570-multi-auth-method
Peter Amstutz [Tue, 6 Jan 2015 13:45:08 +0000 (08:45 -0500)]
Merge branch 'master' into 4570-multi-auth-method

9 years ago4570: Revert to links on log in page instead of form. Fixup documentation to
Peter Amstutz [Tue, 6 Jan 2015 13:44:49 +0000 (08:44 -0500)]
4570: Revert to links on log in page instead of form.  Fixup documentation to
to describe a production setup.

9 years ago3021: Use headless helper in performance and diagnostics tests, too.
Tom Clegg [Tue, 6 Jan 2015 06:02:06 +0000 (01:02 -0500)]
3021: Use headless helper in performance and diagnostics tests, too.

9 years ago3021: 4399: Refactor headless stuff into a module. Clear up new/start/stop use.
Tom Clegg [Tue, 6 Jan 2015 05:59:02 +0000 (00:59 -0500)]
3021: 4399: Refactor headless stuff into a module. Clear up new/start/stop use.

* Create one Headless per test process, when encountering the first
  test case that needs one.

* Call headless.start & stop exactly once for each test case that uses
  it.

9 years ago4598: formatting and calculation fixes (code review)
Tim Pierce [Mon, 5 Jan 2015 19:22:47 +0000 (14:22 -0500)]
4598: formatting and calculation fixes (code review)

Incorporating code review feedback from #4598-13.

Bugs fixed:
* Correct counting and percentage calculation of job failures.
** Jobs were getting categorized as both "unknown" and as a specific failure type.
* Crashes fixed: should not raise any unhandled exceptions.

Formatting fixes:
* Itemized failures are now sorted in descending order by failure type
* Better horizontal alignment
* Modified formatting to account for updated description.

9 years agoMerge branch '4869-keepalive' refs #4869
Peter Amstutz [Mon, 5 Jan 2015 16:37:54 +0000 (11:37 -0500)]
Merge branch '4869-keepalive' refs #4869

9 years ago4869: Client.Timeout and Client.Transport are now correctly set in
Peter Amstutz [Mon, 5 Jan 2015 15:17:42 +0000 (10:17 -0500)]
4869: Client.Timeout and Client.Transport are now correctly set in
DiscoverKeepServers().  Improved comments.

9 years ago3021: Clean up headless/selenium/javascript choices.
Tom Clegg [Sun, 4 Jan 2015 08:17:45 +0000 (03:17 -0500)]
3021: Clean up headless/selenium/javascript choices.

9 years ago3021: Skip angular init if angular is not loaded.
Tom Clegg [Sat, 3 Jan 2015 06:52:34 +0000 (01:52 -0500)]
3021: Skip angular init if angular is not loaded.

9 years ago3021: Silence "invalid regexp" errors while typing regexp; put input in "has-error...
Tom Clegg [Sat, 3 Jan 2015 03:20:59 +0000 (22:20 -0500)]
3021: Silence "invalid regexp" errors while typing regexp; put input in "has-error" state instead.

9 years ago3021: 4399: Convert some tests from selenium to phantomjs. Restart Headless less.
Tom Clegg [Fri, 2 Jan 2015 22:38:34 +0000 (17:38 -0500)]
3021: 4399: Convert some tests from selenium to phantomjs. Restart Headless less.

9 years agoRemove cruft. No issue #
Tom Clegg [Wed, 31 Dec 2014 21:33:57 +0000 (16:33 -0500)]
Remove cruft. No issue #

9 years agoMerge branch '4887-invalidate-duplicate-ip-on-old-compute-nodes'
Ward Vandewege [Wed, 31 Dec 2014 15:01:59 +0000 (10:01 -0500)]
Merge branch '4887-invalidate-duplicate-ip-on-old-compute-nodes'

closes #4887

9 years agoMerge branch 'master' into 4887-invalidate-duplicate-ip-on-old-compute-nodes
Ward Vandewege [Wed, 31 Dec 2014 15:01:30 +0000 (10:01 -0500)]
Merge branch 'master' into 4887-invalidate-duplicate-ip-on-old-compute-nodes

9 years agoAddress review comments:
Ward Vandewege [Wed, 31 Dec 2014 15:00:21 +0000 (10:00 -0500)]
Address review comments:

* change stale_conflicting_nodes to a local variable
* minor performance optimization: add an additional check for ip_address being nil

refs #4887

9 years agoMerge branch '4877-dont-delete-stdout'
Tim Pierce [Tue, 30 Dec 2014 21:50:04 +0000 (16:50 -0500)]
Merge branch '4877-dont-delete-stdout'

Fixes #4877

9 years ago4877: don't delete /dev/stdout
Tim Pierce [Tue, 30 Dec 2014 21:45:42 +0000 (16:45 -0500)]
4877: don't delete /dev/stdout

Fixed the filename check before trying to delete /dev/stdout.

9 years ago4598: added failure types and short names
Tim Pierce [Tue, 30 Dec 2014 21:07:02 +0000 (16:07 -0500)]
4598: added failure types and short names

Added the sys/docker failure type. Failures now reported by short
failure name rather than by regex.

9 years ago4598: remove more dev/debugging features.
Tim Pierce [Tue, 30 Dec 2014 19:48:59 +0000 (14:48 -0500)]
4598: remove more dev/debugging features.

9 years ago4598: take out some debug reporting and --match option
Tim Pierce [Tue, 30 Dec 2014 19:47:31 +0000 (14:47 -0500)]
4598: take out some debug reporting and --match option

Remove debugging features.

9 years ago4598: fetch logs from Keep, more failure reporting
Tim Pierce [Tue, 30 Dec 2014 19:42:28 +0000 (14:42 -0500)]
4598: fetch logs from Keep, more failure reporting

Per standup review: fetch logs with a CollectionReader on the log
collection uuid, rather than fetching log records from the API server.

Perform full failure reporting including job URL details.

9 years agoDetect stale compute node records with the same IP address as the new
Ward Vandewege [Tue, 30 Dec 2014 19:31:53 +0000 (14:31 -0500)]
Detect stale compute node records with the same IP address as the new
node on its first ping. Clear the ip_address field on the stale nodes.

Refs #4887

9 years agoCleanups:
Ward Vandewege [Tue, 30 Dec 2014 18:28:57 +0000 (13:28 -0500)]
Cleanups:

* Remove old commented out code
* Remove superfluous test for presence of file on disk

refs #4887

9 years ago4598: bug fixes, added full stats collection
Tim Pierce [Tue, 30 Dec 2014 16:00:51 +0000 (11:00 -0500)]
4598: bug fixes, added full stats collection

Added code to report full stats on failed, successful, and incomplete
jobs.  Perform basic reporting on failed job causes (not yet working).

9 years ago4869: Enable TCP keepalive and adjust connection timeouts to Keep client.
Peter Amstutz [Tue, 30 Dec 2014 15:39:50 +0000 (10:39 -0500)]
4869: Enable TCP keepalive and adjust connection timeouts to Keep client.

9 years agoFix whitespace, cf. gofmt. refs #4875
Tom Clegg [Mon, 29 Dec 2014 22:02:01 +0000 (17:02 -0500)]
Fix whitespace, cf. gofmt. refs #4875

9 years agoMerge branch '4875-keepclient-test-race' closes #4875
Tom Clegg [Mon, 29 Dec 2014 21:59:35 +0000 (16:59 -0500)]
Merge branch '4875-keepclient-test-race' closes #4875

9 years ago4875: Merge branch 'master' into 4875-keepclient-test-race
Tom Clegg [Mon, 29 Dec 2014 21:29:17 +0000 (16:29 -0500)]
4875: Merge branch 'master' into 4875-keepclient-test-race

Conflicts:
sdk/go/keepclient/keepclient_test.go

9 years agoFix version strings to comply with PEP-440. No issue #
Tom Clegg [Mon, 29 Dec 2014 20:45:30 +0000 (15:45 -0500)]
Fix version strings to comply with PEP-440. No issue #

9 years agoMerge branch '4523-owner_uuid-index' refs #4523
Tom Clegg [Mon, 29 Dec 2014 20:12:46 +0000 (15:12 -0500)]
Merge branch '4523-owner_uuid-index' refs #4523

9 years agoMerge branch '4869-keepproxy' refs #4869
Peter Amstutz [Mon, 29 Dec 2014 20:11:05 +0000 (15:11 -0500)]
Merge branch '4869-keepproxy' refs #4869

9 years ago4869: Strip all newlines from error responses, not just leading and trailing
Peter Amstutz [Mon, 29 Dec 2014 19:37:13 +0000 (14:37 -0500)]
4869: Strip all newlines from error responses, not just leading and trailing
whitespace.

9 years ago4523: Dry up migration and test cases.
Tom Clegg [Mon, 29 Dec 2014 18:58:58 +0000 (13:58 -0500)]
4523: Dry up migration and test cases.

9 years ago4869: Based on Go documentation, don't set a body ReadCloser on the request
Peter Amstutz [Mon, 29 Dec 2014 18:51:20 +0000 (13:51 -0500)]
4869: Based on Go documentation, don't set a body ReadCloser on the request
when body length is 0.

9 years ago4523: Fix column order to match migration order.
Tom Clegg [Mon, 29 Dec 2014 17:45:02 +0000 (12:45 -0500)]
4523: Fix column order to match migration order.

9 years ago4523: Remove dev-only checks in migration.
Tom Clegg [Mon, 29 Dec 2014 17:44:35 +0000 (12:44 -0500)]
4523: Remove dev-only checks in migration.

9 years ago4869: Correctly handle zero-length blocks in Keep client/Keep proxy. Remove
Peter Amstutz [Mon, 29 Dec 2014 17:32:38 +0000 (12:32 -0500)]
4869: Correctly handle zero-length blocks in Keep client/Keep proxy.  Remove
X-Block-Size.  Choose default request timeout based on if client is talking to
a proxy or not.  Use double quotes in logging.  Rename "tag" to "requestId".

9 years ago4523: Fix whitespace.
Tom Clegg [Mon, 29 Dec 2014 17:28:44 +0000 (12:28 -0500)]
4523: Fix whitespace.

9 years ago4869: Keepstore now returns Content-Length headers, and logs the error message
Peter Amstutz [Mon, 29 Dec 2014 14:23:45 +0000 (09:23 -0500)]
4869: Keepstore now returns Content-Length headers, and logs the error message
sent to the client on errors.

9 years ago4869: KeepClient now has a default timeout per block request (10 minutes). In
Peter Amstutz [Mon, 29 Dec 2014 14:09:13 +0000 (09:09 -0500)]
4869: KeepClient now has a default timeout per block request (10 minutes).  In
keepproxy, the timeout is set to 20 seconds per block.  Also rearranged some
keepclient and keepproxy logging to provide better information.

9 years agoMerge branch '4754-performance-TC' closes #4754
Tom Clegg [Tue, 23 Dec 2014 20:51:49 +0000 (15:51 -0500)]
Merge branch '4754-performance-TC' closes #4754

9 years agoMerge branch '4844-stricter-min-nodes-wip'
Ward Vandewege [Tue, 23 Dec 2014 20:47:49 +0000 (15:47 -0500)]
Merge branch '4844-stricter-min-nodes-wip'

refs #4844

9 years agoMerge branch 'master' into 4844-stricter-min-nodes-wip
Ward Vandewege [Tue, 23 Dec 2014 20:47:23 +0000 (15:47 -0500)]
Merge branch 'master' into 4844-stricter-min-nodes-wip

9 years agoSkip two more CLI tests that need a running API server.
Ward Vandewege [Tue, 23 Dec 2014 20:44:10 +0000 (15:44 -0500)]
Skip two more CLI tests that need a running API server.

refs #4156

9 years ago4869: Improve logging
Peter Amstutz [Tue, 23 Dec 2014 14:55:05 +0000 (09:55 -0500)]
4869: Improve logging

9 years ago4875: Let the OS choose port numbers for fake servers.
Tom Clegg [Sun, 21 Dec 2014 00:28:56 +0000 (19:28 -0500)]
4875: Let the OS choose port numbers for fake servers.

Fixes a race condition where test case N+1 can't listen on port 2990
because test case N hasn't shut down its listener.

Also removes the artificial acceptance requirement that nobody else on
the testing host is using the arbitrarily assigned port range
2990..299x.

Incidental changes:

* rename RunBogusKeepServer to RunFakeKeepServer (to match
  RunSomeFakeKeepServers and fix the misleading implication that the
  resulting server does something bogus).

* return a KeepServer object from RunFakeKeepServer (for better parity
  with RunSomeFakeKeepServers).

9 years ago4875: Use range in for loops.
Tom Clegg [Sat, 20 Dec 2014 23:49:53 +0000 (18:49 -0500)]
4875: Use range in for loops.

9 years agoMerge branch '4858-graph-not-comparing' refs #4358
Phil Hodgson [Sat, 20 Dec 2014 18:34:39 +0000 (19:34 +0100)]
Merge branch '4858-graph-not-comparing' refs #4358

9 years agoMerge branch 'master' into 4358-graph-not-comparing
Phil Hodgson [Sat, 20 Dec 2014 17:58:03 +0000 (18:58 +0100)]
Merge branch 'master' into 4358-graph-not-comparing

9 years ago4844: Node Manager doesn't treat min_nodes as min_nodes_idle.
Brett Smith [Fri, 19 Dec 2014 17:09:17 +0000 (12:09 -0500)]
4844: Node Manager doesn't treat min_nodes as min_nodes_idle.

There's a bad interaction between the past bugfixes to (a) implement
min_nodes, and (b) boot new nodes when existing nodes are busy.
Because min_nodes has been implemented at the server wishlist level in
the past, the daemon can't distinguish between "nodes requested to
fulfill min_nodes" and "nodes requested to fulfill jobs."

This commit puts all the responsibility for enforcing min_nodes in the
daemon, so that the server wishlist always represents real job
requirements.  This lets the daemon correctly decide whether or not to
boot a new node when >= min_nodes are busy.

9 years agoMerge branch '4670-node-manager-robust-tags-wip'
Brett Smith [Thu, 18 Dec 2014 21:20:15 +0000 (16:20 -0500)]
Merge branch '4670-node-manager-robust-tags-wip'

Closes #4670, #4812.

9 years ago4670: Add a post-create hook to Node Manager for EC2 tagging.
Brett Smith [Fri, 12 Dec 2014 21:16:39 +0000 (16:16 -0500)]
4670: Add a post-create hook to Node Manager for EC2 tagging.

The previous code was relying on the post-create tagging in libcloud's
EC2 driver.  Unfortunately, that's not working out too well for us: if
it fails, you get no indication of that, and it doesn't get retried.
This moves the work up into Node Manager, where failures can be logged
and retried appropriately.

The retry support may be sufficient to resolve #4670.  If it's not,
then the additional logging will help us track down the root cause.

9 years ago4670: Node Manager handles more libcloud exceptions.
Brett Smith [Fri, 12 Dec 2014 18:18:51 +0000 (13:18 -0500)]
4670: Node Manager handles more libcloud exceptions.

libcloud compute drivers (at least EC2 and GCE) raise bare Exceptions
when there's some problem talking to the cloud service.  The previous
code was expecting to see a LibcloudError, so it wouldn't handle these
errors as intended.

I didn't want to just catch errors with "except Exception" everywhere,
so I added an is_cloud_exception class method to our driver classes to
more accurately identify exceptions that represent trouble talking to
the cloud service.  It recognizes exact Exceptions, plus the other
classes we were catching before.

While I was at this, I gave more specific names to the wrapper methods
in compute node actor decorators, as a debugging aid.

9 years ago4800: run-command calls sys.exit() with an integer.
Brett Smith [Thu, 18 Dec 2014 16:04:23 +0000 (11:04 -0500)]
4800: run-command calls sys.exit() with an integer.

Closes #4800.

9 years ago4818: Add missing timeout in Node Manager test.
Brett Smith [Thu, 18 Dec 2014 15:42:56 +0000 (10:42 -0500)]
4818: Add missing timeout in Node Manager test.

Refs #4818.