arvados.git
8 years ago6518: Dispatch to slurm using sbatch
Peter Amstutz [Fri, 19 Feb 2016 02:42:10 +0000 (21:42 -0500)]
6518: Dispatch to slurm using sbatch

8 years agocloses #8441
radhika [Thu, 18 Feb 2016 19:24:08 +0000 (14:24 -0500)]
closes #8441
Merge branch '8441-project-chooser'

8 years agoMerge branch 'master' into 8441-project-chooser
radhika [Thu, 18 Feb 2016 19:23:09 +0000 (14:23 -0500)]
Merge branch 'master' into 8441-project-chooser

8 years agoMerge branch '8400-additional-gitignore' of https://github.com/wtsi-hgi/arvados close...
Tom Clegg [Thu, 18 Feb 2016 18:57:59 +0000 (13:57 -0500)]
Merge branch '8400-additional-gitignore' of https://github.com/wtsi-hgi/arvados closes #8400

8 years ago8441: Update project chooser modal to display favorites and top-level-my-projects...
radhika [Thu, 18 Feb 2016 17:55:25 +0000 (12:55 -0500)]
8441: Update project chooser modal to display favorites and top-level-my-projects and remove the now unused build_project_trees and shared_project_tree code.

8 years agoMerge branch '8319-bcbio-cwl' closes #8319
Peter Amstutz [Thu, 18 Feb 2016 14:36:45 +0000 (09:36 -0500)]
Merge branch '8319-bcbio-cwl' closes #8319

8 years ago8319: Add log message about uploading docker image
Peter Amstutz [Wed, 17 Feb 2016 17:59:57 +0000 (12:59 -0500)]
8319: Add log message about uploading docker image

8 years agocloses #8286
radhika [Wed, 17 Feb 2016 14:32:09 +0000 (09:32 -0500)]
closes #8286
Merge branch '8286-fav-projects'

8 years ago8286: bigger star icon with stronger color contrast.
radhika [Wed, 17 Feb 2016 14:30:58 +0000 (09:30 -0500)]
8286: bigger star icon with stronger color contrast.

8 years ago8319: Add environment variable to disable websockets.
Peter Amstutz [Wed, 17 Feb 2016 14:21:09 +0000 (09:21 -0500)]
8319: Add environment variable to disable websockets.

8 years ago8286: also include link to Home in favorites section of the Projects dropdown, so...
radhika [Tue, 16 Feb 2016 17:00:09 +0000 (12:00 -0500)]
8286: also include link to Home in favorites section of the Projects dropdown, so that the user does not have to scroll through all of his favorites to seek in MyProjects section.

8 years agoMerge branch 'master' into 8286-fav-projects
radhika [Tue, 16 Feb 2016 02:33:01 +0000 (21:33 -0500)]
Merge branch 'master' into 8286-fav-projects

8 years agocloses #8079
radhika [Tue, 16 Feb 2016 02:32:00 +0000 (21:32 -0500)]
closes #8079
Merge branch '8079-api-client-auth-uuid'

8 years agoMerge branch 'master' into 8079-api-client-auth-uuid
radhika [Tue, 16 Feb 2016 01:46:05 +0000 (20:46 -0500)]
Merge branch 'master' into 8079-api-client-auth-uuid

8 years ago8286: check if project is starred only when current_user is not null (anonymous user...
radhika [Tue, 16 Feb 2016 01:44:59 +0000 (20:44 -0500)]
8286: check if project is starred only when current_user is not null (anonymous user case).

8 years ago8286: to facilitate in-place star icon refresh without the whole page refresh, it...
radhika [Mon, 15 Feb 2016 22:28:55 +0000 (17:28 -0500)]
8286: to facilitate in-place star icon refresh without the whole page refresh, it became necessary
to refresh the Projects menu after star/unstar action. Hence, moved the breadcrumbs code from
body.html into a partial.

8 years ago8286: convert star method into action controller action and refresh the star icon...
radhika [Mon, 15 Feb 2016 20:26:05 +0000 (15:26 -0500)]
8286: convert star method into action controller action and refresh the star icon in place rather than a full page refresh.

8 years agoMerge branch 'master' into 8286-fav-projects
radhika [Mon, 15 Feb 2016 18:50:48 +0000 (13:50 -0500)]
Merge branch 'master' into 8286-fav-projects

8 years agocloses #8183
radhika [Mon, 15 Feb 2016 18:43:08 +0000 (13:43 -0500)]
closes #8183
Merge branch '8183-projects-dropdown'

8 years agoMerge branch 'master' into 8183-projects-dropdown
radhika [Mon, 15 Feb 2016 18:42:44 +0000 (13:42 -0500)]
Merge branch 'master' into 8183-projects-dropdown

8 years ago8286: include favorites and top-level my projects in projects drowdown.
radhika [Mon, 15 Feb 2016 18:33:43 +0000 (13:33 -0500)]
8286: include favorites and top-level my projects in projects drowdown.

8 years ago8409: Use 80% utilization as keep_cache_mb_per_task reporting threshold. refs #8409
Tom Clegg [Mon, 15 Feb 2016 18:22:03 +0000 (13:22 -0500)]
8409: Use 80% utilization as keep_cache_mb_per_task reporting threshold. refs #8409

8 years ago8079: add rescue to drop index
radhika [Mon, 15 Feb 2016 17:47:22 +0000 (12:47 -0500)]
8079: add rescue to drop index

8 years agoMerge branch '8183-projects-dropdown' into 8286-fav-projects
radhika [Mon, 15 Feb 2016 17:35:02 +0000 (12:35 -0500)]
Merge branch '8183-projects-dropdown' into 8286-fav-projects

Conflicts:
apps/workbench/app/controllers/application_controller.rb
apps/workbench/app/views/application/_projects_tree_menu.html.erb
apps/workbench/test/controllers/projects_controller_test.rb

8 years agoMerge branch 'master' into 8183-projects-dropdown
radhika [Mon, 15 Feb 2016 16:29:04 +0000 (11:29 -0500)]
Merge branch 'master' into 8183-projects-dropdown

8 years agoMerge branch '8178-trash-interface-generic-volume-test' closes #8178
Tom Clegg [Mon, 15 Feb 2016 15:49:39 +0000 (10:49 -0500)]
Merge branch '8178-trash-interface-generic-volume-test' closes #8178

8 years ago8178: Stop accepting zeroed data, now that the s3test bug is fixed.
Tom Clegg [Thu, 4 Feb 2016 23:28:43 +0000 (18:28 -0500)]
8178: Stop accepting zeroed data, now that the s3test bug is fixed.

8 years ago8178: add generic volume tests for trash / untrash interface.
radhika [Wed, 27 Jan 2016 04:31:27 +0000 (23:31 -0500)]
8178: add generic volume tests for trash / untrash interface.

8 years agoMerge branch 'master' into 8079-api-client-auth-uuid
radhika [Mon, 15 Feb 2016 13:40:54 +0000 (08:40 -0500)]
Merge branch 'master' into 8079-api-client-auth-uuid

8 years ago8183: display top three levels of projects in menu, improve message when there are...
radhika [Mon, 15 Feb 2016 13:36:47 +0000 (08:36 -0500)]
8183: display top three levels of projects in menu, improve message when there are too many projects, improve the test.

8 years agoProcess live logs for unfinished jobs in pipeline mode, too.
Tom Clegg [Thu, 11 Feb 2016 21:25:54 +0000 (16:25 -0500)]
Process live logs for unfinished jobs in pipeline mode, too.

No issue #

8 years ago8183: retrieve only 3 levels of projects while building projects dropdown.
radhika [Sat, 13 Feb 2016 01:57:25 +0000 (20:57 -0500)]
8183: retrieve only 3 levels of projects while building projects dropdown.

8 years ago8203: crunch-job tempfails after failing to install a Docker image.
Brett Smith [Fri, 12 Feb 2016 23:38:36 +0000 (18:38 -0500)]
8203: crunch-job tempfails after failing to install a Docker image.

* Use `bash -o pipefail` to run the image installer shell script, to
  reliably detect more failures.
* Exit EX_RETRY_UNLOCKED after a failure to install the image.

Refs #8203, because I'm cautiously optimistic that this will reduce
the incidence of the "can't find UID" problem.

8 years ago8183: while displaying "my projects" tree, just show only the user's projects and...
radhika [Fri, 12 Feb 2016 21:50:34 +0000 (16:50 -0500)]
8183: while displaying "my projects" tree, just show only the user's projects and omit shared projects
even when there are no more than 300 projects to avoid confusion.

8 years agoMerge branch 'master' into 8183-projects-dropdown
radhika [Fri, 12 Feb 2016 21:24:41 +0000 (16:24 -0500)]
Merge branch 'master' into 8183-projects-dropdown

8 years ago8286: add an integration test to star / unstar project by clicking on the icon.
radhika [Fri, 12 Feb 2016 19:03:57 +0000 (14:03 -0500)]
8286: add an integration test to star / unstar project by clicking on the icon.

8 years ago8286: added test "unshare project and verify that it is no longer included in shared...
radhika [Thu, 11 Feb 2016 19:53:31 +0000 (14:53 -0500)]
8286: added test "unshare project and verify that it is no longer included in shared user's starred projects"

8 years agoMerge branch '8409-report-keep-cache' closes #8409
Peter Amstutz [Thu, 11 Feb 2016 14:41:17 +0000 (09:41 -0500)]
Merge branch '8409-report-keep-cache' closes #8409

8 years ago8409: Adjust recommended miss rate to below 0.2%
Peter Amstutz [Thu, 11 Feb 2016 14:27:21 +0000 (09:27 -0500)]
8409: Adjust recommended miss rate to below 0.2%

8 years ago8409: Add recommendation if cache miss rate is above 0.5%. Fix tests.
Peter Amstutz [Wed, 10 Feb 2016 21:42:02 +0000 (16:42 -0500)]
8409: Add recommendation if cache miss rate is above 0.5%.  Fix tests.

8 years ago8409: Report Keep cache miss rate & Keep cache utilization
Peter Amstutz [Wed, 10 Feb 2016 20:33:41 +0000 (15:33 -0500)]
8409: Report Keep cache miss rate & Keep cache utilization

8 years agoBuild python-glags backports < version 3.0.
Brett Smith [Wed, 10 Feb 2016 20:02:48 +0000 (15:02 -0500)]
Build python-glags backports < version 3.0.

No issue #.

8 years agoMerge branch 'master' into 8286-fav-projects
radhika [Wed, 10 Feb 2016 18:55:40 +0000 (13:55 -0500)]
Merge branch 'master' into 8286-fav-projects

8 years ago8079: add down migration to api_client_authorizations_search_index
radhika [Wed, 10 Feb 2016 18:51:20 +0000 (13:51 -0500)]
8079: add down migration to api_client_authorizations_search_index

8 years agoPin PySDK's gflags dependency to <3.0.
Brett Smith [Wed, 10 Feb 2016 18:19:11 +0000 (13:19 -0500)]
Pin PySDK's gflags dependency to <3.0.

We've built and tested with 3.0 successfully, but its ChangeLog says:

  * A lot of potentially backwards incompatible changes since 2.0.
  * This version is NOT recommended to use in production. Some of the files and
    documentation has been lost during export; this will be fixed in next
    versions.

We found out about this after 3.0.2 broke our tests.
Take their advice for now.  No issue #.

8 years agoMerge branch 'master' into 8286-fav-projects
radhika [Wed, 10 Feb 2016 17:55:22 +0000 (12:55 -0500)]
Merge branch 'master' into 8286-fav-projects

8 years agoMerge branch '8341-crunchstat-job-time-axis'
Tom Clegg [Wed, 10 Feb 2016 17:52:08 +0000 (12:52 -0500)]
Merge branch '8341-crunchstat-job-time-axis'

8 years agoMerge branch 'master' into 8079-api-client-auth-uuid
radhika [Wed, 10 Feb 2016 17:23:50 +0000 (12:23 -0500)]
Merge branch 'master' into 8079-api-client-auth-uuid

8 years ago8079: Added support get using uuid and list using uuid or api_token and added tests.
radhika [Wed, 10 Feb 2016 17:19:23 +0000 (12:19 -0500)]
8079: Added support get using uuid and list using uuid or api_token and added tests.

8 years ago8341: Fall back to live logs if log collection is saved but missing.
Tom Clegg [Wed, 10 Feb 2016 17:10:53 +0000 (12:10 -0500)]
8341: Fall back to live logs if log collection is saved but missing.

8 years ago8341: Update test results.
Tom Clegg [Wed, 10 Feb 2016 16:14:54 +0000 (11:14 -0500)]
8341: Update test results.

8 years ago8341: Retrieve only the log attributes that actually get used.
Tom Clegg [Wed, 10 Feb 2016 03:44:02 +0000 (22:44 -0500)]
8341: Retrieve only the log attributes that actually get used.

8 years ago8341: In pipeline mode, process all jobs concurrently.
Tom Clegg [Tue, 9 Feb 2016 18:53:38 +0000 (13:53 -0500)]
8341: In pipeline mode, process all jobs concurrently.

8 years ago8341: Include Keep network activity in net stats.
Tom Clegg [Tue, 9 Feb 2016 16:49:35 +0000 (11:49 -0500)]
8341: Include Keep network activity in net stats.

8 years ago8341: Fix up debug labels. Avoid deadlock after exceptions in thread.
Tom Clegg [Tue, 9 Feb 2016 15:27:47 +0000 (10:27 -0500)]
8341: Fix up debug labels. Avoid deadlock after exceptions in thread.

8 years ago8341: Do not round up Y axis to even numbers, just use max series value.
Tom Clegg [Mon, 8 Feb 2016 20:47:02 +0000 (15:47 -0500)]
8341: Do not round up Y axis to even numbers, just use max series value.

Remove Y axis labels (so X axis matches other graphs from the same
job), add grid lines.

8 years ago8341: Use "time since job start", not "time since task start", as X axis.
Tom Clegg [Mon, 8 Feb 2016 14:56:10 +0000 (09:56 -0500)]
8341: Use "time since job start", not "time since task start", as X axis.

8 years agoMerge branch '8284-fix-slurm-queue-timestamp-check' closes #8284
Tom Clegg [Wed, 10 Feb 2016 15:51:33 +0000 (10:51 -0500)]
Merge branch '8284-fix-slurm-queue-timestamp-check' closes #8284

8 years agoEmit log when installing docker image.
Tom Clegg [Wed, 10 Feb 2016 15:22:07 +0000 (10:22 -0500)]
Emit log when installing docker image.

Avoids creating the illusion that "clean work dirs" is taking forever.

No issue #

8 years agoMerge branch '7263-better-busy-behavior' refs #7263
Tom Clegg [Wed, 10 Feb 2016 15:08:57 +0000 (10:08 -0500)]
Merge branch '7263-better-busy-behavior' refs #7263

8 years ago7263: Avoid getting stuck processing stderr for one task for a long time.
Tom Clegg [Fri, 22 Jan 2016 20:02:21 +0000 (15:02 -0500)]
7263: Avoid getting stuck processing stderr for one task for a long time.

Do not sleep(0.1) unless pipes are idle.

8 years agocrunch-job detects more "io aborted" SLURM errors.
Brett Smith [Tue, 9 Feb 2016 22:10:08 +0000 (17:10 -0500)]
crunch-job detects more "io aborted" SLURM errors.

It's seemingly random whether SLURM reports "Aborting, io aborted and
missing step" or "Aborting, missing step and io aborted".  Extend the
regexp to catch both.  No issue #.

8 years agoMerge branch '8406-tempfail-after-retry-unlocked'
Brett Smith [Tue, 9 Feb 2016 21:42:59 +0000 (16:42 -0500)]
Merge branch '8406-tempfail-after-retry-unlocked'

Closes #8406, #8407.

8 years ago8406: Update comment to match new code.
Brett Smith [Tue, 9 Feb 2016 21:42:12 +0000 (16:42 -0500)]
8406: Update comment to match new code.

8 years ago8406: @job_retry_counts.include? jobrecord.uuid because @job_retry_counts has a defau...
Peter Amstutz [Tue, 9 Feb 2016 21:25:45 +0000 (16:25 -0500)]
8406: @job_retry_counts.include? jobrecord.uuid because @job_retry_counts has a default value.

8 years ago8406: Treat EXIT_TEMPFAIL as EXIT_RETRY_UNLOCKED if we have previously gotten
Peter Amstutz [Tue, 9 Feb 2016 20:53:13 +0000 (15:53 -0500)]
8406: Treat EXIT_TEMPFAIL as EXIT_RETRY_UNLOCKED if we have previously gotten
EXIT_RETRY_UNLOCKED (because the job is now in "Running" state.)

8 years agoI "fonud" a typo
Nico Cesar [Tue, 9 Feb 2016 21:34:34 +0000 (16:34 -0500)]
I "fonud" a typo

no issue #

8 years agoMerge branch '8404-catch-interrupted-syscall' closes #8404
Peter Amstutz [Tue, 9 Feb 2016 17:32:42 +0000 (12:32 -0500)]
Merge branch '8404-catch-interrupted-syscall' closes #8404

8 years ago8404: Adjust try block to just surround os.wait().
Peter Amstutz [Tue, 9 Feb 2016 17:31:07 +0000 (12:31 -0500)]
8404: Adjust try block to just surround os.wait().

8 years ago8404: catch and continue from interrupted system call from os.wait()
Peter Amstutz [Tue, 9 Feb 2016 16:41:13 +0000 (11:41 -0500)]
8404: catch and continue from interrupted system call from os.wait()

8 years ago8079: add uuid to api_client_authorizations_search_index and add uuid to all api_clie...
radhika [Tue, 9 Feb 2016 16:11:59 +0000 (11:11 -0500)]
8079: add uuid to api_client_authorizations_search_index and add uuid to all api_client_authorizations test fixtures.

8 years ago8079: update the migration script to use the api_token as the seed
radhika [Tue, 9 Feb 2016 14:57:15 +0000 (09:57 -0500)]
8079: update the migration script to use the api_token as the seed

8 years ago8079: add uuid to api_client_authorizations
radhika [Tue, 9 Feb 2016 13:57:50 +0000 (08:57 -0500)]
8079: add uuid to api_client_authorizations

8 years agoFix nodemanager test race. No issue #
Tom Clegg [Mon, 8 Feb 2016 21:09:28 +0000 (16:09 -0500)]
Fix nodemanager test race. No issue #

8 years agoMerge branch '8341-live-crunchstat-summary' refs #8341
Tom Clegg [Mon, 8 Feb 2016 19:32:45 +0000 (14:32 -0500)]
Merge branch '8341-live-crunchstat-summary' refs #8341

8 years ago8341: Use a Queue of lines and one thread, instead of a succession of threads and...
Tom Clegg [Mon, 8 Feb 2016 19:18:42 +0000 (14:18 -0500)]
8341: Use a Queue of lines and one thread, instead of a succession of threads and a deque of buffers.

8 years ago8341: Move reader classes to reader.py.
Tom Clegg [Mon, 8 Feb 2016 01:19:45 +0000 (20:19 -0500)]
8341: Move reader classes to reader.py.

8 years ago8341: Use a worker thread to get page N+1 of logs while parsing page N.
Tom Clegg [Mon, 8 Feb 2016 01:15:00 +0000 (20:15 -0500)]
8341: Use a worker thread to get page N+1 of logs while parsing page N.

8 years ago8341: Get job log from logs API if the log has not been written to Keep yet.
Tom Clegg [Mon, 8 Feb 2016 00:43:02 +0000 (19:43 -0500)]
8341: Get job log from logs API if the log has not been written to Keep yet.

8 years agoMerge branch '8289-no-extra-orders' closes #8289
Tom Clegg [Mon, 8 Feb 2016 19:29:03 +0000 (14:29 -0500)]
Merge branch '8289-no-extra-orders' closes #8289

8 years ago8289: Strip redundant orders, even when provided explicitly by client.
Tom Clegg [Mon, 8 Feb 2016 19:28:02 +0000 (14:28 -0500)]
8289: Strip redundant orders, even when provided explicitly by client.

8 years ago8289: Do not add fallback orders if client already specified an unambiguous order.
Tom Clegg [Sat, 23 Jan 2016 05:23:49 +0000 (00:23 -0500)]
8289: Do not add fallback orders if client already specified an unambiguous order.

8 years agoAdds additional .gitignore entries
Joshua Randall [Mon, 8 Feb 2016 17:23:53 +0000 (17:23 +0000)]
Adds additional .gitignore entries

Ignores:
apps/workbench/git-commit.version
sdk/perl/install/
services/api/git-commit.version

8 years agoMerge branch '7667-node-manager-logging' refs #7667
Peter Amstutz [Mon, 8 Feb 2016 16:28:53 +0000 (11:28 -0500)]
Merge branch '7667-node-manager-logging' refs #7667

8 years ago7667: Store node size in a table so to avoid blocking on booting and shutdown
Peter Amstutz [Mon, 8 Feb 2016 16:28:11 +0000 (11:28 -0500)]
7667: Store node size in a table so to avoid blocking on booting and shutdown
actors to ask node size.

8 years ago7667: Fix log message
Peter Amstutz [Mon, 8 Feb 2016 03:52:51 +0000 (22:52 -0500)]
7667: Fix log message

8 years agoMerge branch '8285-fuse-subscribe-websockets' closes #8285
Tom Clegg [Sat, 6 Feb 2016 00:45:30 +0000 (19:45 -0500)]
Merge branch '8285-fuse-subscribe-websockets' closes #8285

8 years ago8285: Test that arvados.events.subscribe() is called only when needed.
Tom Clegg [Sat, 6 Feb 2016 00:39:42 +0000 (19:39 -0500)]
8285: Test that arvados.events.subscribe() is called only when needed.

Add missing TagsDirectory.want_event_subscribe().

8 years ago8285: Add test for listen_for_events
Peter Amstutz [Sat, 6 Feb 2016 00:17:42 +0000 (19:17 -0500)]
8285: Add test for listen_for_events

8 years ago8183: add test to check build of my projects tree with the new method; update the...
radhika [Fri, 5 Feb 2016 23:25:11 +0000 (18:25 -0500)]
8183: add test to check build of my projects tree with the new method; update the method implementation
to accommodate testing by make the page_size an argument.

8 years ago8285: Add want_event_subscribe flag to subclasses of fusedir.Directory,
Peter Amstutz [Fri, 5 Feb 2016 21:39:25 +0000 (16:39 -0500)]
8285: Add want_event_subscribe flag to subclasses of fusedir.Directory,
determine whether to call listen_for_events based on it.

8 years ago8183: When there are more than 200 readable projects, build the tree in steps;
radhika [Fri, 5 Feb 2016 17:00:58 +0000 (12:00 -0500)]
8183: When there are more than 200 readable projects, build the tree in steps;
fetch projects under home, then subprojects under those projects and so on
until we exceed the 200 limit. In that case, also display a message in Projects
dropdown informing the user that all projects are not retrieved.
The chooser version of the project tree is untouched (and hence not performing well)
until we have the favorite projects implementation is in place. This is because
there is no search capability in the chooser dialog and hence no way to choose a
shared project when the limit is exceeded in this new top-down approach.

8 years ago7667: Combine polling logs into fewer lines for less noise. Adjust message
Peter Amstutz [Fri, 5 Feb 2016 16:10:43 +0000 (11:10 -0500)]
7667: Combine polling logs into fewer lines for less noise.  Adjust message
when last_ping_at is unexpectedly none to be less severe (can happen in
innocent circumstances).  Report nodes in "booted" list as "booting" since they
are unpaired.  Fix tests.

8 years ago7868: Update API server's arvados-cli version.
Brett Smith [Fri, 5 Feb 2016 09:52:43 +0000 (04:52 -0500)]
7868: Update API server's arvados-cli version.

Curoverse clusters are deployed by setting CRUNCH_JOB_BIN,
effectively excluding it from the bundle, but this is not true for
clusters deployed following the install guide.  Out of the box,
they'll use the version of crunch-job that's actually in the
arvados-cli gem in the bundle.

crunch-dispatch has functionality in it that requires a newer
arvados-cli, so update accordingly.  This is not exactly the version
produced by #7868, but it's pretty close.

I think there's a strong case that we should update this version
whenever we make a substantial change to crunch-job.  But since I'm
pushing this without discussion or review, I'm doing the smallest
thing possible.

Refs #7868.

8 years ago7667: Node manager bug fixes and logging improvements.
Peter Amstutz [Thu, 4 Feb 2016 23:46:31 +0000 (18:46 -0500)]
7667: Node manager bug fixes and logging improvements.

 * ComputeNodeSetupActor will now finish if there is an unhandled exception.

 * ComputeNodeMonitorActor now explains why a node that is in the shutdown window
is not eligible for shutdown.

 * Logging in nodes_wanted now distinguishes idle/busy/booting/shutting down.

 * Logging by actors is now class name and a portion of the actor urn, so actions
of a specific actor can be consistently identified.

8 years agoRecognize another way slurm tells us about node failures.
Tom Clegg [Thu, 4 Feb 2016 19:29:39 +0000 (14:29 -0500)]
Recognize another way slurm tells us about node failures.

Retry, instead of giving up, in situations like this:

2016-02-02_08:42:26 wx7k5-8i9sb-guk2lv53z3572dc 40682 3 stderr srun: error: Aborting, io error and missing step on node 0
2016-02-02_08:42:26 wx7k5-8i9sb-guk2lv53z3572dc 40682 3 stderr srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
2016-02-02_08:42:28 wx7k5-8i9sb-guk2lv53z3572dc 40682 3 stderr srun: error: Timed out waiting for job step to complete
2016-02-02_08:42:28 wx7k5-8i9sb-guk2lv53z3572dc 40682 3 child 42984 on compute26.1 exit 0 success=
2016-02-02_08:42:28 wx7k5-8i9sb-guk2lv53z3572dc 40682 3 ERROR: Task process exited 0, but never updated its task record to indicate success and record its output.
2016-02-02_08:42:28 wx7k5-8i9sb-guk2lv53z3572dc 40682 3 failure (#1, permanent) after 560 seconds
2016-02-02_08:42:28 wx7k5-8i9sb-guk2lv53z3572dc 40682 3 task output (0 bytes):

No issue #

8 years agoMerge branch '8288-poll-client-close-timeout' refs #8288
Tom Clegg [Thu, 4 Feb 2016 18:17:31 +0000 (13:17 -0500)]
Merge branch '8288-poll-client-close-timeout' refs #8288

8 years ago8288: Add timeout option to close() method of event clients.
Tom Clegg [Mon, 1 Feb 2016 06:58:34 +0000 (01:58 -0500)]
8288: Add timeout option to close() method of event clients.

Previously in EventClient, close() didn't wait for anything. Now, if a
timeout is given, it waits for ws4py to call the closed() callback to
indicate the connection has closed.

Previously in PollClient, close() waited indefinitely for the polling
thread to terminate.  This can take a very long time if, for example,
there are multiple subscriptions and the "get logs" API transaction is
slow.

The only apparent reason a caller would want to wait here at all is to
guarantee the simplifying assumption the on_event() callback is never
called after close().  Now, instead of letting the thread run until
all events are received and handled, PollClient achieves this the same
way EventClient does: ignore events that arrive after close().

8 years ago8183: set limit on my_toplevel_projects
radhika [Thu, 4 Feb 2016 15:45:26 +0000 (10:45 -0500)]
8183: set limit on my_toplevel_projects

8 years ago8183: show only toplevel projects in the Projects dropdown in breadcrumbs.
radhika [Thu, 4 Feb 2016 15:21:00 +0000 (10:21 -0500)]
8183: show only toplevel projects in the Projects dropdown in breadcrumbs.