Brett Smith [Mon, 15 Jun 2015 17:54:36 +0000 (13:54 -0400)]
4410: Crunch retries jobs when all SLURM nodes fail.
See the ticket for detailed background discussion and implementation
rationale, especially notes 13 and 14.
This required a couple of ancillary changes:
* crunch-job now makes a distinction between "task failed because a
node failed," and "task failed for other temporary reason." It uses
this additional information to decide when it should retry tasks
itself, and when it needs to give up and kick the problem up to
crunch-dispatch.
* crunch-job now handles creating log collections itself from
manifests generated by arv-put. This enables it to append to logs
generated during previous attempts to run the job.
Brett Smith [Mon, 15 Jun 2015 21:04:25 +0000 (17:04 -0400)]
4410: crunch-dispatch logs crunch-job exit later.
This makes it easier to log the exit code, and makes the logs look
nicer because the exit log doesn't interrupt crunch-job's stderr.
Brett Smith [Thu, 18 Jun 2015 15:35:40 +0000 (11:35 -0400)]
Merge branch '6320-api-logins-include-groups-wip'
Refs #6320. Closes #6325.
Brett Smith [Tue, 16 Jun 2015 13:40:58 +0000 (09:40 -0400)]
6320: API virtual machines login method include groups information.
This is necessary for the virtual machine user setup script to see the
groups information added in #6254.
This required updating a few tests that assumed the active user had no
access to testvm2.
Brett Smith [Tue, 16 Jun 2015 13:18:32 +0000 (09:18 -0400)]
6320: Add tests for API virtual machines login method.
radhika [Thu, 18 Jun 2015 14:49:23 +0000 (10:49 -0400)]
closes #6156
Merge branch '6156-hostnames-in-nodes'
radhika [Thu, 18 Jun 2015 14:47:54 +0000 (10:47 -0400)]
6156: convert the ping-should-fail test when hostname config is malformed into a controller test.
radhika [Thu, 18 Jun 2015 14:22:55 +0000 (10:22 -0400)]
Merge branch 'master' into 6156-hostnames-in-nodes
radhika [Thu, 18 Jun 2015 02:06:46 +0000 (22:06 -0400)]
refs #6277
Merge branch '6277-manifest-validation-api'
radhika [Thu, 18 Jun 2015 02:04:10 +0000 (22:04 -0400)]
6277: since locators are added to multilevel_collection_1 fixture, the collection retrieved after an update will have
a signed manifest; hence, the test needs to compare the old manifest with the new one after stripping signatures.
radhika [Wed, 17 Jun 2015 23:26:41 +0000 (19:26 -0400)]
refs #6277
Merge branch '6277-manifest-validation-api'
radhika [Wed, 17 Jun 2015 23:06:00 +0000 (19:06 -0400)]
6277: rename the before_validation filter check_manifest_validity as log_invalid_manifest_format until we are ready to actually validate manifest formats.
radhika [Wed, 17 Jun 2015 22:54:36 +0000 (18:54 -0400)]
Merge branch 'master' into 6277-manifest-validation-api
radhika [Wed, 17 Jun 2015 22:34:17 +0000 (18:34 -0400)]
6156: use only sprintf formatting for node slot_number config.
radhika [Wed, 17 Jun 2015 21:36:58 +0000 (17:36 -0400)]
Merge branch 'master' into 6156-hostnames-in-nodes
Tom Clegg [Wed, 17 Jun 2015 20:40:29 +0000 (16:40 -0400)]
6277: Simplify manifest-building loop, fix up truncation tests.
Tom Clegg [Wed, 17 Jun 2015 02:30:03 +0000 (22:30 -0400)]
Merge branch '6272-index-eof' closes #6272
Tom Clegg [Tue, 16 Jun 2015 20:14:58 +0000 (16:14 -0400)]
6272: Add blank line to indicate index response EOF. Error out of data manager if not received.
Tom Clegg [Tue, 16 Jun 2015 22:17:03 +0000 (18:17 -0400)]
Do not blow up if HOME is not set. refs #2256
Tom Clegg [Tue, 16 Jun 2015 20:09:04 +0000 (16:09 -0400)]
Merge branch '6222-precompile-regexps' refs #6222
Tom Clegg [Sat, 6 Jun 2015 20:29:48 +0000 (16:29 -0400)]
6222: Precompile all regexps. Remove wasted effort in GetBlockHandler.
radhika [Tue, 16 Jun 2015 15:32:55 +0000 (11:32 -0400)]
6156: support config format for setting a node's hostname
radhika [Mon, 15 Jun 2015 19:02:23 +0000 (15:02 -0400)]
6277: Add check_manifest_validity before_filter in collection model; however, at the moment, this method always true after logging the validation error.
radhika [Mon, 15 Jun 2015 15:34:58 +0000 (11:34 -0400)]
refs #6277
Merge branch '6277-manifest-validation'
radhika [Mon, 15 Jun 2015 15:34:37 +0000 (11:34 -0400)]
Merge branch 'master' into 6277-manifest-validation
radhika [Mon, 15 Jun 2015 15:33:45 +0000 (11:33 -0400)]
6277: all that work and missed the basic nil and empty string check!!!
radhika [Mon, 15 Jun 2015 14:45:21 +0000 (10:45 -0400)]
closes #6254
Merge branch '6254-groups-during-setup'
radhika [Sun, 14 Jun 2015 01:31:29 +0000 (21:31 -0400)]
refs #6277 : ruby sdk with manifest validation method
Merge branch '6277-manifest-validation'
radhika [Sun, 14 Jun 2015 01:21:39 +0000 (21:21 -0400)]
6277: extra white space
radhika [Sun, 14 Jun 2015 00:47:04 +0000 (20:47 -0400)]
Merge branch 'master' into 6277-manifest-validation
radhika [Sat, 13 Jun 2015 02:24:49 +0000 (22:24 -0400)]
6254: minor update to Groups text field label to avoid conflict with 'Virtual Machine' lebel.
radhika [Sat, 13 Jun 2015 01:19:35 +0000 (21:19 -0400)]
6254: better groups text field label
radhika [Sat, 13 Jun 2015 01:06:23 +0000 (21:06 -0400)]
6254: instead of tooltip, use a self explanatory label for groups text field.
radhika [Sat, 13 Jun 2015 00:26:28 +0000 (20:26 -0400)]
7254: add groups to vm link.
radhika [Fri, 12 Jun 2015 18:30:20 +0000 (14:30 -0400)]
6254: remove redundant statement
radhika [Fri, 12 Jun 2015 18:26:52 +0000 (14:26 -0400)]
6254: add groups and verify in test.
Brett Smith [Fri, 12 Jun 2015 18:05:19 +0000 (14:05 -0400)]
Merge branch '6149-quiet-egg-info-stderr-wip'
Closes #6149, #6284.
Brett Smith [Thu, 11 Jun 2015 21:16:03 +0000 (17:16 -0400)]
6149: crunch-job installer handles egg_info errors better.
Capture the command's stderr and confirm that the error refers to
git. If it does, ignore the stderr and set a build tag. Otherwise,
propagate stderr and abort.
radhika [Fri, 12 Jun 2015 15:02:13 +0000 (11:02 -0400)]
6254: slight performance improvement where the can_login link is not retrieved and checked if the groups passed in are the same as those already saved.
radhika [Fri, 12 Jun 2015 14:51:26 +0000 (10:51 -0400)]
6254: add "groups" to user setup process; these comma separated groups entered in the popup will be saved as an array of groups property for the user's can_login link.
Tom Clegg [Thu, 11 Jun 2015 05:58:49 +0000 (01:58 -0400)]
6277: Catch whitespace errors, and "." and ".." in paths. Rename valid? to validate!.
radhika [Fri, 12 Jun 2015 00:13:45 +0000 (20:13 -0400)]
6277: Add more restrictions to manifest format such as cannot start with '/' and end with '/' for file names,
as well as stream names and files names should not contain '//'.
Added the tests provided by Tom during review.
radhika [Thu, 11 Jun 2015 20:19:05 +0000 (16:19 -0400)]
6277: valid manifest must end with new line.
radhika [Thu, 11 Jun 2015 20:11:28 +0000 (16:11 -0400)]
6277: improve error message for missing file tokens.
radhika [Thu, 11 Jun 2015 18:33:48 +0000 (14:33 -0400)]
6277: add Manifest::valid? method in ruby sdk.
Tom Clegg [Tue, 9 Jun 2015 19:16:28 +0000 (15:16 -0400)]
Update arvados in bundle. refs #6203 refs #6277
Tom Clegg [Tue, 9 Jun 2015 19:12:46 +0000 (15:12 -0400)]
Use default word wrap (no mid-word line breaks). refs #6057
Tom Clegg [Tue, 9 Jun 2015 14:11:35 +0000 (10:11 -0400)]
Merge branch '6203-locator-regexp' refs #6203 refs #6277
Nico Cesar [Wed, 10 Jun 2015 16:26:15 +0000 (12:26 -0400)]
added shell-image back again to the build. I checked on my local machine and works great
no issue #
Brett Smith [Wed, 10 Jun 2015 15:24:38 +0000 (11:24 -0400)]
5790: Fix PySDK Docker image listing comparing ints and datetimes.
Closes #5790.
Ward Vandewege [Wed, 10 Jun 2015 14:32:40 +0000 (10:32 -0400)]
Add instructions to the 'Create standard objects' page to create a cluster-wide readable project for standard Arvados Docker images.
refs #6096
radhika [Wed, 10 Jun 2015 14:15:47 +0000 (10:15 -0400)]
closes #6203
Merge branch '6203-collection-perf-api'
radhika [Wed, 10 Jun 2015 14:06:35 +0000 (10:06 -0400)]
Merge branch '6203-collection-perf-api-TC' of git.curoverse.com:arvados into 6203-collection-perf-api-TC
Conflicts:
services/api/app/models/collection.rb
Nico Cesar [Tue, 9 Jun 2015 18:48:29 +0000 (14:48 -0400)]
qr1hi-automated-performance-suite is failing because the test doesnt give enough time for the page to render (now 50s makes the test past.). no issue #
Nico Cesar [Tue, 9 Jun 2015 18:38:18 +0000 (14:38 -0400)]
qr1hi-automated-performance-suite is failing because the test doesnt give enough time for the page to render
no issue #
Nico Cesar [Tue, 9 Jun 2015 15:32:25 +0000 (11:32 -0400)]
openssl self cert creation failed because missing of "tmp" directory.
api server failed to start because missing "tmp/logs" and "tmp/api" ditectories
no issue #
Peter Amstutz [Tue, 9 Jun 2015 15:22:52 +0000 (11:22 -0400)]
Merge branch '6235-go-sdk-discovery' closes #6235
Brett Smith [Tue, 9 Jun 2015 14:48:36 +0000 (10:48 -0400)]
Merge branch '5790-copy-most-recent-docker-image-wip'
Closes #5790, #6103.
Brett Smith [Wed, 3 Jun 2015 20:59:24 +0000 (16:59 -0400)]
5790: Improve Docker image listing in Python SDK.
* Always fetch all relevant Docker links.
* Support finding images by image hash.
* Show image hashes when listing images by name.
* Like Docker itself, when an image has multiple names and we're not
filtering by name, list each one.
* Better match the API server's priority logic:
* Ignore links to collections that aren't found.
* Links with an image_timestamp always have priority over those that
don't, regardless of their respective created_at timestamps.
The main motivation for this change is to make sure arv-copy gets the
right Docker image when copying a pipeline template recursively.
This implementation goes through some trouble to parse timestamps out
of each Docker link only once.
Brett Smith [Tue, 9 Jun 2015 14:26:44 +0000 (10:26 -0400)]
Merge branch '6152-compute-node-no-arv-wip'
Closes #6152, #6256.
Brett Smith [Fri, 5 Jun 2015 19:46:08 +0000 (15:46 -0400)]
6152: Use Python SDK for compute node installation.
This eliminates an otherwise-needless dependency on `arv` and the
entire Ruby stack.
radhika [Tue, 9 Jun 2015 14:25:59 +0000 (10:25 -0400)]
closes #6093
Merge branch '6093-refresh-docs'
radhika [Tue, 9 Jun 2015 14:21:38 +0000 (10:21 -0400)]
6093: remove "Alternate way to add SSH keys" and add the "Manage account" link blurb to "Adding your keys" section itself.
Tom Clegg [Mon, 8 Jun 2015 19:34:48 +0000 (15:34 -0400)]
6203: Fix loophole allowing locators in bogus manifests to be accepted
without verification and then signed in responses. refs #6277
Tom Clegg [Mon, 8 Jun 2015 16:03:42 +0000 (12:03 -0400)]
6203: Add tests for LOCATOR_REGEXP. Fix regexp to reject "++" and trailing "\n".
radhika [Tue, 9 Jun 2015 14:09:07 +0000 (10:09 -0400)]
Merge branch 'master' into 6093-refresh-docs
Tom Clegg [Mon, 8 Jun 2015 14:10:26 +0000 (10:10 -0400)]
6203: Accept (and discard) hints in client-provided portable_data_hash.
Tom Clegg [Mon, 8 Jun 2015 07:11:22 +0000 (03:11 -0400)]
6203: Remove unused vars. Remove unnecessary newline manipulation.
Peter Amstutz [Mon, 8 Jun 2015 13:36:33 +0000 (09:36 -0400)]
6235: Discovery() returns the requested value directly instead of a single-entry map.
Tom Clegg [Mon, 8 Jun 2015 07:11:22 +0000 (03:11 -0400)]
6203: Remove unused vars. Remove unnecessary newline manipulation.
Tom Clegg [Mon, 8 Jun 2015 06:23:57 +0000 (02:23 -0400)]
6203: Use faster =~ instead of match.
Tom Clegg [Mon, 8 Jun 2015 06:23:37 +0000 (02:23 -0400)]
6203: Fix cheating test.
Tom Clegg [Mon, 8 Jun 2015 06:23:20 +0000 (02:23 -0400)]
6203: Use each_line instead of split.each.
Tom Clegg [Mon, 8 Jun 2015 06:22:33 +0000 (02:22 -0400)]
6203: Remove redundant split before each_line.
Tom Clegg [Mon, 8 Jun 2015 05:50:30 +0000 (01:50 -0400)]
6203: Apply special case only to a 0-byte manifest: don't ignore white space.
Tom Clegg [Mon, 8 Jun 2015 05:49:06 +0000 (01:49 -0400)]
6203: Eliminate unneeded variable.
Tom Clegg [Mon, 8 Jun 2015 05:48:45 +0000 (01:48 -0400)]
6203: Merge pdh validations into one method. Update comments. Add tests.
radhika [Tue, 9 Jun 2015 01:42:30 +0000 (21:42 -0400)]
6203: Use LOCATION_REGEXP from sdk; also add back updated each_manifest_locator method
with the same updates as munge method since this is simpler than the munge method. This
has seems further improved performance.
radhika [Mon, 8 Jun 2015 19:39:37 +0000 (15:39 -0400)]
6203: Use manifest.each_line and line.rstrip! instead of manifest.split("\n").
Performance was comparable in both cases; though each_line itself is twice as fast (we need to do strip, which eats away the gain).
radhika [Mon, 8 Jun 2015 18:32:42 +0000 (14:32 -0400)]
6203: further optimization of munge method; also, match[0].sub(/\+A[^+]*/, '') instead of split+append
resulted in another 2 seconds saving in each of create and update operations!
radhika [Mon, 8 Jun 2015 16:24:53 +0000 (12:24 -0400)]
6203: compute_pdh, computed_pdh etc etc etc confusion. clean up to make it easier to follow.
radhika [Sat, 6 Jun 2015 23:36:13 +0000 (19:36 -0400)]
6203: Benchmarking revealed that regexp.match(string) is 2.5x more expensive than string =~ regexp. Updated check_signatures method accordingly.
radhika [Sat, 6 Jun 2015 22:59:33 +0000 (18:59 -0400)]
Merge branch 'master' into 6203-collection-perf-api
Nico Cesar [Fri, 5 Jun 2015 18:54:36 +0000 (14:54 -0400)]
bundler 1.10 brakes workbench build because
https://github.com/lucasefe/themes_for_rails/blob/master/themes_for_rails.gemspec
has a bug. And it's a NOTICE in 1.9.9 but its a FATAL in 1.10.x
this is our workaround.
no issue #
Tom Clegg [Fri, 5 Jun 2015 17:00:31 +0000 (13:00 -0400)]
Merge branch '6146-log-squeue-lost-tasks' refs #6146
Tom Clegg [Thu, 4 Jun 2015 21:09:59 +0000 (17:09 -0400)]
6146: Document how --steps really works. Simplify squeue output format and parsing.
Tom Clegg [Thu, 4 Jun 2015 19:29:35 +0000 (15:29 -0400)]
Merge branch '6074-collections-index' closes #6074
radhika [Thu, 4 Jun 2015 18:10:33 +0000 (14:10 -0400)]
Merge branch 'master' into 6093-refresh-docs
radhika [Thu, 4 Jun 2015 18:09:39 +0000 (14:09 -0400)]
6093: Add button-override css to make any buttons added inside the documentation to appear unclickable to avoid any confusion.
radhika [Thu, 4 Jun 2015 16:51:26 +0000 (12:51 -0400)]
6203: Corrected one dumb switched order of if conditions that caused 5s lag!!
radhika [Thu, 4 Jun 2015 16:16:55 +0000 (12:16 -0400)]
Merge branch 'master' into 6203-collection-perf-api
Tom Clegg [Thu, 4 Jun 2015 15:21:43 +0000 (11:21 -0400)]
6087: Fix MissingAttribute check, and change it to a debug warning for now
because it reveals too many bugs. refs #6087
Tom Clegg [Thu, 4 Jun 2015 14:38:28 +0000 (10:38 -0400)]
6087: Fix MissingAttribute firing for new records during changes_applied. refs #6087
radhika [Thu, 4 Jun 2015 14:27:04 +0000 (10:27 -0400)]
Merge branch 'master' into 6203-collection-perf-api
Conflicts:
apps/workbench/test/integration_performance/collection_unit_test.rb
apps/workbench/test/integration_performance/collections_controller_test.rb
services/api/test/integration/collections_performance_test.rb
services/api/test/unit/collection_performance_test.rb
Tom Clegg [Thu, 4 Jun 2015 13:37:59 +0000 (09:37 -0400)]
Merge branch '6087-collection-timing' closes #6087
Tom Clegg [Wed, 3 Jun 2015 18:50:03 +0000 (14:50 -0400)]
6087: If attributes are accessed but not loaded due to select(), raise instead of returning nil/{}/[].
Tom Clegg [Wed, 3 Jun 2015 16:05:11 +0000 (12:05 -0400)]
6087: Remove unneeded CollectionsController#update special case.
ArvadosBase#save now covers the general case of omitting unchanged attributes.
Tom Clegg [Tue, 2 Jun 2015 19:23:51 +0000 (15:23 -0400)]
6087: Reset changed-attrs list after saving. Fix only-send-changed-attrs logic. Add tests.
Tom Clegg [Wed, 3 Jun 2015 16:00:19 +0000 (12:00 -0400)]
6087: Strengthen "manifest_text is not lost in update" test.
Tom Clegg [Mon, 25 May 2015 17:25:22 +0000 (13:25 -0400)]
6087: Add big-manifest tests, with some finer-grained performance numbers on stderr.