Tim Pierce [Fri, 12 Sep 2014 19:18:15 +0000 (15:18 -0400)]
Bug fix: manifests with extra spaces
Extend the regular expression to match manifest_text to permit more than
a single space in manifest entries (seen in e.g. 91534558193f42a2f7f8aca872e5a78d+15723)
Brett Smith [Thu, 11 Sep 2014 15:21:36 +0000 (11:21 -0400)]
Clean up manifest whitespace in split-fastq.
The previous code was generating two spaces between the stream name
and block list, because it had the space from as_manifest() as well as
its own join. This yielded an invalid manifest.
Brett Smith [Wed, 10 Sep 2014 21:00:21 +0000 (17:00 -0400)]
Prevent PySDK CollectionReader from sending UUIDs to Keep.
No issue #. I tickled this while I was working on #3147. Ward saw
the apparent symptom come up in a Job log, so I'm pushing the fix at
his request. I have a test prepared in my branch, along with a bunch
of test infrastructure.
Brett Smith [Wed, 10 Sep 2014 18:22:01 +0000 (14:22 -0400)]
3846: Improve timeout handling in PySDK KeepClient.
* Catch socket errors (including timeouts) and treat them as
transient with regard to retry logic.
* Increase the default timeout to 5 minutes. Given how long it can
take to PUT 64MiB to a proxy and wait for two servers to return
success, this seems like a reasonable default. Future improvements
could set different timeouts based on the request type and whether
or not we're talking to a proxy.
Brett Smith [Wed, 10 Sep 2014 16:39:58 +0000 (12:39 -0400)]
3842: Keep::Manifest concatenates file information from manifest.
The previous implementation failed to consider the possibility that
file information would be spread across multiple lines of a manifest.
This would cause, e.g., the same file to be yielded many times from
each_file.
This requirement makes it impossible to return file size information
without parsing the entire manifest. Because of that, I have reworked
the Ruby SDK API so that method names are more consistent with their
performance characteristics. I have also added some methods to do
some basic file existence checking that do not require parsing the
whole manifest.
Peter Amstutz [Tue, 9 Sep 2014 15:53:48 +0000 (11:53 -0400)]
3453: Rename check_project_exists to desired_project_uuid. Now raises and
catches distinct apiclient.errors.Error, ValueError errors for project not
found or invalid uuid.
Peter Amstutz [Mon, 8 Sep 2014 20:54:49 +0000 (16:54 -0400)]
3453: Refactored arv-put to remove support for name links, correctly use
ensure_name_unique to prevent name collisions. arv-keepdocker should now
correctly handle cases where the user provides a image hash instead of
repository/tag. Fixed tests.
Peter Amstutz [Mon, 8 Sep 2014 14:53:32 +0000 (10:53 -0400)]
Added create#ensure_unique_name to discovery document. "Add a subproject"
button now uses "ensure_unique_name" to avoid errors when the user creates more
than one project called "New project". refs #3822
Peter Amstutz [Fri, 5 Sep 2014 20:23:38 +0000 (16:23 -0400)]
3822: Added 'ensure_unique_name' option to #create method for API server to
choose a unique name when there is a name collision in the database.
arv-run-pipeline-instance checks to see if there is an output file with the
same name and contents, and uses 'ensure_unique_name' when creating collection.
Brett Smith [Wed, 3 Sep 2014 21:52:00 +0000 (17:52 -0400)]
3720: Limit Workbench file rendering for large Collections.
Rendering too many files can cause rendering to take too long, and
there's not much point because it can really strain browsers too.
Arbitrarily cap rendering at 10,000 files.
Brett Smith [Fri, 29 Aug 2014 21:37:13 +0000 (17:37 -0400)]
3720: Refactor manifest parsing from API server to Ruby SDK.
To date, the API server has been parsing manifests and returning
parsed information in the files attribute. That's convenient, but
causes performance problems in large Collections. Profiling indicates
that most of the API server's response time is spent in the
as_api_response method; rendering the same information twice really
hurts.
This commits removes the API server's need to always parse manifests,
and moves the parsing code to the Ruby SDK. Both the API server and
Workbench use this Gem to parse manifests on an as-needed basis.
This, combined with improvements to the new Keep::Manifest class, lays
the groundwork for future performance gains.
Brett Smith [Wed, 3 Sep 2014 18:57:25 +0000 (14:57 -0400)]
3720: Clean up+skip Ruby SDK tests.
First I changed these to get them passing again. With that done, I
added the skip. We're not currently running these tests, and getting
them going the right way again (e.g., with run_test_server.py or
mocks) is out of scope for my current task.
Peter Amstutz [Fri, 5 Sep 2014 14:16:17 +0000 (10:16 -0400)]
3453: Now lists images by default if called with no parameters. Fixed reported
timestamps on images without a 'timestamp' property. Added 'Docker image'
prefix to default name. Renamed variables in list_images_in_arv() to be more
friendly.
Brett Smith [Thu, 4 Sep 2014 20:05:57 +0000 (16:05 -0400)]
3704: Treat project filters consistently in Workbench chooser.
Before this, a chooser that was loaded on a specific project would
return empty results if the user selected a different project. This
is because it would search for items with two different owner_uuids,
both the project selected at load time, and the project the user
selected later. This patch fixes that by separating the project
filter from other filters, and preseeding it in the same place where
it's updated by the project selection AJAX. Closes #3704, #3778.
Tim Pierce [Thu, 4 Sep 2014 18:55:15 +0000 (14:55 -0400)]
3663: update test_file_reader unit test
Because StreamFileReader.read() calls are now aligned on block
boundaries, the unit test needs to take that into account when testing
the results of individual reads.
Peter Amstutz [Thu, 4 Sep 2014 18:22:25 +0000 (14:22 -0400)]
3644: Added choose-your-own-adventure README files to the --all and --by-id
directories. Regular expressions are now compiled once at the top for
efficiency and readability.
3710: update the combine_selected_files_into_collection method to also handle the scenarios where a collection uuid or "collection uuid/filename" is passed.
Peter Amstutz [Thu, 4 Sep 2014 17:31:26 +0000 (13:31 -0400)]
3644: Change metafiles to .arvados#collection and .arvados#project. Change
--by-hash to be --by-id since it works with both uuids and portable data
hashes.
Peter Amstutz [Wed, 3 Sep 2014 18:10:13 +0000 (14:10 -0400)]
3644: Re-added support for files with json contents of arvados objects.
Changed block size reported by getattr() use 512 byte blocks which is seems to
be necessary to get 'du' to compute the right results.
Peter Amstutz [Tue, 2 Sep 2014 19:59:51 +0000 (15:59 -0400)]
Delete names and description columns from jobs that shouldn't be there. Delete
jobs_owner_uuid_name_unique and pipeline_instance_owner_uuid_name_unique
indexes added by mistake. refs #3036.
Peter Amstutz [Tue, 2 Sep 2014 18:56:07 +0000 (14:56 -0400)]
3453: Add support for --images to get a list of available images. Add support
for --project-uuid to place the image and tags in the specified project. Set a
more useful name for the collection containing the image.
refs #3637 - merge Tom's updates to chooser implementation, where selected items are passed as parameters.
Merge branch '3637-selection-through-chooser'
Peter Amstutz [Fri, 29 Aug 2014 20:35:43 +0000 (16:35 -0400)]
3644: Project and Home directory work. Added support for returning correct
timestamps on projects, collections, files. Should be update
CollectionDirectory if the underlying collection record changes. Fixed bugs.