---
layout: default
navsection: api
navmenu: API Methods
title: "collections"

...
{% comment %}
Copyright (C) The Arvados Authors. All rights reserved.

SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}

API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/collections@

Object type: @4zz18@

Example UUID: @zzzzz-4zz18-0123456789abcde@

h2. Resource

Collections describe sets of files in terms of data blocks stored in Keep.  See "Keep - Content-Addressable Storage":{{site.baseurl}}/architecture/storage.html and "using collection versioning":../../user/topics/collection-versioning.html for details.

Each collection has, in addition to the "Common resource fields":{{site.baseurl}}/api/resources.html:

table(table table-bordered table-condensed).
|_. Attribute|_. Type|_. Description|_. Example|
|name|string|||
|description|text|Free text description of the group.  Allows "HTML formatting.":{{site.baseurl}}/api/resources.html#descriptions||
|properties|hash|User-defined metadata, may be used in queries using "subproperty filters":{{site.baseurl}}/api/methods.html#subpropertyfilters ||
|portable_data_hash|string|The MD5 sum of the manifest text stripped of block hints other than the size hint.||
|manifest_text|text|The manifest describing how to assemble blocks into files, in the "Arvados manifest format":{{site.baseurl}}/architecture/manifest-format.html||
|replication_desired|number|Minimum storage replication level desired for each data block referenced by this collection. A value of @null@ signifies that the site default replication level (typically 2) is desired.|@2@|
|replication_confirmed|number|Replication level most recently confirmed by the storage system. This field is null when a collection is first created, and is reset to null when the manifest_text changes in a way that introduces a new data block. An integer value indicates the replication level of the _least replicated_ data block in the collection.|@2@, null|
|replication_confirmed_at|datetime|When @replication_confirmed@ was confirmed. If @replication_confirmed@ is null, this field is also null.||
|storage_classes_desired|list|An optional list of storage class names where the blocks should be saved. If not provided, the cluster's default storage class(es) will be set.|@['archival']@|
|storage_classes_confirmed|list|Storage classes most recently confirmed by the storage system. This field is an empty list when a collection is first created.|@'archival']@, @[]@|
|storage_classes_confirmed_at|datetime|When @storage_classes_confirmed@ was confirmed. If @storage_classes_confirmed@ is @[]@, this field is null.||
|trash_at|datetime|If @trash_at@ is non-null and in the past, this collection will be hidden from API calls.  May be untrashed.||
|delete_at|datetime|If @delete_at@ is non-null and in the past, the collection may be permanently deleted.||
|is_trashed|boolean|True if @trash_at@ is in the past, false if not.||
|current_version_uuid|string|UUID of the collection's current version. On new collections, it'll be equal to the @uuid@ attribute.||
|version|number|Version number, starting at 1 on new collections. This attribute is read-only.||
|preserve_version|boolean|When set to true on a current version, it will be persisted. When passing @true@ as part of a bigger update call, both current and newly created versions are persisted.||
|file_count|number|The total number of files in the collection. This attribute is read-only.||
|file_size_total|number|The sum of the file sizes in the collection. This attribute is read-only.||

h3. Conditions of creating a Collection

If a new @portable_data_hash@ is specified when creating or updating a Collection, it must match the cryptographic digest of the supplied @manifest_text@.

h3. Side effects of creating a Collection

Referenced blocks are protected from garbage collection in Keep.

Data can be shared with other users via the Arvados permission model.

h3(#trashing). Trashing collections

Collections can be trashed by updating the record and setting the @trash_at@ field, or with the "delete":#delete method.  The delete method sets @trash_at@ to "now".

The value of @trash_at@ can be set to a time in the future as a feature to automatically expire collections.

When @trash_at@ is set, @delete_at@ will also be set.  Normally @delete_at = trash_at + Collections.DefaultTrashLifetime@.  When the @trash_at@ time is past but @delete_at@ is in the future, the trashed collection is invisible to most API calls unless the @include_trash@ parameter is true.  Collections in the trashed state can be "untrashed":#untrash so long as @delete_at@ has not past.  Collections are also trashed if they are contained in a "trashed group":groups.html#trashing

Once @delete_at@ is past, the collection and all of its previous versions will be deleted permanently and can no longer be untrashed.

h3(#replace_files). Using "replace_files" to create or update a collection

The @replace_files@ option can be used with the "create":#create and "update":#update APIs to efficiently and atomically copy individual files and directory trees from other collections, copy/rename/delete items within an existing collection, and add new items to a collection.

@replace_files@ keys indicate target paths in the new collection, and values specify sources that should be copied to the target paths.
* Each target path must be an absolute canonical path beginning with @/@. It must not contain @.@ or @..@ components, consecutive @/@ characters, or a trailing @/@ after the final component.
* Each source must be one of the following:
** an empty string (signifying that the target path is to be deleted),
** @<PDH>/<path>@ where @<PDH>@ is the portable data hash of a collection on the cluster and @<path>@ is a file or directory in that collection,
** @manifest_text/<path>@ where @<path>@ is an existing file or directory in a collection supplied in the @manifest_text@ attribute in the request, or
** @current/<path>@ where @<path>@ is an existing file or directory in the collection being updated.

In an @update@ request, sources may reference the current portable data hash of the collection being updated. However, in many cases it is more appropriate to use a @current/<path>@ source instead, to ensure the latest content is used even if the collection has been updated since the PDH was last retrieved.

h4(#replace_files-delete). Delete a file

Delete @foo.txt@.

<notextile><pre>
"replace_files": {
  "/foo.txt": ""
}
</pre></notextile>

h4(#replace_files-rename). Rename a file

Rename @foo.txt@ to @bar.txt@.

<notextile><pre>
"replace_files": {
  "/foo.txt": "",
  "/bar.txt": "current/foo.txt"
}
</pre></notextile>

h4(#replace_files-swap). Swap files

Swap contents of files @foo@ and @bar@.

<notextile><pre>
"replace_files": {
  "/foo": "current/bar",
  "/bar": "current/foo"
}
</pre></notextile>

h4(#replace_files-add). Add a file

<notextile><pre>
"replace_files": {
  "/new_directory/new_file.txt": "manifest_text/new_file.txt"
},
"collection": {
  "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

h4(#replace_files-replace). Replace all content with new content

Note this is equivalent to omitting the @replace_files@ argument.

<notextile><pre>
"replace_files": {
  "/": "manifest_text/"
},
"collection": {
  "manifest_text": "./new_directory acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

h4(#replace_files-rename-and-replace). Atomic rename and replace

Rename @current_file.txt@ to @old_file.txt@ and replace @current_file.txt@ with new content, all in a single atomic operation.

<notextile><pre>
"replace_files": {
  "/current_file.txt": "manifest_text/new_file.txt",
  "/old_file.txt": "current/current_file.txt"
},
"collection": {
  "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

h4(#replace_files-combine). Combine collections

Delete all current content, then copy content from other collections into new subdirectories.

<notextile><pre>
"replace_files": {
  "/": "",
  "/copy of collection 1": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/",
  "/copy of collection 2": "ea10d51bcf88862dbcc36eb292017dfd+45/"
}
</pre></notextile>

h4(#replace_files-extract-subdirectory). Extract a subdirectory

Replace all current content with a copy of a subdirectory from another collection.

<notextile><pre>
"replace_files": {
  "/": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/subdir"
}
</pre></notextile>

h4(#replace_files-usage-restrictions). Usage restrictions

A target path with a non-empty source cannot be the ancestor of another target path in the same request. For example, the following request is invalid:

<notextile><pre>
"replace_files": {
  "/foo": "fa7aeb5140e2848d39b416daeef4ffc5+45/",
  "/foo/this_will_return_an_error": ""
}
</pre></notextile>

It is an error to supply a non-empty @manifest_text@ that is unused, i.e., the @replace_files@ argument does not contain any values beginning with @"manifest_text/"@. For example, the following request is invalid:

<notextile><pre>
"replace_files": {
  "/foo": "current/bar"
},
"collection": {
  "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

Collections on other clusters in a federation cannot be used as sources. Each source must exist on the current cluster and be readable by the current user.

Similarly, if @manifest_text@ is provided, it must only reference data blocks that are stored on the current cluster. This API does not copy data from other clusters in a federation.

h3(#replace_segments). Using "replace_segments" to repack file data

The @replace_segments@ option can be used with the "create":#create or "update":#update API to atomically apply a new file packing, typically with the goal of replacing a number of small blocks with one larger block. The repacking is specified in terms of _block segments_: a block segment is a portion of a stored block that is referenced by a file in a manifest.

@replace_segments@ keys indicate existing block segments in the collection, and values specify replacement segments.
* Each segment is specified as space-separated tokens: @"locator offset length"@ where @locator@ is a signed block locator and @offset@ and @length@ are decimal-encoded integers specifying a portion of the block that is referenced in the collection.
* Each replacement block locator must be properly signed (just as if it appeared in a @manifest_text@).
* Each existing block segment must correspond to an entire contiguous portion of a block referenced by a single file (splitting existing segments is not supported).
* If a segment to be replaced does not match any existing block segment in the manifest, that segment _and all other @replace_segments@ entries referencing the same replacement block_ will be skipped. Other replacements will still be applied. Replacements that are skipped for this reason do not cause the request to fail. This rule ensures that when concurrent clients compute different repackings and request similar replacements such as @a,b,c,d,e → X@ and @a,b,c,d,e,f → Y@, the resulting manifest references @X@ or @Y@ but not both. Otherwise, the effect could be @a,b,c,d,e → X, f → Y@ where @Y@ is just an inefficient way to reference the same data as @f@.

The @replace_files@ and @manifest_text@ options, if present, are applied before @replace_segments@. This means @replace_segments@ can apply to blocks from @manifest_text@ and/or other collections referenced by @replace_files@.

In the following example, two files were originally saved by writing two small blocks (@c410@ and @c93e@). After concatenating the two small blocks and writing a single larger block @ca9c@, the manifest is being updated to reference the larger block.

<notextile><pre>
"collection": {
  "manifest_text": ". c4103f122d27677c9db144cae1394a66+2+A3d02f1f3d8a622b2061ad5afe4853dbea42039e2@674dd351 693e9af84d3dfcc71e640e005bdc5e2e+3+A6528480b63d90a24b60b2ee2409040f050cc5d0c@674dd351 0:2:file1.txt 2:3:file2.txt\n"
},
"replace_segments": {
  "c4103f122d27677c9db144cae1394a66+2+A3d02f1f3d8a622b2061ad5afe4853dbea42039e2@674dd351 0 2": "ca9c491ac66b2c62500882e93f3719a8+5+A312fea6de5807e9e77d844450d36533a599c40f1@674dd351 0 2",
  "693e9af84d3dfcc71e640e005bdc5e2e+3+A6528480b63d90a24b60b2ee2409040f050cc5d0c@674dd351 0 3": "ca9c491ac66b2c62500882e93f3719a8+5+A312fea6de5807e9e77d844450d36533a599c40f1@674dd351 2 3"
}
</pre></notextile>

Resulting manifest:

<notextile><pre>
. ca9c491ac66b2c62500882e93f3719a8+5+A312fea6de5807e9e77d844450d36533a599c40f1@674dd351 0:2:file1.txt 2:3:file2.txt
</pre></notextile>

h2. Methods

See "Common resource methods":{{site.baseurl}}/api/methods.html for more information about @create@, @delete@, @get@, @list@, and @update@.

Required arguments are displayed in %{background:#ccffcc}green%.

Supports federated @get@ only, which may be called with either a uuid or a portable data hash.  When requesting a portable data hash which is not available on the home cluster, the query is forwarded to all the clusters listed in @RemoteClusters@ and returns the first successful result.

h3(#create). create

Create a new Collection.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
|collection|object||query||
|replace_files|object|Initialize files and directories with new content and/or content from other collections|query||
|replace_segments|object|Repack the collection by substituting data blocks|query||

The new collection's content can be initialized by providing a @manifest_text@ key in the provided @collection@ object, or by "using the @replace_files@ option":#replace_files.

An alternative file packing can be applied atomically "using the @replace_segments@ option":#replace_segments.

h3(#delete). delete

Put a Collection in the trash.  This sets the @trash_at@ field to @now@ and @delete_at@ field to @now@ + token TTL.  A trashed collection is invisible to most API calls unless the @include_trash@ parameter is true.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection in question.|path||

h3. get

Gets a Collection's metadata by UUID or portable data hash.  When making a request by portable data hash, attributes other than @portable_data_hash@, @manifest_text@, and @trash_at@ are not returned, even when requested explicitly using the @select@ parameter.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID or portable data hash of the Collection in question.|path||

h3. list

List collections.

See "common resource list method.":{{site.baseurl}}/api/methods.html#index

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
|include_trash|boolean (default false)|Include trashed collections.|query||
|include_old_versions|boolean (default false)|Include past versions of the collection(s) being listed, if any.|query||

Note: Because adding access tokens to manifests can be computationally expensive, the @manifest_text@ field is not included in results by default.  If you need it, pass a @select@ parameter that includes @manifest_text@.

h4. Searching Collections for names of file or directories

You can search collections for specific file or directory names (whole or part) using the following filter in a @list@ query.

<pre>
filters: [["file_names", "ilike", "%sample1234.fastq%"]]
</pre>

Note: @file_names@ is a hidden field used for indexing.  It is not returned by any API call.  On the client, you can programmatically enumerate all the files in a collection using @arv-ls@, the Python SDK @Collection@ class, Go SDK @FileSystem@ struct, the WebDAV API, or the S3-compatible API.

As of this writing (Arvados 2.4), you can also search for directory paths, but _not_ complete file paths.

In other words, this will work (when @dir3@ is a directory):

<pre>
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"]]
</pre>

However, this will _not_ return the desired results (where @sample1234.fastq@ is a file):

<pre>
filters: [["file_names", "ilike", "%dir1/dir2/dir3/sample1234.fastq%"]]
</pre>

As a workaround, you can search for both the directory path and file name separately, and then filter on the client side.

<pre>
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"], ["file_names", "ilike", "%sample1234.fastq%"]]
</pre>

h3(#update). update

Update attributes of an existing Collection.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection in question.|path||
|collection|object||query||
|replace_files|object|Add, delete, and replace files and directories with new content and/or content from other collections|query||
|replace_segments|object|Repack the collection by substituting data blocks|query||

The collection's existing content can be replaced entirely by providing a @manifest_text@ key in the provided @collection@ object, or updated in place by "using the @replace_files@ option":#replace_files.

An alternative file packing can be applied atomically "using the @replace_segments@ option":#replace_segments.

h3(#untrash). untrash

Remove a Collection from the trash.  This sets the @trash_at@ and @delete_at@ fields to @null@.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection to untrash.|path||
|ensure_unique_name|boolean (default false)|Rename collection uniquely if untrashing it would fail with a unique name conflict.|query||


h3. provenance

Returns a list of objects in the database that directly or indirectly contributed to producing this collection, such as the container request that produced this collection as output.

The general algorithm is:

# Visit the container request that produced this collection (via @output_uuid@ or @log_uuid@ attributes of the container request)
# Visit the input collections to that container request (via @mounts@ and @container_image@ of the container request)
# Iterate until there are no more objects to visit

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection to get provenance.|path||

h3. used_by

Returns a list of objects in the database this collection directly or indirectly contributed to, such as containers that takes this collection as input.

The general algorithm is:

# Visit containers that take this collection as input (via @mounts@ or @container_image@ of the container)
# Visit collections produced by those containers (via @output@ or @log@ of the container)
# Iterate until there are no more objects to visit

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection to get usage.|path||