---
layout: default
navsection: api
navmenu: API Methods
title: "collections"

...
{% comment %}
Copyright (C) The Arvados Authors. All rights reserved.

SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}

API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/collections@

Object type: @4zz18@

Example UUID: @zzzzz-4zz18-0123456789abcde@

h2. Resource

Collections describe sets of files in terms of data blocks stored in Keep.  See "Keep - Content-Addressable Storage":{{site.baseurl}}/architecture/storage.html and "using collection versioning":../../user/topics/collection-versioning.html for details.

Each collection has, in addition to the "Common resource fields":{{site.baseurl}}/api/resources.html:

table(table table-bordered table-condensed).
|_. Attribute|_. Type|_. Description|_. Example|
|name|string|||
|description|text|Free text description of the group.  May be HTML formatted, must be appropriately sanitized before display.||
|properties|hash|User-defined metadata, may be used in queries using "subproperty filters":{{site.baseurl}}/api/methods.html#subpropertyfilters ||
|portable_data_hash|string|The MD5 sum of the manifest text stripped of block hints other than the size hint.||
|manifest_text|text|The manifest describing how to assemble blocks into files, in the "Arvados manifest format":{{site.baseurl}}/architecture/manifest-format.html||
|replication_desired|number|Minimum storage replication level desired for each data block referenced by this collection. A value of @null@ signifies that the site default replication level (typically 2) is desired.|@2@|
|replication_confirmed|number|Replication level most recently confirmed by the storage system. This field is null when a collection is first created, and is reset to null when the manifest_text changes in a way that introduces a new data block. An integer value indicates the replication level of the _least replicated_ data block in the collection.|@2@, null|
|replication_confirmed_at|datetime|When @replication_confirmed@ was confirmed. If @replication_confirmed@ is null, this field is also null.||
|storage_classes_desired|list|An optional list of storage class names where the blocks should be saved. If not provided, the cluster's default storage class(es) will be set.|@['archival']@|
|storage_classes_confirmed|list|Storage classes most recently confirmed by the storage system. This field is an empty list when a collection is first created.|@'archival']@, @[]@|
|storage_classes_confirmed_at|datetime|When @storage_classes_confirmed@ was confirmed. If @storage_classes_confirmed@ is @[]@, this field is null.||
|trash_at|datetime|If @trash_at@ is non-null and in the past, this collection will be hidden from API calls.  May be untrashed.||
|delete_at|datetime|If @delete_at@ is non-null and in the past, the collection may be permanently deleted.||
|is_trashed|boolean|True if @trash_at@ is in the past, false if not.||
|current_version_uuid|string|UUID of the collection's current version. On new collections, it'll be equal to the @uuid@ attribute.||
|version|number|Version number, starting at 1 on new collections. This attribute is read-only.||
|preserve_version|boolean|When set to true on a current version, it will be persisted. When passing @true@ as part of a bigger update call, both current and newly created versions are persisted.||
|file_count|number|The total number of files in the collection. This attribute is read-only.||
|file_size_total|number|The sum of the file sizes in the collection. This attribute is read-only.||

h3. Conditions of creating a Collection

If a new @portable_data_hash@ is specified when creating or updating a Collection, it must match the cryptographic digest of the supplied @manifest_text@.

h3. Side effects of creating a Collection

Referenced blocks are protected from garbage collection in Keep.

Data can be shared with other users via the Arvados permission model.

h3(#trashing). Trashing collections

Collections can be trashed by updating the record and setting the @trash_at@ field, or with the "delete":#delete method.  The delete method sets @trash_at@ to "now".

The value of @trash_at@ can be set to a time in the future as a feature to automatically expire collections.

When @trash_at@ is set, @delete_at@ will also be set.  Normally @delete_at = trash_at + Collections.DefaultTrashLifetime@.  When the @trash_at@ time is past but @delete_at@ is in the future, the trashed collection is invisible to most API calls unless the @include_trash@ parameter is true.  Collections in the trashed state can be "untrashed":#untrash so long as @delete_at@ has not past.  Collections are also trashed if they are contained in a "trashed group":groups.html#trashing

Once @delete_at@ is past, the collection and all of its previous versions will be deleted permanently and can no longer be untrashed.

h3(#replace_files). Using "replace_files" to create or update a collection

The @replace_files@ option can be used with the "create":#create and "update":#update APIs to efficiently and atomically copy individual files and directory trees from other collections, copy/rename/delete items within an existing collection, and add new items to a collection.

@replace_files@ keys indicate target paths in the new collection, and values specify sources that should be copied to the target paths.
* Each target path must be an absolute canonical path beginning with @/@. It must not contain @.@ or @..@ components, consecutive @/@ characters, or a trailing @/@ after the final component.
* Each source must be one of the following:
** an empty string (signifying that the target path is to be deleted),
** @<PDH>/<path>@ where @<PDH>@ is the portable data hash of a collection on the cluster and @<path>@ is a file or directory in that collection,
** @manifest_text/<path>@ where @<path>@ is an existing file or directory in a collection supplied in the @manifest_text@ attribute in the request, or
** @current/<path>@ where @<path>@ is an existing file or directory in the collection being updated.

In an @update@ request, sources may reference the current portable data hash of the collection being updated. However, in many cases it is more appropriate to use a @current/<path>@ source instead, to ensure the latest content is used even if the collection has been updated since the PDH was last retrieved.

h4(#replace_files-delete). Delete a file

Delete @foo.txt@.

<notextile><pre>
"replace_files": {
  "/foo.txt": ""
}
</pre></notextile>

h4(#replace_files-rename). Rename a file

Rename @foo.txt@ to @bar.txt@.

<notextile><pre>
"replace_files": {
  "/foo.txt": "",
  "/bar.txt": "current/foo.txt"
}
</pre></notextile>

h4(#replace_files-swap). Swap files

Swap contents of files @foo@ and @bar@.

<notextile><pre>
"replace_files": {
  "/foo": "current/bar",
  "/bar": "current/foo"
}
</pre></notextile>

h4(#replace_files-add). Add a file

<notextile><pre>
"replace_files": {
  "/new_directory/new_file.txt": "manifest_text/new_file.txt"
},
"collection": {
  "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

h4(#replace_files-replace). Replace all content with new content

Note this is equivalent to omitting the @replace_files@ argument.

<notextile><pre>
"replace_files": {
  "/": "manifest_text/"
},
"collection": {
  "manifest_text": "./new_directory acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

h4(#replace_files-rename-and-replace). Atomic rename and replace

Rename @current_file.txt@ to @old_file.txt@ and replace @current_file.txt@ with new content, all in a single atomic operation.

<notextile><pre>
"replace_files": {
  "/current_file.txt": "manifest_text/new_file.txt",
  "/old_file.txt": "current/current_file.txt"
},
"collection": {
  "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

h4(#replace_files-combine). Combine collections

Delete all current content, then copy content from other collections into new subdirectories.

<notextile><pre>
"replace_files": {
  "/": "",
  "/copy of collection 1": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/",
  "/copy of collection 2": "ea10d51bcf88862dbcc36eb292017dfd+45/"
}
</pre></notextile>

h4(#replace_files-extract-subdirectory). Extract a subdirectory

Replace all current content with a copy of a subdirectory from another collection.

<notextile><pre>
"replace_files": {
  "/": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/subdir"
}
</pre></notextile>

h4(#replace_files-usage-restrictions). Usage restrictions

A target path with a non-empty source cannot be the ancestor of another target path in the same request. For example, the following request is invalid:

<notextile><pre>
"replace_files": {
  "/foo": "fa7aeb5140e2848d39b416daeef4ffc5+45/",
  "/foo/this_will_return_an_error": ""
}
</pre></notextile>

It is an error to supply a non-empty @manifest_text@ that is unused, i.e., the @replace_files@ argument does not contain any values beginning with @"manifest_text/"@. For example, the following request is invalid:

<notextile><pre>
"replace_files": {
  "/foo": "current/bar"
},
"collection": {
  "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n"
}
</pre></notextile>

Collections on other clusters in a federation cannot be used as sources. Each source must exist on the current cluster and be readable by the current user.

Similarly, if @manifest_text@ is provided, it must only reference data blocks that are stored on the current cluster. This API does not copy data from other clusters in a federation.

h2. Methods

See "Common resource methods":{{site.baseurl}}/api/methods.html for more information about @create@, @delete@, @get@, @list@, and @update@.

Required arguments are displayed in %{background:#ccffcc}green%.

Supports federated @get@ only, which may be called with either a uuid or a portable data hash.  When requesting a portable data hash which is not available on the home cluster, the query is forwarded to all the clusters listed in @RemoteClusters@ and returns the first successful result.

h3(#create). create

Create a new Collection.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
|collection|object||query||
|replace_files|object|Initialize files and directories with new content and/or content from other collections|query||

The new collection's content can be initialized by providing a @manifest_text@ key in the provided @collection@ object, or by "using the @replace_files@ option":#replace_files.

h3(#delete). delete

Put a Collection in the trash.  This sets the @trash_at@ field to @now@ and @delete_at@ field to @now@ + token TTL.  A trashed collection is invisible to most API calls unless the @include_trash@ parameter is true.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection in question.|path||

h3. get

Gets a Collection's metadata by UUID or portable data hash.  When making a request by portable data hash, attributes other than @portable_data_hash@, @manifest_text@, and @trash_at@ are not returned, even when requested explicitly using the @select@ parameter.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID or portable data hash of the Collection in question.|path||

h3. list

List collections.

See "common resource list method.":{{site.baseurl}}/api/methods.html#index

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
|include_trash|boolean (default false)|Include trashed collections.|query||
|include_old_versions|boolean (default false)|Include past versions of the collection(s) being listed, if any.|query||

Note: Because adding access tokens to manifests can be computationally expensive, the @manifest_text@ field is not included in results by default.  If you need it, pass a @select@ parameter that includes @manifest_text@.

h4. Searching Collections for names of file or directories

You can search collections for specific file or directory names (whole or part) using the following filter in a @list@ query.

<pre>
filters: [["file_names", "ilike", "%sample1234.fastq%"]]
</pre>

Note: @file_names@ is a hidden field used for indexing.  It is not returned by any API call.  On the client, you can programmatically enumerate all the files in a collection using @arv-ls@, the Python SDK @Collection@ class, Go SDK @FileSystem@ struct, the WebDAV API, or the S3-compatible API.

As of this writing (Arvados 2.4), you can also search for directory paths, but _not_ complete file paths.

In other words, this will work (when @dir3@ is a directory):

<pre>
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"]]
</pre>

However, this will _not_ return the desired results (where @sample1234.fastq@ is a file):

<pre>
filters: [["file_names", "ilike", "%dir1/dir2/dir3/sample1234.fastq%"]]
</pre>

As a workaround, you can search for both the directory path and file name separately, and then filter on the client side.

<pre>
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"], ["file_names", "ilike", "%sample1234.fastq%"]]
</pre>

h3(#update). update

Update attributes of an existing Collection.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection in question.|path||
|collection|object||query||
|replace_files|object|Add, delete, and replace files and directories with new content and/or content from other collections|query||

The collection's existing content can be replaced entirely by providing a @manifest_text@ key in the provided @collection@ object, or updated in place by "using the @replace_files@ option":#replace_files.

h3(#untrash). untrash

Remove a Collection from the trash.  This sets the @trash_at@ and @delete_at@ fields to @null@.

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection to untrash.|path||
|ensure_unique_name|boolean (default false)|Rename collection uniquely if untrashing it would fail with a unique name conflict.|query||


h3. provenance

Returns a list of objects in the database that directly or indirectly contributed to producing this collection, such as the container request that produced this collection as output.

The general algorithm is:

# Visit the container request that produced this collection (via @output_uuid@ or @log_uuid@ attributes of the container request)
# Visit the input collections to that container request (via @mounts@ and @container_image@ of the container request)
# Iterate until there are no more objects to visit

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection to get provenance.|path||

h3. used_by

Returns a list of objects in the database this collection directly or indirectly contributed to, such as containers that takes this collection as input.

The general algorithm is:

# Visit containers that take this collection as input (via @mounts@ or @container_image@ of the container)
# Visit collections produced by those containers (via @output@ or @log@ of the container)
# Iterate until there are no more objects to visit

Arguments:

table(table table-bordered table-condensed).
|_. Argument |_. Type |_. Description |_. Location |_. Example |
{background:#ccffcc}.|uuid|string|The UUID of the Collection to get usage.|path||