+Once you have a @Collection@ object, you can iterate over it to retrieve the names of all files and streams in it. Streams are like subdirectories: you can open them using the "@Collection.find@ method":{{ site.baseurl }}/sdk/python/python.html, and work with the files in them just like you would in the original collection. This example shows how to combine these techniques to iterate all files in a collection, including its streams.
+
+{% codeblock as python %}
+import arvados.collection
+import collections
+import pathlib
+root_collection = arvados.collection.Collection(...)
+# Start work from the base stream.
+stream_queue = collections.deque([pathlib.PurePosixPath('.')])
+while stream_queue:
+ stream_path = stream_queue.popleft()
+ collection = root_collection.find(str(stream_path))
+ for item_name in collection:
+ try:
+ my_file = collection.open(item_name)
+ except IsADirectoryError:
+ # item_name refers to a stream. Queue it to walk later.
+ stream_queue.append(stream_path / item_name)
+ continue
+ with my_file:
+ ... # Work with my_file as desired
+{% endcodeblock %}
+
+h3(#copy-files-from-a-collection-to-another-collection). Copy a file between collections
+
+Once you have one or more @Collection@ objects, call the "@Collection.copy@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.copy on the destination collection to copy files to it. This method doesn't re-upload data, so it's very efficient.
+
+{% codeblock as python %}
+import arvados.collection
+src_collection = arvados.collection.Collection(...)
+dst_collection = arvados.collection.Collection(...)
+dst_collection.copy(
+ # The path of the source file or directory to copy
+ 'ExamplePath',
+ # The path where the source file or directory will be copied.
+ # Pass the empty string like this to copy it to the same path.
+ '',
+ # The collection where the source file or directory comes from.
+ # If not specified, the default is the current collection (so you'll
+ # make multiple copies of the same data in the same collection).
+ source_collection=src_collection,
+ # Pass overwrite=True to force the method to overwrite any data
+ # that already exists at the given path in the current collection.
+ overwrite=False,
+)
+dst_collection.save_new(...) # or dst_collection.save() to update an existing collection
+{% endcodeblock %}
+
+h3(#combine-two-or-more-collections). Combine two or more collections
+
+You can concatenate manifest texts from multiple collections to create a single collection that contains all the data from the source collections. Note that if multiple source collections have data at the same path, the merged collection will have a single file at that path with concatenated data from the source collections.
+
+{% codeblock as python %}
+import arvados.collection
+
+# Retrieve all of the source collection manifest texts
+src_collection_uuid_list = [
+ 'zzzzz-4zz18-111111111111111',
+ 'zzzzz-4zz18-222222222222222',
+ ...,
+]
+manifest_texts = [
+ arvados.collection.Collection(uuid).manifest_text()
+ for uuid in src_collection_uuid_list
+]
+
+# Initialize a new collection object from the concatenated manifest text
+new_collection = arvados.collection.Collection(''.join(manifest_texts), ...)
+
+# Record the new collection in Arvados
+new_collection.save_new(
+ name='Collection merged by Python SDK',
+ owner_uuid='zzzzz-j7d0g-12345abcde67890',
+)
+{% endcodeblock %}
+
+h3(#sharing-link). Create a collection sharing link
+
+You can create a sharing link for a collection by creating a new API token that is only allowed to read that collection; then constructing a link to your Keep web server that includes the collection UUID and the new token.
+
+{% codeblock as python %}
+import urllib.parse
+
+# The UUID of the collection you want to share
+collection_uuid = 'zzzzz-4zz18-12345abcde67890'
+
+sharing_token_scopes = [
+ 'GET /arvados/v1/keep_services/accessible',
+ f'GET /arvados/v1/collections/{collection_uuid}',
+ f'GET /arvados/v1/collections/{collection_uuid}/',
+]
+sharing_token = arv_client.api_client_authorizations().create(
+ body={
+ 'api_client_authorization': {
+ 'scopes': sharing_token_scopes,
+ },
+ },
+).execute()
+plain_token = sharing_token['api_token']
+token_parts = plain_token.split('/')
+if token_parts[0] == 'v2':
+ plain_token = token_parts[2]
+
+sharing_url_parts = (
+ # The scheme your Keep web server uses. Change this to 'http' if necessary.
+ 'https',
+ # The hostname, and optionally port, your Keep web server uses
+ 'collections.zzzzz.example.com',
+ # You shouldn't need to change any other items
+ f'/c={collection_uuid}/t={plain_token}/_/',
+ None,
+ None,
+)
+sharing_url = urllib.parse.urlunsplit(sharing_url_parts)
+print(sharing_url)
+{% endcodeblock %}
+
+h2(#working-with-containers). Working with containers
+
+If you haven't already, start by reading the "Computing with Crunch":{{ site.baseurl }}/api/execution.html guide. It provides a high-level overview of how users submit work to Arvados as container requests; how Arvados dispatches that work to containers; and how Arvados records the association and results back on the original container request record.
+
+If you have experience running CWL workflows on Workbench 2, it runs through this same API. When you start that workflow run, Workbench 2 creates a small container request to run a "CWL runner" tool with the specific inputs you gave it. Once Crunch dispatches a container for it, the CWL runner creates additional container requests to run each step of the workflow, and oversees the process until the workflow runs to completion. The UUID of this container is recorded in the @container_uuid@ field of the container request you submitted.
+
+The UUID of the CWL runner container is recorded in the @requesting_container_uuid@ field of each container request it creates. You can list container requests with a filter on this field to inspect each step of the workflow individually, as shown below.
+
+The next few examples show how to perform a task with a container request generally, and then provide a more specific example of working with a CWL runner container.
+
+h3(#get-input-of-a-container). Get input of a container
+
+A container request's most varied inputs are recorded in the @mounts@ field, which can include data from Keep, specific collections, Git checkouts, and static files. You might also be interested in the @environment@, @command@, @container_image@, and @secret_mounts@ fields. Refer to the "container requests API documentation":{{ site.baseurl }}/api/methods/container_requests.html for details.
+
+{% codeblock as python %}
+container_request = arv_client.container_requests().get(
+ uuid='zzzzz-xvhdp-12345abcde67890',
+).execute()
+# From here, you can process any of the container request's input fields.
+# Below is an example of listing all the mounts.
+import pprint
+for mount_name, mount_source in container_request['mounts'].items():
+ mount_summary = []
+ # These are the fields that define different types of mounts.
+ # Try to collect them all. Just skip any that aren't set.
+ for key in ['kind', 'uuid', 'portable_data_hash', 'commit', 'path']:
+ try:
+ mount_summary.append(mount_source[key])
+ except KeyError:
+ pass
+ print(f"{mount_name}: {' '.join(mount_summary)}")
+ if mount_source.get('kind') == 'json':
+ pprint.pprint(mount_source.get('content'))
+{% endcodeblock %}
+
+h3(#get-input-of-a-cwl-workflow). Get input of a CWL workflow run
+
+When you run a CWL workflow, the CWL inputs are stored in the container request's @mounts@ field as a JSON mount named @/var/lib/cwl/cwl.input.json@.
+
+{% codeblock as python %}
+container_request = arv_client.container_requests().get(
+ uuid='zzzzz-xvhdp-12345abcde67890',
+).execute()
+cwl_input = container_request['mounts']['/var/lib/cwl/cwl.input.json']['content']
+... # Work with the cwl_input dictionary
+{% endcodeblock %}
+
+h3(#get-output-of-a-container). Get output of a container
+
+A container's output files are saved in a collection. The UUID of that collection is recorded in the @output_uuid@ of the container request, which you can load as you like.
+
+{% codeblock as python %}
+import arvados.collection
+container_request = arv_client.container_requests().get(
+ uuid='zzzzz-xvhdp-12345abcde67890',
+).execute()
+container_output = arvados.collection.Collection(
+ container_request.get('output_uuid'),
+)
+... # Work with the container_output collection object
+{% endcodeblock %}
+
+h3(#get-output-of-a-cwl-workflow). Get output of a CWL workflow run
+
+When you run a CWL workflow, the container request's output collection includes a file named @cwl.output.json@ that provides additional information about other files in the output.
+
+{% codeblock as python %}
+import arvados.collection
+import json
+cwl_container_request = arv_client.container_requests().get(
+ uuid='zzzzz-xvhdp-12345abcde67890',
+).execute()
+cwl_output_collection = arvados.collection.Collection(
+ cwl_container_request['output_uuid'],
+)
+with cwl_output_collection.open('cwl.output.json') as cwl_output_file:
+ cwl_output = json.load(cwl_output_file)
+... # Work with the cwl_output dictionary