doc/sdk/python/cookbook.html.textile.liquid

   1 ---
   2 layout: default
   3 navsection: sdk
   4 navmenu: Python
   5 title: Code cookbook
   6 ...
   7 {% comment %}
   8 Copyright (C) The Arvados Authors. All rights reserved.
   9
  10 SPDX-License-Identifier: CC-BY-SA-3.0
  11 {% endcomment %}
  12
  13 # "Introduction":#introduction
  14 # "Working with the current user":#working-with-current-user
  15 ## "Fetch the current user":#fetch-current-user
  16 ## "List objects shared with the current user":#list-shared-objects
  17 # "Working with projects":#working-with-projects
  18 ## "Create a project":#create-a-project
  19 ## "List the contents of a project":#list-project-contents
  20 # "Working with permissions":#working-with-permissions
  21 ## "Grant permission to an object":#grant-permission
  22 ## "Modify permission on an object":#modify-permission
  23 ## "Revoke permission from an object":#revoke-permission
  24 # "Working with properties":#working-with-properties
  25 ## "Update the properties of an object":#update-properties
  26 ## "Translate between vocabulary identifiers and labels":#translating-between-vocabulary-identifiers-and-labels
  27 ## "Query the vocabulary definition":#querying-the-vocabulary-definition
  28 # "Working with collections":#working-with-collections
  29 ## "Load and update an existing collection":#load-collection
  30 ## "Create and save a new collection":#create-collection
  31 ## "Read a file from a collection":#read-a-file-from-a-collection
  32 ## "Download a file from a collection":#download-a-file-from-a-collection
  33 ## "Write a file to a collection":#write-a-file-into-a-new-collection
  34 ## "Upload a file to a collection":#upload-a-file-into-a-new-collection
  35 ## "Delete a file from a collection":#delete-a-file-from-an-existing-collection
  36 ## "Delete a directory from a collection recursively":#delete-a-directory-from-a-collection
  37 ## "Walk over all files in a collection":#walk-collection
  38 ## "Copy a file between collections":#copy-files-from-a-collection-to-another-collection
  39 ## "Combine two or more collections":#combine-two-or-more-collections
  40 ## "Create a collection sharing link":#sharing-link
  41 # "Working with containers and workflow runs":#working-with-containers
  42 ## "Get input of a container":#get-input-of-a-container
  43 ## "Get input of a CWL workflow run":#get-input-of-a-cwl-workflow
  44 ## "Get output of a container":#get-output-of-a-container
  45 ## "Get output of a CWL workflow run":#get-output-of-a-cwl-workflow
  46 ## "Get logs of a container or CWL workflow run":#get-log-of-a-child-request
  47 ## "Get status of a container or CWL workflow run":#get-state-of-a-cwl-workflow
  48 ## "List child requests of a container or CWL workflow run":#list-failed-child-requests
  49 ## "List child requests of a container request":#list-child-requests-of-container-request
  50 # "Working with the container request queue":#working-with-container-request-queue
  51 ## "List completed container requests":#list-completed-container-requests
  52 ## "Cancel a container request":#cancel-a-container-request
  53 ## "Cancel multiple pending container requests":#cancel-all-container-requests
  54
  55 h2(#introduction). Introduction
  56
  57 This page provides example code to perform various high-level tasks using Arvados' Python SDK. This page assumes you've already read the "API client documentation":{{ site.baseurl }}/sdk/python/api-client.html and understand the basics of using the Python SDK client. You don't have to have the details of every API method memorized, but you should at least be comfortable with the pattern of calling a resource type, API method, and @execute()@, as well as the dictionaries these methods return.
  58
  59 The code examples assume you've built the @arv_client@ object by doing something like:
  60
  61 {% codeblock as python %}
  62 import arvados
  63 arv_client = arvados.api('v1', ...)
  64 {% endcodeblock %}
  65
  66 These examples work no matter how you call @arvados.api()@, or if you use another constructor from "@arvados.api@ module":{{ site.baseurl }}/sdk/python/arvados/api.html. Just understand that @arv_client@ represents your client object, no matter how you built it.
  67
  68 Whenever you see the Ellipsis object @...@ in these examples, that means you may need or want to fill something in. That might be list items, function arguments, or your own code. Comments will provide additional guidance.
  69
  70 Whenever you see the example UUID @zzzzz-zzzzz-12345abcde67890@, you should provide your own UUID from input.
  71
  72 h2(#working-with-current-user). Working with the current user
  73
  74 h3(#fetch-current-user). Fetch the current user
  75
  76 The API provides a "dedicated users method named @current@":{{ site.baseurl }}/api/methods/users.html#current. It returns the user object that is authenticated by your current API token. Use this method to get the current user's UUID to use in other API calls, or include user details like name in your output.
  77
  78 {% codeblock as python %}
  79 current_user = arv_client.users().current().execute()
  80 {% endcodeblock %}
  81
  82 h3(#list-shared-objects). List objects shared with the current user
  83
  84 The API provides a "dedicated groups method named @shared@":{{ site.baseurl }}/api/methods/groups.html#shared to do this. Call it like you would any other list method. This example illustrates some popular arguments. Check the API reference for full details of all possible arguments.
  85
  86 {% codeblock as python %}
  87 for item in arvados.util.keyset_list_all(
  88     # Do *not* call the method here, just pass it.
  89     arv_client.groups().shared,
  90     # Pass filters to limit what objects are returned.
  91     # This example returns only subprojects.
  92     filters=[
  93         ['uuid', 'is_a', 'arvados#group'],
  94         ['group_class', '=', 'project'],
  95     ],
  96     # Pass order_key and ascending to control how the contents are sorted.
  97     # This example lists projects in ascending creation time (the default).
  98     order_key='created_at',
  99     ascending=True,
 100 ):
 101     ...  # Work on item as desired
 102 {% endcodeblock %}
 103
 104 h2(#working-with-projects). Working with projects
 105
 106 h3(#create-a-project). Create a project
 107
 108 A project is represented in the Arvados API as a group with its @group_class@ field set to @"project"@.
 109
 110 {% codeblock as python %}
 111 new_project = arv_client.groups().create(
 112     body={
 113         'group': {
 114             'group_class': 'project',
 115             'name': 'Python SDK Test Project',
 116             # owner_uuid can be the UUID for an Arvados user or group.
 117             # Specify the UUID of an existing project to make a subproject.
 118             # If not specified, the current user is the default owner.
 119             'owner_uuid': 'zzzzz-j7d0g-12345abcde67890',
 120         },
 121     },
 122     ensure_unique_name=True,
 123 ).execute()
 124 {% endcodeblock %}
 125
 126 h3(#list-project-contents). List the contents of a project
 127
 128 The API provides a "dedicated groups method named @contents@":{{ site.baseurl }}/api/methods/groups.html#contents to do this. Call it like you would any other list method. This example illustrates some popular arguments. Check the API reference for full details of all possible arguments.
 129
 130 {% codeblock as python %}
 131 current_user = arv_client.users().current().execute()
 132 for item in arvados.util.keyset_list_all(
 133     # Do *not* call the method here, just pass it.
 134     arv_client.groups().contents,
 135     # The UUID of the project whose contents we're listing.
 136     # Pass a user UUID to list their home project.
 137     # This example lists the current user's home project.
 138     uuid=current_user['uuid'],
 139     # Pass filters to limit what objects are returned.
 140     # This example returns only subprojects.
 141     filters=[
 142         ['uuid', 'is_a', 'arvados#group'],
 143         ['group_class', '=', 'project'],
 144     ],
 145     # Pass recursive=True to include results from subprojects in the listing.
 146     recursive=False,
 147     # Pass include_trash=True to include objects in the listing whose
 148     # trashed_at time is passed.
 149     include_trash=False,
 150 ):
 151     ...  # Work on item as desired
 152 {% endcodeblock %}
 153
 154 h2(#working-with-permissions). Working with permissions
 155
 156 In brief, a permission is represented in Arvados as a link object with the following values:
 157
 158 * @link_class@ is @"permission"@.
 159 * @name@ is one of @"can_read"@, @"can_write"@, @"can_manage"@, or @"can_login"@.
 160 * @tail_uuid@ identifies the user or role group that receives the permission.
 161 * @head_uuid@ identifies the Arvados object this permission grants access to.
 162
 163 For details, refer to the "Permissions model documentation":{{ site.baseurl }}/api/permission-model.html. Managing permissions is just a matter of ensuring the desired links exist with the standard @create@, @update@, and @delete@ methods.
 164
 165 h3(#grant-permission). Grant permission to an object
 166
 167 Create a link with values as documented above.
 168
 169 {% codeblock as python %}
 170 permission = arv_client.links().create(
 171     body={
 172         'link': {
 173             'link_class': 'permission',
 174             # Adjust name for the level of permission you want to grant
 175             'name': 'can_read',
 176             # tail_uuid must identify a user or role group
 177             'tail_uuid': 'zzzzz-tpzed-12345abcde67890',
 178             # head_uuid can identify any Arvados object
 179             'head_uuid': 'zzzzz-4zz18-12345abcde67890',
 180         },
 181     },
 182 ).execute()
 183 {% endcodeblock %}
 184
 185 h3(#modify-permission). Modify permission on an object
 186
 187 To modify an existing permission—for example, to change its access level—find the existing link object for the permission, then update it with the new values you want. This example shows changing all read-write permissions on a specific collection to read-only. Adjust the filters appropriately to find the permission(s) you want to modify.
 188
 189 {% codeblock as python %}
 190 import arvados.util
 191 for permission in arvados.util.keyset_list_all(
 192     # Do *not* call the method here, just pass it.
 193     arv_client.links().list,
 194     filters=[
 195         # You should use this filter for all permission searches,
 196         # to exclude other kinds of links.
 197         ['link_class', '=', 'permission'],
 198         # Add other filters as desired.
 199         ['name', '=', 'can_write'],
 200         ['head_uuid', '=', 'zzzzz-4zz18-12345abcde67890'],
 201         ...,
 202     ],
 203 ):
 204     arv_client.links().update(
 205         uuid=permission['uuid'],
 206         body={
 207             'link': {
 208                 'name': 'can_read',
 209             },
 210        },
 211     ).execute()
 212 {% endcodeblock %}
 213
 214 h3(#revoke-permission). Revoke permission from an object
 215
 216 To revoke an existing permission, find the existing link object for the permission, then delete it. This example shows revoking one user's permission to log into any virtual machines. Adjust the filters appropriately to find the permission(s) you want to revoke.
 217
 218 {% codeblock as python %}
 219 import arvados.util
 220 for permission in arvados.util.keyset_list_all(
 221     # Do *not* call the method here, just pass it.
 222     arv_client.links().list,
 223     filters=[
 224         # You should use this filter for all permission searches,
 225         # to exclude other kinds of links.
 226         ['link_class', '=', 'permission'],
 227         # Add other filters as desired.
 228         ['name', '=', 'can_login'],
 229         ['tail_uuid', '=', 'zzzzz-tpzed-12345abcde67890'],
 230         ...,
 231     ],
 232 ):
 233     arv_client.links().delete(
 234         uuid=permission['uuid'],
 235     ).execute()
 236 {% endcodeblock %}
 237
 238 h2(#working-with-properties). Working with properties
 239
 240 Container requests, collections, groups, and links can have metadata properties set through their @properties@ field. For details, refer to the "Metadata properties API reference":{{ site.baseurl }}/api/properties.html.
 241
 242 An Arvados cluster can be configured to use a metadata vocabulary. If this is set up, the vocabulary defines standard identifiers and specific properties and their values. These identifiers can also have more human-friendly aliases. The cluster can also be configured to use the vocabulary strictly, so clients may _only_ set properties on objects that are defined in the vocabulary. For more information about configuring a metadata vocabulary, refer to the "Metadata vocabulary administration documentation":{{ site.baseurl }}/admin/metadata-vocabulary.html.
 243
 244 h3(#update-properties). Update the properties of an object
 245
 246 To set an object's properties to a new value, just call the resource's @update@ method with a new @properties@ field in the body. If you want to make changes to the current set of properties, @get@ the object, build a new dictionary based on its @properties@ field, then call the resource's @update@ method with your new dictionary as the @properties@. Below is an example for a container request.
 247
 248 {% codeblock as python %}
 249 container_request = arv_client.container_requests().get(
 250     uuid='zzzzz-xvhdp-12345abcde67890',
 251 ).execute()
 252 new_properties = dict(container_request['properties'])
 253 ...  # Make your desired changes to new_proprties
 254 container_request = arv_client.container_requests().update(
 255     uuid=container_request['uuid'],
 256     body={
 257         'container_request': {
 258             'properties': new_properties,
 259         },
 260     },
 261 ).execute()
 262 {% endcodeblock %}
 263
 264 h3(#translating-between-vocabulary-identifiers-and-labels). Translate between vocabulary identifiers and labels
 265
 266 Client software might need to present properties to the user in a human-readable form or take input from the user without requiring them to remember identifiers. The "@Vocabulary.convert_to_labels@":{{ site.baseurl }}/sdk/python/arvados/vocabulary.html#arvados.vocabulary.Vocabulary.convert_to_labels and "@Vocabulary.convert_to_identifiers@":{{ site.baseurl }}/sdk/python/arvados/vocabulary.html#arvados.vocabulary.Vocabulary.convert_to_identifiers methods help with these tasks, respectively.
 267
 268 {% codeblock as python %}
 269 import arvados.vocabulary
 270 vocabulary = arvados.vocabulary.load_vocabulary(arv_client)
 271
 272 # The argument should be a mapping of vocabulary keys and values using any
 273 # defined aliases, like this:
 274 #   {'Creature': 'Human', 'Priority': 'Normal'}
 275 # The return value will be an analogous mapping where all the aliases have
 276 # been translated to identifiers, like this:
 277 #   {'IDTAGANIMALS': 'IDVALANIMALS2', 'IDTAGIMPORTANCES': 'IDTAGIMPORTANCES1'}
 278 properties_by_identifier = vocabulary.convert_to_identifiers({...})
 279
 280 # You can use this to set metadata properties on objects that support them.
 281 project = arv_client.groups().update(
 282     uuid='zzzzz-j7d0g-12345abcde67890',
 283     body={
 284         'group': {
 285             'properties': properties_by_identifier,
 286         },
 287     },
 288 ).execute()
 289
 290 # You can report properties to the user by their preferred name.
 291 print(f"{project['name']} ({project['group_class']} {project['uuid']}) updated with properties:")
 292 for key, value in vocabulary.convert_to_labels(project['properties']).items():
 293     print(f"↳ {key}: {value}")
 294 {% endcodeblock %}
 295
 296 h3(#querying-the-vocabulary-definition). Query the vocabulary definition
 297
 298 The @arvados.vocabulary@ module provides facilities to interact with the "active metadata vocabulary":{{ site.baseurl }}/admin/metadata-vocabulary.html in the system. The "@Vocabulary@ class":{{ site.baseurl }}/sdk/python/arvados/vocabulary.html#arvados.vocabulary.Vocabulary provides a mapping-like view of a cluster's configured vocabulary.
 299
 300 {% codeblock as python %}
 301 import arvados.vocabulary
 302 vocabulary = arvados.vocabulary.load_vocabulary(arv_client)
 303
 304 # You can use the vocabulary object to access specific keys and values by
 305 # case-insensitive mapping, like this:
 306 #   vocabulary_value = vocabulary[key_alias][value_alias]
 307 # You can also access the `key_aliases` and `value_aliases` mapping
 308 # attributes directly to view the entire vocabulary. The example below
 309 # writes a plaintext table of the vocabulary.
 310 for vocabulary_key in set(vocabulary.key_aliases.values()):
 311     print(
 312         vocabulary_key.identifier,
 313         vocabulary_key.preferred_label,
 314         ', '.join(vocabulary_key.aliases[1:]),
 315         sep='\t',
 316     )
 317     for vocabulary_value in set(vocabulary_key.value_aliases.values()):
 318         print(
 319             f'↳ {vocabulary_value.identifier}',
 320             vocabulary_value.preferred_label,
 321             ', '.join(vocabulary_value.aliases[1:]),
 322             sep='\t',
 323         )
 324 {% endcodeblock %}
 325
 326 h2(#working-with-collections). Working with collections
 327
 328 The "@arvados.collection.Collection@ class":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.Collection provides a high-level interface to read, create, and update collections. It orchestrates multiple requests to API and Keep so you don't have to worry about the low-level details of keeping everything in sync. It uses threads to make multiple requests to Keep in parallel.
 329
 330 This page only shows you how to perform common tasks using the @Collection@ class. To see all the supported constructor arguments and methods, refer to "the @Collection@ class documentation":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.Collection.
 331
 332 h3(#load-collection). Load and update an existing collection
 333
 334 Construct the @Collection@ class with the UUID of a collection you want to read. You can pass additional constructor arguments as needed.
 335
 336 {% codeblock as python %}
 337 import arvados.collection
 338 collection = arvados.collection.Collection('zzzzz-4zz18-12345abcde67890', ...)
 339 {% endcodeblock %}
 340
 341 If you make changes to the collection and want to update the existing collection, call the "@Collection.save@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.Collection.save:
 342
 343 {% codeblock as python %}
 344 collection.save()
 345 {% endcodeblock %}
 346
 347 If you would rather save your changes as a new collection object, call the "@Collection.save_new@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.Collection.save_new. This example illustrates some popular arguments. Check the API reference for full details of all possible arguments.
 348
 349 {% codeblock as python %}
 350 collection.save_new(
 351     name='Collection updated by Python SDK',
 352     # owner_uuid can be the UUID for an Arvados user or group.
 353     # Specify the UUID of a project to add this collection to it.
 354     owner_uuid='zzzzz-j7d0g-12345abcde67890',
 355 )
 356 {% endcodeblock %}
 357
 358 h3(#create-collection). Create and save a new collection
 359
 360 Construct the @Collection@ class without an existing collection UUID or manifest text. You can pass additional constructor arguments as needed.
 361
 362 {% codeblock as python %}
 363 import arvados.collection
 364 new_collection = arvados.collection.Collection(...)
 365 {% endcodeblock %}
 366
 367 Usually you'll upload or copy files to the new collection. Once you're done with that and ready to save your changes, call the "@Collection.save_new@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.Collection.save_new. This example illustrates some popular arguments. Check the API reference for full details of all possible arguments.
 368
 369 {% codeblock as python %}
 370 new_collection.save_new(
 371     name='Collection created by Python SDK',
 372     # owner_uuid can be the UUID for an Arvados user or group.
 373     # Specify the UUID of a project to add this collection to it.
 374     owner_uuid='zzzzz-j7d0g-12345abcde67890',
 375 )
 376 {% endcodeblock %}
 377
 378 h3(#read-a-file-from-a-collection). Read a file from a collection
 379
 380 Once you have a @Collection@ object, the "@Collection.open@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.open lets you open files from a collection the same way you would open files from disk using Python's built-in @open@ function. It returns a file-like object that you can use in many of the same ways you would use any other file object. This example prints all non-empty lines from @ExampleFile@ in your collection:
 381
 382 {% codeblock as python %}
 383 import arvados.collection
 384 collection = arvados.collection.Collection(...)
 385 with collection.open('ExampleFile') as my_file:
 386     # Read from my_file as desired.
 387     # This example prints all non-empty lines from the file to stdout.
 388     for line in my_file:
 389         if not line.isspace():
 390             print(line, end='')
 391 {% endcodeblock %}
 392
 393 h3(#download-a-file-from-a-collection). Download a file from a collection
 394
 395 Once you have a @Collection@ object, the "@Collection.open@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.open lets you open files from a collection the same way you would open files from disk using Python's built-in @open@ function. It returns a file-like object that you can use in many of the same ways you would use any other file object. You can pass it as a source to Python's standard "@shutil.copyfileobj@ function":https://docs.python.org/3/library/shutil.html#shutil.copyfileobj to download it. This code downloads @ExampleFile@ from your collection and saves it to the current working directory as @ExampleDownload@:
 396
 397 {% codeblock as python %}
 398 import arvados.collection
 399 import shutil
 400 collection = arvados.collection.Collection(...)
 401 with (
 402   collection.open('ExampleFile') as src_file,
 403   open('ExampleDownload', 'w') as dst_file,
 404 ):
 405     shutil.copyfileobj(src_file, dst_file)
 406 {% endcodeblock %}
 407
 408 h3(#write-a-file-into-a-new-collection). Write a file to a collection
 409
 410 Once you have a @Collection@ object, the "@Collection.open@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.open lets you open files from a collection the same way you would open files from disk using Python's built-in @open@ function. Pass a second mode argument like @'w'@ or @'a'@ to write a file in the collection. It returns a file-like object that you can use in many of the same ways you would use any other file object. This example writes @Hello, Arvados!@ to a file named @ExampleHello@ in your collection:
 411
 412 {% codeblock as python %}
 413 import arvados.collection
 414 collection = arvados.collection.Collection(...)
 415 with collection.open('ExampleFile', 'w') as my_file:
 416     # Write to my_file as desired.
 417     # This example writes "Hello, world!" to the file.
 418     print("Hello, Arvados!", file=my_file)
 419 collection.save_new(...)  # or collection.save() to update an existing collection
 420 {% endcodeblock %}
 421
 422 h3(#upload-a-file-into-a-new-collection). Upload a file to a collection
 423
 424 Once you have a @Collection@ object, the "@Collection.open@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.open lets you open files from a collection the same way you would open files from disk using Python's built-in @open@ function. Pass a second mode argument like @'w'@ or @'a'@ to write a file in the collection. It returns a file-like object that you can use in many of the same ways you would use any other file object. You can pass it as a destination to Python's standard "@shutil.copyfileobj@ function":https://docs.python.org/3/library/shutil.html#shutil.copyfileobj to upload data from a source file. This example reads @ExampleFile@ from the current working directory and uploads it into your collection as @ExampleUpload@:
 425
 426 {% codeblock as python %}
 427 import arvados.collection
 428 import shutil
 429 collection = arvados.collection.Collection(...)
 430 with (
 431   open('ExampleFile') as src_file,
 432   collection.open('ExampleUpload', 'w') as dst_file,
 433 ):
 434     shutil.copyfileobj(src_file, dst_file)
 435 collection.save_new(...)  # or collection.save() to update an existing collection
 436 {% endcodeblock %}
 437
 438 h3(#delete-a-file-from-an-existing-collection). Delete a file from a collection
 439
 440 Once you have a @Collection@ object, call the "@Collection.remove@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.Collection.remove with a file path to remove that file or directory from the collection.
 441
 442 {% codeblock as python %}
 443 import arvados.collection
 444 collection = arvados.collection.Collection(...)
 445 collection.remove('ExamplePath')
 446 collection.save_new(...)  # or collection.save() to update an existing collection
 447 {% endcodeblock %}
 448
 449 h3(#delete-a-directory-from-a-collection). Delete a directory from a collection recursively
 450
 451 Once you have a @Collection@ object, call the "@Collection.remove@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.Collection.remove with a directory path and @recursive=True@ to delete everything under that directory from the collection.
 452
 453 {% codeblock as python %}
 454 import arvados.collection
 455 collection = arvados.collection.Collection(...)
 456 collection.remove('ExampleDirectoryPath', recursive=True)
 457 collection.save_new(...)  # or collection.save() to update an existing collection
 458 {% endcodeblock %}
 459
 460 h3(#walk-collection). Walk over all files in a collection
 461
 462 Once you have a @Collection@ object, you can iterate over it to retrieve the names of all files and streams in it. Streams are like subdirectories: you can open them using the "@Collection.find@ method":{{ site.baseurl }}/sdk/python/python.html, and work with the files in them just like you would in the original collection. This example shows how to combine these techniques to iterate all files in a collection, including its streams.
 463
 464 {% codeblock as python %}
 465 import arvados.collection
 466 import collections
 467 import pathlib
 468 root_collection = arvados.collection.Collection(...)
 469 # Start work from the base stream.
 470 stream_queue = collections.deque(['.'])
 471 while stream_queue:
 472     stream_name = stream_queue.popleft()
 473     collection = root_collection.find(stream_name)
 474     for item_name in collection:
 475         try:
 476             my_file = collection.open(item_name)
 477         except IsADirectoryError:
 478             # item_name refers to a stream. Queue it to walk later.
 479             stream_path = pathlib.Path(stream_name, item_name)
 480             stream_queue.append(stream_path.as_posix())
 481             continue
 482         with my_file:
 483             ...  # Work with my_file as desired
 484 {% endcodeblock %}
 485
 486 h3(#copy-files-from-a-collection-to-another-collection). Copy a file between collections
 487
 488 Once you have one or more @Collection@ objects, call the "@Collection.copy@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.copy on the destination collection to copy files to it. This method doesn't re-upload data, so it's very efficient.
 489
 490 {% codeblock as python %}
 491 import arvados.collection
 492 src_collection = arvados.collection.Collection(...)
 493 dst_collection = arvados.collection.Collection(...)
 494 dst_collection.copy(
 495     # The path of the source file or directory to copy
 496     'ExamplePath',
 497     # The path where the source file or directory will be copied.
 498     # Pass the empty string like this to copy it to the same path.
 499     '',
 500     # The collection where the source file or directory comes from.
 501     # If not specified, the default is the current collection (so you'll
 502     # make multiple copies of the same data in the same collection).
 503     source_collection=src_collection,
 504     # Pass overwrite=True to force the method to overwrite any data
 505     # that already exists at the given path in the current collection.
 506     overwrite=False,
 507 )
 508 dst_collection.save_new(...)  # or dst_collection.save() to update an existing collection
 509 {% endcodeblock %}
 510
 511 h3(#combine-two-or-more-collections). Combine two or more collections
 512
 513 You can concatenate manifest texts from multiple collections to create a single collection that contains all the data from the source collections. Note that if multiple source collections have data at the same path, the merged collection will have a single file at that path with concatenated data from the source collections.
 514
 515 {% codeblock as python %}
 516 import arvados.collection
 517
 518 # Retrieve all of the source collection manifest texts
 519 src_collection_uuid_list = [
 520     'zzzzz-4zz18-111111111111111',
 521     'zzzzz-4zz18-222222222222222',
 522     ...,
 523 ]
 524 manifest_texts = [
 525     arvados.collection.Collection(uuid).manifest_text()
 526     for uuid in src_collection_uuid_list
 527 ]
 528
 529 # Initialize a new collection object from the concatenated manifest text
 530 new_collection = arvados.collection.Collection(''.join(manifest_texts), ...)
 531
 532 # Record the new collection in Arvados
 533 new_collection.save_new(
 534     name='Collection merged by Python SDK',
 535     owner_uuid='zzzzz-j7d0g-12345abcde67890',
 536 )
 537 {% endcodeblock %}
 538
 539 h3(#sharing-link). Create a collection sharing link
 540
 541 You can create a sharing link for a collection by creating a new API token that is only allowed to read that collection; then constructing a link to your Keep web server that includes the collection UUID and the new token.
 542
 543 {% codeblock as python %}
 544 import urllib.parse
 545
 546 # The UUID of the collection you want to share
 547 collection_uuid = 'zzzzz-4zz18-12345abcde67890'
 548
 549 sharing_token_scopes = [
 550     'GET /arvados/v1/keep_services/accessible',
 551     f'GET /arvados/v1/collections/{collection_uuid}',
 552     f'GET /arvados/v1/collections/{collection_uuid}/',
 553 ]
 554 sharing_token = arv_client.api_client_authorizations().create(
 555     body={
 556         'api_client_authorization': {
 557             'scopes': sharing_token_scopes,
 558         },
 559     },
 560 ).execute()
 561 plain_token = sharing_token['api_token']
 562 token_parts = plain_token.split('/')
 563 if token_parts[0] == 'v2':
 564     plain_token = token_parts[2]
 565
 566 sharing_url_parts = (
 567     # The scheme your Keep web server uses. Change this to 'http' if necessary.
 568     'https',
 569     # The hostname, and optionally port, your Keep web server uses
 570     'collections.zzzzz.example.com',
 571     # You shouldn't need to change any other items
 572     f'/c={collection_uuid}/t={plain_token}/_/',
 573     None,
 574     None,
 575 )
 576 sharing_url = urllib.parse.urlunsplit(sharing_url_parts)
 577 print(sharing_url)
 578 {% endcodeblock %}
 579
 580 h2(#working-with-containers). Working with containers
 581
 582 If you haven't already, start by reading the "Computing with Crunch":{{ site.baseurl }}/api/execution.html guide. It provides a high-level overview of how users submit work to Arvados as container requests; how Arvados dispatches that work to containers; and how Arvados records the association and results back on the original container request record.
 583
 584 If you have experience running CWL workflows on Workbench 2, it runs through this same API. When you start that workflow run, Workbench 2 creates a small container request to run a "CWL runner" tool with the specific inputs you gave it. Once Crunch dispatches a container for it, the CWL runner creates additional container requests to run each step of the workflow, and oversees the process until the workflow runs to completion. The UUID of this container is recorded in the @container_uuid@ field of the container request you submitted.
 585
 586 The UUID of the CWL runner container is recorded in the @requesting_container_uuid@ field of each container request it creates. You can list container requests with a filter on this field to inspect each step of the workflow individually, as shown below.
 587
 588 The next few examples show how to perform a task with a container request generally, and then provide a more specific example of working with a CWL runner container.
 589
 590 h3(#get-input-of-a-container). Get input of a container
 591
 592 A container request's most varied inputs are recorded in the @mounts@ field, which can include data from Keep, specific collections, Git checkouts, and static files. You might also be interested in the @environment@, @command@, @container_image@, and @secret_mounts@ fields. Refer to the "container requests API documentation":{{ site.baseurl }}/api/methods/container_requests.html for details.
 593
 594 {% codeblock as python %}
 595 container_request = arv_client.container_requests().get(
 596     uuid='zzzzz-xvhdp-12345abcde67890',
 597 ).execute()
 598 # From here, you can process any of the container request's input fields.
 599 # Below is an example of listing all the mounts.
 600 import pprint
 601 for mount_name, mount_source in container_request['mounts'].items():
 602     mount_summary = []
 603     # These are the fields that define different types of mounts.
 604     # Try to collect them all. Just skip any that aren't set.
 605     for key in ['kind', 'uuid', 'portable_data_hash', 'commit', 'path']:
 606         try:
 607             mount_summary.append(mount_source[key])
 608         except KeyError:
 609             pass
 610     print(f"{mount_name}: {' '.join(mount_summary)}")
 611     if mount_source.get('kind') == 'json':
 612         pprint.pprint(mount_source.get('content'))
 613 {% endcodeblock %}
 614
 615 h3(#get-input-of-a-cwl-workflow). Get input of a container or CWL workflow run
 616
 617 When you run a CWL workflow, the CWL inputs are stored in the container request's @mounts@ field as a JSON mount named @/var/lib/cwl/cwl.input.json@.
 618
 619 {% codeblock as python %}
 620 container_request = arv_client.container_requests().get(
 621     uuid='zzzzz-xvhdp-12345abcde67890',
 622 ).execute()
 623 cwl_input = container_request['mounts']['/var/lib/cwl/cwl.input.json']['content']
 624 ...  # Work with the cwl_input dictionary
 625 {% endcodeblock %}
 626
 627 h3(#get-output-of-a-container). Get output of a container
 628
 629 A container's output files are saved in a collection. The UUID of that collection is recorded in the @output_uuid@ of the container request, which you can load as you like.
 630
 631 {% codeblock as python %}
 632 import arvados.collection
 633 container_request = arv_client.container_requests().get(
 634     uuid='zzzzz-xvhdp-12345abcde67890',
 635 ).execute()
 636 container_output = arvados.collection.Collection(
 637     container_request.get('output_uuid'),
 638 )
 639 ...  # Work with the container_output collection object
 640 {% endcodeblock %}
 641
 642 h3(#get-output-of-a-cwl-workflow). Get output of a CWL workflow run
 643
 644 When you run a CWL workflow, the container request's output collection includes a file named @cwl.output.json@ that provides additional information about other files in the output.
 645
 646 {% codeblock as python %}
 647 import arvados.collection
 648 import json
 649 cwl_container_request = arv_client.container_requests().get(
 650     uuid='zzzzz-xvhdp-12345abcde67890',
 651 ).execute()
 652 cwl_output_collection = arvados.collection.Collection(
 653     cwl_container_request['output_uuid'],
 654 )
 655 with cwl_output_collection.open('cwl.output.json') as cwl_output_file:
 656     cwl_output = json.load(cwl_output_file)
 657 ...  # Work with the cwl_output dictionary
 658 {% endcodeblock %}
 659
 660 h3(#get-log-of-a-child-request). Get logs of a container or CWL workflow run
 661
 662 A container's log files are saved in a collection. The UUID of that collection is recorded in the @log_uuid@ of the container request, which you can load as you like.
 663
 664 {% codeblock as python %}
 665 import arvados.collection
 666 container_request = arv_client.container_requests().get(
 667     uuid='zzzzz-xvhdp-12345abcde67890',
 668 ).execute()
 669 log_collection = arvados.collection.Collection(
 670     container_request['log_uuid'],
 671 )
 672 # From here, you can process the container's log collection any way you like.
 673 # Below is an example that writes the container's stderr to this process' stderr.
 674 import shutil
 675 import sys
 676 with log_collection.open('stderr.txt') as containter_stderr:
 677     shutil.copyfileobj(container_stderr, sys.stderr)
 678 {% endcodeblock %}
 679
 680 h3(#get-state-of-a-cwl-workflow). Get status of a container or CWL workflow run
 681
 682 Workbench shows users a single status badge for container requests. This status is synthesized from different fields on the container request and associated container. This code shows how to do analogous reporting using the Python SDK.
 683
 684 {% codeblock as python %}
 685 container_request = arv_client.container_requests().get(
 686     uuid='zzzzz-xvhdp-12345abcde67890',
 687 ).execute()
 688 if container_request['container_uuid'] is None:
 689     status = container_request['state']
 690 else:
 691     container = arv_client.containers().get(
 692         uuid=container_request['container_uuid'],
 693     ).execute()
 694     container_state = container['state']
 695     if container_state == 'Queued' or container_state == 'Locked':
 696         status = "On hold" if container['priority'] == 0 else "Queued"
 697     elif container_state == 'Running':
 698         if container['runtime_status'].get('error'):
 699             status = "Failing"
 700         elif container['runtime_status'].get('warning'):
 701             status = "Warning"
 702         else:
 703             status = container_state
 704     elif container_state == 'Cancelled':
 705         status = container_state
 706     elif container_state == 'Complete':
 707         status = "Completed" if container['exit_code'] == 0 else "Failed"
 708 ...  # Report status as desired
 709 {% endcodeblock %}
 710
 711 h3(#list-failed-child-requests). List child requests of a container or CWL workflow run
 712
 713 When a running container creates a container request to do additional work, the UUID of the source container is recorded in the @requesting_container_uuid@ field of the new container request. You can list container requests with this filter to find requests created by a specific container.
 714
 715 {% codeblock as python %}
 716 import arvados.util
 717 for child_container_requests in arvados.util.keyset_list_all(
 718     # Do *not* call the method here, just pass it.
 719     arv_client.container_requests().list,
 720     filters=[
 721         # Note this is a container UUID, *not* a container request UUID
 722         ['requesting_container_uuid', '=', 'zzzzz-dz642-12345abcde67890'],
 723         # You may add other filters for your listing.
 724         # For example, you could filter by 'name' to find specific kinds
 725         # of steps of a CWL workflow.
 726         ...,
 727     ],
 728 ):
 729     ...  # Work with each child container request
 730 {% endcodeblock %}
 731
 732 h3(#list-child-requests-of-container-request). List child requests of a container request
 733
 734 When a running container creates a container request to do additional work, the UUID of the source container is recorded in the @requesting_container_uuid@ field of the new container request. If all you have is the UUID of a container request, you can get that request, then list container requests with a filter where @requesting_container_uuid@ matches the @container_uuid@ of your request to find all its children.
 735
 736 {% codeblock as python %}
 737 import arvados.util
 738 parent_container_request = arv_client.container_requests().get(
 739     uuid='zzzzz-xvhdp-12345abcde67890',
 740 ).execute()
 741 parent_container_uuid = parent_container_request['container_uuid']
 742 if parent_container_uuid is None:
 743     # No container has run for this request yet, so there cannot be child requests.
 744     child_container_requests = ()
 745 else:
 746     child_container_requests = arvados.util.keyset_list_all(
 747     # Do *not* call the method here, just pass it.
 748         arv_client.container_requests().list,
 749         filters=[
 750             ['requesting_container_uuid', '=', parent_container_uuid],
 751             # You may add other filters for your listing.
 752             # For example, you could filter by 'name' to find specific kinds
 753             # of steps of a CWL workflow.
 754             ...,
 755         ],
 756     )
 757 for child_container_request in child_container_requests:
 758     ...  # Work with each child container request
 759 {% endcodeblock %}
 760
 761 With each child container request, you could repeat any of the recipes listed earlier in this section: examine their status, inputs, outputs, logs, and so on.
 762
 763 h2(#working-with-container-request-queue). Working with the container request queue
 764
 765 h3(#list-completed-container-requests). List completed container requests
 766
 767 Completed container requests have their @state@ field set to @"Final"@. You can list container requests with this filter to find completed requests.
 768
 769 {% codeblock as python %}
 770 import arvados.util
 771 import datetime
 772 time_filter = datetime.datetime.utcnow()
 773 time_filter -= datetime.timedelta(days=7)
 774
 775 for container_request in arvados.util.keyset_list_all(
 776     # Do *not* call the method here, just pass it.
 777     arv_client.container_requests().list,
 778     filters=[
 779         # This is the filter you need to find completed container requests.
 780         ['state', '=', 'Final'],
 781         # There could be many completed container requests, so you should
 782         # provide additional filters. This example limits the listing to
 783         # container requests from the past week.
 784         ['created_at', '>=', f'{time_filter.isoformat()}Z'],
 785         ...,
 786     ],
 787 ):
 788     # Work with each container_request as desired.
 789     # This example provides a basic status table with the container request
 790     # UUID, time the request was created, and time the container finished
 791     # (both in UTC).
 792     print(
 793         container_request['uuid'],
 794         container_request['created_at'],
 795         container_request['modified_at'],
 796     )
 797 {% endcodeblock %}
 798
 799 h3(#cancel-a-container-request). Cancel a container request
 800
 801 To cancel a container request, update it to set its @priority@ field to 0. See the "containers API reference":{{ site.baseurl }}/api/methods/containers.html for details.
 802
 803 {% codeblock as python %}
 804 cancelled_container_request = arv_client.container_requests().update(
 805     uuid='zzzzz-xvhdp-12345abcde67890',
 806     body={
 807         'container_request': {
 808             'priority': 0,
 809         },
 810     },
 811 ).execute()
 812 {% endcodeblock %}
 813
 814 h3(#cancel-all-container-requests). Cancel multiple pending container requests
 815
 816 If you want to cancel multiple pending container requests, you can list container requests with the @state@ field set to @"Committed"@, a @priority@ greater than zero, and any other filters you like. Then update each container request to set its @priority@ field to 0. See the "containers API reference":{{ site.baseurl }}/api/methods/containers.html for details.
 817
 818 {% codeblock as python %}
 819 import arvados.util
 820 for container_request in arvados.util.keyset_list_all(
 821     # Do *not* call the method here, just pass it.
 822     arv_client.container_requests().list,
 823     filters=[
 824         # These are the filters you need to find cancellable container requests.
 825         ['state', '=', 'Committed'],
 826         ['priority', '>', 0],
 827         # You can add other filters as desired.
 828         # For example, you might filter on `requesting_container_uuid` to
 829         # cancel only steps of one specific workflow.
 830         ...,
 831     ],
 832 ):
 833     cancelled_container_request = arv_client.container_requests().update(
 834         uuid=container_request['uuid'],
 835         body={
 836             'container_request': {
 837                 'priority': 0,
 838             },
 839         },
 840     ).execute()
 841 {% endcodeblock %}