h3(#download-a-file-from-a-collection). Download a file from a collection
-Once you have a @Collection@ object, the "@Collection.open@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.open lets you open files from a collection the same way you would open files from disk using Python's built-in @open@ function. It returns a file-like object that you can use in many of the same ways you would use any other file object. You can pass it as a source to Python's standard "@shutil.copyfileobj@ function":https://docs.python.org/3/library/shutil.html#shutil.copyfileobj to download it. This code downloads @ExampleFile@ from your collection and saves it to the current working directory as @ExampleDownload@:
+Once you have a @Collection@ object, the "@Collection.open@ method":{{ site.baseurl }}/sdk/python/arvados/collection.html#arvados.collection.RichCollectionBase.open lets you open files from a collection the same way you would open files from disk using Python's built-in @open@ function. You pass a second mode argument like @'rb'@ to open the file in binary mode. It returns a file-like object that you can use in many of the same ways you would use any other file object. You can pass it as a source to Python's standard "@shutil.copyfileobj@ function":https://docs.python.org/3/library/shutil.html#shutil.copyfileobj to download it. This code downloads @ExampleFile@ from your collection and saves it to the current working directory as @ExampleDownload@:
{% codeblock as python %}
import arvados.collection
import shutil
collection = arvados.collection.Collection(...)
with (
- collection.open('ExampleFile') as src_file,
- open('ExampleDownload', 'w') as dst_file,
+ collection.open('ExampleFile', 'rb') as src_file,
+ open('ExampleDownload', 'wb') as dst_file,
):
shutil.copyfileobj(src_file, dst_file)
{% endcodeblock %}
import shutil
collection = arvados.collection.Collection(...)
with (
- open('ExampleFile') as src_file,
- collection.open('ExampleUpload', 'w') as dst_file,
+ open('ExampleFile', 'rb') as src_file,
+ collection.open('ExampleUpload', 'wb') as dst_file,
):
shutil.copyfileobj(src_file, dst_file)
collection.save_new(...) # or collection.save() to update an existing collection
import pathlib
root_collection = arvados.collection.Collection(...)
# Start work from the base stream.
-stream_queue = collections.deque(['.'])
+stream_queue = collections.deque([pathlib.PurePosixPath('.')])
while stream_queue:
- stream_name = stream_queue.popleft()
- collection = root_collection.find(stream_name)
+ stream_path = stream_queue.popleft()
+ collection = root_collection.find(str(stream_path))
for item_name in collection:
try:
my_file = collection.open(item_name)
except IsADirectoryError:
# item_name refers to a stream. Queue it to walk later.
- stream_path = pathlib.Path(stream_name, item_name)
- stream_queue.append(stream_path.as_posix())
+ stream_queue.append(stream_path / item_name)
continue
with my_file:
... # Work with my_file as desired