9 Copyright (C) The Arvados Authors. All rights reserved.
11 SPDX-License-Identifier: CC-BY-SA-3.0
14 "Web Distributed Authoring and Versioning (WebDAV)":https://tools.ietf.org/html/rfc4918 is an IETF standard set of extensions to HTTP to manipulate and retrieve hierarchical web resources, similar to directories in a file system. Arvados supports accessing files in Keep using WebDAV.
16 Most major operating systems include built-in support for mounting WebDAV resources as network file systems, see user guide sections for "Windows":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-windows.html , "macOS":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-os-x.html , "Linux (Gnome)":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-gnu-linux.html#gnome . WebDAV is also supported by various standalone storage browser applications such as "Cyberduck":https://cyberduck.io/ and client libraries exist in many languages for programmatic access.
18 Keep-web provides read/write HTTP (WebDAV) access to files stored in Keep. It serves public data to anonymous and unauthenticated clients, and serves private data to clients that supply Arvados API tokens.
20 h3. Supported Operations
22 Supports WebDAV HTTP methods @GET@, @PUT@, @DELETE@, @PROPFIND@, @COPY@, and @MOVE@.
24 Does not support @LOCK@ or @UNLOCK@. These methods will be accepted, but are no-ops.
28 Requests can be authenticated a variety of ways as described below in "Authentication mechanisms":#auth . An unauthenticated request will return a 401 Unauthorized response with a @WWW-Authenticate@ header indicating "support for RFC 7617 Basic Authentication":https://tools.ietf.org/html/rfc7617 .
30 Getting a listing from keep-web starting at the root path @/@ will return two folders, @by_id@ and @users@.
32 The @by_id@ folder will return an empty listing. However, a path which starts with /by_id/ followed by a collection uuid, portable data hash, or project uuid will return the listing of that object.
34 The @users@ folder will return a listing of the users for whom the client has permission to read the "home" project of that user. Browsing an individual user will return the collections and projects directly owned by that user. Browsing those collections and projects return listings of the files, directories, collections, and subprojects they contain, and so forth.
36 In addition to the @/by_id/@ path prefix, the collection or project can be specified using a path prefix of @/c=<uuid or pdh>/@ or (if the cluster is properly configured) as a virtual host. This is described on "Keep-web URLs":keep-web-urls.html
38 It is possible for a project or a "filter group":methods/groups.html#filter to appear as its own descendant in the @by_id@ and @users@ tree (a filter group may match itself, its own ancestor, another filter group that matches its ancestor, etc). When this happens, the descendant appears as an empty read-only directory. For example, if filter group @f@ matches its own parent @p@:
39 * @/users/example/p/f@ will show the filter group's contents (matched projects and collections).
40 * @/users/example/p/f/p@ will appear as an empty directory.
41 * @/by_id/uuid_of_f/p@ will show the parent project's contents, including @f@.
42 * @/by_id/uuid_of_f/p/f@ will appear as an empty directory.
44 h3(#zip). Downloading ZIP archives
46 Keep-web can produce an uncompressed ZIP archive of a collection, or a subset of a collection.
48 To request a ZIP archive:
49 * The request must include an @Accept: application/zip@ header _or_ @?accept=application/zip&disposition=attachment@ in the query.
50 * The request URI must specify the root directory of a collection, e.g., @/by_id/<uuid>/@. See "Keep-web URLs":keep-web-urls.html for more examples.
52 To download a subset of a collection, the request can specify one or more pathnames relative to the collection directory:
53 * A @files@ parameter in the query of a @GET@ request, e.g., @https://<uuid>.collections.example.com/?files=file1&files=file2@,
54 * A @files@ parameter in the body of a @POST@ request with a @Content-Type: application/x-www-form-urlencoded@ header, or
55 * The value of a @files@ key in a JSON object in the body of a @POST@ request with a @Content-Type: application/json@ header, e.g., @{"files":["file1","file2"]}@.
57 Keep-web returns an error if one of the specified paths does not exist in the requested collection.
59 The ZIP archive comment will include a download URL with the collection UUID or portable data hash, e.g., "Downloaded from https://collections.example.com/by_id/zzzzz-4zz18-0pg114rezrbz46u/".
61 The ZIP archive will also include collection metadata if the request sets an @include_collection_metadata@ parameter, e.g., @https://<uuid>.collections.example.com/?include_collection_metadata=true@. The resulting ZIP archive will also include a file named @collection.json@ containing the collection's metadata (UUID, name, description, portable data hash, properties, creation time, modification time) and information about the user who last modified it (UUID, full name, username, and email). If the collection is specified by portable data hash rather than name or UUID, @collection.json@ will contain only the portable data hash.
63 Example @collection.json@ content:
67 "created_at":"2025-04-28T19:50:49.046969000Z",
68 "description":"Description of test collection\n",
69 "modified_at":"2025-04-28T19:50:49.093166000Z",
71 "email":"example@example.com",
72 "full_name":"Example Name",
74 "uuid":"zzzzz-tpzed-xurymjxw79nv3jz"
76 "name":"collection name",
77 "portable_data_hash":"6acf043b102afcf04e3be2443e7ea2ba+223",
81 "uuid":"zzzzz-4zz18-0pg114rezrbz46u"
85 The request can also include a @download_filename@ parameter with a desired name for the downloaded zip file. This filename will be included in the @Content-Disposition@ response header. If this parameter is not provided, the filename suggested in the response header will be based on the collection name or portable data hash:
86 * @{collection name}.zip@ if downloading an entire collection
87 * @{collection name} - {file name}.zip@ if a single file was specified in the request
88 * @{collection name} - 3 files.zip@ if a directory or multiple files were specified in the request
89 * @{portable data hash}.zip@, @{portable data hash} - {file name}.zip@, etc., if the source collection was specified by portable data hash rather than name or UUID
94 GET /by_id/zzzzz-4zz18-0pg114rezrbz46u
95 Accept: application/zip
96 Content-Type: application/json
99 "download_filename": "odd-numbered files and directories.zip",
105 "include_collection_metadata": true
109 h3(#auth). Authentication mechanisms
111 A token can be provided in an Authorization header as a @Bearer@ token:
114 Authorization: Bearer o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
117 A token can also be provided with "RFC 7617 Basic Authentication":https://tools.ietf.org/html/rfc7617 in this case, the payload is formatted as @username:token@ and encoded with base64. The username must be non-empty, but is ignored. In this example, the username is "user":
120 Authorization: Basic dXNlcjpvMDdqNHB4N1JsSks0Q3VNWXA3QzBMRFQ0Q3pSMUoxcUJFNUF2bzdlQ2NVak9UaWt4Swo=
123 A base64-encoded token can be provided in a cookie named "api_token":
126 Cookie: api_token=bzA3ajRweDdSbEpLNEN1TVlwN0MwTERUNEN6UjFKMXFCRTVBdm83ZUNjVWpPVGlreEs=
129 A token can be provided in an URL-encoded query string:
132 GET /foo/bar.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
135 A token can be provided in a URL-encoded path (as described in the previous section):
138 GET /t=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK/_/foo/bar.txt
141 A suitably encoded token can be provided in a POST body if the request has a content type of application/x-www-form-urlencoded or multipart/form-data:
145 Content-Type: application/x-www-form-urlencoded
147 api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
150 If a token is provided in a query string or in a POST request, the response is an HTTP 303 redirect to an equivalent GET request, with the token stripped from the query string and added to a cookie instead.
154 Keep-web returns a generic HTML index listing when a directory is requested with the GET method. It does not serve a default file like "index.html". Directory listings are also returned for WebDAV PROPFIND requests.
158 Keep-web supports partial resource reads using the HTTP @Range@ header as specified in "RFC 7233":https://tools.ietf.org/html/rfc7233 .
162 Client-provided authorization tokens are ignored if the client does not provide a @Host@ header.
164 In order to use the query string or a POST form authorization mechanisms, the client must follow 303 redirects; the client must accept cookies with a 303 response and send those cookies when performing the redirect; and either the client or an intervening proxy must resolve a relative URL ("//host/path") if given in a response Location header.
168 Normally, Keep-web accepts requests for multiple collections using the same host name, provided the client's credentials are not being used. This provides insufficient XSS protection in an installation where the "anonymously accessible" data is not truly public, but merely protected by network topology.
170 In such cases -- for example, a site which is not reachable from the internet, where some data is world-readable from Arvados's perspective but is intended to be available only to users within the local network -- the downstream proxy should configured to return 401 for all paths beginning with "/c=".
174 Without the same-origin protection outlined above, a web page stored in collection X could execute JavaScript code that uses the current viewer's credentials to download additional data from collection Y -- data which is accessible to the current viewer, but not to the author of collection X -- from the same origin (``https://collections.example.com/'') and upload it to some other site chosen by the author of collection X.