1 // Keep-web provides read-only HTTP access to files stored in Keep. It
2 // serves public data to anonymous and unauthenticated clients, and
3 // serves private data to clients that supply Arvados API tokens. It
4 // can be installed anywhere with access to Keep services, typically
5 // behind a web proxy that supports TLS.
7 // See http://doc.arvados.org/install/install-keep-web.html.
11 // Serve HTTP requests at port 1234 on all interfaces:
13 // keep-web -address=:1234
15 // Serve HTTP requests at port 1234 on the interface with IP address 1.2.3.4:
17 // keep-web -address=1.2.3.4:1234
19 // Proxy configuration
21 // Keep-web does not support SSL natively. Typically, it is installed
22 // behind a proxy like nginx.
24 // Here is an example nginx configuration.
27 // upstream keep-web {
28 // server localhost:1234;
32 // server_name dl.example.com *.dl.example.com ~.*--dl.example.com;
33 // ssl_certificate /root/wildcard.example.com.crt;
34 // ssl_certificate_key /root/wildcard.example.com.key;
36 // proxy_pass http://keep-web;
37 // proxy_set_header Host $host;
38 // proxy_set_header X-Forwarded-For $remote_addr;
43 // It is not necessary to run keep-web on the same host as the nginx
44 // proxy. However, TLS is not used between nginx and keep-web, so
45 // intervening networks must be secured by other means.
49 // The following "same origin" URL patterns are supported for public
50 // collections (i.e., collections which can be served by keep-web
51 // without making use of any credentials supplied by the client). See
52 // "Same-origin URLs" below.
54 // http://dl.example.com/c=uuid_or_pdh/path/file.txt
55 // http://dl.example.com/c=uuid_or_pdh/t=TOKEN/path/file.txt
57 // The following "multiple origin" URL patterns are supported for all
60 // http://uuid_or_pdh--dl.example.com/path/file.txt
61 // http://uuid_or_pdh--dl.example.com/t=TOKEN/path/file.txt
63 // In the "multiple origin" form, the string "--" can be replaced with
64 // "." with identical results (assuming the upstream proxy is
65 // configured accordingly). These two are equivalent:
67 // http://uuid_or_pdh--dl.example.com/path/file.txt
68 // http://uuid_or_pdh.dl.example.com/path/file.txt
70 // The first form minimizes the cost and effort of deploying a
71 // wildcard TLS certificate for *.dl.example.com. The second form is
72 // likely to be easier to configure, and more efficient to run, on an
75 // In all of the above forms, the "dl.example.com" part can be
76 // anything at all: keep-web ignores everything after the first "." or
79 // In all of the above forms, the "uuid_or_pdh" part can be either a
80 // collection UUID or a portable data hash with the "+" character
83 // In all of the above forms, a top level directory called "_" is
84 // skipped. In cases where the "path/file.txt" part might start with
85 // "t=" or "c=" or "_/", links should be constructed with a leading
86 // "_/" to ensure the top level directory is not interpreted as a
87 // token or collection ID.
89 // Assuming there is a collection with UUID
90 // zzzzz-4zz18-znfnqtbbv4spc3w and portable data hash
91 // 1f4b0bc7583c2a7f9102c395f4ffc5e3+45, the following URLs are
94 // http://zzzzz-4zz18-znfnqtbbv4spc3w.dl.example.com/foo
95 // http://zzzzz-4zz18-znfnqtbbv4spc3w.dl.example.com/_/foo
96 // http://zzzzz-4zz18-znfnqtbbv4spc3w--dl.example.com/_/foo
97 // http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--foo.example.com/foo
98 // http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--.invalid/foo
100 // An additional form is supported specifically to make it more
101 // convenient to maintain support for existing Workbench download
104 // http://dl.example.com/collections/download/uuid_or_pdh/TOKEN/path/file.txt
106 // A regular Workbench "download" link is also accepted, but
107 // credentials passed via cookie, header, etc. are ignored. Only
108 // public data can be served this way:
110 // http://dl.example.com/collections/uuid_or_pdh/path/file.txt
112 // Authorization mechanisms
114 // A token can be provided in an Authorization header:
116 // Authorization: OAuth2 o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
118 // A base64-encoded token can be provided in a cookie named "api_token":
120 // Cookie: api_token=bzA3ajRweDdSbEpLNEN1TVlwN0MwTERUNEN6UjFKMXFCRTVBdm83ZUNjVWpPVGlreEs=
122 // A token can be provided in an URL-encoded query string:
124 // GET /foo.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
126 // A suitably encoded token can be provided in a POST body if the
127 // request has a content type of application/x-www-form-urlencoded or
128 // multipart/form-data:
131 // Content-Type: application/x-www-form-urlencoded
133 // api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
135 // If a token is provided in a query string or in a POST request, the
136 // response is an HTTP 303 redirect to an equivalent GET request, with
137 // the token stripped from the query string and added to a cookie
142 // Client-provided authorization tokens are ignored if the client does
143 // not provide a Host header.
145 // In order to use the query string or a POST form authorization
146 // mechanisms, the client must follow 303 redirects; the client must
147 // accept cookies with a 303 response and send those cookies when
148 // performing the redirect; and either the client or an intervening
149 // proxy must resolve a relative URL ("//host/path") if given in a
150 // response Location header.
154 // Normally, Keep-web accepts requests for multiple collections using
155 // the same host name, provided the client's credentials are not being
156 // used. This provides insufficient XSS protection in an installation
157 // where the "anonymously accessible" data is not truly public, but
158 // merely protected by network topology.
160 // In such cases -- for example, a site which is not reachable from
161 // the internet, where some data is world-readable from Arvados's
162 // perspective but is intended to be available only to users within
163 // the local network -- the upstream proxy should configured to return
164 // 401 for all paths beginning with "/c=".
168 // Without the same-origin protection outlined above, a web page
169 // stored in collection X could execute JavaScript code that uses the
170 // current viewer's credentials to download additional data from
171 // collection Y -- data which is accessible to the current viewer, but
172 // not to the author of collection X -- from the same origin
173 // (``https://dl.example.com/'') and upload it to some other site
174 // chosen by the author of collection X.
176 // Attachment-Only host
178 // It is possible to serve untrusted content and accept user
179 // credentials at the same origin as long as the content is only
180 // downloaded, never executed by browsers. A single origin (hostname
181 // and port) can be designated as an "attachment-only" origin: cookies
182 // will be accepted and all responses will have a
183 // "Content-Disposition: attachment" header. This behavior is invoked
184 // only when the designated origin matches exactly the Host header
185 // provided by the client or upstream proxy.
187 // keep-web -attachment-only-host domain.example:9999
189 // Trust All Content mode
191 // In "trust all content" mode, Keep-web will accept credentials (API
192 // tokens) and serve any collection X at
193 // "https://dl.example.com/collections/X/path/file.ext". This is
194 // UNSAFE except in the special case where everyone who is able write
195 // ANY data to Keep, and every JavaScript and HTML file written to
196 // Keep, is also trusted to read ALL of the data in Keep.
198 // In such cases you can enable trust-all-content mode.
200 // keep-web -trust-all-content [...]