Brett Smith [Mon, 19 Aug 2024 21:34:25 +0000 (17:34 -0400)]
21901: Log all keep-web GET requests that request the first byte
We use this as a heuristic for "starting a new download." This heuristic
does mean that a GET request can avoid being logged if:
* it comes from the same source address (from the server's perspective)
* it uses the same API token
* it requests Range: bytes=1-
Those conditions seem acceptably narrow for now.
The http_range module used here is effectively a public export of the
same logic used by Go's HTTP FS server. I felt like it was important to
use the same logic to avoid different outcomes in the logging logic
vs. serving logic.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Tue, 30 Jul 2024 20:03:32 +0000 (16:03 -0400)]
21901: Introduce fileEventLog to keep-web
Rather than storing all log metadata twice (once as a logger field and
once as an Arvados log property), this lets us store it all in one place
and then export it as needed. This restructure will make it easier to
add smarts to the logging infrastructure.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Tom Clegg [Tue, 30 Jul 2024 16:52:04 +0000 (12:52 -0400)]
21927: Fix race condition in test.
Test was occasionally failing because the "wait for pending reqs, then
count reqs" step ran before the background refresh had progressed far
enough to be counted as a pending req.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Peter Amstutz [Thu, 25 Jul 2024 21:25:49 +0000 (17:25 -0400)]
21943: Fix bug and add integration test
The bug was caused by the fact that the path mapper de-duplicates file
references before doing the mapping, but also did not traverse
directory listings. If the list of files was de-duplicated by
selecting the entries from the directory listing (given a choice of 2+
places a given file reference appears in output), but then the
directory listings were skipped later on, those files wouldn't be
included in the path map at all.
Now trims the directory listings while constructing the list of files
to path map, so if the file is referenced in directory listings in the
output, it isn't lost to the dedup behavior.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 25 Jul 2024 14:44:13 +0000 (10:44 -0400)]
21993: Committed request with priority == 0 is treated as cancel
Now handles the case where a user cancels a container request before
it has a chance to run, which currently remains in the "Committed"
state rather than "Final". This fixes the bug where the workflow
runner would be stuck forever.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Zoë Ma [Mon, 22 Jul 2024 13:10:43 +0000 (21:10 +0800)]
22003: prevent redirect loop in Wb2 client-side redirect handler
Check that the target URL is not empty before setting
window.location.href (browsing to new URL), because setting it to empty
effectively reloads the page and triggers the redirect handler again,
causing an endless loop.
Arvados-DCO-1.1-Signed-off-by: Zoë Ma <zoe.ma@curii.com>
Zoë Ma [Fri, 12 Jul 2024 13:27:09 +0000 (21:27 +0800)]
22003: Workbench2 and keep-web: better interoperability with redirect
keep-web: When sending an unauthenticated browser client to a redirect
to Wb2, encode the target URL path in the query part of the redirection
URL in the `Location` header. This avoids possible corrupted header and
confused client.
Workbench2:
- In redirection handler, more robust handling of the input target-path
passed by the URL query part.
- In the "copy link to clipboard" action in the files panel of a
collection view, when creating the URL for clipboard, better emulate
the server-generated redirect URL (see above).
Overall, when working with redirects (either generating redirect URLs or
handling them on the client side), we're better prepared for paths that
may contain special characters.
Arvados-DCO-1.1-Signed-off-by: Zoë Ma <zoe.ma@curii.com>
Zoë Ma [Tue, 23 Jul 2024 10:34:04 +0000 (18:34 +0800)]
21998: Try to find original request's scheme in URL generated for wget.
In the "wget" example generated on the directory listing page, try to
make the scheme part in the URL argument closer to the original
request's, by using the value of "X-Forwarded-Proto" header if it's
valid.
Arvados-DCO-1.1-Signed-off-by: Zoë Ma <zoe.ma@curii.com>
Zoë Ma [Thu, 11 Jul 2024 17:05:06 +0000 (01:05 +0800)]
21998: properly percent-encode paths in keep-web directory listing page
Use percent-encoded form of the relative URLs as the value of "href"
attribute in directory listing page generated by keep-web
The "wget" command example shown on the page now has single-quotes
around the URL argument.
For testing involving complex URL or path patterns, use the HTML parser
provided by golang.org/x/net/html to scrape the directory listing page,
instead of using regular expressions that may get unwieldy.
Minor edits to the HTML template for compliance and ease of testing.
Arvados-DCO-1.1-Signed-off-by: Zoë Ma <zoe.ma@curii.com>
Brett Smith [Mon, 15 Jul 2024 19:12:18 +0000 (15:12 -0400)]
22001: Simplify virtualenv management in run-tests
A lot of the current virtualenv management code is parameterized,
probably to able to support virtualenvs for Python 2 and 3 in
parallel. Since we don't need that anymore, remove all this code to
simplify the script.
* initialize() calls setup_virtualenv(), which ensures the virtualenv
exists and then activates it. Everything else in the script can assume
this is done.
* install_env() installs the Python build tools and support libraries
we need.
* install_deps() installs our Python modules that are used by other
components in Arvados.
With all this, there should be no need for regular virtualenv
de/reactivation.
Various old incantations meant to work around old setuptools/pip bugs
got cleaned up as part of this too.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Fri, 12 Jul 2024 19:02:02 +0000 (15:02 -0400)]
21935: Merge _normalize_stream and _ranges into arvados._internal.streams
All these classes and functions are related to parsing and manipulating
streams, so it makes sense to collect them in one module with a name
reflecting that.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Fri, 12 Jul 2024 18:37:12 +0000 (14:37 -0400)]
21935: Move arvados.http_to_keep under arvados._internal
This functionality could be useful to provide to people writing their
own tools. However, because of the way it was extracted out of
arvados-cwl-runner, the API wasn't really designed to be public, and has
already had one breaking change since it was introduced.
Make the current module internal so the API doesn't ossify. We can
always develop a public API for it in the future.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Zoë Ma [Mon, 8 Jul 2024 07:15:26 +0000 (15:15 +0800)]
21989: Keepstore: Use comma separator for X-Keep-Storage-Classes-Confirmed header
When generating the X-Keep-Storage-Classes-Confirmed header, use comma
(",") instead of semicolon (";") as the separator for multi-entry value.
This is to be consistent with keep-proxy's behavior and the arv-put
client's expectation. Plus, it follows RFC 9110 (see sec. 5.2).
Without the fix, arv-put wrongly believes that storage classes are not
supported on the cluster it is putting to. The spurious warning
"X-Keep-Storage-Classes header not supported by the cluster" can be seen
in arv-put messages and in Workbench2.
Arvados-DCO-1.1-Signed-off-by: Zoë Ma <zoe.ma@curii.com>
Brett Smith [Tue, 2 Jul 2024 17:45:08 +0000 (13:45 -0400)]
21931: Revert net-imap dependency
This was updated as part of #21933 in 9167b2053ef2a39a4e8a7c13e4127e12470b14ed.
However, net-imap 0.4 requires Ruby > 2.7.3, while Ubuntu 20.04 only
ships with Ruby 2.7.0. Upgrading this gem doesn't seem to be required
for the new Rails release, so don't.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>