Peter Amstutz [Mon, 25 Mar 2024 18:19:32 +0000 (14:19 -0400)]
21541: Code cleanup and additional memory usage improvements
* Add slots to major Directory classes
* Disconnect FuseArvadosFile from ArvadosFile to reduce cyclic
references.
* Clean up _remove_inode loop and use dataclasses for the inode
operations.
* Now calls del_entry on collection_record_file and project_object_file.
It looks like collection_record_file was holding a reference to the
Collection object (and was remaining in the inodes table) even when
CollectionDirectory was cleared. I believe this is the memory leak I
have been looking for.
* Remove the "dead" flag and set parent_inode to None instead. This
clarifies the behavior that directory entries keep their (numeric)
inodes until they are detached from the directory which may have
contributed to infrequent "file not found" errors.
* Adjust cache behavior to only hold objects that are cache-eligible
and have non-zero cache_size. This avoids filling the cache with
entries that are just going to be skipped over.
Overall: Memory usage is mostly stable but does tend to creep up over
time. My best guess is that this is forced because we need to keep
inodes in RAM as long as the kernel maintains a reference to them, so
with multiple processes accessing different filesystem locations, this
is simply RAM required for the working set.
I'm also cautiously optimistic that issues I observed with performance
slowing down with long-lived processes are improved (e.g. fixing
memory leaks means no more unbounded growth of cache_entries, which
means no more time wasted iterating over huge lists).
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Wed, 6 Mar 2024 20:03:38 +0000 (15:03 -0500)]
21541: Fix KeyError, segfaults, and memory use issues
* Fixes a segfault on startup due to multiple threads fetching the
cluster config using the same http2 object, which is not threadafe.
Now fetches the relevant configuration
item once (ForwardSlashNameSubstitution), and stores it where all the
threads can access it. (bug #21568)
* Fixes KeyError thrown where a parent inode is removed from the
inodes table before its children.
* In the process of testing, re-discovered a bug where, if the llfuse
_notify_queue fills up, the entire mount process deadlocks.
The previous fix worked by monkey-patching llfuse to replace a
limited-length queue with an unlimited length queue, however changes
in subsequent llfuse versions caused that variable to be hidden from
Python (so the monkey patch didn't fail but it no longer had any
effect either). The solution is to introduce an additional
unlimited-size queue in between the operation handlers and the
limited-size kernel notification queue.
* Because cache management and inode cleanup interact with kernel
notifications (which were moved into a separate thread), I decided
they should also be asynchronous from the operation handlers, so they
are now part of the same thread that processes kernel notifications.
* Attempting to remove an inode that is in use will now at minimum
send a kernel invalidation, which will sometimes nudge the kernel to
forget the inode, enabling us to remove it.
* Filter groups now check if the filter group contains itself so it
doesn't create an infinite path loop that breaks filesystem traversal
tools.
* In the process of testing, found that llfuse didn't wait for the
_notify_queue to drain before closing the FUSE channel, resulting in a
segfault if the _notify_loop thread tried to process any events after
shutdown started. This bug cannot be worked around on the Arvados
side, so I have prepared an arvados-llfuse fork with a bug fix.
* Testing with arv-mount-stress-test (which creates 8 subprocesses that all
traverse the filesystem at the same time) now passes with no
filesystem errors, no deadlocks, no segfaults, and modest memory
usage.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Lucas Di Pentima [Thu, 14 Mar 2024 19:33:33 +0000 (16:33 -0300)]
21165: Adds extra state to uninstall wb1's package from workbench node.
I opted for specifically uninstalling the package instead of adding the
arvados.workbench.package.clean state, because that state also removes
packages that might be needed for other services.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Tom Clegg [Thu, 14 Mar 2024 19:25:46 +0000 (15:25 -0400)]
21449: Fix ordering in "install deps".
"install sdk/cli" is meant to use the current version of the arvados
gem from this checkout, but that can only happen if "install sdk/ruby"
runs first.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Brett Smith [Mon, 4 Mar 2024 21:19:23 +0000 (16:19 -0500)]
Clean up default tested package list
I was referring to this list while filing a bug and happened to notice
the redundant `keep-block-check` and `keep-rsync` entries. These caused
duplicate work during the package-build Jenkins jobs. No issue #.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Peter Amstutz [Mon, 4 Mar 2024 16:32:45 +0000 (11:32 -0500)]
20455: Use noopener everywhere on links and window.open
I removed "noreferrer" as this does something different (it prevents
passing the "Referer" header when opening the new URL). It's not
clear users benefits from suppressing the information that they
navigated to a link from workbench.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Brett Smith [Sun, 25 Feb 2024 22:00:48 +0000 (17:00 -0500)]
21504: Rewrite arv-mount help for more consistent style
Some key points:
* Prefer describing the effect with a phrase over a sentence
* Only use periods when there are multiple sentences
* Defaults in parentheses
* Consistent text where needed (particularly across --by-* and
--mount-by-* options and different --unmount options)
We don't really have a style guide for documentation like this. I'm not
trying to establish one by fiat. I'd be open to discussing basically all
these points. But until that discussion happens, consistency is
valuable, so I'm using the rules that I tend to follow naturally.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>