Peter Amstutz [Mon, 25 Mar 2024 18:19:32 +0000 (14:19 -0400)]
21541: Code cleanup and additional memory usage improvements
* Add slots to major Directory classes
* Disconnect FuseArvadosFile from ArvadosFile to reduce cyclic
references.
* Clean up _remove_inode loop and use dataclasses for the inode
operations.
* Now calls del_entry on collection_record_file and project_object_file.
It looks like collection_record_file was holding a reference to the
Collection object (and was remaining in the inodes table) even when
CollectionDirectory was cleared. I believe this is the memory leak I
have been looking for.
* Remove the "dead" flag and set parent_inode to None instead. This
clarifies the behavior that directory entries keep their (numeric)
inodes until they are detached from the directory which may have
contributed to infrequent "file not found" errors.
* Adjust cache behavior to only hold objects that are cache-eligible
and have non-zero cache_size. This avoids filling the cache with
entries that are just going to be skipped over.
Overall: Memory usage is mostly stable but does tend to creep up over
time. My best guess is that this is forced because we need to keep
inodes in RAM as long as the kernel maintains a reference to them, so
with multiple processes accessing different filesystem locations, this
is simply RAM required for the working set.
I'm also cautiously optimistic that issues I observed with performance
slowing down with long-lived processes are improved (e.g. fixing
memory leaks means no more unbounded growth of cache_entries, which
means no more time wasted iterating over huge lists).
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Brett Smith [Sat, 23 Mar 2024 18:16:35 +0000 (14:16 -0400)]
21601: Build Python distro packages from wheels
The immediate problem this solves is that, by building and installing
from a repository of wheels, Python packages can find their
interdependencies without any special logic in the build process.
Other benefits:
* Eliminates some redundant work. We don't have to build the Python SDK
from source multiple times. We can use the published cwltest wheel
instead of building our own.
* Prepares the code for PEP 517 compliance. We only invoke setup.py to
build packages that have not been updated yet. We introspect packages
from their wheels, so we no longer have to introspect the source to
build distro packages.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Sat, 16 Mar 2024 23:23:44 +0000 (19:23 -0400)]
21601: Specify Python interdependencies with ~=
This has the same rationale as using <= before, but it's stricter. It
should prevent pip from using release versions to satisfy development
dependencies in the future, and help root out bugs in our build
processes.
DRY up this logic in arvados_version.py.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Sat, 16 Mar 2024 23:02:59 +0000 (19:02 -0400)]
21601: Make arvados_version.py more declarative
The main goal of this change is to introduce the metadata maps near the
top of the file, which we will use to build additional
functionality. The rest of the changes are just modernization or
clean-up based on that.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Peter Amstutz [Wed, 6 Mar 2024 20:03:38 +0000 (15:03 -0500)]
21541: Fix KeyError, segfaults, and memory use issues
* Fixes a segfault on startup due to multiple threads fetching the
cluster config using the same http2 object, which is not threadafe.
Now fetches the relevant configuration
item once (ForwardSlashNameSubstitution), and stores it where all the
threads can access it. (bug #21568)
* Fixes KeyError thrown where a parent inode is removed from the
inodes table before its children.
* In the process of testing, re-discovered a bug where, if the llfuse
_notify_queue fills up, the entire mount process deadlocks.
The previous fix worked by monkey-patching llfuse to replace a
limited-length queue with an unlimited length queue, however changes
in subsequent llfuse versions caused that variable to be hidden from
Python (so the monkey patch didn't fail but it no longer had any
effect either). The solution is to introduce an additional
unlimited-size queue in between the operation handlers and the
limited-size kernel notification queue.
* Because cache management and inode cleanup interact with kernel
notifications (which were moved into a separate thread), I decided
they should also be asynchronous from the operation handlers, so they
are now part of the same thread that processes kernel notifications.
* Attempting to remove an inode that is in use will now at minimum
send a kernel invalidation, which will sometimes nudge the kernel to
forget the inode, enabling us to remove it.
* Filter groups now check if the filter group contains itself so it
doesn't create an infinite path loop that breaks filesystem traversal
tools.
* In the process of testing, found that llfuse didn't wait for the
_notify_queue to drain before closing the FUSE channel, resulting in a
segfault if the _notify_loop thread tried to process any events after
shutdown started. This bug cannot be worked around on the Arvados
side, so I have prepared an arvados-llfuse fork with a bug fix.
* Testing with arv-mount-stress-test (which creates 8 subprocesses that all
traverse the filesystem at the same time) now passes with no
filesystem errors, no deadlocks, no segfaults, and modest memory
usage.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Lucas Di Pentima [Thu, 14 Mar 2024 19:33:33 +0000 (16:33 -0300)]
21165: Adds extra state to uninstall wb1's package from workbench node.
I opted for specifically uninstalling the package instead of adding the
arvados.workbench.package.clean state, because that state also removes
packages that might be needed for other services.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Tom Clegg [Thu, 14 Mar 2024 19:25:46 +0000 (15:25 -0400)]
21449: Fix ordering in "install deps".
"install sdk/cli" is meant to use the current version of the arvados
gem from this checkout, but that can only happen if "install sdk/ruby"
runs first.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>