Tom Clegg [Mon, 29 May 2023 19:11:27 +0000 (15:11 -0400)]
20540: Use arvados.Client for arvadosclient.ArvadosClient reqs.
Caller-specified Retries is used as a timeout in minutes.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 26 May 2023 19:22:01 +0000 (15:22 -0400)]
20540: Remove arvadosclient usage in copier and git_tree.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 26 May 2023 14:29:16 +0000 (10:29 -0400)]
Merge branch '20511-aborted-boot'
refs #20511
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Lucas Di Pentima [Fri, 26 May 2023 13:42:20 +0000 (10:42 -0300)]
Merge branch '20474-installer-api-reqs-limit'. Closes #20474
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Tom Clegg [Thu, 25 May 2023 21:45:50 +0000 (17:45 -0400)]
20511: Fix allowing too many supervisor processes.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 25 May 2023 21:10:48 +0000 (17:10 -0400)]
20511: Don't shutdown excess instances just because MaxSupervisors.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Lucas Di Pentima [Thu, 25 May 2023 18:58:53 +0000 (15:58 -0300)]
20474: Updates documentation stating that RailsAPI supports /metrics endpoint.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 25 May 2023 18:52:32 +0000 (15:52 -0300)]
20474: Changes RailsAPI queue size to be 10% more than controller's.
Also, adds comment on config file clarifying why we're setting up this
increased value.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Tom Clegg [Thu, 25 May 2023 17:57:58 +0000 (13:57 -0400)]
20511: Fix unclosed response body in test.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 25 May 2023 17:43:34 +0000 (13:43 -0400)]
20511: Don't shutdown/unlock based on dynamic maxConcurrency.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 25 May 2023 15:25:02 +0000 (11:25 -0400)]
20511: Limit server-to-server client to 1/4 of API req capacity.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 25 May 2023 14:33:53 +0000 (10:33 -0400)]
20511: Fix slow-expansion logic.
Limit was always being raised to 2x known-working, instead of
min(+10%, 2x known-working) as intended.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 25 May 2023 14:12:40 +0000 (10:12 -0400)]
20511: Start with 8 concurrent outgoing API calls, not unlimited.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Peter Amstutz [Mon, 22 May 2023 21:16:12 +0000 (17:16 -0400)]
Add 2.6.2 to the upgrading notes refs #20393
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Lucas Di Pentima [Mon, 22 May 2023 21:13:53 +0000 (18:13 -0300)]
20474: Adds 5 to controller's request queue size for RailsAPI.
This size difference would allow extra requests (like metrics) to happen
even on heavily loaded clusters.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Peter Amstutz [Mon, 22 May 2023 15:33:46 +0000 (11:33 -0400)]
Merge branch '20529-container-deadlocks' refs #20529
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Brett Smith [Mon, 22 May 2023 14:47:29 +0000 (10:47 -0400)]
Merge branch '20527-group-contents-select-doc'
Closes #20527.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Peter Amstutz [Fri, 19 May 2023 19:01:34 +0000 (15:01 -0400)]
20529: Lock direct parent containers on updates
This is because cost is propagated up to the parent container on
container completion.
Handle case of resource attrs at top level when deciding whether to
lock or not (resource attrs appear as strings, not symbols)
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Lucas Di Pentima [Fri, 19 May 2023 18:11:57 +0000 (15:11 -0300)]
Merge branch '20482-installer-improvements'. Closes #20482
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Fri, 19 May 2023 16:16:56 +0000 (13:16 -0300)]
20482: For some reason, symlinks don't seem to be ignored for copyright checks.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Fri, 19 May 2023 15:06:33 +0000 (12:06 -0300)]
20482: Updates the upgrade notice to include the DOMAIN envvar at local.params.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 18 May 2023 20:36:10 +0000 (17:36 -0300)]
20482: Updates installer's documentation to reflect latest changes.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 18 May 2023 20:35:17 +0000 (17:35 -0300)]
20482: Code cleanup for readability.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 18 May 2023 20:31:37 +0000 (17:31 -0300)]
20482: Re-exports VPC's CIDR.
Previously exported as 'vpc_cidr' and removed when preexisting vpc usage
was added. This config data is used on local.params and was mentioned on the
documentation page.
Now, it's exported as 'cluster_int_cidr' and its value is requested from AWS
so that we get the correct one whether the vpc was just created or a
previously existing one is being in use.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Fri, 19 May 2023 14:57:03 +0000 (11:57 -0300)]
20482: Overrides shellinabox config templates to fix the domain name usage.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 18 May 2023 14:22:10 +0000 (11:22 -0300)]
20482: Allows the cluster operator to use an arbitrary domain.
Instead of making domains like cluster_prefix.domain mandatory, let the site
admin to select whichever domain they need for the deployment.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Brett Smith [Fri, 19 May 2023 14:39:03 +0000 (10:39 -0400)]
20527: Document the select argument of the groups contents API method
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Thu, 18 May 2023 15:40:55 +0000 (11:40 -0400)]
Merge branch '12684-pysdk-auto-retry'
Closes #12684.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Thu, 18 May 2023 12:48:13 +0000 (08:48 -0400)]
12684: Test that plain 403 responses are not retried
Requested in review.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Thu, 18 May 2023 12:36:56 +0000 (08:36 -0400)]
12684: Stop retrying 422 responses in PySDK
The original motivation for this was to retry when the API server was
having database connectivity problems. The feeling eight years later is
that things have changed enough that, on balance, this isn't worth
retrying anymore.
I don't think this will have any real impact on current Arvados
software. In the main branch as I write this,
`check_http_response_status` only gets called in five places. Three of
those are in the main `arvados` module for job and task utilities, which
presumably nobody is using anymore. The other two talk to Keep, which
only returns 422 for hash mismatches, where a retry will definitely
never succeed.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Tom Clegg [Wed, 17 May 2023 15:19:35 +0000 (11:19 -0400)]
Merge branch '20457-queue-churn'
refs #20457
refs #20511
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Lucas Di Pentima [Wed, 17 May 2023 13:28:16 +0000 (10:28 -0300)]
Merge branch '20482-further-terraform-improvements'. Refs #20482
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 17 May 2023 13:20:54 +0000 (10:20 -0300)]
Merge branch '20325-go-docker-distribution-upgrade2'. Refs #20325
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Tue, 16 May 2023 18:43:48 +0000 (15:43 -0300)]
20325: Upgrades github.com/docker/distribution module.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Tom Clegg [Tue, 16 May 2023 13:45:44 +0000 (09:45 -0400)]
20457: Don't displace locked containers with queued containers.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Lucas Di Pentima [Mon, 15 May 2023 13:24:00 +0000 (10:24 -0300)]
20482: Makes installer.sh compatible with older (Ubuntu 18.04) git versions.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Mon, 15 May 2023 13:21:29 +0000 (10:21 -0300)]
20482: Don't create S3 endpoint if using a preexisting VPC.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Mon, 15 May 2023 13:20:52 +0000 (10:20 -0300)]
20482: Fixes formatting.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Brett Smith [Thu, 11 May 2023 15:53:41 +0000 (11:53 -0400)]
Refine PySDK collection walk recipe
Use PurePosixPath to clarify that we're strictly doing path manipulation.
(It will also behave better on Windows, although I'm not sure if the SDK
itself is Windows-ready yet.)
Keep Path objects in the queue to reduce local state.
No issue #
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 15:48:33 +0000 (12:48 -0300)]
Merge branch '20482-terraform-custom-configs'. Closes #20482
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 15:47:28 +0000 (12:47 -0300)]
20482: Improves upgrade note's title.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 14:52:36 +0000 (11:52 -0300)]
20482: Improves file/dir creation on user_data script.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 14:49:46 +0000 (11:49 -0300)]
20482: Adds upgrade notes on domain_name variable changes.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 14:15:11 +0000 (11:15 -0300)]
20482: Pins terraform and AWS provider versions.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 13:03:20 +0000 (10:03 -0300)]
20482: Allows the site admin to specify instance volume sizes per node.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 12:48:07 +0000 (09:48 -0300)]
20482: Improves readability of instance profile assignment code.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Thu, 11 May 2023 12:40:05 +0000 (09:40 -0300)]
20482: Allows setting instance types per service node.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 22:09:39 +0000 (19:09 -0300)]
20482: Fixes PassRole's target to point to the newly created compute node role.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 21:42:08 +0000 (18:42 -0300)]
20482: Sets output as sensitive to make Terraform happy.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 20:38:48 +0000 (17:38 -0300)]
20482: Adds proper compute node instance profile instead of using keepstore's.
We first used keepstore's instance profile because compute nodes run a local
keepstore now.
We also need to give compute nodes permission to change resources related to
the EBS Autoscaler.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 20:05:55 +0000 (17:05 -0300)]
20482: Extracts DNS aliases map as configurable variables.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 19:51:55 +0000 (16:51 -0300)]
20482: Extracts the private IP addr map as configurable variables.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 19:29:10 +0000 (16:29 -0300)]
20482: Allows the site admin to customize tags applied to every resource.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 18:25:47 +0000 (15:25 -0300)]
20482: Allows the site admin to specify a custom AMI for the nodes.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Wed, 10 May 2023 17:45:40 +0000 (14:45 -0300)]
20482: Allows the admin to specify the user for deployment.
Also, removes the need to use AWS key pairs, by directly storing the SSH
pubkey in the user's ~/.ssh/ directory via the user-data script.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Tue, 9 May 2023 19:49:09 +0000 (16:49 -0300)]
Merge branch '20325-jquery-rails-upgrade'. Refs #20325
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Tue, 9 May 2023 18:47:23 +0000 (15:47 -0300)]
20325: Upgrades jquery-rails and dependencies on RailsAPI & Workbench1.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Tue, 9 May 2023 00:10:09 +0000 (21:10 -0300)]
20482: Allows deploying on known VPC & subnets.
Instead of creating everything new, the admin now has the option to deploy
the resources on preexisting networks.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Mon, 8 May 2023 15:11:49 +0000 (12:11 -0300)]
20482: Fixes use of var domain_name, it's now used for the Route53 zone.
Also, updates documentation including the new private_only var.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Sat, 6 May 2023 18:18:54 +0000 (15:18 -0300)]
20482: Allow the site admin to create a non-public Arvados cluster.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Sat, 6 May 2023 18:14:43 +0000 (15:14 -0300)]
20482: Fixes S3 bucket creation for Keep blocks due to changes in AWS defaults.
ACLs are now not accepted on newly created S3 buckets, and by default they're
set as private, so there's no need for us to explicitly asking for that.
See: https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-s3-automatically-enable-block-public-access-disable-access-control-lists-buckets-april-2023/
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Lucas Di Pentima [Tue, 9 May 2023 15:17:11 +0000 (12:17 -0300)]
Merge branch '20489-iam-policy-fix'. Closes #20489
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Brett Smith [Tue, 9 May 2023 15:16:22 +0000 (11:16 -0400)]
12684: Use mock services in arvfile sparse write tests
Without these mocks, Jenkins seems to spend a lot of time retrying
requests—although weirdly, I don't see that in my own development
environment.
I believe the mocks were always intended to be used, since they're
instantiated and already used in other sparse write tests. To me this
looks like an oversight when the previous tests were adapted to write
new collections.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Lucas Di Pentima [Tue, 9 May 2023 15:03:24 +0000 (12:03 -0300)]
20489: Fixes privileges escalation issue on installer's terraform code.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Brett Smith [Tue, 9 May 2023 14:04:41 +0000 (10:04 -0400)]
12684/20425: Skip TestContainerInputOnDifferentCluster
See comments for rationale.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Tom Clegg [Fri, 5 May 2023 17:57:11 +0000 (13:57 -0400)]
Merge branch '20457-max-supervisors-overquota'
refs #20457
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Lucas Di Pentima [Fri, 5 May 2023 17:43:51 +0000 (14:43 -0300)]
Merge branch '20468-installer-perf-knobs'. Closes #20468
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Peter Amstutz [Fri, 5 May 2023 17:33:57 +0000 (13:33 -0400)]
Merge branch '20470-contents-select' refs #20470
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Fri, 5 May 2023 17:12:25 +0000 (13:12 -0400)]
Merge branch '20457-max-supervisors-overquota' refs #20457
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Tom Clegg [Fri, 5 May 2023 14:03:00 +0000 (10:03 -0400)]
Check /metrics & /_inspect/requests are available during busy times.
refs #20474
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Brett Smith [Fri, 5 May 2023 13:49:04 +0000 (09:49 -0400)]
12684: Check for no log case in controller integration tests
Without this guard, tests fail with a message "API endpoint not found,"
which sounds scary and makes you think you broke all of Arvados until
you see the test code is just looking up a collection with an empty
UUID.
And by "you," I mean me.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Peter Amstutz [Thu, 4 May 2023 23:13:51 +0000 (19:13 -0400)]
Merge branch '20472-priority-update' refs #20472
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 23:09:05 +0000 (19:09 -0400)]
20470: select_for_klass checks for bogus prefixed fields
Update comments
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Brett Smith [Thu, 4 May 2023 21:41:23 +0000 (17:41 -0400)]
12684: PySDK client retries specific 4xx errors
The rationale for retrying these codes is the same as for retrying them
in the retry module.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Brett Smith [Thu, 4 May 2023 20:21:08 +0000 (16:21 -0400)]
12684: Support num_retries in PySDK client constructors
This lets users set their preferred retry strategy once, rather than in
every call to execute(), which is error-prone. The default num_retries
is 10 because we expect most users to care more about eventual success
than responsiveness. See the added release notes for further discussion
and rationale.
Changes to the rest of the code are mostly about supporting this
consistently. Tests that relied on the old no-default-num_retries
behavior now specify that explicitly.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Peter Amstutz [Thu, 4 May 2023 22:34:01 +0000 (18:34 -0400)]
20470: Fix discovery document generation to drop unpublished fields
Now uses the list of API published fields (selectable_attributes) to
generate discovery doc, this causes some obsolete and nonpublic fields
to disappear from the discovery doc (but actually they were never part
of the public API in the first place).
The immediate reason to do this is because workbench 1 was using the
discovery document to craft a list of fields to select, but the
changes to the way select work in this branch means that asking for
unpublished fields now throws an error.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Brett Smith [Wed, 3 May 2023 18:19:06 +0000 (14:19 -0400)]
12684: Remove custom retry logic from PySDK
This logic traces its roots back to
`
5722c604c6f5dc1553674d179ec016ec12e2b090`. The goal of that commit was to
work around a bug in httplib, which we no longer use as a client
library. `
31eb1bdc31e1d030844a6fdc7f4ba4286ec79d4f` made an analogous
change for httplib2.
`
8a0eb69984a93852ec888cd3e02b778b0be758ed` made three major changes:
1. Proactively close sockets if they seem likely to be stale
2. Wrap the retry logic in a loop
3. Generalize catching `httplib.BadStatusLine` to `httplib.HTTPException`
(which covers all kinds of malformed HTTP responses)
However, #1 functionally obsoletes the exception handlers added in the
earlier commits. Preemptively closing the sockets prevents httplib/2
from trying to reuse stale ones. So these exception handlers, along with
their retry loops, no longer serve their original purpose.
Remove this logic in favor of using the retry logic built into
googleapiclient. That logic is easier to configure and more refined.
Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>
Tom Clegg [Thu, 4 May 2023 20:17:08 +0000 (16:17 -0400)]
Merge branch '20475-dump-busy-queue'
closes #20475
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Peter Amstutz [Thu, 4 May 2023 20:11:49 +0000 (16:11 -0400)]
20472: Always do "select for update" before priority update
Code cleanup.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 19:38:59 +0000 (15:38 -0400)]
20470: Restore error selecting on invalid fields
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Lucas Di Pentima [Thu, 4 May 2023 18:40:12 +0000 (15:40 -0300)]
20468: Adds config knobs for RailsAPI & controller performance tuning.
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>
Peter Amstutz [Thu, 4 May 2023 17:34:27 +0000 (13:34 -0400)]
20472: Need to make sure :id is selected in update_priority for reload
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 15:51:32 +0000 (11:51 -0400)]
20472: Add a couple more cancellation tests
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 03:11:14 +0000 (23:11 -0400)]
20472: Add a few comments and add container_tree function
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 02:58:15 +0000 (22:58 -0400)]
20472: Inherit priority being propagated down
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 01:31:06 +0000 (21:31 -0400)]
20472: Remove special handling of update_priority
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 01:17:27 +0000 (21:17 -0400)]
20470: Remove locks on containers table
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 01:13:06 +0000 (21:13 -0400)]
20470: Update priorities with a single stored query
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Thu, 4 May 2023 14:34:48 +0000 (10:34 -0400)]
20470: Disallow selecting manifest_text on group contents
Because it currently won't be signed.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Tom Clegg [Wed, 3 May 2023 20:40:59 +0000 (16:40 -0400)]
20475: Option to dump active requests when queue is >=90% full.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Peter Amstutz [Wed, 3 May 2023 17:50:31 +0000 (13:50 -0400)]
20470: Fix tests
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Tom Clegg [Wed, 3 May 2023 13:51:57 +0000 (09:51 -0400)]
20457: Add dispatchcloud_probe_age_seconds_max and _median metrics.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 3 May 2023 13:41:58 +0000 (09:41 -0400)]
Merge branch '18790-log-client'
closes #18790
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Peter Amstutz [Wed, 3 May 2023 02:09:46 +0000 (22:09 -0400)]
20470: Handle nil selection, selecting on writable_by
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Tue, 2 May 2023 22:08:41 +0000 (18:08 -0400)]
20470: Implement select parameter for 'contents' API calls
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Tom Clegg [Tue, 2 May 2023 21:16:05 +0000 (17:16 -0400)]
20457: Include delayed supervisor containers in overquota metric.
Previously, supervisor containers that had high enough priority to
run, but weren't scheduled because of SupervisorFraction, were not
counted in the containers_over_quota metric. This caused the
"overquota" metric to show a misleading time series as non-supervisor
containers made their way through the queue and the delayed supervisor
containers flapped between "not allocated because quota" (counted) and
"not allocated because SupervisorFraction" (not counted).
With this change, un-mappable supervisors always count toward the
containers_not_allocated_over_quota metric.
This also applies the "unlock if previously locked but now delayed due
to SupervisorFraction" logic to supervisor processes, which was
previously overlooked. This prevents supervisors from staying in
Locked state after being bumped by higher-priority containers.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 1 May 2023 20:11:07 +0000 (16:11 -0400)]
Merge branch '20457-logs-and-mem-usage'
refs #20457
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 1 May 2023 19:40:12 +0000 (15:40 -0400)]
20457: Don't keep non-"tmp" mounts in memory at all.
Only "tmp" mounts are relevant for dispatch.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Peter Amstutz [Mon, 1 May 2023 19:17:37 +0000 (15:17 -0400)]
Merge branch '20432-getting-containers' refs #20432
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
Peter Amstutz [Mon, 1 May 2023 19:17:05 +0000 (15:17 -0400)]
20432: Tweak "error checking states on API server" message
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>