arvados.git
17 months ago20595: Scales nginx settings depending on max concurrent requests config. 20594-scaling-nginx-settings
Lucas Di Pentima [Fri, 26 May 2023 18:45:31 +0000 (15:45 -0300)]
20595: Scales nginx settings depending on max concurrent requests config.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

17 months agoMerge branch '20511-aborted-boot'
Tom Clegg [Fri, 26 May 2023 14:29:16 +0000 (10:29 -0400)]
Merge branch '20511-aborted-boot'

refs #20511

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

17 months agoMerge branch '20474-installer-api-reqs-limit'. Closes #20474
Lucas Di Pentima [Fri, 26 May 2023 13:42:20 +0000 (10:42 -0300)]
Merge branch '20474-installer-api-reqs-limit'. Closes #20474

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

17 months ago20511: Fix allowing too many supervisor processes. 20511-aborted-boot
Tom Clegg [Thu, 25 May 2023 21:45:50 +0000 (17:45 -0400)]
20511: Fix allowing too many supervisor processes.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

17 months ago20511: Don't shutdown excess instances just because MaxSupervisors.
Tom Clegg [Thu, 25 May 2023 21:10:48 +0000 (17:10 -0400)]
20511: Don't shutdown excess instances just because MaxSupervisors.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

17 months ago20474: Updates documentation stating that RailsAPI supports /metrics endpoint. 20474-installer-api-reqs-limit
Lucas Di Pentima [Thu, 25 May 2023 18:58:53 +0000 (15:58 -0300)]
20474: Updates documentation stating that RailsAPI supports /metrics endpoint.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

17 months ago20474: Changes RailsAPI queue size to be 10% more than controller's.
Lucas Di Pentima [Thu, 25 May 2023 18:52:32 +0000 (15:52 -0300)]
20474: Changes RailsAPI queue size to be 10% more than controller's.

Also, adds comment on config file clarifying why we're setting up this
increased value.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

17 months ago20511: Fix unclosed response body in test.
Tom Clegg [Thu, 25 May 2023 17:57:58 +0000 (13:57 -0400)]
20511: Fix unclosed response body in test.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

17 months ago20511: Don't shutdown/unlock based on dynamic maxConcurrency.
Tom Clegg [Thu, 25 May 2023 17:43:34 +0000 (13:43 -0400)]
20511: Don't shutdown/unlock based on dynamic maxConcurrency.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

17 months ago20511: Limit server-to-server client to 1/4 of API req capacity.
Tom Clegg [Thu, 25 May 2023 15:25:02 +0000 (11:25 -0400)]
20511: Limit server-to-server client to 1/4 of API req capacity.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

17 months ago20511: Fix slow-expansion logic.
Tom Clegg [Thu, 25 May 2023 14:33:53 +0000 (10:33 -0400)]
20511: Fix slow-expansion logic.

Limit was always being raised to 2x known-working, instead of
min(+10%, 2x known-working) as intended.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

17 months ago20511: Start with 8 concurrent outgoing API calls, not unlimited.
Tom Clegg [Thu, 25 May 2023 14:12:40 +0000 (10:12 -0400)]
20511: Start with 8 concurrent outgoing API calls, not unlimited.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months agoAdd 2.6.2 to the upgrading notes refs #20393
Peter Amstutz [Mon, 22 May 2023 21:16:12 +0000 (17:16 -0400)]
Add 2.6.2 to the upgrading notes refs #20393

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20474: Adds 5 to controller's request queue size for RailsAPI.
Lucas Di Pentima [Mon, 22 May 2023 21:13:53 +0000 (18:13 -0300)]
20474: Adds 5 to controller's request queue size for RailsAPI.

This size difference would allow extra requests (like metrics) to happen
even on heavily loaded clusters.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months agoMerge branch '20529-container-deadlocks' refs #20529
Peter Amstutz [Mon, 22 May 2023 15:33:46 +0000 (11:33 -0400)]
Merge branch '20529-container-deadlocks' refs #20529

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months agoMerge branch '20527-group-contents-select-doc'
Brett Smith [Mon, 22 May 2023 14:47:29 +0000 (10:47 -0400)]
Merge branch '20527-group-contents-select-doc'

Closes #20527.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months ago20529: Lock direct parent containers on updates 20529-container-deadlocks
Peter Amstutz [Fri, 19 May 2023 19:01:34 +0000 (15:01 -0400)]
20529: Lock direct parent containers on updates

This is because cost is propagated up to the parent container on
container completion.

Handle case of resource attrs at top level when deciding whether to
lock or not (resource attrs appear as strings, not symbols)

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months agoMerge branch '20482-installer-improvements'. Closes #20482
Lucas Di Pentima [Fri, 19 May 2023 18:11:57 +0000 (15:11 -0300)]
Merge branch '20482-installer-improvements'. Closes #20482

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: For some reason, symlinks don't seem to be ignored for copyright checks. 20482-installer-improvements
Lucas Di Pentima [Fri, 19 May 2023 16:16:56 +0000 (13:16 -0300)]
20482: For some reason, symlinks don't seem to be ignored for copyright checks.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Updates the upgrade notice to include the DOMAIN envvar at local.params.
Lucas Di Pentima [Fri, 19 May 2023 15:06:33 +0000 (12:06 -0300)]
20482: Updates the upgrade notice to include the DOMAIN envvar at local.params.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Updates installer's documentation to reflect latest changes.
Lucas Di Pentima [Thu, 18 May 2023 20:36:10 +0000 (17:36 -0300)]
20482: Updates installer's documentation to reflect latest changes.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Code cleanup for readability.
Lucas Di Pentima [Thu, 18 May 2023 20:35:17 +0000 (17:35 -0300)]
20482: Code cleanup for readability.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Re-exports VPC's CIDR.
Lucas Di Pentima [Thu, 18 May 2023 20:31:37 +0000 (17:31 -0300)]
20482: Re-exports VPC's CIDR.

Previously exported as 'vpc_cidr' and removed when preexisting vpc usage
was added. This config data is used on local.params and was mentioned on the
documentation page.
Now, it's exported as 'cluster_int_cidr' and its value is requested from AWS
so that we get the correct one whether the vpc was just created or a
previously existing one is being in use.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Overrides shellinabox config templates to fix the domain name usage.
Lucas Di Pentima [Fri, 19 May 2023 14:57:03 +0000 (11:57 -0300)]
20482: Overrides shellinabox config templates to fix the domain name usage.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allows the cluster operator to use an arbitrary domain.
Lucas Di Pentima [Thu, 18 May 2023 14:22:10 +0000 (11:22 -0300)]
20482: Allows the cluster operator to use an arbitrary domain.

Instead of making domains like cluster_prefix.domain mandatory, let the site
admin to select whichever domain they need for the deployment.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20527: Document the select argument of the groups contents API method 20527-group-contents-select-doc
Brett Smith [Fri, 19 May 2023 14:39:03 +0000 (10:39 -0400)]
20527: Document the select argument of the groups contents API method

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months agoMerge branch '12684-pysdk-auto-retry'
Brett Smith [Thu, 18 May 2023 15:40:55 +0000 (11:40 -0400)]
Merge branch '12684-pysdk-auto-retry'

Closes #12684.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months ago12684: Test that plain 403 responses are not retried 12684-pysdk-auto-retry
Brett Smith [Thu, 18 May 2023 12:48:13 +0000 (08:48 -0400)]
12684: Test that plain 403 responses are not retried

Requested in review.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months ago12684: Stop retrying 422 responses in PySDK
Brett Smith [Thu, 18 May 2023 12:36:56 +0000 (08:36 -0400)]
12684: Stop retrying 422 responses in PySDK

The original motivation for this was to retry when the API server was
having database connectivity problems. The feeling eight years later is
that things have changed enough that, on balance, this isn't worth
retrying anymore.

I don't think this will have any real impact on current Arvados
software. In the main branch as I write this,
`check_http_response_status` only gets called in five places. Three of
those are in the main `arvados` module for job and task utilities, which
presumably nobody is using anymore. The other two talk to Keep, which
only returns 422 for hash mismatches, where a retry will definitely
never succeed.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months agoMerge branch '20457-queue-churn'
Tom Clegg [Wed, 17 May 2023 15:19:35 +0000 (11:19 -0400)]
Merge branch '20457-queue-churn'

refs #20457
refs #20511

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months agoMerge branch '20482-further-terraform-improvements'. Refs #20482
Lucas Di Pentima [Wed, 17 May 2023 13:28:16 +0000 (10:28 -0300)]
Merge branch '20482-further-terraform-improvements'. Refs #20482

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months agoMerge branch '20325-go-docker-distribution-upgrade2'. Refs #20325
Lucas Di Pentima [Wed, 17 May 2023 13:20:54 +0000 (10:20 -0300)]
Merge branch '20325-go-docker-distribution-upgrade2'. Refs #20325

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20325: Upgrades github.com/docker/distribution module. 20325-go-docker-distribution-upgrade2
Lucas Di Pentima [Tue, 16 May 2023 18:43:48 +0000 (15:43 -0300)]
20325: Upgrades github.com/docker/distribution module.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20457: Don't displace locked containers with queued containers. 20457-queue-churn
Tom Clegg [Tue, 16 May 2023 13:45:44 +0000 (09:45 -0400)]
20457: Don't displace locked containers with queued containers.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months ago20482: Makes installer.sh compatible with older (Ubuntu 18.04) git versions. 20482-further-terraform-improvements
Lucas Di Pentima [Mon, 15 May 2023 13:24:00 +0000 (10:24 -0300)]
20482: Makes installer.sh compatible with older (Ubuntu 18.04) git versions.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Don't create S3 endpoint if using a preexisting VPC.
Lucas Di Pentima [Mon, 15 May 2023 13:21:29 +0000 (10:21 -0300)]
20482: Don't create S3 endpoint if using a preexisting VPC.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Fixes formatting.
Lucas Di Pentima [Mon, 15 May 2023 13:20:52 +0000 (10:20 -0300)]
20482: Fixes formatting.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months agoRefine PySDK collection walk recipe
Brett Smith [Thu, 11 May 2023 15:53:41 +0000 (11:53 -0400)]
Refine PySDK collection walk recipe

Use PurePosixPath to clarify that we're strictly doing path manipulation.
(It will also behave better on Windows, although I'm not sure if the SDK
itself is Windows-ready yet.)

Keep Path objects in the queue to reduce local state.

No issue #

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months agoMerge branch '20482-terraform-custom-configs'. Closes #20482
Lucas Di Pentima [Thu, 11 May 2023 15:48:33 +0000 (12:48 -0300)]
Merge branch '20482-terraform-custom-configs'. Closes #20482

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Improves upgrade note's title. 20482-terraform-custom-configs
Lucas Di Pentima [Thu, 11 May 2023 15:47:28 +0000 (12:47 -0300)]
20482: Improves upgrade note's title.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Improves file/dir creation on user_data script.
Lucas Di Pentima [Thu, 11 May 2023 14:52:36 +0000 (11:52 -0300)]
20482: Improves file/dir creation on user_data script.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Adds upgrade notes on domain_name variable changes.
Lucas Di Pentima [Thu, 11 May 2023 14:49:46 +0000 (11:49 -0300)]
20482: Adds upgrade notes on domain_name variable changes.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Pins terraform and AWS provider versions.
Lucas Di Pentima [Thu, 11 May 2023 14:15:11 +0000 (11:15 -0300)]
20482: Pins terraform and AWS provider versions.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allows the site admin to specify instance volume sizes per node.
Lucas Di Pentima [Thu, 11 May 2023 13:03:20 +0000 (10:03 -0300)]
20482: Allows the site admin to specify instance volume sizes per node.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Improves readability of instance profile assignment code.
Lucas Di Pentima [Thu, 11 May 2023 12:48:07 +0000 (09:48 -0300)]
20482: Improves readability of instance profile assignment code.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allows setting instance types per service node.
Lucas Di Pentima [Thu, 11 May 2023 12:40:05 +0000 (09:40 -0300)]
20482: Allows setting instance types per service node.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Fixes PassRole's target to point to the newly created compute node role.
Lucas Di Pentima [Wed, 10 May 2023 22:09:39 +0000 (19:09 -0300)]
20482: Fixes PassRole's target to point to the newly created compute node role.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Sets output as sensitive to make Terraform happy.
Lucas Di Pentima [Wed, 10 May 2023 21:42:08 +0000 (18:42 -0300)]
20482: Sets output as sensitive to make Terraform happy.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Adds proper compute node instance profile instead of using keepstore's.
Lucas Di Pentima [Wed, 10 May 2023 20:38:48 +0000 (17:38 -0300)]
20482: Adds proper compute node instance profile instead of using keepstore's.

We first used keepstore's instance profile because compute nodes run a local
keepstore now.
We also need to give compute nodes permission to change resources related to
the EBS Autoscaler.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Extracts DNS aliases map as configurable variables. 20482-terraform-private-only-infra
Lucas Di Pentima [Wed, 10 May 2023 20:05:55 +0000 (17:05 -0300)]
20482: Extracts DNS aliases map as configurable variables.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Extracts the private IP addr map as configurable variables.
Lucas Di Pentima [Wed, 10 May 2023 19:51:55 +0000 (16:51 -0300)]
20482: Extracts the private IP addr map as configurable variables.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allows the site admin to customize tags applied to every resource.
Lucas Di Pentima [Wed, 10 May 2023 19:29:10 +0000 (16:29 -0300)]
20482: Allows the site admin to customize tags applied to every resource.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allows the site admin to specify a custom AMI for the nodes.
Lucas Di Pentima [Wed, 10 May 2023 18:25:47 +0000 (15:25 -0300)]
20482: Allows the site admin to specify a custom AMI for the nodes.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allows the admin to specify the user for deployment.
Lucas Di Pentima [Wed, 10 May 2023 17:45:40 +0000 (14:45 -0300)]
20482: Allows the admin to specify the user for deployment.

Also, removes the need to use AWS key pairs, by directly storing the SSH
pubkey in the user's ~/.ssh/ directory via the user-data script.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months agoMerge branch '20325-jquery-rails-upgrade'. Refs #20325
Lucas Di Pentima [Tue, 9 May 2023 19:49:09 +0000 (16:49 -0300)]
Merge branch '20325-jquery-rails-upgrade'. Refs #20325

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20325: Upgrades jquery-rails and dependencies on RailsAPI & Workbench1. 20325-jquery-rails-upgrade
Lucas Di Pentima [Tue, 9 May 2023 18:47:23 +0000 (15:47 -0300)]
20325: Upgrades jquery-rails and dependencies on RailsAPI & Workbench1.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allows deploying on known VPC & subnets.
Lucas Di Pentima [Tue, 9 May 2023 00:10:09 +0000 (21:10 -0300)]
20482: Allows deploying on known VPC & subnets.

Instead of creating everything new, the admin now has the option to deploy
the resources on preexisting networks.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Fixes use of var domain_name, it's now used for the Route53 zone.
Lucas Di Pentima [Mon, 8 May 2023 15:11:49 +0000 (12:11 -0300)]
20482: Fixes use of var domain_name, it's now used for the Route53 zone.

Also, updates documentation including the new private_only var.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Allow the site admin to create a non-public Arvados cluster.
Lucas Di Pentima [Sat, 6 May 2023 18:18:54 +0000 (15:18 -0300)]
20482: Allow the site admin to create a non-public Arvados cluster.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20482: Fixes S3 bucket creation for Keep blocks due to changes in AWS defaults.
Lucas Di Pentima [Sat, 6 May 2023 18:14:43 +0000 (15:14 -0300)]
20482: Fixes S3 bucket creation for Keep blocks due to changes in AWS defaults.

ACLs are now not accepted on newly created S3 buckets, and by default they're
set as private, so there's no need for us to explicitly asking for that.

See: https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-s3-automatically-enable-block-public-access-disable-access-control-lists-buckets-april-2023/

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months agoMerge branch '20489-iam-policy-fix'. Closes #20489
Lucas Di Pentima [Tue, 9 May 2023 15:17:11 +0000 (12:17 -0300)]
Merge branch '20489-iam-policy-fix'. Closes #20489

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago12684: Use mock services in arvfile sparse write tests
Brett Smith [Tue, 9 May 2023 15:16:22 +0000 (11:16 -0400)]
12684: Use mock services in arvfile sparse write tests

Without these mocks, Jenkins seems to spend a lot of time retrying
requests—although weirdly, I don't see that in my own development
environment.

I believe the mocks were always intended to be used, since they're
instantiated and already used in other sparse write tests. To me this
looks like an oversight when the previous tests were adapted to write
new collections.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months ago20489: Fixes privileges escalation issue on installer's terraform code. 20489-iam-policy-fix
Lucas Di Pentima [Tue, 9 May 2023 15:03:24 +0000 (12:03 -0300)]
20489: Fixes privileges escalation issue on installer's terraform code.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago12684/20425: Skip TestContainerInputOnDifferentCluster
Brett Smith [Tue, 9 May 2023 14:04:41 +0000 (10:04 -0400)]
12684/20425: Skip TestContainerInputOnDifferentCluster

See comments for rationale.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months agoMerge branch '20457-max-supervisors-overquota'
Tom Clegg [Fri, 5 May 2023 17:57:11 +0000 (13:57 -0400)]
Merge branch '20457-max-supervisors-overquota'

refs #20457

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months agoMerge branch '20468-installer-perf-knobs'. Closes #20468
Lucas Di Pentima [Fri, 5 May 2023 17:43:51 +0000 (14:43 -0300)]
Merge branch '20468-installer-perf-knobs'. Closes #20468

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months agoMerge branch '20470-contents-select' refs #20470
Peter Amstutz [Fri, 5 May 2023 17:33:57 +0000 (13:33 -0400)]
Merge branch '20470-contents-select' refs #20470

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months agoMerge branch '20457-max-supervisors-overquota' refs #20457
Peter Amstutz [Fri, 5 May 2023 17:12:25 +0000 (13:12 -0400)]
Merge branch '20457-max-supervisors-overquota' refs #20457

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months agoCheck /metrics & /_inspect/requests are available during busy times.
Tom Clegg [Fri, 5 May 2023 14:03:00 +0000 (10:03 -0400)]
Check /metrics & /_inspect/requests are available during busy times.

refs #20474

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months ago12684: Check for no log case in controller integration tests
Brett Smith [Fri, 5 May 2023 13:49:04 +0000 (09:49 -0400)]
12684: Check for no log case in controller integration tests

Without this guard, tests fail with a message "API endpoint not found,"
which sounds scary and makes you think you broke all of Arvados until
you see the test code is just looking up a collection with an empty
UUID.

And by "you," I mean me.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months agoMerge branch '20472-priority-update' refs #20472
Peter Amstutz [Thu, 4 May 2023 23:13:51 +0000 (19:13 -0400)]
Merge branch '20472-priority-update' refs #20472

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20470: select_for_klass checks for bogus prefixed fields 20470-contents-select
Peter Amstutz [Thu, 4 May 2023 23:09:05 +0000 (19:09 -0400)]
20470: select_for_klass checks for bogus prefixed fields

Update comments

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago12684: PySDK client retries specific 4xx errors
Brett Smith [Thu, 4 May 2023 21:41:23 +0000 (17:41 -0400)]
12684: PySDK client retries specific 4xx errors

The rationale for retrying these codes is the same as for retrying them
in the retry module.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months ago12684: Support num_retries in PySDK client constructors
Brett Smith [Thu, 4 May 2023 20:21:08 +0000 (16:21 -0400)]
12684: Support num_retries in PySDK client constructors

This lets users set their preferred retry strategy once, rather than in
every call to execute(), which is error-prone. The default num_retries
is 10 because we expect most users to care more about eventual success
than responsiveness. See the added release notes for further discussion
and rationale.

Changes to the rest of the code are mostly about supporting this
consistently. Tests that relied on the old no-default-num_retries
behavior now specify that explicitly.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months ago20470: Fix discovery document generation to drop unpublished fields
Peter Amstutz [Thu, 4 May 2023 22:34:01 +0000 (18:34 -0400)]
20470: Fix discovery document generation to drop unpublished fields

Now uses the list of API published fields (selectable_attributes) to
generate discovery doc, this causes some obsolete and nonpublic fields
to disappear from the discovery doc (but actually they were never part
of the public API in the first place).

The immediate reason to do this is because workbench 1 was using the
discovery document to craft a list of fields to select, but the
changes to the way select work in this branch means that asking for
unpublished fields now throws an error.

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago12684: Remove custom retry logic from PySDK
Brett Smith [Wed, 3 May 2023 18:19:06 +0000 (14:19 -0400)]
12684: Remove custom retry logic from PySDK

This logic traces its roots back to
`5722c604c6f5dc1553674d179ec016ec12e2b090`. The goal of that commit was to
work around a bug in httplib, which we no longer use as a client
library. `31eb1bdc31e1d030844a6fdc7f4ba4286ec79d4f` made an analogous
change for httplib2.

`8a0eb69984a93852ec888cd3e02b778b0be758ed` made three major changes:

1. Proactively close sockets if they seem likely to be stale
2. Wrap the retry logic in a loop
3. Generalize catching `httplib.BadStatusLine` to `httplib.HTTPException`
   (which covers all kinds of malformed HTTP responses)

However, #1 functionally obsoletes the exception handlers added in the
earlier commits. Preemptively closing the sockets prevents httplib/2
from trying to reuse stale ones. So these exception handlers, along with
their retry loops, no longer serve their original purpose.

Remove this logic in favor of using the retry logic built into
googleapiclient. That logic is easier to configure and more refined.

Arvados-DCO-1.1-Signed-off-by: Brett Smith <brett.smith@curii.com>

18 months agoMerge branch '20475-dump-busy-queue'
Tom Clegg [Thu, 4 May 2023 20:17:08 +0000 (16:17 -0400)]
Merge branch '20475-dump-busy-queue'

closes #20475

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months ago20472: Always do "select for update" before priority update 20472-priority-update
Peter Amstutz [Thu, 4 May 2023 20:11:49 +0000 (16:11 -0400)]
20472: Always do "select for update" before priority update

Code cleanup.

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20470: Restore error selecting on invalid fields
Peter Amstutz [Thu, 4 May 2023 19:38:59 +0000 (15:38 -0400)]
20470: Restore error selecting on invalid fields

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20468: Adds config knobs for RailsAPI & controller performance tuning. 20468-installer-perf-knobs
Lucas Di Pentima [Thu, 4 May 2023 18:40:12 +0000 (15:40 -0300)]
20468: Adds config knobs for RailsAPI & controller performance tuning.

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

18 months ago20472: Need to make sure :id is selected in update_priority for reload
Peter Amstutz [Thu, 4 May 2023 17:34:27 +0000 (13:34 -0400)]
20472: Need to make sure :id is selected in update_priority for reload

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20472: Add a couple more cancellation tests
Peter Amstutz [Thu, 4 May 2023 15:51:32 +0000 (11:51 -0400)]
20472: Add a couple more cancellation tests

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20472: Add a few comments and add container_tree function
Peter Amstutz [Thu, 4 May 2023 03:11:14 +0000 (23:11 -0400)]
20472: Add a few comments and add container_tree function

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20472: Inherit priority being propagated down
Peter Amstutz [Thu, 4 May 2023 02:58:15 +0000 (22:58 -0400)]
20472: Inherit priority being propagated down

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20472: Remove special handling of update_priority
Peter Amstutz [Thu, 4 May 2023 01:31:06 +0000 (21:31 -0400)]
20472: Remove special handling of update_priority

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20470: Remove locks on containers table
Peter Amstutz [Thu, 4 May 2023 01:17:27 +0000 (21:17 -0400)]
20470: Remove locks on containers table

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20470: Update priorities with a single stored query
Peter Amstutz [Thu, 4 May 2023 01:13:06 +0000 (21:13 -0400)]
20470: Update priorities with a single stored query

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20470: Disallow selecting manifest_text on group contents
Peter Amstutz [Thu, 4 May 2023 14:34:48 +0000 (10:34 -0400)]
20470: Disallow selecting manifest_text on group contents

Because it currently won't be signed.

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20475: Option to dump active requests when queue is >=90% full. 20475-dump-busy-queue
Tom Clegg [Wed, 3 May 2023 20:40:59 +0000 (16:40 -0400)]
20475: Option to dump active requests when queue is >=90% full.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months ago20470: Fix tests
Peter Amstutz [Wed, 3 May 2023 17:50:31 +0000 (13:50 -0400)]
20470: Fix tests

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20457: Add dispatchcloud_probe_age_seconds_max and _median metrics.
Tom Clegg [Wed, 3 May 2023 13:51:57 +0000 (09:51 -0400)]
20457: Add dispatchcloud_probe_age_seconds_max and _median metrics.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months agoMerge branch '18790-log-client'
Tom Clegg [Wed, 3 May 2023 13:41:58 +0000 (09:41 -0400)]
Merge branch '18790-log-client'

closes #18790

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months ago20470: Handle nil selection, selecting on writable_by
Peter Amstutz [Wed, 3 May 2023 02:09:46 +0000 (22:09 -0400)]
20470: Handle nil selection, selecting on writable_by

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20470: Implement select parameter for 'contents' API calls
Peter Amstutz [Tue, 2 May 2023 22:08:41 +0000 (18:08 -0400)]
20470: Implement select parameter for 'contents' API calls

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20457: Include delayed supervisor containers in overquota metric. 20457-max-supervisors-overquota
Tom Clegg [Tue, 2 May 2023 21:16:05 +0000 (17:16 -0400)]
20457: Include delayed supervisor containers in overquota metric.

Previously, supervisor containers that had high enough priority to
run, but weren't scheduled because of SupervisorFraction, were not
counted in the containers_over_quota metric. This caused the
"overquota" metric to show a misleading time series as non-supervisor
containers made their way through the queue and the delayed supervisor
containers flapped between "not allocated because quota" (counted) and
"not allocated because SupervisorFraction" (not counted).

With this change, un-mappable supervisors always count toward the
containers_not_allocated_over_quota metric.

This also applies the "unlock if previously locked but now delayed due
to SupervisorFraction" logic to supervisor processes, which was
previously overlooked. This prevents supervisors from staying in
Locked state after being bumped by higher-priority containers.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months agoMerge branch '20457-logs-and-mem-usage'
Tom Clegg [Mon, 1 May 2023 20:11:07 +0000 (16:11 -0400)]
Merge branch '20457-logs-and-mem-usage'

refs #20457

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months ago20457: Don't keep non-"tmp" mounts in memory at all. 20457-logs-and-mem-usage
Tom Clegg [Mon, 1 May 2023 19:40:12 +0000 (15:40 -0400)]
20457: Don't keep non-"tmp" mounts in memory at all.

Only "tmp" mounts are relevant for dispatch.

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

18 months agoMerge branch '20432-getting-containers' refs #20432
Peter Amstutz [Mon, 1 May 2023 19:17:37 +0000 (15:17 -0400)]
Merge branch '20432-getting-containers' refs #20432

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months ago20432: Tweak "error checking states on API server" message
Peter Amstutz [Mon, 1 May 2023 19:17:05 +0000 (15:17 -0400)]
20432: Tweak "error checking states on API server" message

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>

18 months agoMerge branch '20447-less-table-locking'
Tom Clegg [Mon, 1 May 2023 16:07:45 +0000 (12:07 -0400)]
Merge branch '20447-less-table-locking'

fixes #20447

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>