Merge branch '17754-federated-acct-merge'. Refs #17754.
authorLucas Di Pentima <lucas.dipentima@curii.com>
Tue, 1 Mar 2022 22:29:47 +0000 (19:29 -0300)
committerLucas Di Pentima <lucas.dipentima@curii.com>
Tue, 1 Mar 2022 22:29:47 +0000 (19:29 -0300)
Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <lucas.dipentima@curii.com>

40 files changed:
doc/_config.yml
doc/admin/config-migration.html.textile.liquid [deleted file]
doc/admin/config.html.textile.liquid
doc/admin/maintenance-and-upgrading.html.textile.liquid [new file with mode: 0644]
doc/admin/upgrading.html.textile.liquid
doc/install/crunch2-cloud/install-compute-node.html.textile.liquid
doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid
doc/install/salt-multi-host.html.textile.liquid
doc/install/salt-single-host.html.textile.liquid
doc/user/cwl/cwl-run-options.html.textile.liquid
doc/user/cwl/cwl-runner.html.textile.liquid
lib/cloud/ec2/ec2.go
lib/config/config.default.yml
lib/config/load.go
lib/controller/localdb/login_oidc.go
lib/controller/localdb/login_oidc_test.go
lib/controller/localdb/login_pam.go
lib/controller/localdb/login_pam_static.go [new file with mode: 0644]
lib/crunchstat/crunchstat.go
sdk/cwl/arvados_cwl/__init__.py
sdk/cwl/arvados_cwl/arvcontainer.py
sdk/cwl/arvados_cwl/arvdocker.py
sdk/cwl/arvados_cwl/context.py
sdk/cwl/arvados_cwl/runner.py
sdk/cwl/setup.py
sdk/cwl/tests/test_container.py
sdk/cwl/tests/test_submit.py
sdk/go/arvadostest/oidc_provider.go
services/api/app/models/api_client_authorization.rb
services/api/app/models/user.rb
services/api/test/functional/arvados/v1/api_client_authorizations_controller_test.rb
services/api/test/unit/user_test.rb
tools/arvbox/bin/arvbox
tools/arvbox/lib/arvbox/docker/common.sh
tools/arvbox/lib/arvbox/docker/service/ready/run-service
tools/compute-images/arvados-images-aws.json
tools/compute-images/build.sh
tools/compute-images/scripts/base.sh
tools/compute-images/scripts/create-ebs-volume-nvme.patch [new file with mode: 0644]
tools/compute-images/scripts/usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh [new file with mode: 0644]

index eb0f173dfbcefae5c4c609026363ec1c93127b38..f2ddd7f58c11a246accc2fec6c01c72d86d9a633 100644 (file)
@@ -218,12 +218,13 @@ navbar:
     - Manual installation:
       - install/install-manual-prerequisites.html.textile.liquid
       - install/packages.html.textile.liquid
-      - admin/upgrading.html.textile.liquid
     - Configuration:
       - install/config.html.textile.liquid
-      - admin/config-migration.html.textile.liquid
-      - admin/config.html.textile.liquid
       - admin/config-urls.html.textile.liquid
+      - admin/config.html.textile.liquid
+    - Maintenance and upgrading:
+      - admin/upgrading.html.textile.liquid
+      - admin/maintenance-and-upgrading.html.textile.liquid
     - Core:
       - install/install-api-server.html.textile.liquid
     - Keep:
diff --git a/doc/admin/config-migration.html.textile.liquid b/doc/admin/config-migration.html.textile.liquid
deleted file mode 100644 (file)
index e36655a..0000000
+++ /dev/null
@@ -1,106 +0,0 @@
----
-layout: default
-navsection: installguide
-title: Migrating Configuration from v1.4 to v2.0
-...
-
-{% comment %}
-Copyright (C) The Arvados Authors. All rights reserved.
-
-SPDX-License-Identifier: CC-BY-SA-3.0
-{% endcomment %}
-
-{% include 'notebox_begin_warning' %}
-_New installations of Arvados 2.0+ can skip this section_
-{% include 'notebox_end' %}
-
-Arvados 2.0 migrates to a centralized configuration file for all components.  The centralized Arvados configuration is @/etc/arvados/config.yml@.  Components that support the new centralized configuration are listed below.  During the migration period, legacy configuration files are still loaded and take precedence over the centralized configuration file.
-
-h2. API server
-
-The legacy API server configuration is stored in @config/application.yml@ and @config/database.yml@.  After migration to @/etc/arvados/config.yml@, both of these files should be moved out of the way and/or deleted.
-
-Change to the API server directory and use the following commands:
-
-<pre>
-$ RAILS_ENV=production bin/rake config:migrate > config.yml
-$ cp config.yml /etc/arvados/config.yml
-</pre>
-
-This will print the contents of @config.yml@ after merging the legacy @application.yml@ and @database.yml@ into the existing systemwide @config.yml@.  It may be redirected to a file and copied to @/etc/arvados/config.yml@ (it is safe to copy over, all configuration items from the existing @/etc/arvados/config.yml@ will be included in the migrated output).
-
-If you wish to update @config.yml@ configuration by hand, or check that everything has been migrated, use @config:diff@ to print configuration items that differ between @application.yml@ and the system @config.yml@.
-
-<pre>
-$ RAILS_ENV=production bin/rake config:diff
-</pre>
-
-This command will also report if no migrations are required.
-
-h2. Workbench
-
-The legacy workbench configuration is stored in @config/application.yml@.  After migration to @/etc/arvados/config.yml@, this file should be moved out of the way and/or deleted.
-
-Change to the workbench server directory and use the following commands:
-
-<pre>
-$ RAILS_ENV=production bin/rake config:migrate > config.yml
-$ cp config.yml /etc/arvados/config.yml
-</pre>
-
-This will print the contents of @config.yml@ after merging the legacy @application.yml@ into the existing systemwide @config.yml@.  It may be redirected to a file and copied to @/etc/arvados/config.yml@ (it is safe to copy over, all configuration items from the existing @/etc/arvados/config.yml@ will be included in the migrated output).
-
-If you wish to update @config.yml@ configuration by hand, or check that everything has been migrated, use @config:diff@ to print configuration items that differ between @application.yml@ and the system @config.yml@.
-
-<pre>
-$ RAILS_ENV=production bin/rake config:diff
-</pre>
-
-This command will also report if no migrations are required.
-
-h2. keepstore, keep-web, crunch-dispatch-slurm, arvados-ws, keepproxy, arv-git-httpd, keep-balance
-
-The legacy config for each component (loaded from @/etc/arvados/component/component.yml@ or a different location specified via the -legacy-component-config command line argument) takes precedence over the centralized config. After you migrate everything from the legacy config to the centralized config, you should delete @/etc/arvados/component/component.yml@ and/or stop using the corresponding -legacy-component-config argument.
-
-To migrate a component configuration, do this on each node that runs an Arvados service:
-
-# Ensure that the latest @config.yml@ is installed on the current node
-# Install @arvados-server@ using @apt-get@ or @yum@.
-# Run @arvados-server config-check@, review and apply the recommended changes to @/etc/arvados/config.yml@
-# After applying changes, re-run @arvados-server config-check@ again to check for additional warnings and recommendations.
-# When you are satisfied, delete the legacy config file, restart the service, and check its startup logs.
-# Copy the updated @config.yml@ file to your next node, and repeat the process there.
-# When you have a @config.yml@ file that includes all volumes on all keepstores, it is important to add a 'Rendezvous' parameter to the InternalURLs entries to make sure the old volume identifiers line up with the new config. If you don't do this, @keep-balance@ will want to shuffle all the existing data around to match the new volume order. The 'Rendezvous' value should be the last 15 characters of the keepstore's UUID in the old configuration. Here's an example:
-
-<notextile>
-<pre><code>Clusters:
-  xxxxx:
-    Services:
-      Keepstore:
-        InternalURLs:
-          "http://keep1.xxxxx.arvadosapi.com:25107": {Rendezvous: "eim6eefaibesh3i"}
-          "http://keep2.xxxxx.arvadosapi.com:25107": {Rendezvous: "yequoodalai7ahg"}
-          "http://keep3.xxxxx.arvadosapi.com:25107": {Rendezvous: "eipheho6re1shou"}
-          "http://keep4.xxxxx.arvadosapi.com:25107": {Rendezvous: "ahk7chahthae3oo"}
-</code></pre>
-</notextile>
-
-In this example, the keepstore with the name `keep1` had the uuid `xxxxx-bi6l4-eim6eefaibesh3i` in the old configuration.
-
-After migrating and removing all legacy config files, make sure the @/etc/arvados/config.yml@ file is identical across all system nodes -- API server, keepstore, etc. -- and restart all services to make sure they are using the latest configuration.
-
-h2. Cloud installations only: node manager
-
-Node manager is deprecated and replaced by @arvados-dispatch-cloud@.  No automated config migration is available.  Follow the instructions to "install the cloud dispatcher":../install/crunch2-cloud/install-dispatch-cloud.html
-
-*Only one dispatch process should be running at a time.* If you are migrating a system that currently runs Node manager and @crunch-dispatch-slurm@, it is safest to remove the @crunch-dispatch-slurm@ service entirely before installing @arvados-dispatch-cloud@.
-
-<notextile>
-<pre><code>~$ <span class="userinput">sudo systemctl --now disable crunch-dispatch-slurm</span>
-~$ <span class="userinput">sudo apt-get remove crunch-dispatch-slurm</span>
-</code></pre>
-</notextile>
-
-h2. arvados-controller, arvados-dispatch-cloud
-
-Already uses centralized config exclusively.  No migration needed.
index 745cd2853265a096ad99b55bd6c53124feb22871..a043d6d1c09be1326f7a65b5e8f131db40b08ae9 100644 (file)
@@ -12,8 +12,6 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 
 The Arvados configuration is stored at @/etc/arvados/config.yml@
 
-See "Migrating Configuration":config-migration.html for information about migrating from legacy component-specific configuration files.
-
 {% codeblock as yaml %}
 {% include 'config_default_yml' %}
 {% endcodeblock %}
diff --git a/doc/admin/maintenance-and-upgrading.html.textile.liquid b/doc/admin/maintenance-and-upgrading.html.textile.liquid
new file mode 100644 (file)
index 0000000..0d517fe
--- /dev/null
@@ -0,0 +1,64 @@
+---
+layout: default
+navsection: installguide
+title: Maintenance and upgrading
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+# "Commercial support":#commercial_support
+# "Maintaining Arvados":#maintaining
+## "Modification of the config.yml file":#configuration
+## "Distributing the configuration file":#distribution
+## "Restart the services affected by the change":#restart
+# "Upgrading Arvados":#upgrading
+
+h2(#commercial_support). Commercial support
+
+Arvados is "100% open source software":{{site.baseurl}}/user/copying/copying.html. Anyone can download, install, maintain and upgrade it. However, if this is not something you want to spend your time and energy doing, "Curii Corporation":https://curii.com provides managed Arvados installations as well as commercial support for Arvados. Please contact "info@curii.com":mailto:info@curii.com for more information.
+
+If you'd prefer to do things yourself, a few starting points for maintaining and upgrading Arvados can be found below.
+
+h2(#maintaining). Maintaining Arvados
+
+After Arvados is installed, periodic configuration changes may be required to adapt the software to your needs. Arvados uses a unified configuration file, which is normally found at @/etc/arvados/config.yml@.
+
+Making a configuration change to Arvados typically involves three steps:
+
+* modification of the @config.yml@ file
+* distribution of the modified file to the machines in the cluster
+* restarting of the services affected by the change
+
+h3(#configchange). Modification of the @config.yml@ file
+
+Consult the "configuration reference":{{site.baseurl}}/admin/config.html or another part of the documentation to identify the change to be made.
+
+Preserve a copy of your existing configuration file as a backup, and make the desired modification.
+
+Run @arvados-server config-check@ to make sure the configuration file has no errors and no warnings.
+
+h3(#distribution). Distribute the configuration file
+
+We recommend to keep the @config.yml@ file in sync between all the Arvados system nodes, to avoid issues with services running on different versions of the configuration.
+
+Distribution of the configuration file can be done in many ways, e.g. scp, configuration management software, etc.
+
+h3(#restart). Restart the services affected by the change
+
+If you know which Arvados service uses the specific configuration that was modified, restart those services. When in doubt, restart all Arvados system services.
+
+h2(#upgrading). Upgrading Arvados
+
+Upgrading Arvados typically involves the following steps:
+
+# consult the "upgrade notes":{{site.baseurl}}/admin/upgrading.html and the "release notes":https://arvados.org/releases/ for the release you want to upgrade to
+# Wait for the cluster to be idle and stop Arvados services.
+# Make a backup of your database, as a precaution.
+# update the configuration file for the new release, if necessary (see "Maintaining Arvados":#maintaining above)
+# rebuild and deploy the "compute node image":{{site.baseurl}}/install/crunch2-cloud/install-compute-node.html (cloud only)
+# Install new packages using @apt-get upgrade@ or @yum upgrade@.
+# Wait for package installation scripts as they perform any necessary data migrations.
+# Verify that the Arvados services were restarted as part of the package upgrades.
index 05b932a116bea3578dbf3e21563f6745eef80208..943bc3e0ee7f9a420c1dfac438f755373ac52aae 100644 (file)
@@ -1,7 +1,7 @@
 ---
 layout: default
 navsection: installguide
-title: "Upgrading Arvados and Release notes"
+title: "Arvados upgrade notes"
 ...
 
 {% comment %}
@@ -10,20 +10,13 @@ Copyright (C) The Arvados Authors. All rights reserved.
 SPDX-License-Identifier: CC-BY-SA-3.0
 {% endcomment %}
 
-For Arvados administrators, this page will cover what you need to know and do in order to ensure a smooth upgrade of your Arvados installation.  For general release notes covering features added and bugs fixed, see "Arvados releases":https://arvados.org/releases .
+For Arvados administrators, this page will cover what you need to know and do in order to ensure a smooth upgrade of your Arvados installation.  For general release notes covering features added and bugs fixed, see "Arvados releases":https://arvados.org/releases.
 
-h2. General process
-
-# Consult upgrade notes below to see if any manual configuration updates are necessary.
-# Wait for the cluster to be idle and stop Arvados services.
-# Make a backup of your database, as a precaution.
-# Install new packages using @apt-get upgrade@ or @yum upgrade@.
-# Wait for package installation scripts as they perform any necessary data migrations.
-# Restart Arvados services.
+Upgrade instructions can be found at "Maintenance and upgrading":{{site.baseurl}}/admin/maintenance-and-upgrading.html#upgrading.
 
 h2. Upgrade notes
 
-Some versions introduce changes that require special attention when upgrading: e.g., there is a new service to install, or there is a change to the default configuration that you might need to override in order to preserve the old behavior.
+Some versions introduce changes that require special attention when upgrading: e.g., there is a new service to install, or there is a change to the default configuration that you might need to override in order to preserve the old behavior. These notes are listed below, organized by release version. Scroll down to the version number you are upgrading to.
 
 {% comment %}
 Note to developers: Add new items at the top. Include the date, issue number, commit, and considerations/instructions for those about to upgrade.
@@ -37,11 +30,11 @@ TODO: extract this information based on git commit messages and generate changel
 
 h2(#main). development main (as of 2022-02-10)
 
-"previous: Upgrading from 2.3.0":#v2_3_0
+"previous: Upgrading to 2.3.0":#v2_3_0
 
 h3. Anonymous token changes
 
-The anonymous token configured in @Users.AnonymousUserToken@ must now be 32 characters or longer. This was already the suggestion in the documentation, now it is enforced. The @script/get_anonymous_user_token.rb@ script that was needed to register the anonymous user token in the database has been removed. Registration of the anonymous token is no longer necessary.
+The anonymous token configured in @Users.AnonymousUserToken@ must now be 32 characters or longer. This was already the suggestion in the documentation, now it is enforced. The @script/get_anonymous_user_token.rb@ script that was needed to register the anonymous user token in the database has been removed. Registration of the anonymous token is no longer necessary. If the anonymous token in @config.yml@ is specified as a full V2 token, that will now generate a warning - it should be updated to list just the secret (i.e. the part after the last forward slash).
 
 h3. Preemptible instance types are used automatically, if any are configured
 
@@ -238,7 +231,7 @@ Arvados 2.0 is a major upgrade, with many changes.  Please read these upgrade no
 
 h3. Migrating to centralized config.yml
 
-See "Migrating Configuration":config-migration.html for notes on migrating legacy per-component configuration files to the new centralized @/etc/arvados/config.yml@.
+See "Migrating Configuration":https://doc.arvados.org/v2.1/admin/config-migration.html for notes on migrating legacy per-component configuration files to the new centralized @/etc/arvados/config.yml@.
 
 To ensure a smooth transition, the per-component config files continue to be read, and take precedence over the centralized configuration.  Your cluster should continue to function after upgrade but before doing the full configuration migration.  However, several services (keepstore, keep-web, keepproxy) require a minimal `/etc/arvados/config.yml` to start:
 
@@ -258,11 +251,11 @@ You can no longer specify types of keep services to balance via the @KeepService
 
 You can no longer specify individual keep services to balance via the @config.KeepServiceList@ command line option or @KeepServiceList@ legacy config option. Instead, keep-balance will operate on all keepstore servers with @service_type:disk@ as reported by the @arv keep_service list@ command. If you are still using the legacy config, @KeepServiceList@ should be removed or keep-balance will produce an error.
 
-Please see the "config migration guide":{{site.baseurl}}/admin/config-migration.html and "keep-balance install guide":{{site.baseurl}}/install/install-keep-balance.html for more details.
+Please see the "config migration guide":https://doc.arvados.org/v2.1/admin/config-migration.html and "keep-balance install guide":{{site.baseurl}}/install/install-keep-balance.html for more details.
 
 h3. Arv-git-httpd configuration migration
 
-(feature "#14712":https://dev.arvados.org/issues/14712 ) The arv-git-httpd package can now be configured using the centralized configuration file at @/etc/arvados/config.yml@. Configuration via individual command line arguments is no longer available. Please see "arv-git-httpd's config migration guide":{{site.baseurl}}/admin/config-migration.html#arv-git-httpd for more details.
+(feature "#14712":https://dev.arvados.org/issues/14712 ) The arv-git-httpd package can now be configured using the centralized configuration file at @/etc/arvados/config.yml@. Configuration via individual command line arguments is no longer available. Please see "arv-git-httpd's config migration guide":https://doc.arvados.org/v2.1/admin/config-migration.html#arv-git-httpd for more details.
 
 h3. Keepstore and keep-web configuration migration
 
@@ -270,11 +263,11 @@ keepstore and keep-web no longer support configuration via (previously deprecate
 
 keep-web now supports the legacy @keep-web.yml@ config format (used by Arvados 1.4) and the new cluster config file format. Please check "keep-web's install guide":{{site.baseurl}}/install/install-keep-web.html for more details.
 
-keepstore now supports the legacy @keepstore.yml@ config format (used by Arvados 1.4) and the new cluster config file format. Please check the "keepstore config migration notes":{{site.baseurl}}/admin/config-migration.html#keepstore and "keepstore install guide":{{site.baseurl}}/install/install-keepstore.html for more details.
+keepstore now supports the legacy @keepstore.yml@ config format (used by Arvados 1.4) and the new cluster config file format. Please check the "keepstore config migration notes":https://doc.arvados.org/v2.1/admin/config-migration.html#keepstore and "keepstore install guide":{{site.baseurl}}/install/install-keepstore.html for more details.
 
 h3. Keepproxy configuration migration
 
-(feature "#14715":https://dev.arvados.org/issues/14715 ) Keepproxy can now be configured using the centralized config at @/etc/arvados/config.yml@. Configuration via individual command line arguments is no longer available and the @DisableGet@, @DisablePut@, and @PIDFile@ configuration options are no longer supported. If you are still using the legacy config and @DisableGet@ or @DisablePut@ are set to true or @PIDFile@ has a value, keepproxy will produce an error and fail to start. Please see "keepproxy's config migration guide":{{site.baseurl}}/admin/config-migration.html#keepproxy for more details.
+(feature "#14715":https://dev.arvados.org/issues/14715 ) Keepproxy can now be configured using the centralized config at @/etc/arvados/config.yml@. Configuration via individual command line arguments is no longer available and the @DisableGet@, @DisablePut@, and @PIDFile@ configuration options are no longer supported. If you are still using the legacy config and @DisableGet@ or @DisablePut@ are set to true or @PIDFile@ has a value, keepproxy will produce an error and fail to start. Please see "keepproxy's config migration guide":https://doc.arvados.org/v2.1/admin/config-migration.html#keepproxy for more details.
 
 h3. Delete "keep_services" records
 
@@ -471,7 +464,7 @@ As part of story "#9945":https://dev.arvados.org/issues/9945, it was discovered
 
 h3. New configuration
 
-Arvados is migrating to a centralized configuration file for all components.  During the migration, legacy configuration files will continue to be loaded.  See "Migrating Configuration":config-migration.html for details.
+Arvados is migrating to a centralized configuration file for all components.  During the migration, legacy configuration files will continue to be loaded.  See "Migrating Configuration":https://doc.arvados.org/v2.1/admin/config-migration.html for details.
 
 h2(#v1_3_3). v1.3.3 (2019-05-14)
 
index 89771514e9fdf380c141ae3e474b78ecdb3ef832..e75be0881e8b03df617f4644f44693067fceb626 100644 (file)
@@ -17,8 +17,11 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 # "Create an SSH keypair":#sshkeypair
 # "Compute image requirements":#requirements
 # "The build script":#building
+# "DNS resolution":#dns-resolution
+# "NVIDIA GPU support":#nvidia
 # "Singularity mksquashfs configuration":#singularity_mksquashfs_configuration
 # "Build an AWS image":#aws
+## "Autoscaling compute node scratch space":#aws-ebs-autoscaler
 # "Build an Azure image":#azure
 
 h2(#introduction). Introduction
@@ -56,12 +59,6 @@ foktmqOY8MyctzFgXBpGTxPliGjqo8OkrOyQP2g+FL7v+Km31Xs61P8=
 </code></pre>
 </notextile>
 
-{% assign show_docker_warning = true %}
-
-{% include 'singularity_mksquashfs_configuration' %}
-
-The desired amount of memory to make available for @mksquashfs@ can be configured in an argument to "the build script":#building. It defaults to @256M@.
-
 h2(#requirements). Compute image requirements
 
 Arvados comes with a build script to automate the creation of a suitable compute node image (see "The build script":#building below). It is provided as a convenience. It is also possible to create a compute node image via other means. These are the requirements:
@@ -101,6 +98,8 @@ Options:
       VPC id for AWS, otherwise packer will pick the default one
   --aws-subnet-id
       Subnet id for AWS otherwise packer will pick the default one for the VPC
+  --aws-ebs-autoscale (default: false)
+      Install the AWS EBS autoscaler daemon.
   --gcp-project-id (default: false, required if building for GCP)
       GCP project id
   --gcp-account-file (default: false, required if building for GCP)
@@ -131,10 +130,29 @@ Options:
       Output debug information
 </code></pre></notextile>
 
-h2(#building). NVIDIA GPU support
+h2(#dns-resolution). DNS resolution
+
+Compute nodes must be able to resolve the hostnames of the API server and any keepstore servers to your internal IP addresses. You can do this by running an internal DNS resolver. The IP address of the resolver should be passed as the value for the @--resolver@ argument to "the build script":#building.
+
+Alternatively, the services could be hardcoded into an @/etc/hosts@ file. For example:
+
+<notextile><pre><code>10.20.30.40     <span class="userinput">ClusterID.example.com</span>
+10.20.30.41     <span class="userinput">keep1.ClusterID.example.com</span>
+10.20.30.42     <span class="userinput">keep2.ClusterID.example.com</span>
+</code></pre></notextile>
+
+Adding these lines to the @/etc/hosts@ file in the compute node image could be done with a small change to the Packer template and the @scripts/base.sh@ script, which will be left as an exercise for the reader.
+
+h2(#nvidia). NVIDIA GPU support
 
 If you plan on using instance types with NVIDIA GPUs, add @--nvidia-gpu-support@ to the build command line.  Arvados uses the same compute image for both GPU and non-GPU instance types.  The GPU tooling is ignored when using the image with a non-GPU instance type.
 
+{% assign show_docker_warning = true %}
+
+{% include 'singularity_mksquashfs_configuration' %}
+
+The desired amount of memory to make available for @mksquashfs@ can be configured in an argument to "the build script":#building. It defaults to @256M@.
+
 h2(#aws). Build an AWS image
 
 <notextile><pre><code>~$ <span class="userinput">./build.sh --json-file arvados-images-aws.json \
@@ -155,16 +173,84 @@ For @ClusterID@, fill in your cluster ID. The @VPC@ and @Subnet@ should be confi
 
 @ArvadosDispatchCloudPublicKeyPath@ should be replaced with the path to the ssh *public* key file generated in "Create an SSH keypair":#sshkeypair, above.
 
-Compute nodes must be able to resolve the hostnames of the API server and any keepstore servers to your internal IP addresses. You can do this by running an internal DNS resolver. The IP address of the resolver should replace the string @ResolverIP@ in the command above.
+h3(#aws-ebs-autoscaler). Autoscaling compute node scratch space
+
+If you want to add the "AWS EBS autoscaler":https://github.com/awslabs/amazon-ebs-autoscale daemon in your images, add the @--aws-ebs-autoscale@ flag to the "the build script":#building. Doing so will make the compute image scratch space scale automatically as needed.
+
+The AWS EBS autoscaler daemon will be installed with this configuration:
+
+<notextile><pre><code>{
+    "mountpoint": "/tmp",
+    "filesystem": "lvm.ext4",
+    "lvm": {
+      "volume_group": "autoscale_vg",
+      "logical_volume": "autoscale_lv"
+    },
+    "volume": {
+        "type": "gp3",
+        "iops": 3000,
+        "encrypted": 1
+    },
+    "detection_interval": 2,
+    "limits": {
+        "max_ebs_volume_size": 1500,
+        "max_logical_volume_size": 8000,
+        "max_ebs_volume_count": 16
+    },
+    "logging": {
+        "log_file": "/var/log/ebs-autoscale.log",
+        "log_interval": 300
+    }
+}
+</code></pre></notextile>
 
-Alternatively, the services could be hardcoded into an @/etc/hosts@ file. For example:
+Changing the configuration is left as an exercise for the reader.
+
+Using this feature also requires a few Arvados configuration changes in @config.yml@:
+
+* The @Containers/InstanceTypes@ list should be modified so that all @AddedScratch@ lines are removed, and the @IncludedScratch@ value should be set to a (fictional) high number. This way, the scratch space requirements will be met by all the defined instance type. For example:
+
+<notextile><pre><code>    InstanceTypes:
+      c5large:
+        ProviderType: c5.large
+        VCPUs: 2
+        RAM: 4GiB
+        IncludedScratch: 16TB
+        Price: 0.085
+      m5large:
+        ProviderType: m5.large
+        VCPUs: 2
+        RAM: 8GiB
+        IncludedScratch: 16TB
+        Price: 0.096
+...
+</code></pre></notextile>
 
-<notextile><pre><code>10.20.30.40     <span class="userinput">ClusterID.example.com</span>
-10.20.30.41     <span class="userinput">keep1.ClusterID.example.com</span>
-10.20.30.42     <span class="userinput">keep2.ClusterID.example.com</span>
+* You will also need to create an IAM role in AWS with these permissions:
+
+<notextile><pre><code>{
+    "Version": "2012-10-17",
+    "Statement": [
+        {
+            "Effect": "Allow",
+            "Action": [
+                "ec2:AttachVolume",
+                "ec2:DescribeVolumeStatus",
+                "ec2:DescribeVolumes",
+                "ec2:DescribeTags",
+                "ec2:ModifyInstanceAttribute",
+                "ec2:DescribeVolumeAttribute",
+                "ec2:CreateVolume",
+                "ec2:DeleteVolume",
+                "ec2:CreateTags"
+            ],
+            "Resource": "*"
+        }
+    ]
+}
 </code></pre></notextile>
 
-Adding these lines to the @/etc/hosts@ file in the compute node image could be done with a small change to the Packer template and the @scripts/base.sh@ script, which will be left as an exercise for the reader.
+Then, in @config.yml@ set @Containers/CloudVMs/DriverParameters/IAMInstanceProfile@ to the name of the IAM role. This will make @arvados-dispatch-cloud@ pass an IAMInstanceProfile to the compute nodes as they start up, giving them sufficient permissions to attach and grow EBS volumes.
 
 h2(#azure). Build an Azure image
 
@@ -195,14 +281,3 @@ These secrets can be generated from the Azure portal, or with the cli using a co
 </code></pre></notextile>
 
 @ArvadosDispatchCloudPublicKeyPath@ should be replaced with the path to the ssh *public* key file generated in "Create an SSH keypair":#sshkeypair, above.
-
-Compute nodes must be able to resolve the hostnames of the API server and any keepstore servers to your internal IP addresses. You can do this by running an internal DNS resolver. The IP address of the resolver should replace the string @ResolverIP@ in the command above.
-
-Alternatively, the services could be hardcoded into an @/etc/hosts@ file. For example:
-
-<notextile><pre><code>10.20.30.40     <span class="userinput">ClusterID.example.com</span>
-10.20.30.41     <span class="userinput">keep1.ClusterID.example.com</span>
-10.20.30.42     <span class="userinput">keep2.ClusterID.example.com</span>
-</code></pre></notextile>
-
-Adding these lines to the @/etc/hosts@ file in the compute node image could be done with a small change to the Packer template and the @scripts/base.sh@ script, which will be left as an exercise for the reader.
index 06a918dd37bcfab0df9a050dccbc1e4e0de68170..ee71d7a3f61b6b96577b34ff9d060509f0ee0b93 100644 (file)
@@ -120,6 +120,32 @@ The <span class="userinput">ImageID</span> value is the compute node image that
 </code></pre>
 </notextile>
 
+Example policy for the IAM role used by the cloud dispatcher:
+
+<notextile>
+<pre>
+{
+    "Version": "2012-10-17",
+    "Id": "arvados-dispatch-cloud policy",
+    "Statement": [
+        {
+            "Effect": "Allow",
+            "Action": [
+                "iam:PassRole",
+                "ec2:DescribeKeyPairs",
+                "ec2:ImportKeyPair",
+                "ec2:RunInstances",
+                "ec2:DescribeInstances",
+                "ec2:CreateTags",
+                "ec2:TerminateInstances"
+            ],
+            "Resource": "*"
+        }
+    ]
+}
+</pre>
+</notextile>
+
 h4. Minimal configuration example for Azure
 
 Using managed disks:
index c3d6a92b5d9f019e28c26b622acd52c7ac2a9b89..10f2e32ef134569591940ffa8f44bdbdae593d70 100644 (file)
@@ -20,8 +20,7 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 # "Run the provision.sh script":#run_provision_script
 # "Initial user and login":#initial_user
 # "Test the installed cluster running a simple workflow":#test_install
-
-
+# "After the installation":#post_install
 
 h2(#introduction). Introduction
 
@@ -290,3 +289,9 @@ INFO Final output collection d6c69a88147dde9d52a418d50ef788df+123
 INFO Final process status is success
 </code></pre>
 </notextile>
+
+h2(#post_install). After the installation
+
+Once the installation is complete, it is recommended to keep a copy of your local configuration files. Committing them to version control is a good idea.
+
+Re-running the Salt-based installer is not recommended for maintaining and upgrading Arvados, please see "Maintenance and upgrading":{{site.baseurl}}/admin/maintenance-and-upgrading.html for more information.
index ce70a30d467bc7f38bb75e1f4803e04df699cb86..0f06412f9103c12fb3c46d8fa9f2c8a826e7eb89 100644 (file)
@@ -20,6 +20,7 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 ## "DNS configuration (single host / multiple hostnames)":#single_host_multiple_hostnames_dns_configuration
 # "Initial user and login":#initial_user
 # "Test the installed cluster running a simple workflow":#test_install
+# "After the installation":#post_install
 
 h2(#single_host). Single host install using the provision.sh script
 
@@ -266,3 +267,9 @@ INFO Final output collection d6c69a88147dde9d52a418d50ef788df+123
 INFO Final process status is success
 </code></pre>
 </notextile>
+
+h2(#post_install). After the installation
+
+Once the installation is complete, it is recommended to keep a copy of your local configuration files. Committing them to version control is a good idea.
+
+Re-running the Salt-based installer is not recommended for maintaining and upgrading Arvados, please see "Maintenance and upgrading":{{site.baseurl}}/admin/maintenance-and-upgrading.html for more information.
index a1c102593a4f1d790f1d1921627e6f5ec8bc07a3..d331dad871570a7f03a6604a4bfcbe7598f1bacc 100644 (file)
@@ -51,6 +51,7 @@ table(table table-bordered table-condensed).
 |==--submit-runner-ram== SUBMIT_RUNNER_RAM|RAM (in MiB) required for the workflow runner (default 1024)|
 |==--submit-runner-image== SUBMIT_RUNNER_IMAGE|Docker image for workflow runner|
 |==--always-submit-runner==|When invoked with --submit --wait, always submit a runner to manage the workflow, even when only running a single CommandLineTool|
+|==--match-submitter-images==|Where Arvados has more than one Docker image of the same name, use image from the Docker instance on the submitting node.|
 |==--submit-request-uuid== UUID|Update and commit to supplied container request instead of creating a new one.|
 |==--submit-runner-cluster== CLUSTER_ID|Submit workflow runner to a remote cluster|
 |==--name NAME==|Name to use for workflow execution instance.|
index b108de551a077d4c24626386eb3a86b745457a09..07663849ad6ff633b18c0f39463623df7b0689cc 100644 (file)
@@ -113,6 +113,14 @@ h3. Work reuse
 
 Workflows submitted with @arvados-cwl-runner@ will take advantage of Arvados job reuse.  If you submit a workflow which is identical to one that has run before, it will short cut the execution and return the result of the previous run.  This also applies to individual workflow steps.  For example, a two step workflow where the first step has run before will reuse results for first step and only execute the new second step.  You can disable this behavior with @--disable-reuse@.
 
+h3(#docker). Docker images
+
+Docker images referenced by the workflow must be uploaded to Arvados.  This requires @docker@ to be installed and usable by the user running @arvados-cwl-runner@.  If the image is not present in the local Docker instance, @arvados-cwl-runner@ will first attempt to pull the image using @docker pull@, then upload it.
+
+If there is already a Docker image in Arvados with the same name, it will use the existing image.  In this case, the submitter will not use Docker.
+
+The @--match-submitter-images@ option will check the id of the image in the local Docker instance and compare it to the id of the image already in Arvados with the same name and tag.  If they are different, it will choose the image matching the local image id, which will be uploaded it if necessary.  This helpful for development, if you locally rebuild the image with the 'latest' tag, the @--match-submitter-images@ will ensure that the newer version is used.
+
 h3. Command line options
 
 See "arvados-cwl-runner options":{{site.baseurl}}/user/cwl/cwl-run-options.html
index 269a7d8def59a1e38603633691d657aef29d8e81..52b73f781c6bc63c2e4e2d3242ddc33c642157dc 100644 (file)
@@ -40,13 +40,14 @@ const (
 )
 
 type ec2InstanceSetConfig struct {
-       AccessKeyID      string
-       SecretAccessKey  string
-       Region           string
-       SecurityGroupIDs arvados.StringSet
-       SubnetID         string
-       AdminUsername    string
-       EBSVolumeType    string
+       AccessKeyID        string
+       SecretAccessKey    string
+       Region             string
+       SecurityGroupIDs   arvados.StringSet
+       SubnetID           string
+       AdminUsername      string
+       EBSVolumeType      string
+       IAMInstanceProfile string
 }
 
 type ec2Interface interface {
@@ -230,6 +231,12 @@ func (instanceSet *ec2InstanceSet) Create(
                        }}
        }
 
+       if instanceSet.ec2config.IAMInstanceProfile != "" {
+               rii.IamInstanceProfile = &ec2.IamInstanceProfileSpecification{
+                       Name: aws.String(instanceSet.ec2config.IAMInstanceProfile),
+               }
+       }
+
        rsv, err := instanceSet.client.RunInstances(&rii)
        err = wrapError(err, &instanceSet.throttleDelayCreate)
        if err != nil {
index a7ce9828573fb4b5832f49837de19413af4aa6d4..9800be70473fc8bf20f56ae5013bac099c80d9a5 100644 (file)
@@ -1269,6 +1269,9 @@ Clusters:
           Region: ""
           EBSVolumeType: gp2
           AdminUsername: debian
+          # (ec2) name of the IAMInstanceProfile for instances started by
+          # the cloud dispatcher. Leave blank when not needed.
+          IAMInstanceProfile: ""
 
           # (azure) Credentials.
           SubscriptionID: ""
index aa7520ca29e7d6fe2d4040e50ae11f3416b98c77..8d498af170f2180881fac496c82900b1bd764d7f 100644 (file)
@@ -299,10 +299,10 @@ func (ldr *Loader) Load() (*arvados.Config, error) {
                for _, err = range []error{
                        ldr.checkClusterID(fmt.Sprintf("Clusters.%s", id), id, false),
                        ldr.checkClusterID(fmt.Sprintf("Clusters.%s.Login.LoginCluster", id), cc.Login.LoginCluster, true),
-                       ldr.checkToken(fmt.Sprintf("Clusters.%s.ManagementToken", id), cc.ManagementToken, true),
-                       ldr.checkToken(fmt.Sprintf("Clusters.%s.SystemRootToken", id), cc.SystemRootToken, true),
-                       ldr.checkToken(fmt.Sprintf("Clusters.%s.Users.AnonymousUserToken", id), cc.Users.AnonymousUserToken, false),
-                       ldr.checkToken(fmt.Sprintf("Clusters.%s.Collections.BlobSigningKey", id), cc.Collections.BlobSigningKey, true),
+                       ldr.checkToken(fmt.Sprintf("Clusters.%s.ManagementToken", id), cc.ManagementToken, true, false),
+                       ldr.checkToken(fmt.Sprintf("Clusters.%s.SystemRootToken", id), cc.SystemRootToken, true, false),
+                       ldr.checkToken(fmt.Sprintf("Clusters.%s.Users.AnonymousUserToken", id), cc.Users.AnonymousUserToken, false, true),
+                       ldr.checkToken(fmt.Sprintf("Clusters.%s.Collections.BlobSigningKey", id), cc.Collections.BlobSigningKey, true, false),
                        checkKeyConflict(fmt.Sprintf("Clusters.%s.PostgreSQL.Connection", id), cc.PostgreSQL.Connection),
                        ldr.checkEnum("Containers.LocalKeepLogsToContainerLog", cc.Containers.LocalKeepLogsToContainerLog, "none", "all", "errors"),
                        ldr.checkEmptyKeepstores(cc),
@@ -316,6 +316,11 @@ func (ldr *Loader) Load() (*arvados.Config, error) {
                                return nil, err
                        }
                }
+               if strings.Count(cc.Users.AnonymousUserToken, "/") == 3 {
+                       // V2 token, strip it to just a secret
+                       tmp := strings.Split(cc.Users.AnonymousUserToken, "/")
+                       cc.Users.AnonymousUserToken = tmp[2]
+               }
        }
        return &cfg, nil
 }
@@ -334,7 +339,7 @@ func (ldr *Loader) checkClusterID(label, clusterID string, emptyStringOk bool) e
 var acceptableTokenRe = regexp.MustCompile(`^[a-zA-Z0-9]+$`)
 var acceptableTokenLength = 32
 
-func (ldr *Loader) checkToken(label, token string, mandatory bool) error {
+func (ldr *Loader) checkToken(label, token string, mandatory bool, acceptV2 bool) error {
        if len(token) == 0 {
                if !mandatory {
                        // when a token is not mandatory, the acceptable length and content is only checked if its length is non-zero
@@ -345,7 +350,21 @@ func (ldr *Loader) checkToken(label, token string, mandatory bool) error {
                        }
                }
        } else if !acceptableTokenRe.MatchString(token) {
-               return fmt.Errorf("%s: unacceptable characters in token (only a-z, A-Z, 0-9 are acceptable)", label)
+               if !acceptV2 {
+                       return fmt.Errorf("%s: unacceptable characters in token (only a-z, A-Z, 0-9 are acceptable)", label)
+               }
+               // Test for a proper V2 token
+               tmp := strings.SplitN(token, "/", 3)
+               if len(tmp) != 3 {
+                       return fmt.Errorf("%s: unacceptable characters in token (only a-z, A-Z, 0-9 are acceptable)", label)
+               }
+               if !strings.HasPrefix(token, "v2/") {
+                       return fmt.Errorf("%s: unacceptable characters in token (only a-z, A-Z, 0-9 are acceptable)", label)
+               }
+               ldr.Logger.Warnf("%s: token is a full V2 token, should just be a secret (remove everything up to and including the last forward slash)", label)
+               if !acceptableTokenRe.MatchString(tmp[2]) {
+                       return fmt.Errorf("%s: unacceptable characters in V2 token secret (only a-z, A-Z, 0-9 are acceptable)", label)
+               }
        } else if len(token) < acceptableTokenLength {
                if ldr.Logger != nil {
                        ldr.Logger.Warnf("%s: token is too short (should be at least %d characters)", label, acceptableTokenLength)
index e076f7e1289c2b7ad48c6b7fb7e8782fd85ff1ce..6d6f80f39c70ac5427578ddd6ed5eb3e78b6a136 100644 (file)
@@ -31,6 +31,7 @@ import (
        "github.com/coreos/go-oidc"
        lru "github.com/hashicorp/golang-lru"
        "github.com/jmoiron/sqlx"
+       "github.com/lib/pq"
        "github.com/sirupsen/logrus"
        "golang.org/x/oauth2"
        "google.golang.org/api/option"
@@ -43,6 +44,7 @@ var (
        tokenCacheNegativeTTL = time.Minute * 5
        tokenCacheTTL         = time.Minute * 10
        tokenCacheRaceWindow  = time.Minute
+       pqCodeUniqueViolation = pq.ErrorCode("23505")
 )
 
 type oidcLoginController struct {
@@ -479,7 +481,6 @@ func (ta *oidcTokenAuthorizer) registerToken(ctx context.Context, tok string) er
        // it's expiring.
        exp := time.Now().UTC().Add(tokenCacheTTL + tokenCacheRaceWindow)
 
-       var aca arvados.APIClientAuthorization
        if updating {
                _, err = tx.ExecContext(ctx, `update api_client_authorizations set expires_at=$1 where api_token=$2`, exp, hmac)
                if err != nil {
@@ -487,23 +488,44 @@ func (ta *oidcTokenAuthorizer) registerToken(ctx context.Context, tok string) er
                }
                ctxlog.FromContext(ctx).WithField("HMAC", hmac).Debug("(*oidcTokenAuthorizer)registerToken: updated api_client_authorizations row")
        } else {
-               aca, err = ta.ctrl.Parent.CreateAPIClientAuthorization(ctx, ta.ctrl.Cluster.SystemRootToken, *authinfo)
+               aca, err := ta.ctrl.Parent.CreateAPIClientAuthorization(ctx, ta.ctrl.Cluster.SystemRootToken, *authinfo)
                if err != nil {
                        return err
                }
-               _, err = tx.ExecContext(ctx, `update api_client_authorizations set api_token=$1, expires_at=$2 where uuid=$3`, hmac, exp, aca.UUID)
+               _, err = tx.ExecContext(ctx, `savepoint upd`)
                if err != nil {
+                       return err
+               }
+               _, err = tx.ExecContext(ctx, `update api_client_authorizations set api_token=$1, expires_at=$2 where uuid=$3`, hmac, exp, aca.UUID)
+               if e, ok := err.(*pq.Error); ok && e.Code == pqCodeUniqueViolation {
+                       // unique_violation, given that the above
+                       // query did not find a row with matching
+                       // api_token, means another thread/process
+                       // also received this same token and won the
+                       // race to insert it -- in which case this
+                       // thread doesn't need to update the database.
+                       // Discard the redundant row.
+                       _, err = tx.ExecContext(ctx, `rollback to savepoint upd`)
+                       if err != nil {
+                               return err
+                       }
+                       _, err = tx.ExecContext(ctx, `delete from api_client_authorizations where uuid=$1`, aca.UUID)
+                       if err != nil {
+                               return err
+                       }
+                       ctxlog.FromContext(ctx).WithField("HMAC", hmac).Debug("(*oidcTokenAuthorizer)registerToken: api_client_authorizations row inserted by another thread")
+               } else if err != nil {
+                       ctxlog.FromContext(ctx).Errorf("%#v", err)
                        return fmt.Errorf("error adding OIDC access token to database: %w", err)
+               } else {
+                       ctxlog.FromContext(ctx).WithFields(logrus.Fields{"UUID": aca.UUID, "HMAC": hmac}).Debug("(*oidcTokenAuthorizer)registerToken: inserted api_client_authorizations row")
                }
-               aca.APIToken = hmac
-               ctxlog.FromContext(ctx).WithFields(logrus.Fields{"UUID": aca.UUID, "HMAC": hmac}).Debug("(*oidcTokenAuthorizer)registerToken: inserted api_client_authorizations row")
        }
        err = tx.Commit()
        if err != nil {
                return err
        }
-       aca.ExpiresAt = exp
-       ta.cache.Add(tok, aca)
+       ta.cache.Add(tok, arvados.APIClientAuthorization{ExpiresAt: exp})
        return nil
 }
 
index 4778e45f5fe48f3a8edeb7e9afa295524d7af5e4..b9f0f56e058482eb74eb527b038136e56979feff 100644 (file)
@@ -17,6 +17,7 @@ import (
        "net/url"
        "sort"
        "strings"
+       "sync"
        "testing"
        "time"
 
@@ -236,18 +237,49 @@ func (s *OIDCLoginSuite) TestOIDCAuthorizer(c *check.C) {
 
        ctx := auth.NewContext(context.Background(), &auth.Credentials{Tokens: []string{accessToken}})
        var exp1 time.Time
-       oidcAuthorizer.WrapCalls(func(ctx context.Context, opts interface{}) (interface{}, error) {
-               creds, ok := auth.FromContext(ctx)
-               c.Assert(ok, check.Equals, true)
-               c.Assert(creds.Tokens, check.HasLen, 1)
-               c.Check(creds.Tokens[0], check.Equals, accessToken)
 
-               err := db.QueryRowContext(ctx, `select expires_at at time zone 'UTC' from api_client_authorizations where api_token=$1`, apiToken).Scan(&exp1)
-               c.Check(err, check.IsNil)
-               c.Check(exp1.Sub(time.Now()) > -time.Second, check.Equals, true)
-               c.Check(exp1.Sub(time.Now()) < time.Second, check.Equals, true)
-               return nil, nil
-       })(ctx, nil)
+       concurrent := 4
+       s.fakeProvider.HoldUserInfo = make(chan *http.Request)
+       s.fakeProvider.ReleaseUserInfo = make(chan struct{})
+       go func() {
+               for i := 0; ; i++ {
+                       if i == concurrent {
+                               close(s.fakeProvider.ReleaseUserInfo)
+                       }
+                       <-s.fakeProvider.HoldUserInfo
+               }
+       }()
+       var wg sync.WaitGroup
+       for i := 0; i < concurrent; i++ {
+               i := i
+               wg.Add(1)
+               go func() {
+                       defer wg.Done()
+                       _, err := oidcAuthorizer.WrapCalls(func(ctx context.Context, opts interface{}) (interface{}, error) {
+                               c.Logf("concurrent req %d/%d", i, concurrent)
+                               var exp time.Time
+
+                               creds, ok := auth.FromContext(ctx)
+                               c.Assert(ok, check.Equals, true)
+                               c.Assert(creds.Tokens, check.HasLen, 1)
+                               c.Check(creds.Tokens[0], check.Equals, accessToken)
+
+                               err := db.QueryRowContext(ctx, `select expires_at at time zone 'UTC' from api_client_authorizations where api_token=$1`, apiToken).Scan(&exp)
+                               c.Check(err, check.IsNil)
+                               c.Check(exp.Sub(time.Now()) > -time.Second, check.Equals, true)
+                               c.Check(exp.Sub(time.Now()) < time.Second, check.Equals, true)
+                               if i == 0 {
+                                       exp1 = exp
+                               }
+                               return nil, nil
+                       })(ctx, nil)
+                       c.Check(err, check.IsNil)
+               }()
+       }
+       wg.Wait()
+       if c.Failed() {
+               c.Fatal("giving up")
+       }
 
        // If the token is used again after the in-memory cache
        // expires, oidcAuthorizer must re-check the token and update
@@ -257,8 +289,8 @@ func (s *OIDCLoginSuite) TestOIDCAuthorizer(c *check.C) {
                var exp time.Time
                err := db.QueryRowContext(ctx, `select expires_at at time zone 'UTC' from api_client_authorizations where api_token=$1`, apiToken).Scan(&exp)
                c.Check(err, check.IsNil)
-               c.Check(exp.Sub(exp1) > 0, check.Equals, true)
-               c.Check(exp.Sub(exp1) < time.Second, check.Equals, true)
+               c.Check(exp.Sub(exp1) > 0, check.Equals, true, check.Commentf("expect %v > 0", exp.Sub(exp1)))
+               c.Check(exp.Sub(exp1) < time.Second, check.Equals, true, check.Commentf("expect %v < 1s", exp.Sub(exp1)))
                return nil, nil
        })(ctx, nil)
 
index 237f900a83458b61e695ffe7e6808820419c2cea..14e0a582c13eda01680ccf35b8f239f10bf0892d 100644 (file)
@@ -2,6 +2,8 @@
 //
 // SPDX-License-Identifier: AGPL-3.0
 
+//go:build !static
+
 package localdb
 
 import (
diff --git a/lib/controller/localdb/login_pam_static.go b/lib/controller/localdb/login_pam_static.go
new file mode 100644 (file)
index 0000000..420a256
--- /dev/null
@@ -0,0 +1,31 @@
+// Copyright (C) The Arvados Authors. All rights reserved.
+//
+// SPDX-License-Identifier: AGPL-3.0
+
+//go:build static
+
+package localdb
+
+import (
+       "context"
+       "errors"
+
+       "git.arvados.org/arvados.git/sdk/go/arvados"
+)
+
+type pamLoginController struct {
+       Cluster *arvados.Cluster
+       Parent  *Conn
+}
+
+func (ctrl *pamLoginController) Logout(ctx context.Context, opts arvados.LogoutOptions) (arvados.LogoutResponse, error) {
+       return logout(ctx, ctrl.Cluster, opts)
+}
+
+func (ctrl *pamLoginController) Login(ctx context.Context, opts arvados.LoginOptions) (arvados.LoginResponse, error) {
+       return arvados.LoginResponse{}, errors.New("interactive login is not available")
+}
+
+func (ctrl *pamLoginController) UserAuthenticate(ctx context.Context, opts arvados.UserAuthenticateOptions) (arvados.APIClientAuthorization, error) {
+       return arvados.APIClientAuthorization{}, errors.New("support not available due to static compilation")
+}
index 028083fa0d1a23442f527b24f8ce95aacff660f4..10cd7cfce43a03472e2e942b68512efcdd7d0c61 100644 (file)
@@ -21,16 +21,6 @@ import (
        "time"
 )
 
-// This magically allows us to look up userHz via _SC_CLK_TCK:
-
-/*
-#include <unistd.h>
-#include <sys/types.h>
-#include <pwd.h>
-#include <stdlib.h>
-*/
-import "C"
-
 // A Reporter gathers statistics for a cgroup and writes them to a
 // log.Logger.
 type Reporter struct {
@@ -395,7 +385,7 @@ func (r *Reporter) doCPUStats() {
 
        var userTicks, sysTicks int64
        fmt.Sscanf(string(b), "user %d\nsystem %d", &userTicks, &sysTicks)
-       userHz := float64(C.sysconf(C._SC_CLK_TCK))
+       userHz := float64(100)
        nextSample := cpuSample{
                hasData:    true,
                sampleTime: time.Now(),
index 734945c0f7ccb707258ab60be52107195afff0e0..826467cc09397342c8d0fa32bfe3b4ed8dd10124 100644 (file)
@@ -152,6 +152,10 @@ def arg_parser():  # type: () -> argparse.ArgumentParser
                         help="When invoked with --submit --wait, always submit a runner to manage the workflow, even when only running a single CommandLineTool",
                         default=False)
 
+    parser.add_argument("--match-submitter-images", action="store_true",
+                        default=False, dest="match_local_docker",
+                        help="Where Arvados has more than one Docker image of the same name, use image from the Docker instance on the submitting node.")
+
     exgroup = parser.add_mutually_exclusive_group()
     exgroup.add_argument("--submit-request-uuid",
                          default=None,
index 3c7e9cfaa6d5e91eed03b03222522db9a96c83f9..753c2c25024a385b068abdbef4c78829e9b102ef 100644 (file)
@@ -246,7 +246,8 @@ class ArvadosContainer(JobBase):
                                                                     runtimeContext.pull_image,
                                                                     runtimeContext.project_uuid,
                                                                     runtimeContext.force_docker_pull,
-                                                                    runtimeContext.tmp_outdir_prefix)
+                                                                    runtimeContext.tmp_outdir_prefix,
+                                                                    runtimeContext.match_local_docker)
 
         network_req, _ = self.get_requirement("NetworkAccess")
         if network_req:
index 26408317cbe6d0cdb5382f38c1455b0bd7b5db2a..04e2a4cffcfb5e674aa471555aa5e0c7fac2033e 100644 (file)
@@ -6,6 +6,8 @@ import logging
 import sys
 import threading
 import copy
+import re
+import subprocess
 
 from schema_salad.sourceline import SourceLine
 
@@ -18,8 +20,44 @@ logger = logging.getLogger('arvados.cwl-runner')
 cached_lookups = {}
 cached_lookups_lock = threading.Lock()
 
+def determine_image_id(dockerImageId):
+    for line in (
+        subprocess.check_output(  # nosec
+            ["docker", "images", "--no-trunc", "--all"]
+        )
+        .decode("utf-8")
+        .splitlines()
+    ):
+        try:
+            match = re.match(r"^([^ ]+)\s+([^ ]+)\s+([^ ]+)", line)
+            split = dockerImageId.split(":")
+            if len(split) == 1:
+                split.append("latest")
+            elif len(split) == 2:
+                #  if split[1] doesn't  match valid tag names, it is a part of repository
+                if not re.match(r"[\w][\w.-]{0,127}", split[1]):
+                    split[0] = split[0] + ":" + split[1]
+                    split[1] = "latest"
+            elif len(split) == 3:
+                if re.match(r"[\w][\w.-]{0,127}", split[2]):
+                    split[0] = split[0] + ":" + split[1]
+                    split[1] = split[2]
+                    del split[2]
+
+            # check for repository:tag match or image id match
+            if match and (
+                (split[0] == match.group(1) and split[1] == match.group(2))
+                or dockerImageId == match.group(3)
+            ):
+                return match.group(3)
+        except ValueError:
+            pass
+
+    return None
+
+
 def arv_docker_get_image(api_client, dockerRequirement, pull_image, project_uuid,
-                         force_pull, tmp_outdir_prefix):
+                         force_pull, tmp_outdir_prefix, match_local_docker):
     """Check if a Docker image is available in Keep, if not, upload it using arv-keepdocker."""
 
     if "http://arvados.org/cwl#dockerCollectionPDH" in dockerRequirement:
@@ -46,6 +84,20 @@ def arv_docker_get_image(api_client, dockerRequirement, pull_image, project_uuid
                                                                 image_name=image_name,
                                                                 image_tag=image_tag)
 
+        if images and match_local_docker:
+            local_image_id = determine_image_id(dockerRequirement["dockerImageId"])
+            if local_image_id:
+                # find it in the list
+                found = False
+                for i in images:
+                    if i[1]["dockerhash"] == local_image_id:
+                        found = True
+                        images = [i]
+                        break
+                if not found:
+                    # force re-upload.
+                    images = []
+
         if not images:
             # Fetch Docker image if necessary.
             try:
index 1e04dd5774ebb8bbc45ebdd417c35138f2d13a4d..4239dd3b51fea8e51d3cdaf33116595cc2748f34 100644 (file)
@@ -36,6 +36,7 @@ class ArvRuntimeContext(RuntimeContext):
         self.cluster_target_id = 0
         self.always_submit_runner = False
         self.collection_cache_size = 256
+        self.match_local_docker = False
 
         super(ArvRuntimeContext, self).__init__(kwargs)
 
index 7d6d287a207455d04e2a1d5469e053ea57f7cf6e..ad17950a2fb6bab90376ecb0fe688af7a80f2b94 100644 (file)
@@ -461,12 +461,14 @@ def upload_docker(arvrunner, tool):
                     "Option 'dockerOutputDirectory' of DockerRequirement not supported.")
             arvados_cwl.arvdocker.arv_docker_get_image(arvrunner.api, docker_req, True, arvrunner.project_uuid,
                                                        arvrunner.runtimeContext.force_docker_pull,
-                                                       arvrunner.runtimeContext.tmp_outdir_prefix)
+                                                       arvrunner.runtimeContext.tmp_outdir_prefix,
+                                                       arvrunner.runtimeContext.match_local_docker)
         else:
             arvados_cwl.arvdocker.arv_docker_get_image(arvrunner.api, {"dockerPull": "arvados/jobs:"+__version__},
                                                        True, arvrunner.project_uuid,
                                                        arvrunner.runtimeContext.force_docker_pull,
-                                                       arvrunner.runtimeContext.tmp_outdir_prefix)
+                                                       arvrunner.runtimeContext.tmp_outdir_prefix,
+                                                       arvrunner.runtimeContext.match_local_docker)
     elif isinstance(tool, cwltool.workflow.Workflow):
         for s in tool.steps:
             upload_docker(arvrunner, s.embedded_tool)
@@ -503,7 +505,8 @@ def packed_workflow(arvrunner, tool, merged_map):
                 v["http://arvados.org/cwl#dockerCollectionPDH"] = arvados_cwl.arvdocker.arv_docker_get_image(arvrunner.api, v, True,
                                                                                                              arvrunner.project_uuid,
                                                                                                              arvrunner.runtimeContext.force_docker_pull,
-                                                                                                             arvrunner.runtimeContext.tmp_outdir_prefix)
+                                                                                                             arvrunner.runtimeContext.tmp_outdir_prefix,
+                                                                                                             arvrunner.runtimeContext.match_local_docker)
             for l in v:
                 visit(v[l], cur_id)
         if isinstance(v, list):
@@ -610,7 +613,8 @@ def arvados_jobs_image(arvrunner, img):
     try:
         return arvados_cwl.arvdocker.arv_docker_get_image(arvrunner.api, {"dockerPull": img}, True, arvrunner.project_uuid,
                                                           arvrunner.runtimeContext.force_docker_pull,
-                                                          arvrunner.runtimeContext.tmp_outdir_prefix)
+                                                          arvrunner.runtimeContext.tmp_outdir_prefix,
+                                                          arvrunner.runtimeContext.match_local_docker)
     except Exception as e:
         raise Exception("Docker image %s is not available\n%s" % (img, e) )
 
index e739235953cf799e07e9f13440ef7e05963241c9..e126d170b7618cf8cae94c64cd70696923963786 100644 (file)
@@ -36,7 +36,7 @@ setup(name='arvados-cwl-runner',
       # file to determine what version of cwltool and schema-salad to
       # build.
       install_requires=[
-          'cwltool==3.1.20220210171524',
+          'cwltool==3.1.20220217222804',
           'schema-salad==8.2.20211116214159',
           'arvados-python-client{}'.format(pysdk_dep),
           'setuptools',
index 21ae4a7af0372aada304472b7da5e629e0a7956b..72774daba37070050b9bb4f983523d365e3882af 100644 (file)
@@ -1117,6 +1117,97 @@ class TestContainer(unittest.TestCase):
                 }))
 
 
+    # The test passes no builder.resources
+    # Hence the default resources will apply: {'cores': 1, 'ram': 1024, 'outdirSize': 1024, 'tmpdirSize': 1024}
+    @mock.patch("arvados_cwl.arvdocker.determine_image_id")
+    @mock.patch("arvados.commands.keepdocker.list_images_in_arv")
+    def test_match_local_docker(self, keepdocker, determine_image_id):
+        arvados_cwl.add_arv_hints()
+        arv_docker_clear_cache()
+
+        runner = mock.MagicMock()
+        runner.ignore_docker_for_reuse = False
+        runner.intermediate_output_ttl = 0
+        runner.secret_store = cwltool.secrets.SecretStore()
+        runner.api._rootDesc = {"revision": "20210628"}
+
+        keepdocker.return_value = [("zzzzz-4zz18-zzzzzzzzzzzzzz4", {"dockerhash": "456"}),
+                                   ("zzzzz-4zz18-zzzzzzzzzzzzzz3", {"dockerhash": "123"})]
+        determine_image_id.side_effect = lambda x: "123"
+        def execute(uuid):
+            ex = mock.MagicMock()
+            lookup = {"zzzzz-4zz18-zzzzzzzzzzzzzz4": {"portable_data_hash": "99999999999999999999999999999994+99"},
+                      "zzzzz-4zz18-zzzzzzzzzzzzzz3": {"portable_data_hash": "99999999999999999999999999999993+99"}}
+            ex.execute.return_value = lookup[uuid]
+            return ex
+        runner.api.collections().get.side_effect = execute
+
+        tool = cmap({
+            "inputs": [],
+            "outputs": [],
+            "baseCommand": "echo",
+            "arguments": [],
+            "id": "",
+            "cwlVersion": "v1.2",
+            "class": "CommandLineTool"
+        })
+
+        loadingContext, runtimeContext = self.helper(runner, True)
+
+        arvtool = cwltool.load_tool.load_tool(tool, loadingContext)
+        arvtool.formatgraph = None
+
+        container_request = {
+            'environment': {
+                'HOME': '/var/spool/cwl',
+                'TMPDIR': '/tmp'
+            },
+            'name': 'test_run_True',
+            'runtime_constraints': {
+                'vcpus': 1,
+                'ram': 268435456
+            },
+            'use_existing': True,
+            'priority': 500,
+            'mounts': {
+                '/tmp': {'kind': 'tmp',
+                         "capacity": 1073741824
+                         },
+                '/var/spool/cwl': {'kind': 'tmp',
+                                   "capacity": 1073741824 }
+            },
+            'state': 'Committed',
+            'output_name': 'Output for step test_run_True',
+            'owner_uuid': 'zzzzz-8i9sb-zzzzzzzzzzzzzzz',
+            'output_path': '/var/spool/cwl',
+            'output_ttl': 0,
+            'container_image': '99999999999999999999999999999994+99',
+            'command': ['echo'],
+            'cwd': '/var/spool/cwl',
+            'scheduling_parameters': {},
+            'properties': {},
+            'secret_mounts': {},
+            'output_storage_classes': ["default"]
+        }
+
+        runtimeContext.match_local_docker = False
+        for j in arvtool.job({}, mock.MagicMock(), runtimeContext):
+            j.run(runtimeContext)
+            runner.api.container_requests().create.assert_called_with(
+                body=JsonDiffMatcher(container_request))
+
+        arv_docker_clear_cache()
+        runtimeContext.match_local_docker = True
+        container_request['container_image'] = '99999999999999999999999999999993+99'
+        container_request['name'] = 'test_run_True_2'
+        container_request['output_name'] = 'Output for step test_run_True_2'
+        for j in arvtool.job({}, mock.MagicMock(), runtimeContext):
+            j.run(runtimeContext)
+            runner.api.container_requests().create.assert_called_with(
+                body=JsonDiffMatcher(container_request))
+
+
+
 class TestWorkflow(unittest.TestCase):
     def setUp(self):
         cwltool.process._names = set()
index 77f70851e8dd86d1682f209e1eb27cb08d17f12d..10443359b99bf02a51163d8eb38924579d14154a 100644 (file)
@@ -67,10 +67,10 @@ def stubs(func):
 
         stubs.keep_client = keep_client2
         stubs.docker_images = {
-            "arvados/jobs:"+arvados_cwl.__version__: [("zzzzz-4zz18-zzzzzzzzzzzzzd3", "")],
-            "debian:buster-slim": [("zzzzz-4zz18-zzzzzzzzzzzzzd4", "")],
-            "arvados/jobs:123": [("zzzzz-4zz18-zzzzzzzzzzzzzd5", "")],
-            "arvados/jobs:latest": [("zzzzz-4zz18-zzzzzzzzzzzzzd6", "")],
+            "arvados/jobs:"+arvados_cwl.__version__: [("zzzzz-4zz18-zzzzzzzzzzzzzd3", {})],
+            "debian:buster-slim": [("zzzzz-4zz18-zzzzzzzzzzzzzd4", {})],
+            "arvados/jobs:123": [("zzzzz-4zz18-zzzzzzzzzzzzzd5", {})],
+            "arvados/jobs:latest": [("zzzzz-4zz18-zzzzzzzzzzzzzd6", {})],
         }
         def kd(a, b, image_name=None, image_tag=None):
             return stubs.docker_images.get("%s:%s" % (image_name, image_tag), [])
@@ -1062,6 +1062,7 @@ class TestSubmit(unittest.TestCase):
         arvrunner.project_uuid = ""
         api.return_value = mock.MagicMock()
         arvrunner.api = api.return_value
+        arvrunner.runtimeContext.match_local_docker = False
         arvrunner.api.links().list().execute.side_effect = ({"items": [{"created_at": "",
                                                                         "head_uuid": "zzzzz-4zz18-zzzzzzzzzzzzzzb",
                                                                         "link_class": "docker_image_repo+tag",
index fa5e55c42e10af410d86d0e16fc23a637dbaeff2..087adc4b2441648111c0857b93c84eeb48d58cca 100644 (file)
@@ -35,6 +35,12 @@ type OIDCProvider struct {
 
        PeopleAPIResponse map[string]interface{}
 
+       // send incoming /userinfo requests to HoldUserInfo (if not
+       // nil), then receive from ReleaseUserInfo (if not nil),
+       // before responding (these are used to set up races)
+       HoldUserInfo    chan *http.Request
+       ReleaseUserInfo chan struct{}
+
        key       *rsa.PrivateKey
        Issuer    *httptest.Server
        PeopleAPI *httptest.Server
@@ -126,6 +132,12 @@ func (p *OIDCProvider) serveOIDC(w http.ResponseWriter, req *http.Request) {
        case "/auth":
                w.WriteHeader(http.StatusInternalServerError)
        case "/userinfo":
+               if p.HoldUserInfo != nil {
+                       p.HoldUserInfo <- req
+               }
+               if p.ReleaseUserInfo != nil {
+                       <-p.ReleaseUserInfo
+               }
                authhdr := req.Header.Get("Authorization")
                if _, err := jwt.ParseSigned(strings.TrimPrefix(authhdr, "Bearer ")); err != nil {
                        p.c.Logf("OIDCProvider: bad auth %q", authhdr)
index f8454029d6b8cf2561505080ac5b74b8d57b8c70..993a49e5b75e7ecfb782a306df16c74b37fbed4a 100644 (file)
@@ -35,7 +35,12 @@ class ApiClientAuthorization < ArvadosModel
   UNLOGGED_CHANGES = ['last_used_at', 'last_used_by_ip_address', 'updated_at']
 
   def assign_random_api_token
-    self.api_token ||= rand(2**256).to_s(36)
+    begin
+      self.api_token ||= rand(2**256).to_s(36)
+    rescue ActiveModel::MissingAttributeError
+      # Ignore the case where self.api_token doesn't exist, which happens when
+      # the select=[...] is used.
+    end
   end
 
   def owner_uuid
@@ -130,7 +135,8 @@ class ApiClientAuthorization < ArvadosModel
       return ApiClientAuthorization.new(user: User.find_by_uuid(anonymous_user_uuid),
                                         uuid: Rails.configuration.ClusterID+"-gj3su-anonymouspublic",
                                         api_token: token,
-                                        api_client: anonymous_user_token_api_client)
+                                        api_client: anonymous_user_token_api_client,
+                                        scopes: ['GET /'])
     else
       return nil
     end
index febb8ea51611eb5a21549b8133770fe059a2ca5c..811cd89758d9ef8642be5382c9bac54b4f649134 100644 (file)
@@ -21,6 +21,7 @@ class User < ArvadosModel
             uniqueness: true,
             allow_nil: true)
   validate :must_unsetup_to_deactivate
+  validate :identity_url_nil_if_empty
   before_update :prevent_privilege_escalation
   before_update :prevent_inactive_admin
   before_update :verify_repositories_empty, :if => Proc.new {
@@ -810,4 +811,10 @@ SELECT target_uuid, perm_level
       repo.save!
     end
   end
+
+  def identity_url_nil_if_empty
+    if identity_url == ""
+      self.identity_url = nil
+    end
+  end
 end
index bf407afcd7e06ed8cd55438e6f85883ae198aa49..9c70f6f417b6a654710b2d79e70bfa88b91ecd0b 100644 (file)
@@ -203,4 +203,20 @@ class Arvados::V1::ApiClientAuthorizationsControllerTest < ActionController::Tes
     get :current
     assert_response 401
   end
+
+  # Tests regression #18801
+  test "select param is respected in 'show' response" do
+    authorize_with :active
+    get :show, params: {
+          id: api_client_authorizations(:active).uuid,
+          select: ["uuid"],
+        }
+    assert_response :success
+    assert_raises ActiveModel::MissingAttributeError do
+      assigns(:object).api_token
+    end
+    assert_nil json_response["expires_at"]
+    assert_nil json_response["api_token"]
+    assert_equal api_client_authorizations(:active).uuid, json_response["uuid"]
+  end
 end
index 7368d893745658fd20de42de6d2410f421a1098b..9a0e1dbf9cc2427be71613b1ed939bc125f07351 100644 (file)
@@ -797,4 +797,12 @@ class UserTest < ActiveSupport::TestCase
     assert user.save
   end
 
+  test "empty identity_url saves as null" do
+    set_user_from_auth :admin
+    user = users(:active)
+    assert user.update_attributes(identity_url: '')
+    user.reload
+    assert_nil user.identity_url
+  end
+
 end
index e7d03677eac190a3b4f6cb10a056973811765366..e021b442f1d7a0db09848c4364c0dcca45c7d0f0 100755 (executable)
@@ -143,7 +143,7 @@ docker_run_dev() {
            "--volume=$PG_DATA:/var/lib/postgresql:rw" \
            "--volume=$VAR_DATA:$ARVADOS_CONTAINER_PATH:rw" \
            "--volume=$PASSENGER:/var/lib/passenger:rw" \
-          "--volume=$GEMS:/var/lib/arvados/lib/ruby/gems:rw" \
+          "--volume=$GEMS:/var/lib/arvados-arvbox/.gem:rw" \
            "--volume=$PIPCACHE:/var/lib/pip:rw" \
            "--volume=$NPMCACHE:/var/lib/npm:rw" \
            "--volume=$GOSTUFF:/var/lib/gopath:rw" \
index c44e7c410151d772291e25b267b368439f6867a3..cb41227c9da86319e93a6260cc2d934e15ae12d6 100644 (file)
@@ -12,7 +12,8 @@ export npm_config_cache_min=Infinity
 export R_LIBS=/var/lib/Rlibs
 export HOME=$(getent passwd arvbox | cut -d: -f6)
 export ARVADOS_CONTAINER_PATH=/var/lib/arvados-arvbox
-GEMLOCK=/var/lib/arvados/lib/ruby/gems/gems.lock
+export GEM_HOME=$HOME/.gem
+GEMLOCK=$HOME/gems.lock
 
 defaultdev=$(/sbin/ip route|awk '/default/ { print $5 }')
 dockerip=$(/sbin/ip route | grep default | awk '{ print $3 }')
@@ -63,7 +64,7 @@ else
 fi
 
 run_bundler() {
-    flock $GEMLOCK /var/lib/arvados/bin/gem install --no-document bundler:$BUNDLER_VERSION
+    flock $GEMLOCK /var/lib/arvados/bin/gem install --no-document --user bundler:$BUNDLER_VERSION
     if test -f Gemfile.lock ; then
         frozen=--frozen
     else
index 6ec788589f59d624a3ab2e1b9c47bc72d76e84dc..5007fe0be3e8e459fdd107246b5987b324051bfd 100755 (executable)
@@ -63,7 +63,7 @@ fi
 
 if ! [[ -z "$waiting" ]] ; then
     if ps x | grep -v grep | grep "bundle install" > /dev/null; then
-        gemcount=$(ls /var/lib/arvados/lib/ruby/gems/*/gems 2>/dev/null | wc -l)
+        gemcount=$(ls /var/lib/arvados/lib/ruby/gems/*/gems /var/lib/arvados-arvbox/.gem/ruby/*/gems 2>/dev/null | wc -l)
 
         gemlockcount=0
         for l in /usr/src/arvados/services/api/Gemfile.lock \
index 23b7832fcba455b4e9ed3fe13a504eb4644fd4d1..131aa8a8786375543202294df52f8f80a52cfcdd 100644 (file)
@@ -6,6 +6,7 @@
     "aws_profile": "",
     "aws_secret_key": "",
     "aws_source_ami": "ami-031283ff8a43b021c",
+    "aws_ebs_autoscale": "",
     "build_environment": "aws",
     "public_key_file": "",
     "mksquashfs_mem": "",
     "type": "file",
     "source": "scripts/usr-local-bin-ensure-encrypted-partitions.sh",
     "destination": "/tmp/usr-local-bin-ensure-encrypted-partitions.sh"
+  },{
+    "type": "file",
+    "source": "scripts/usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh",
+    "destination": "/tmp/usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh"
+  },{
+    "type": "file",
+    "source": "scripts/create-ebs-volume-nvme.patch",
+    "destination": "/tmp/create-ebs-volume-nvme.patch"
   },{
     "type": "file",
     "source": "{{user `public_key_file`}}",
@@ -84,6 +93,6 @@
     "type": "shell",
     "execute_command": "sudo -S env {{ .Vars }} /bin/bash '{{ .Path }}'",
     "script": "scripts/base.sh",
-    "environment_vars": ["RESOLVER={{user `resolver`}}","REPOSUFFIX={{user `reposuffix`}}","MKSQUASHFS_MEM={{user `mksquashfs_mem`}}","NVIDIA_GPU_SUPPORT={{user `nvidia_gpu_support`}}","CLOUD=aws"]
+    "environment_vars": ["RESOLVER={{user `resolver`}}","REPOSUFFIX={{user `reposuffix`}}","MKSQUASHFS_MEM={{user `mksquashfs_mem`}}","NVIDIA_GPU_SUPPORT={{user `nvidia_gpu_support`}}","CLOUD=aws","AWS_EBS_AUTOSCALE={{user `aws_ebs_autoscale`}}"]
   }]
 }
index fce8b1918b0c7927dee58724c97cca2142fd8bf1..769b9a5b5aaca004b6f7624369ab0ffcafc98de2 100755 (executable)
@@ -33,6 +33,8 @@ Options:
       VPC id for AWS, otherwise packer will pick the default one
   --aws-subnet-id
       Subnet id for AWS otherwise packer will pick the default one for the VPC
+  --aws-ebs-autoscale (default: false)
+      Install the AWS EBS autoscaler daemon.
   --gcp-project-id (default: false, required if building for GCP)
       GCP project id
   --gcp-account-file (default: false, required if building for GCP)
@@ -62,6 +64,8 @@ Options:
   --debug (default: false)
       Output debug information
 
+For more information, see the Arvados documentation at https://doc.arvados.org/install/crunch2-cloud/install-compute-node.html
+
 EOF
 
 JSON_FILE=
@@ -71,6 +75,7 @@ AWS_SECRETS_FILE=
 AWS_SOURCE_AMI=
 AWS_VPC_ID=
 AWS_SUBNET_ID=
+AWS_EBS_AUTOSCALE=
 GCP_PROJECT_ID=
 GCP_ACCOUNT_FILE=
 GCP_ZONE=
@@ -86,7 +91,7 @@ MKSQUASHFS_MEM=256M
 NVIDIA_GPU_SUPPORT=
 
 PARSEDOPTS=$(getopt --name "$0" --longoptions \
-    help,json-file:,arvados-cluster-id:,aws-source-ami:,aws-profile:,aws-secrets-file:,aws-region:,aws-vpc-id:,aws-subnet-id:,gcp-project-id:,gcp-account-file:,gcp-zone:,azure-secrets-file:,azure-resource-group:,azure-location:,azure-sku:,azure-cloud-environment:,ssh_user:,resolver:,reposuffix:,public-key-file:,mksquashfs-mem:,nvidia-gpu-support,debug \
+    help,json-file:,arvados-cluster-id:,aws-source-ami:,aws-profile:,aws-secrets-file:,aws-region:,aws-vpc-id:,aws-subnet-id:,aws-ebs-autoscale,gcp-project-id:,gcp-account-file:,gcp-zone:,azure-secrets-file:,azure-resource-group:,azure-location:,azure-sku:,azure-cloud-environment:,ssh_user:,resolver:,reposuffix:,public-key-file:,mksquashfs-mem:,nvidia-gpu-support,debug \
     -- "" "$@")
 if [ $? -ne 0 ]; then
     exit 1
@@ -124,6 +129,9 @@ while [ $# -gt 0 ]; do
         --aws-subnet-id)
             AWS_SUBNET_ID="$2"; shift
             ;;
+        --aws-ebs-autoscale)
+            AWS_EBS_AUTOSCALE=1
+            ;;
         --gcp-project-id)
             GCP_PROJECT_ID="$2"; shift
             ;;
@@ -185,7 +193,7 @@ while [ $# -gt 0 ]; do
 done
 
 
-if [[ "$JSON_FILE" == "" ]] || [[ ! -f "$JSON_FILE" ]]; then
+if [[ -z "$JSON_FILE" ]] || [[ ! -f "$JSON_FILE" ]]; then
   echo >&2 "$helpmessage"
   echo >&2
   echo >&2 "ERROR: packer json file not found"
@@ -201,7 +209,7 @@ if [[ -z "$ARVADOS_CLUSTER_ID" ]]; then
   exit 1
 fi
 
-if [[ "$PUBLIC_KEY_FILE" == "" ]] || [[ ! -f "$PUBLIC_KEY_FILE" ]]; then
+if [[ -z "$PUBLIC_KEY_FILE" ]] || [[ ! -f "$PUBLIC_KEY_FILE" ]]; then
   echo >&2 "$helpmessage"
   echo >&2
   echo >&2 "ERROR: public key file file not found"
@@ -220,63 +228,64 @@ fi
 
 EXTRA2=""
 
-if [[ "$AWS_SOURCE_AMI" != "" ]]; then
+if [[ -n "$AWS_SOURCE_AMI" ]]; then
   EXTRA2+=" -var aws_source_ami=$AWS_SOURCE_AMI"
 fi
-if [[ "$AWS_PROFILE" != "" ]]; then
+if [[ -n "$AWS_PROFILE" ]]; then
   EXTRA2+=" -var aws_profile=$AWS_PROFILE"
 fi
-if [[ "$AWS_VPC_ID" != "" ]]; then
+if [[ -n "$AWS_VPC_ID" ]]; then
   EXTRA2+=" -var vpc_id=$AWS_VPC_ID -var associate_public_ip_address=true "
 fi
-if [[ "$AWS_SUBNET_ID" != "" ]]; then
+if [[ -n "$AWS_SUBNET_ID" ]]; then
   EXTRA2+=" -var subnet_id=$AWS_SUBNET_ID -var associate_public_ip_address=true "
 fi
-if [[ "$AWS_DEFAULT_REGION" != "" ]]; then
+if [[ -n "$AWS_DEFAULT_REGION" ]]; then
   EXTRA2+=" -var aws_default_region=$AWS_DEFAULT_REGION"
 fi
-if [[ "$GCP_PROJECT_ID" != "" ]]; then
+if [[ -n "$AWS_EBS_AUTOSCALE" ]]; then
+  EXTRA2+=" -var aws_ebs_autoscale=$AWS_EBS_AUTOSCALE"
+fi
+if [[ -n "$GCP_PROJECT_ID" ]]; then
   EXTRA2+=" -var project_id=$GCP_PROJECT_ID"
 fi
-if [[ "$GCP_ACCOUNT_FILE" != "" ]]; then
+if [[ -n "$GCP_ACCOUNT_FILE" ]]; then
   EXTRA2+=" -var account_file=$GCP_ACCOUNT_FILE"
 fi
-if [[ "$GCP_ZONE" != "" ]]; then
+if [[ -n "$GCP_ZONE" ]]; then
   EXTRA2+=" -var zone=$GCP_ZONE"
 fi
-if [[ "$AZURE_RESOURCE_GROUP" != "" ]]; then
+if [[ -n "$AZURE_RESOURCE_GROUP" ]]; then
   EXTRA2+=" -var resource_group=$AZURE_RESOURCE_GROUP"
 fi
-if [[ "$AZURE_LOCATION" != "" ]]; then
+if [[ -n "$AZURE_LOCATION" ]]; then
   EXTRA2+=" -var location=$AZURE_LOCATION"
 fi
-if [[ "$AZURE_SKU" != "" ]]; then
+if [[ -n "$AZURE_SKU" ]]; then
   EXTRA2+=" -var image_sku=$AZURE_SKU"
 fi
-if [[ "$AZURE_CLOUD_ENVIRONMENT" != "" ]]; then
+if [[ -n "$AZURE_CLOUD_ENVIRONMENT" ]]; then
   EXTRA2+=" -var cloud_environment_name=$AZURE_CLOUD_ENVIRONMENT"
 fi
-if [[ "$SSH_USER" != "" ]]; then
+if [[ -n "$SSH_USER" ]]; then
   EXTRA2+=" -var ssh_user=$SSH_USER"
 fi
-if [[ "$RESOLVER" != "" ]]; then
+if [[ -n "$RESOLVER" ]]; then
   EXTRA2+=" -var resolver=$RESOLVER"
 fi
-if [[ "$REPOSUFFIX" != "" ]]; then
+if [[ -n "$REPOSUFFIX" ]]; then
   EXTRA2+=" -var reposuffix=$REPOSUFFIX"
 fi
-if [[ "$PUBLIC_KEY_FILE" != "" ]]; then
+if [[ -n "$PUBLIC_KEY_FILE" ]]; then
   EXTRA2+=" -var public_key_file=$PUBLIC_KEY_FILE"
 fi
-if [[ "$MKSQUASHFS_MEM" != "" ]]; then
+if [[ -n "$MKSQUASHFS_MEM" ]]; then
   EXTRA2+=" -var mksquashfs_mem=$MKSQUASHFS_MEM"
 fi
-if [[ "$NVIDIA_GPU_SUPPORT" != "" ]]; then
+if [[ -n "$NVIDIA_GPU_SUPPORT" ]]; then
   EXTRA2+=" -var nvidia_gpu_support=$NVIDIA_GPU_SUPPORT"
 fi
 
-
-
 echo
 packer version
 echo
index 450a8b3c549bd124950931a644967526a147eb27..90b845f1ac8a30c0a0b30edb6d5361557085a374 100644 (file)
@@ -4,6 +4,8 @@
 #
 # SPDX-License-Identifier: Apache-2.0
 
+set -eu -o pipefail
+
 SUDO=sudo
 
 wait_for_apt_locks() {
@@ -142,8 +144,29 @@ $SUDO chmod 700 /home/crunch/.ssh/
 if [ "x$RESOLVER" != "x" ]; then
   $SUDO sed -i "s/#prepend domain-name-servers 127.0.0.1;/prepend domain-name-servers ${RESOLVER};/" /etc/dhcp/dhclient.conf
 fi
-# Set up the cloud-init script that will ensure encrypted disks
-$SUDO mv /tmp/usr-local-bin-ensure-encrypted-partitions.sh /usr/local/bin/ensure-encrypted-partitions.sh
+
+if [ "$AWS_EBS_AUTOSCALE" != "1" ]; then
+  # Set up the cloud-init script that will ensure encrypted disks
+  $SUDO mv /tmp/usr-local-bin-ensure-encrypted-partitions.sh /usr/local/bin/ensure-encrypted-partitions.sh
+else
+  wait_for_apt_locks && $SUDO DEBIAN_FRONTEND=noninteractive apt-get -qq --yes install jq unzip
+
+  curl -s "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "/tmp/awscliv2.zip"
+  unzip -q /tmp/awscliv2.zip -d /tmp && $SUDO /tmp/aws/install
+  # Pinned to v2.4.5 because we apply a patch below
+  #export EBS_AUTOSCALE_VERSION=$(curl --silent "https://api.github.com/repos/awslabs/amazon-ebs-autoscale/releases/latest" | jq -r .tag_name)
+  export EBS_AUTOSCALE_VERSION="v2.4.5"
+  cd /opt && $SUDO git clone https://github.com/awslabs/amazon-ebs-autoscale.git
+  cd /opt/amazon-ebs-autoscale && $SUDO git checkout $EBS_AUTOSCALE_VERSION
+  $SUDO patch -p1 < /tmp/create-ebs-volume-nvme.patch
+
+  # This script really requires bash and the shebang line is wrong
+  $SUDO sed -i 's|^#!/bin/sh|#!/bin/bash|' /opt/amazon-ebs-autoscale/bin/ebs-autoscale
+
+  # Set up the cloud-init script that makes use of the AWS EBS autoscaler
+  $SUDO mv /tmp/usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh /usr/local/bin/ensure-encrypted-partitions.sh
+fi
+
 $SUDO chmod 755 /usr/local/bin/ensure-encrypted-partitions.sh
 $SUDO chown root:root /usr/local/bin/ensure-encrypted-partitions.sh
 $SUDO mv /tmp/etc-cloud-cloud.cfg.d-07_compute_arvados_dispatch_cloud.cfg /etc/cloud/cloud.cfg.d/07_compute_arvados_dispatch_cloud.cfg
diff --git a/tools/compute-images/scripts/create-ebs-volume-nvme.patch b/tools/compute-images/scripts/create-ebs-volume-nvme.patch
new file mode 100644 (file)
index 0000000..79ce487
--- /dev/null
@@ -0,0 +1,49 @@
+# Copyright (C) The Arvados Authors. All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+Make the create-ebs-volume script work with nvme devices.
+
+diff --git a/bin/create-ebs-volume b/bin/create-ebs-volume
+index 6857564..e3122fa 100755
+--- a/bin/create-ebs-volume
++++ b/bin/create-ebs-volume
+@@ -149,10 +149,11 @@ function get_next_logical_device() {
+     for letter in ${alphabet[@]}; do
+         # use /dev/xvdb* device names to avoid contention for /dev/sd* and /dev/xvda names
+         # only supported by HVM instances
+-        if [ ! -b "/dev/xvdb${letter}" ]; then
++        if [[ $created_volumes =~ .*/dev/xvdb${letter}.* ]]; then
++            continue
++        fi
+             echo "/dev/xvdb${letter}"
+             break
+-        fi
+     done
+ }
+@@ -323,8 +324,13 @@ function create_and_attach_volume() {
+     logthis "waiting for volume $volume_id on filesystem"
+     while true; do
+-        if [ -e "$device" ]; then
+-            logthis "volume $volume_id on filesystem as $device"
++        # AWS returns e.g. vol-00338247831716a7b4, the kernel changes that to vol00338247831716a7b
++        valid_volume_id=`echo $volume_id |sed -e 's/[^a-zA-Z0-9]//'`
++        # example lsblk output:
++        # nvme4n1                     259:7    0  150G  0 disk            vol00338247831716a7b
++        if LSBLK=`lsblk -o NAME,SERIAL |grep $valid_volume_id`; then
++            nvme_device=/dev/`echo $LSBLK|cut -f1 -d' '`
++            logthis "volume $volume_id on filesystem as $nvme_device (aws device $device)"
+             break
+         fi
+         sleep 1
+@@ -338,7 +344,7 @@ function create_and_attach_volume() {
+     > /dev/null
+     logthis "volume $volume_id DeleteOnTermination ENABLED"
+-    echo $device
++    echo "$nvme_device"
+ }
+ create_and_attach_volume
diff --git a/tools/compute-images/scripts/usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh b/tools/compute-images/scripts/usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh
new file mode 100644 (file)
index 0000000..4b73c8b
--- /dev/null
@@ -0,0 +1,60 @@
+#!/bin/bash
+
+# Copyright (C) The Arvados Authors. All rights reserved.
+#
+# SPDX-License-Identifier: Apache-2.0
+
+set -e
+set -x
+
+MOUNTPATH=/tmp
+
+findmntq() {
+    findmnt "$@" >/dev/null
+}
+
+ensure_umount() {
+    if findmntq "$1"; then
+        umount "$1"
+    fi
+}
+
+# First make sure docker is not using /tmp, then unmount everything under it.
+if [ -d /etc/sv/docker.io ]
+then
+  sv stop docker.io || service stop docker.io || true
+else
+  service docker stop || true
+fi
+
+ensure_umount "$MOUNTPATH/docker/aufs"
+
+/bin/bash /opt/amazon-ebs-autoscale/install.sh -f lvm.ext4 -m $MOUNTPATH 2>&1 > /var/log/ebs-autoscale-install.log
+
+# Make sure docker uses the big partition
+cat <<EOF > /etc/docker/daemon.json
+{
+    "data-root": "$MOUNTPATH/docker-data"
+}
+EOF
+
+# restart docker
+if [ -d /etc/sv/docker.io ]
+then
+  ## runit
+  sv up docker.io
+else
+  service docker start
+fi
+
+end=$((SECONDS+60))
+
+while [ $SECONDS -lt $end ]; do
+  if /usr/bin/docker ps -q >/dev/null; then
+    exit 0
+  fi
+  sleep 1
+done
+
+# Docker didn't start within a minute, abort
+exit 1