Dante Tsang <dante@dantetsang.com>
Codex Genetics Ltd <info@codexgenetics.com>
Bruno P. Kinoshita <brunodepaulak@yahoo.com.br>
+George Chlipala <gchlip2@uic.edu>
- api/methods.html.textile.liquid
- api/resources.html.textile.liquid
- Permission and authentication:
+ - api/methods/users.html.textile.liquid
+ - api/methods/groups.html.textile.liquid
- api/methods/api_client_authorizations.html.textile.liquid
- - api/methods/api_clients.html.textile.liquid
+ - api/methods/links.html.textile.liquid
- api/methods/authorized_keys.html.textile.liquid
- - api/methods/groups.html.textile.liquid
- - api/methods/users.html.textile.liquid
+ - api/methods/api_clients.html.textile.liquid
- api/methods/user_agreements.html.textile.liquid
- - System resources:
- - api/methods/keep_services.html.textile.liquid
- - api/methods/links.html.textile.liquid
- - api/methods/logs.html.textile.liquid
- - api/methods/nodes.html.textile.liquid
- api/methods/virtual_machines.html.textile.liquid
- - api/methods/keep_disks.html.textile.liquid
- Data management:
- api/keep-webdav.html.textile.liquid
- api/keep-s3.html.textile.liquid
- api/projects.html.textile.liquid
- api/properties.html.textile.liquid
- api/methods/collections.html.textile.liquid
- - api/methods/repositories.html.textile.liquid
+ - api/methods/logs.html.textile.liquid
+ - api/methods/keep_services.html.textile.liquid
- Container engine:
- api/methods/container_requests.html.textile.liquid
- api/methods/containers.html.textile.liquid
- api/methods/workflows.html.textile.liquid
- - Management (admin/system):
- api/dispatch.html.textile.liquid
- Jobs engine (legacy):
- api/crunch-scripts.html.textile.liquid
- api/methods/job_tasks.html.textile.liquid
- api/methods/pipeline_instances.html.textile.liquid
- api/methods/pipeline_templates.html.textile.liquid
- - Metadata for bioinformatics (deprecated):
+ - api/methods/nodes.html.textile.liquid
+ - api/methods/repositories.html.textile.liquid
+ - api/methods/keep_disks.html.textile.liquid
+ - Metadata for bioinformatics (legacy):
- api/methods/humans.html.textile.liquid
- api/methods/specimens.html.textile.liquid
- api/methods/traits.html.textile.liquid
The script expects cert/key files with these basenames (matching the role except for <i>keepweb</i>, which is split in both <i>download / collections</i>):
+# @balancer@ -- Optional on multi-node installations
+# @collections@ -- Part of keepweb, must be a wildcard for @*.collections.${DOMAIN}@
# @controller@
-# @websocket@ -- note: corresponds to default domain @ws.${DOMAIN}@
-# @keepproxy@ -- note: corresponds to default domain @keep.${DOMAIN}@
# @download@ -- Part of keepweb
-# @collections@ -- Part of keepweb, must be a wildcard for @*.collections.${DOMAIN}@
+# @grafana@ -- Service available by default on multi-node installations
+# @keepproxy@ -- Corresponds to default domain @keep.${DOMAIN}@
+# @prometheus@ -- Service available by default on multi-node installations
+# @webshell@
+# @websocket@ -- Corresponds to default domain @ws.${DOMAIN}@
# @workbench@
# @workbench2@
-# @webshell@
For example, for the @keepproxy@ service the script will expect to find this certificate:
{% include 'multi_host_install_custom_certificates' %}
All certificate files will be used by nginx. You may need to include intermediate certificates in your certificate files. See "the nginx documentation":http://nginx.org/en/docs/http/configuring_https_servers.html#chains for more details.
+
+h4(#secure-tls-keys). Securing your TLS certificate keys (AWS specific) (optional)
+
+When using @SSL_MODE=bring-your-own@, you can keep your TLS certificate keys encrypted on the server nodes. This reduces the risk of certificate leaks from node disk volumes snapshots or backups.
+
+This feature is currently implemented in AWS by providing the certificate keys’ password via Amazon’s "Secrets Manager":https://aws.amazon.com/es/secrets-manager/ service, and installing appropriate services on the nodes that provide this password to nginx via a file that only lives in system RAM.
+
+If you use the installer's Terraform code, the secret and related permission cloud resources are created automatically, and you can customize the secret's name by editing @terraform/services/terraform.tfvars@ and setting its suffix in @ssl_password_secret_name_suffix@.
+
+In @local.params@ you need to set @SSL_KEY_ENCRYPTED@ to @yes@ and change the default values for @SSL_KEY_AWS_SECRET_NAME@ and @SSL_KEY_AWS_REGION@ if necessary.
+
+Then, if your certificate key file is not yet encrypted, you can generated an encrypted version of it by running the @openssl@ command as follows:
+
+<notextile>
+<pre><code>openssl rsa -aes256 -in your.key -out your.encrypted.key
+</code></pre>
+</notextile>
+(this will ask you to type the encryption password)
+
+This encrypted key file will be the one needed to be copied to the @${CUSTOM_CERTS_DIR}@ directory, instead of the plain key file.
+
+In order to allow the appropriate nodes decrypt the key file, you should set the password on Amazon Secrets Manager. There're a couple way this can be done:
+
+# Through AWS web interface may be the easiest, just make sure to set it as "plain text" instead of JSON.
+# By using the AWS CLI tools, for example:
+<notextile>
+<pre><code>aws secretsmanager put-secret-value --secret-id pkey-pwd --secret-string "p455w0rd" --region us-east-1
+</code></pre>
+</notextile>Where @pkey-pwd@ should match with what's set in @SSL_KEY_AWS_SECRET_NAME@ and @us-east-1@ with what's set in @SSL_KEY_AWS_REGION@.
+
+Take into account that the AWS secret should be set before running @installer.sh deploy@ to avoid any failures when trying to start the @nginx@ servers.
+
+If you ever need to change the encryption password on a running cluster, you should first change the secret's value on AWS, and only then copy the newly encrypted key file to @${CUSTOM_CERTS_DIR}@ and re-run the deploy command.
\ No newline at end of file
ERROR 60: checking internal/external client detection (11 ms): expecting internal=true external=false, but found internal=false external=true
</pre></notextile>
+h2(#container-options). Container-running options
+
+By default, the @diagnostics@ command builds a custom Docker image containing a copy of its own binary, and uses that image to run diagnostic checks from inside an Arvados container. This can help detect problems like lack of network connectivity between containers and Arvados cluster services.
+
+The default approach works well if the client host (i.e., the host where you invoke @arvados-client diagnostics@) meets certain conditions:
+* Docker is installed and working (so the diagnostics command can run @docker build@ and @docker save@).
+* Its hardware and kernel are similar to the cluster's compute instances (so the @arvados-client@ binary and the custom-built Docker image are compatible with the compute instances).
+* Network bandwidth supports uploading the Docker image (about 100 megabytes) in less than a minute.
+
+The following options provide flexibility in case the default approach is not suitable.
+* @-priority=0@ skips the container-running part of the diagnostics suite.
+* @-docker-image="hello-world"@ uses a tiny "hello world" image that is already embedded in the @arvados-client@ binary. This works even if the client host does not have any docker tools installed, and it minimizes the data transferred during the diagnostics suite. It provides less test coverage than the default option, but it will at least check that it is possible to run a container on the cluster.
+* @-docker-image=X@ (where @X@ is a Docker image name or a portable data hash) uses a Docker image that has already been uploaded to your Arvados cluster using @arv keep docker@. In this case the diagnostics tool will run a container with the command @echo {timestamp}@.
+* @-docker-image-from=NAME@ builds a custom Docker image on the fly as described above, but using the specified image as a base instead of the default @debian:slim-stable@ image. Note that the build recipe runs commands like @apt-get install [...] libfuse2 ca-certificates@ so only Debian-based base images are supported. For more flexibility, use one of the above @-docker-image=...@ options.
+* @-timeout=2m@ extends the time limit for each HTTP request made by the diagnostics suite, including the process of uploading a custom-built Docker image, to 2 minutes (the default HTTP request timeout is 10 seconds, and the default upload time limit is either the HTTP timeout or 1 minute, whichever is longer).
+
h2. Example output
<notextile><pre>
Similarly, you can use a scope of @["PATCH", "/arvados/v1/collections/zzzzz-4zz18-0123456789abcde"]@ to restrict updates to a single collection.
+There is one special exception to the scope rules: a valid token is always allowed to issue a request to "@GET /arvados/v1/api_client_authorizations/current@":{{ site.baseurl }}/api/methods/api_client_authorizations.html#current regardless of its scopes. This allows clients to reliably determine whether a request failed because a token is invalid, or because the token is not permitted to perform a particular request. The API server itself needs to be able to do this to validate tokens issued by other clusters in a federation.
+
h2. Creating a scoped token
A scoped token can be created at the command line:
</notextile>
-h2(#main). development main (as of 2023-08-15)
+h2(#main). development main (as of 2023-09-03)
"previous: Upgrading to 2.6.3":#v2_6_3
The defaults for all of these options match the previous behavior of @arvados-login-sync@ _except_ for @SyncIgnoredGroups@. This list names groups that @arvados-login-sync@ will never modify by adding or removing members. As a security precaution, the default list names security-sensitive system groups on Debian- and Red Hat-based distributions. If you are using Arvados to manage system group membership on shell nodes, especially @sudo@ or @wheel@, you may want to provide your own list. Set @SyncIgnoredGroups: []@ to restore the original behavior of ignoring no groups.
+h3. API clients can always retrieve their current token, regardless of scopes
+
+We have introduced a small exception to the previous behavior of "Arvados API token scopes":{{ site.baseurl }}/admin/scoped-tokens.html in this release. A valid token is now always allowed to issue a request to "@GET /arvados/v1/api_client_authorizations/current@":{{ site.baseurl }}/api/methods/api_client_authorizations.html#current regardless of its scopes. This allows clients to reliably determine whether a request failed because a token is invalid, or because the token is not permitted to perform a particular request. The API server itself needs to be able to do this to validate tokens issued by other clusters in a federation.
+
h3. UseAWSS3v2Driver option removed
The old "v1" S3 driver for keepstore has been removed. The new "v2" implementation, which has been the default since Arvados 2.5.0, is always used. The @Volumes.*.DriverParameters.UseAWSS3v2Driver@ configuration key is no longer recognized. If your config file uses it, remove it to avoid warning messages at startup.
+h3. Deprecated/legacy APIs slated for removal
+
+The legacy APIs "humans":../api/methods/humans.html, "specimens":../api/methods/specimens.html, "traits":../api/methods/traits.html, "jobs":../api/methods/jobs.html, "job_tasks":../api/methods/job_tasks.html, "pipeline_instances":../api/methods/pipeline_instances.html, "pipeline_templates":../api/methods/pipeline_templates.html, "nodes":../api/methods/nodes.html, "repositories":../api/methods/repositories.html, and "keep_disks":../api/methods/keep_disks.html are deprecated and will be removed in a future major version of Arvados.
+
+h3. Workbench 1 deprecated
+
+The original Arvados Workbench application (referred to as "Workbench 1") is deprecated and will be removed in a future major version of Arvados. Users are advised to migrate to "Workbench 2". Starting with this release, new installations of Arvados will only set up Workbench 2 and no longer include Workbench 1 by default.
+
h2(#v2_6_3). v2.6.3 (2023-06-06)
h3. Python SDK automatically retries failed requests much more
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Legacy. The job APIs are read-only and disabled by default in new installations. Use "container requests":methods/container_requests.html .*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "container requests.":methods/container_requests.html
+{% include 'notebox_end' %}
h2. Crunch scripts
|_. Attribute|_. Type|_. Description|_. Example|
|uuid|string|An identifier used to refer to the token without exposing the actual token.||
|api_token|string|The actual token string that is expected in the Authorization header.||
-|api_client_id|integer|-||
-|user_id|integer|-||
|created_by_ip_address|string|-||
|last_used_by_ip_address|string|The network address of the most recent client using this token.||
|last_used_at|datetime|Timestamp of the most recent request using this token.||
|api_client_id|integer||query||
|scopes|array||query||
+h3(#current). current
+
+Return the full record associated with the provided API token. This endpoint is often used to check the validity of a given token.
+
+Arguments:
+
+table(table table-bordered table-condensed).
+|_. Argument |_. Type |_. Description |_. Location |_. Example |
+
h3. delete
Delete an existing ApiClientAuthorization.
h2. Resource
-Collections describe sets of files in terms of data blocks stored in Keep. See "Keep - Content-Addressable Storage":{{site.baseurl}}/architecture/storage.html for details.
+Collections describe sets of files in terms of data blocks stored in Keep. See "Keep - Content-Addressable Storage":{{site.baseurl}}/architecture/storage.html and "using collection versioning":../../user/topics/collection-versioning.html for details.
Each collection has, in addition to the "Common resource fields":{{site.baseurl}}/api/resources.html:
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Deprecated, likely to be removed in a future version. The recommended way to store metadata is "collection properties":collections.html*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and is slated to be removed entirely in a future major release of Arvados. The recommended way to store metadata is with "'properties' field on collections and projects.":../properties.html
+{% include 'notebox_end' %}
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/humans@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Legacy. This endpoint is read-only and disabled by default in new installations.*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "container requests.":container_requests.html
+{% include 'notebox_end' %}
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/job_tasks@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Legacy. This endpoint is read-only and disabled by default in new installations.*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "container requests.":container_requests.html
+{% include 'notebox_end' %}
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/jobs@
layout: default
navsection: api
navmenu: API Methods
-title: "keep_disks (deprecated)"
+title: "keep_disks"
...
{% comment %}
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "keep services.":keep_services.html
+{% include 'notebox_end' %}
+
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/keep_disks@
Object type: @penuu@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "cloud dispatcher API.":../dispatch.html
+{% include 'notebox_end' %}
+
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/nodes@
Object type: @7ekkf@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Legacy. This endpoint is read-only and disabled by default in new installations.*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "container requests.":container_requests.html
+{% include 'notebox_end' %}
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/pipeline_instances@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Legacy. This endpoint is read-only and disabled by default in new installations.*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "registered workflows.":workflows.html
+{% include 'notebox_end' %}
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/pipeline_templates@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and slated to be removed entirely in a future major release of Arvados. It is replaced by "collection versioning.":collections.html
+{% include 'notebox_end' %}
+
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/repositories@
Object type: @s0uqq@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Deprecated, likely to be removed in a future version. The recommended way to store metadata is "collection properties":collections.html*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and is slated to be removed entirely in a future major release of Arvados. The recommended way to store metadata is with "'properties' field on collections and projects.":../properties.html
+{% include 'notebox_end' %}
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/specimens@
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-p=. *Deprecated, likely to be removed in a future version. The recommended way to store metadata is "collection properties":collections.html*
+{% include 'notebox_begin_warning' %}
+This is a legacy API. This endpoint is deprecated, disabled by default in new installations, and is slated to be removed entirely in a future major release of Arvados. The recommended way to store metadata is with "'properties' field on collections and projects.":../properties.html
+{% include 'notebox_end' %}
API endpoint base: @https://{{ site.arvados_api_host }}/arvados/v1/traits@
As a special case, a scope of @["all"]@ allows all resources. This is the default if no scope is given.
+A valid token is always allowed to issue a request to "@GET /arvados/v1/api_client_authorizations/current@":{{ site.baseurl }}/api/methods/api_client_authorizations.html#current regardless of its scopes.
+
Using scopes is also described on the "Securing API access with scoped tokens":{{site.baseurl}}/admin/scoped-tokens.html page of the admin documentation.
h3. Scope examples
A scope of @GET /arvados/v1/collections@ permits listing collections.
* Requests with different methods, such as creating a new collection using @POST /arvados/v1/collections@, will be rejected.
-* Requests to access other resources, such as @GET /arvados/v1/groups@, will be rejected.
+* Requests to access other resources, such as @GET /arvados/v1/groups@, will be rejected (except "@GET /arvados/v1/api_client_authorizations/current@":{{ site.baseurl }}/api/methods/api_client_authorizations.html#current, which is always allowed).
* Be aware that requests for specific records, such as @GET /arvados/v1/collections/962eh-4zz18-xi32mpz2621o8km@ will also be rejected. This is because the scope @GET /arvados/v1/collections@ does not end in @/@
A scope of @GET /arvados/v1/collections/@ (with @/@ suffix) will permit access to individual collections.
# "Choose the SSL configuration":#certificates
## "Using a Let's Encrypt certificates":#lets-encrypt
## "Bring your own certificates":#bring-your-own
+### "Securing your TLS certificate keys":#secure-tls-keys
# "Create a compute image":#create_a_compute_image
# "Begin installation":#installation
# "Further customization of the installation":#further_customization
# "Initial user and login":#initial_user
# "Monitoring and Metrics":#monitoring
# "Load balancing controllers":#load_balancing
-## "Rolling upgrades procedure":#rolling-upgrades
# "After the installation":#post_install
h2(#introduction). Introduction
h3. Parameters from @local.params@:
-# Set @CLUSTER@ to the 5-character cluster identifier (e.g "xarv1")
-# Set @DOMAIN@ to the base DNS domain of the environment, e.g. "xarv1.example.com"
+# Set @CLUSTER@ to the 5-character cluster identifier. (e.g. "xarv1")
+# Set @DOMAIN@ to the base DNS domain of the environment. (e.g. "xarv1.example.com")
# Set the @*_INT_IP@ variables with the internal (private) IP addresses of each host. Since services share hosts, some hosts are the same. See "note about /etc/hosts":#etchosts
# Edit @CLUSTER_INT_CIDR@, this should be the CIDR of the private network that Arvados is running on, e.g. the VPC. If you used terraform, this is emitted as @cluster_int_cidr@.
_CIDR stands for "Classless Inter-Domain Routing" and describes which portion of the IP address that refers to the network. For example 192.168.3.0/24 means that the first 24 bits are the network (192.168.3) and the last 8 bits are a specific host on that network._
h3. Object storage in S3 (AWS Specific)
-Open @local_config_dir/pillars/arvados.sls@ and edit as follows:
+If you "followed the recommendend naming scheme":#keep-bucket for both the bucket and role (or used the provided Terraform script), you're done.
-# In the @arvados.cluster.Volumes.DriverParameters@ section, set @Region@ to the appropriate AWS region (e.g. 'us-east-1')
+If you did not follow the recommendend naming scheme for either the bucket or role, you'll need to update these parameters in @local.params@:
-If "followed the recommendend naming scheme":#keep-bucket for both the bucket and role (or used the provided Terraform script), you're done.
+# Set @KEEP_AWS_S3_BUCKET@ to the value of "keepstore bucket you created earlier":#keep-bucket
+# Set @KEEP_AWS_IAM_ROLE@ to "keepstore role you created earlier":#keep-bucket
-If you did not follow the recommendend naming scheme for either the bucket or role, you'll need to update these parameters as well:
-
-# Set @Bucket@ to the value of "keepstore bucket you created earlier":#keep-bucket
-# Set @IAMRole@ to "keepstore role you created earlier":#keep-bucket
+You can also configure a specific AWS Region for the S3 bucket by setting @KEEP_AWS_REGION@.
{% include 'ssl_config_multi' %}
...
)
</code></pre>
-# In @local.params@, set @DATABASE_INT_IP@ to the database endpoint (can be a hostname, does not have to be an IP address).
-<pre><code>DATABASE_INT_IP=...
+# In @local.params@, set @DATABASE_INT_IP@ to empty string and @DATABASE_EXTERNAL_SERVICE_HOST_OR_IP@ to the database endpoint (can be a hostname, does not have to be an IP address).
+<pre><code>DATABASE_INT_IP=""
+...
+DATABASE_EXTERNAL_SERVICE_HOST_OR_IP="arvados.xxxxxxx.eu-east-1.rds.amazonaws.com"
</code></pre>
-# In @local.params@, set @DATABASE_PASSWORD@ to the correct value. "See the previous section describing correct quoting":#localparams
-# In @local_config_dir/pillars/arvados.sls@ you may need to adjust the database name and user. This can be found in the section @arvados.cluster.database@.
+# In @local.params.secrets@, set @DATABASE_PASSWORD@ to the correct value. "See the previous section describing correct quoting":#localparams
+# In @local.params@ you may need to adjust the database name and user.
h2(#further_customization). Further customization of the installation (optional)
)
</code></pre>
-Note that we also set the @database@ role to its own node.
-
-h3(#rolling-upgrades). Rolling upgrades procedure
-
-Once you have more than one controller backend node, it's easy to take one at a time from the backend pool to upgrade it to a newer version of Arvados (which might involve applying database migrations) by adding its name to the @DISABLED_CONTROLLER@ variable in @local.params@. For example:
-
-<pre><code>...
-DISABLED_CONTROLLER="controller1"
-...</code></pre>
-
-Then, apply the configuration change to just the load-balancer:
-
-<pre><code class="userinput">./installer.sh deploy controller.xarv1.example.com</code></pre>
-
-This will allow you to do the necessary changes to the @controller1@ node without service disruption, as it will not be receiving any traffic until you remove it from the @DISABLED_CONTROLLER@ variable.
-
-Next step is applying the @deploy@ command to @controller1@:
-
-<pre><code class="userinput">./installer.sh deploy controller1.xarv1.example.com</code></pre>
-
-After that, disable the other controller node by editing @local.params@:
-
-<pre><code>...
-DISABLED_CONTROLLER="controller2"
-...</code></pre>
-
-...applying the changes on the balancer node:
-
-<pre><code class="userinput">./installer.sh deploy controller.xarv1.example.com</code></pre>
-
-Then, deploy the changes to the recently disabled @controller2@ node:
-
-<pre><code class="userinput">./installer.sh deploy controller2.xarv1.example.com</code></pre>
-
-This won't cause a service interruption because the load balancer is already routing all traffic to the othe @controller1@ node.
-
-And the last step is enabling both controller nodes by making the following change to @local.params@:
-
-<pre><code>...
-DISABLED_CONTROLLER=""
-...</code></pre>
-
-...and running:
-
-<pre><code class="userinput">./installer.sh deploy controller.xarv1.example.com</code></pre>
-
-This should get all your @controller@ nodes correctly upgraded, and you can continue executing the @deploy@ command with the rest of the nodes individually, or just run:
-
-<pre><code class="userinput">./installer.sh deploy</code></pre>
+Note that we also set the @database@ role to its own node instead of just leaving it in a shared controller node.
-Only the nodes with pending changes might require certain services to be restarted. In this example, the @workbench@ node will have the remaining Arvados services upgraded and restarted. However, these services are not as critical as the ones on the @controller@ nodes.
+Each time you run @installer.sh deploy@, the system will automatically do rolling upgrades. This means it will make changes to one controller node at a time, after removing it from the balancer so that there's no downtime.
h2(#post_install). After the installation
if insecure {
client = h.insecureClient
}
+ // Clearing the Host field here causes the Go http client to
+ // use the host part of urlOut as the Host header in the
+ // outgoing request, instead of the Host value from the
+ // original request we received.
+ req.Host = ""
return h.proxy.Do(req, urlOut, client)
}
ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(time.Minute))
defer cancel()
- // 0.0.0.0:0 is just a placeholder here -- do(), which is
+ // "http://localhost" is just a placeholder here -- we'll fill
+ // in req.URL.Path below, and then do(), which is
// localClusterRequest(), will replace the scheme and host
// parts with the real proxy destination.
- req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://0.0.0.0:0/"+path, nil)
+ req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://localhost", nil)
if err != nil {
return nil, nil, err
}
+ req.URL.Path = path
resp, err := do(req)
if err != nil {
return nil, nil, err
"archive/tar"
"bytes"
"context"
+ "crypto/sha256"
_ "embed"
"flag"
"fmt"
"net/http"
"net/url"
"os"
+ "os/exec"
"strings"
"time"
func (Command) RunCommand(prog string, args []string, stdin io.Reader, stdout, stderr io.Writer) int {
var diag diagnoser
f := flag.NewFlagSet(prog, flag.ContinueOnError)
- f.StringVar(&diag.projectName, "project-name", "scratch area for diagnostics", "name of project to find/create in home project and use for temporary/test objects")
- f.StringVar(&diag.logLevel, "log-level", "info", "logging level (debug, info, warning, error)")
- f.StringVar(&diag.dockerImage, "docker-image", "", "image to use when running a test container (default: use embedded hello-world image)")
+ f.StringVar(&diag.projectName, "project-name", "scratch area for diagnostics", "`name` of project to find/create in home project and use for temporary/test objects")
+ f.StringVar(&diag.logLevel, "log-level", "info", "logging `level` (debug, info, warning, error)")
+ f.StringVar(&diag.dockerImage, "docker-image", "", "`image` (tag or portable data hash) to use when running a test container, or \"hello-world\" to use embedded hello-world image (default: build a custom image containing this executable, and run diagnostics inside the container too)")
+ f.StringVar(&diag.dockerImageFrom, "docker-image-from", "debian:stable-slim", "`base` image to use when building a custom image (see https://doc.arvados.org/main/admin/diagnostics.html#container-options)")
f.BoolVar(&diag.checkInternal, "internal-client", false, "check that this host is considered an \"internal\" client")
f.BoolVar(&diag.checkExternal, "external-client", false, "check that this host is considered an \"external\" client")
f.BoolVar(&diag.verbose, "v", false, "verbose: include more information in report")
if ok, code := cmd.ParseFlags(f, prog, args, "", stderr); !ok {
return code
}
+ diag.stdout = stdout
+ diag.stderr = stderr
diag.logger = ctxlog.New(stdout, "text", diag.logLevel)
diag.logger.SetFormatter(&logrus.TextFormatter{DisableTimestamp: true, DisableLevelTruncation: true, PadLevelText: true})
diag.runtests()
var HelloWorldDockerImage []byte
type diagnoser struct {
- stdout io.Writer
- stderr io.Writer
- logLevel string
- priority int
- projectName string
- dockerImage string
- checkInternal bool
- checkExternal bool
- verbose bool
- timeout time.Duration
- logger *logrus.Logger
- errors []string
- done map[int]bool
+ stdout io.Writer
+ stderr io.Writer
+ logLevel string
+ priority int
+ projectName string
+ dockerImage string
+ dockerImageFrom string
+ checkInternal bool
+ checkExternal bool
+ verbose bool
+ timeout time.Duration
+ logger *logrus.Logger
+ errors []string
+ done map[int]bool
}
func (diag *diagnoser) debugf(f string, args ...interface{}) {
}()
}
- // Read hello-world.tar to find image ID, so we can upload it
- // as "sha256:{...}.tar"
+ tempdir, err := ioutil.TempDir("", "arvados-diagnostics")
+ if err != nil {
+ diag.errorf("error creating temp dir: %s", err)
+ return
+ }
+ defer os.RemoveAll(tempdir)
+
+ var dockerImageData []byte
+ if diag.dockerImage != "" || diag.priority < 1 {
+ // We won't be using the self-built docker image, so
+ // don't build it. But we will write the embedded
+ // "hello-world" image to our test collection to test
+ // upload/download, whether or not we're using it as a
+ // docker image.
+ dockerImageData = HelloWorldDockerImage
+ } else if selfbin, err := os.Readlink("/proc/self/exe"); err != nil {
+ diag.errorf("readlink /proc/self/exe: %s", err)
+ return
+ } else if selfbindata, err := os.ReadFile(selfbin); err != nil {
+ diag.errorf("error reading %s: %s", selfbin, err)
+ return
+ } else {
+ selfbinSha := fmt.Sprintf("%x", sha256.Sum256(selfbindata))
+ tag := "arvados-client-diagnostics:" + selfbinSha[:9]
+ err := os.WriteFile(tempdir+"/arvados-client", selfbindata, 0777)
+ if err != nil {
+ diag.errorf("error writing %s: %s", tempdir+"/arvados-client", err)
+ return
+ }
+
+ dockerfile := "FROM " + diag.dockerImageFrom + "\n"
+ dockerfile += "RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --yes --no-install-recommends libfuse2 ca-certificates && apt-get clean\n"
+ dockerfile += "COPY /arvados-client /arvados-client\n"
+ cmd := exec.Command("docker", "build", "--tag", tag, "-f", "-", tempdir)
+ cmd.Stdin = strings.NewReader(dockerfile)
+ cmd.Stdout = diag.stderr
+ cmd.Stderr = diag.stderr
+ err = cmd.Run()
+ if err != nil {
+ diag.errorf("error building docker image: %s", err)
+ return
+ }
+ checkversion, err := exec.Command("docker", "run", tag, "/arvados-client", "version").CombinedOutput()
+ if err != nil {
+ diag.errorf("docker image does not seem to work: %s", err)
+ return
+ }
+ diag.infof("arvados-client version: %s", checkversion)
+
+ buf, err := exec.Command("docker", "save", tag).Output()
+ if err != nil {
+ diag.errorf("docker save %s: %s", tag, err)
+ return
+ }
+ diag.infof("docker image size is %d", len(buf))
+ dockerImageData = buf
+ }
+
+ // Read image tarball to find image ID, so we can upload it as
+ // "sha256:{...}.tar"
var imageSHA2 string
{
- tr := tar.NewReader(bytes.NewReader(HelloWorldDockerImage))
+ tr := tar.NewReader(bytes.NewReader(dockerImageData))
for {
hdr, err := tr.Next()
if err == io.EOF {
break
}
if err != nil {
- diag.errorf("internal error/bug: cannot read embedded docker image tar file: %s", err)
+ diag.errorf("internal error/bug: cannot read docker image tar file: %s", err)
return
}
if s := strings.TrimSuffix(hdr.Name, ".json"); len(s) == 64 && s != hdr.Name {
}
}
if imageSHA2 == "" {
- diag.errorf("internal error/bug: cannot find {sha256}.json file in embedded docker image tar file")
+ diag.errorf("internal error/bug: cannot find {sha256}.json file in docker image tar file")
return
}
}
tarfilename := "sha256:" + imageSHA2 + ".tar"
diag.dotest(100, "uploading file via webdav", func() error {
- ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(diag.timeout))
+ timeout := diag.timeout
+ if len(dockerImageData) > 10<<20 && timeout < time.Minute {
+ // Extend the normal http timeout if we're
+ // uploading a substantial docker image.
+ timeout = time.Minute
+ }
+ ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(timeout))
defer cancel()
if collection.UUID == "" {
return fmt.Errorf("skipping, no test collection")
}
- req, err := http.NewRequestWithContext(ctx, "PUT", cluster.Services.WebDAVDownload.ExternalURL.String()+"c="+collection.UUID+"/"+tarfilename, bytes.NewReader(HelloWorldDockerImage))
+ t0 := time.Now()
+ req, err := http.NewRequestWithContext(ctx, "PUT", cluster.Services.WebDAVDownload.ExternalURL.String()+"c="+collection.UUID+"/"+tarfilename, bytes.NewReader(dockerImageData))
if err != nil {
return fmt.Errorf("BUG? http.NewRequest: %s", err)
}
if resp.StatusCode != http.StatusCreated {
return fmt.Errorf("status %s", resp.Status)
}
- diag.debugf("ok, status %s", resp.Status)
+ diag.verbosef("upload ok, status %s, %f MB/s", resp.Status, float64(len(dockerImageData))/time.Since(t0).Seconds()/1000000)
err = client.RequestAndDecodeContext(ctx, &collection, "GET", "arvados/v1/collections/"+collection.UUID, nil, nil)
if err != nil {
return fmt.Errorf("get updated collection: %s", err)
}
- diag.debugf("ok, pdh %s", collection.PortableDataHash)
+ diag.verbosef("upload pdh %s", collection.PortableDataHash)
return nil
})
if resp.StatusCode != trial.status {
return fmt.Errorf("unexpected response status: %s", resp.Status)
}
- if trial.status == http.StatusOK && !bytes.Equal(body, HelloWorldDockerImage) {
+ if trial.status == http.StatusOK && !bytes.Equal(body, dockerImageData) {
excerpt := body
if len(excerpt) > 128 {
excerpt = append([]byte(nil), body[:128]...)
}
timestamp := time.Now().Format(time.RFC3339)
- ctrCommand := []string{"echo", timestamp}
- if diag.dockerImage == "" {
+
+ var ctrCommand []string
+ switch diag.dockerImage {
+ case "":
+ if collection.UUID == "" {
+ return fmt.Errorf("skipping, no test collection to use as docker image")
+ }
+ diag.dockerImage = collection.PortableDataHash
+ ctrCommand = []string{"/arvados-client", "diagnostics",
+ "-priority=0", // don't run a container
+ "-log-level=" + diag.logLevel,
+ "-internal-client=true"}
+ case "hello-world":
if collection.UUID == "" {
return fmt.Errorf("skipping, no test collection to use as docker image")
}
diag.dockerImage = collection.PortableDataHash
ctrCommand = []string{"/hello"}
+ default:
+ ctrCommand = []string{"echo", timestamp}
}
var cr arvados.ContainerRequest
},
},
"runtime_constraints": arvados.RuntimeConstraints{
+ API: true,
VCPUs: 1,
- RAM: 1 << 26,
- KeepCacheRAM: 1 << 26,
+ RAM: 128 << 20,
+ KeepCacheRAM: 64 << 20,
},
}})
if err != nil {
return err
}
- diag.verbosef("container request uuid = %s", cr.UUID)
+ diag.infof("container request uuid = %s", cr.UUID)
diag.verbosef("container uuid = %s", cr.ContainerUUID)
timeout := 10 * time.Minute
stubvm.ExecuteContainer = executeContainer
stubvm.CrashRunningContainer = finishContainer
stubvm.ExtraCrunchRunArgs = "'--runtime-engine=stub' '--foo' '--extra='\\''args'\\'''"
- switch n % 7 {
- case 0:
+ switch {
+ case n%7 == 0:
+ // some instances start out OK but then stop
+ // running any commands
stubvm.Broken = time.Now().Add(time.Duration(rand.Int63n(90)) * time.Millisecond)
- case 1:
+ case n%7 == 1:
+ // some instances never pass a run-probe
stubvm.CrunchRunMissing = true
- case 2:
+ case n%7 == 2:
+ // some instances start out OK but then start
+ // reporting themselves as broken
stubvm.ReportBroken = time.Now().Add(time.Duration(rand.Int63n(200)) * time.Millisecond)
+ case n == 3:
+ // 1 instance is completely broken, ensuring
+ // the boot_outcomes{outcome="failure"} metric
+ // is not zero
+ stubvm.CrunchRunCrashRate = 1
default:
stubvm.CrunchRunCrashRate = 0.1
stubvm.ArvMountDeadlockRate = 0.1
'future',
'google-api-core <2.11.0', # 2.11.0rc1 is incompatible with google-auth<2
'google-api-python-client >=2.1.0',
- 'google-auth<2',
+ 'google-auth <2',
'httplib2 >=0.9.2, <0.20.2',
'pycurl >=7.19.5.1, <7.45.0',
'ruamel.yaml >=0.15.54, <0.17.22',
- 'setuptools>=40.3.0',
- 'typing_extensions>=3.7.4; python_version<"3.8"',
+ 'setuptools >=40.3.0',
+ # As of 4.8.0rc1, typing_extensions does not parse in Python 3.7
+ 'typing_extensions >=3.7.4, <4.8; python_version<"3.8"',
'ws4py >=0.4.2',
- 'protobuf<4.0.0dev',
- 'pyparsing<3',
- 'setuptools>=40.3.0',
- "dataclasses ;python_version<'3.7'",
+ 'protobuf <4.0.0dev',
+ 'pyparsing <3',
+ 'setuptools >=40.3.0',
+ 'dataclasses; python_version<"3.7"',
],
classifiers=[
'Programming Language :: Python :: 3',
# reader_tokens.
accepted = false
auth = nil
+ remote_errcodes = []
+ remote_errmsgs = []
[params["api_token"],
params["oauth_token"],
env["HTTP_AUTHORIZATION"].andand.match(/(OAuth2|Bearer) ([!-~]+)/).andand[2],
*reader_tokens,
].each do |supplied|
next if !supplied
- try_auth = ApiClientAuthorization.
- validate(token: supplied, remote: remote)
- if try_auth.andand.user
- auth = try_auth
- accepted = supplied
- break
+ begin
+ try_auth = ApiClientAuthorization.validate(token: supplied, remote: remote)
+ rescue => e
+ begin
+ remote_errcodes.append(e.http_status)
+ rescue NoMethodError
+ # The exception is an internal validation problem, not a remote error.
+ next
+ end
+ begin
+ errors = SafeJSON.load(e.res.content)["errors"]
+ rescue
+ errors = nil
+ end
+ remote_errmsgs += errors if errors.is_a?(Array)
+ else
+ if try_auth.andand.user
+ auth = try_auth
+ accepted = supplied
+ break
+ end
end
end
Thread.current[:token] = accepted
Thread.current[:user] = auth.andand.user
- @app.call env if @app
+ if auth.nil? and not remote_errcodes.empty?
+ # If we failed to validate any tokens because of remote validation
+ # errors, pass those on to the client. This code is functionally very
+ # similar to ApplicationController#render_error, but the implementation
+ # is very different because we're a Rack middleware, not in
+ # ActionDispatch land yet.
+ remote_errmsgs.prepend("failed to validate remote token")
+ error_content = {
+ error_token: "%d+%08x" % [Time.now.utc.to_i, rand(16 ** 8)],
+ errors: remote_errmsgs,
+ }
+ [
+ remote_errcodes.max,
+ {"Content-Type": "application/json"},
+ SafeJSON.dump(error_content).html_safe,
+ ]
+ else
+ @app.call env if @app
+ end
end
end
include HasUuid
include KindAndEtag
include CommonApiTemplate
+ include Rails.application.routes.url_helpers
extend CurrentApiClient
extend DbCurrentTime
def scopes_allow_request?(request)
method = request.request_method
- if method == 'HEAD'
+ if method == 'GET' and request.path == url_for(controller: 'arvados/v1/api_client_authorizations', action: 'current', only_path: true)
+ true
+ elsif method == 'HEAD'
(scopes_allow?(['HEAD', request.path].join(' ')) ||
scopes_allow?(['GET', request.path].join(' ')))
else
Rails.logger.warn "remote authentication rejected: no host for #{upstream_cluster_id.inspect}"
return nil
end
+ remote_url = URI::parse("https://#{host}/")
+ remote_query = {"remote" => Rails.configuration.ClusterID}
+ remote_headers = {"Authorization" => "Bearer #{token}"}
- begin
- remote_user = SafeJSON.load(
- clnt.get_content('https://' + host + '/arvados/v1/users/current',
- {'remote' => Rails.configuration.ClusterID},
- {'Authorization' => 'Bearer ' + token}))
- rescue => e
- Rails.logger.warn "remote authentication with token #{token.inspect} failed: #{e}"
- return nil
- end
-
- # Check the response is well formed.
- if !remote_user.is_a?(Hash) || !remote_user['uuid'].is_a?(String)
- Rails.logger.warn "remote authentication rejected: remote_user=#{remote_user.inspect}"
- return nil
- end
-
- remote_user_prefix = remote_user['uuid'][0..4]
-
- # Get token scope, and make sure we use the same UUID as the
- # remote when caching the token.
+ # First get the current token. This query is not limited by token scopes,
+ # and tells us the user's UUID via owner_uuid, so this gives us enough
+ # information to load a local user record from the database if one exists.
remote_token = nil
begin
remote_token = SafeJSON.load(
- clnt.get_content('https://' + host + '/arvados/v1/api_client_authorizations/current',
- {'remote' => Rails.configuration.ClusterID},
- {'Authorization' => 'Bearer ' + token}))
+ clnt.get_content(
+ remote_url.merge("arvados/v1/api_client_authorizations/current"),
+ remote_query, remote_headers,
+ ))
Rails.logger.debug "retrieved remote token #{remote_token.inspect}"
token_uuid = remote_token['uuid']
if !token_uuid.match(HasUuid::UUID_REGEX) || token_uuid[0..4] != upstream_cluster_id
raise "remote cluster #{upstream_cluster_id} returned invalid token uuid #{token_uuid.inspect}"
end
rescue HTTPClient::BadResponseError => e
- if e.res.status != 401
- raise
- end
- rev = SafeJSON.load(clnt.get_content('https://' + host + '/discovery/v1/apis/arvados/v1/rest'))['revision']
- if rev >= '20010101' && rev < '20210503'
- Rails.logger.warn "remote cluster #{upstream_cluster_id} at #{host} with api rev #{rev} does not provide token expiry and scopes; using scopes=['all']"
- else
- # remote server is new enough that it should have accepted
- # this request if the token was valid
- raise
+ # CurrentApiToken#call and ApplicationController#render_error will
+ # propagate the status code from the #http_status method, so define
+ # that here.
+ def e.http_status
+ self.res.status_code
end
+ raise
+ # TODO #20927: Catch network exceptions and assign a 5xx status to them so
+ # the client knows they're a temporary problem.
rescue => e
Rails.logger.warn "error getting remote token details for #{token.inspect}: #{e}"
return nil
end
- # Clusters can only authenticate for their own users.
- if remote_user_prefix != upstream_cluster_id
- Rails.logger.warn "remote authentication rejected: claimed remote user #{remote_user_prefix} but token was issued by #{upstream_cluster_id}"
- return nil
+ # Next, load the token's user record from the database (might be nil).
+ remote_user_prefix, remote_user_suffix = remote_token['owner_uuid'].split('-', 2)
+ if anonymous_user_uuid.end_with?(remote_user_suffix)
+ # Special case: map the remote anonymous user to local anonymous user
+ remote_user_uuid = anonymous_user_uuid
+ else
+ remote_user_uuid = remote_token['owner_uuid']
end
+ user = User.find_by_uuid(remote_user_uuid)
+ # Next, try to load the remote user. If this succeeds, we'll use this
+ # information to update/create the local database record as necessary.
+ # If this fails for any reason, but we successfully loaded a user record
+ # from the database, we'll just rely on that information.
+ remote_user = nil
+ begin
+ remote_user = SafeJSON.load(
+ clnt.get_content(
+ remote_url.merge("arvados/v1/users/current"),
+ remote_query, remote_headers,
+ ))
+ rescue HTTPClient::BadResponseError => e
+ # If user is defined, we will use that alone for auth, see below.
+ if user.nil?
+ # See rationale in the previous BadResponseError rescue.
+ def e.http_status
+ self.res.status_code
+ end
+ raise
+ end
+ # TODO #20927: Catch network exceptions and assign a 5xx status to them so
+ # the client knows they're a temporary problem.
+ rescue => e
+ Rails.logger.warn "getting remote user with token #{token.inspect} failed: #{e}"
+ else
+ # Check the response is well formed.
+ if !remote_user.is_a?(Hash) || !remote_user['uuid'].is_a?(String)
+ Rails.logger.warn "malformed remote user=#{remote_user.inspect}"
+ remote_user = nil
+ # Clusters can only authenticate for their own users.
+ elsif remote_user_prefix != upstream_cluster_id
+ Rails.logger.warn "remote user rejected: claimed remote user #{remote_user_prefix} but token was issued by #{upstream_cluster_id}"
+ remote_user = nil
+ # Force our local copy of a remote root to have a static name
+ elsif system_user_uuid.end_with?(remote_user_suffix)
+ remote_user.update(
+ "first_name" => "root",
+ "last_name" => "from cluster #{remote_user_prefix}",
+ )
+ end
+ end
+
+ if user.nil? and remote_user.nil?
+ Rails.logger.warn "remote token #{token.inspect} rejected: cannot get owner #{remote_user_uuid} from database or remote cluster"
+ return nil
# Invariant: remote_user_prefix == upstream_cluster_id
# therefore: remote_user_prefix != Rails.configuration.ClusterID
-
# Add or update user and token in local database so we can
# validate subsequent requests faster.
-
- if remote_user['uuid'][-22..-1] == '-tpzed-anonymouspublic'
- # Special case: map the remote anonymous user to local anonymous user
- remote_user['uuid'] = anonymous_user_uuid
- end
-
- user = User.find_by_uuid(remote_user['uuid'])
-
- if !user
+ elsif user.nil?
# Create a new record for this user.
user = User.new(uuid: remote_user['uuid'],
is_active: false,
user.set_initial_username(requested: remote_user['username'])
end
- # Sync user record.
+ # Sync user record if we loaded a remote user.
act_as_system_user do
- %w[first_name last_name email prefs].each do |attr|
- user.send(attr+'=', remote_user[attr])
- end
-
- if remote_user['uuid'][-22..-1] == '-tpzed-000000000000000'
- user.first_name = "root"
- user.last_name = "from cluster #{remote_user_prefix}"
- end
-
- begin
- user.save!
- rescue ActiveRecord::RecordInvalid, ActiveRecord::RecordNotUnique
- Rails.logger.debug("remote user #{remote_user['uuid']} already exists, retrying...")
- # Some other request won the race: retry fetching the user record.
- user = User.find_by_uuid(remote_user['uuid'])
- if !user
- Rails.logger.warn("cannot find or create remote user #{remote_user['uuid']}")
- return nil
+ if remote_user
+ %w[first_name last_name email prefs].each do |attr|
+ user.send(attr+'=', remote_user[attr])
end
- end
- if user.is_invited && !remote_user['is_invited']
- # Remote user is not "invited" state, they should be unsetup, which
- # also makes them inactive.
- user.unsetup
- else
- if !user.is_invited && remote_user['is_invited'] and
- (remote_user_prefix == Rails.configuration.Login.LoginCluster or
- Rails.configuration.Users.AutoSetupNewUsers or
- Rails.configuration.Users.NewUsersAreActive or
- Rails.configuration.RemoteClusters[remote_user_prefix].andand["ActivateUsers"])
- user.setup
+ begin
+ user.save!
+ rescue ActiveRecord::RecordInvalid, ActiveRecord::RecordNotUnique
+ Rails.logger.debug("remote user #{remote_user['uuid']} already exists, retrying...")
+ # Some other request won the race: retry fetching the user record.
+ user = User.find_by_uuid(remote_user['uuid'])
+ if !user
+ Rails.logger.warn("cannot find or create remote user #{remote_user['uuid']}")
+ return nil
+ end
end
- if !user.is_active && remote_user['is_active'] && user.is_invited and
- (remote_user_prefix == Rails.configuration.Login.LoginCluster or
- Rails.configuration.Users.NewUsersAreActive or
- Rails.configuration.RemoteClusters[remote_user_prefix].andand["ActivateUsers"])
- user.update_attributes!(is_active: true)
- elsif user.is_active && !remote_user['is_active']
- user.update_attributes!(is_active: false)
- end
+ if user.is_invited && !remote_user['is_invited']
+ # Remote user is not "invited" state, they should be unsetup, which
+ # also makes them inactive.
+ user.unsetup
+ else
+ if !user.is_invited && remote_user['is_invited'] and
+ (remote_user_prefix == Rails.configuration.Login.LoginCluster or
+ Rails.configuration.Users.AutoSetupNewUsers or
+ Rails.configuration.Users.NewUsersAreActive or
+ Rails.configuration.RemoteClusters[remote_user_prefix].andand["ActivateUsers"])
+ user.setup
+ end
- if remote_user_prefix == Rails.configuration.Login.LoginCluster and
- user.is_active and
- user.is_admin != remote_user['is_admin']
- # Remote cluster controls our user database, including the
- # admin flag.
- user.update_attributes!(is_admin: remote_user['is_admin'])
+ if !user.is_active && remote_user['is_active'] && user.is_invited and
+ (remote_user_prefix == Rails.configuration.Login.LoginCluster or
+ Rails.configuration.Users.NewUsersAreActive or
+ Rails.configuration.RemoteClusters[remote_user_prefix].andand["ActivateUsers"])
+ user.update_attributes!(is_active: true)
+ elsif user.is_active && !remote_user['is_active']
+ user.update_attributes!(is_active: false)
+ end
+
+ if remote_user_prefix == Rails.configuration.Login.LoginCluster and
+ user.is_active and
+ user.is_admin != remote_user['is_admin']
+ # Remote cluster controls our user database, including the
+ # admin flag.
+ user.update_attributes!(is_admin: remote_user['is_admin'])
+ end
end
end
assert_not_empty(json_response['uuid'])
end
+ [
+ :active_noscope,
+ :active_all_collections,
+ :active_userlist,
+ :foo_collection_sharing_token,
+ ].each do |auth|
+ test "#{auth} can get current token without the appropriate scope" do
+ authorize_with auth
+ get :current
+ assert_response :success
+ end
+ end
+
test "get current token, no auth" do
get :current
assert_response 401
end
res.status = @stub_token_status
if res.status == 200
- res.body = {
+ body = {
uuid: api_client_authorizations(:active).uuid.sub('zzzzz', clusterid),
+ owner_uuid: "#{clusterid}-tpzed-00000000000000z",
scopes: @stub_token_scopes,
- }.to_json
+ }
+ if @stub_content.is_a?(Hash) and owner_uuid = @stub_content[:uuid]
+ body[:owner_uuid] = owner_uuid
+ end
+ res.body = body.to_json
end
end
Thread.new do
end
end
+ def uncache_token(src)
+ if match = src.match(/\b(?:[a-z0-9]{5}-){2}[a-z0-9]{15}\b/)
+ tokens = ApiClientAuthorization.where(uuid: match[0])
+ else
+ tokens = ApiClientAuthorization.where("uuid like ?", "#{src}-%")
+ end
+ tokens.update_all(expires_at: "1995-05-15T01:02:03Z")
+ end
+
test 'authenticate with remote token that has limited scope' do
get '/arvados/v1/collections',
params: {format: 'json'},
headers: auth(remote: 'zbbbb')
assert_response :success
- # simulate cache expiry
- ApiClientAuthorization.where('uuid like ?', 'zbbbb-%').
- update_all(expires_at: db_current_time - 1.minute)
-
+ uncache_token('zbbbb')
# re-authorize after cache expires
get '/arvados/v1/collections',
params: {format: 'json'},
assert_response 403
end
+ test "authenticate with remote token with limited initial scope" do
+ @stub_token_scopes = ["GET /arvados/v1/users/"]
+ get "/arvados/v1/users/#{@stub_content[:uuid]}",
+ params: {format: "json"},
+ headers: auth(remote: "zbbbb")
+ assert_response :success
+ end
+
test 'authenticate with remote token' do
get '/arvados/v1/users/current',
params: {format: 'json'},
assert_equal 'barney', json_response['username']
# revoke original token
- @stub_status = 401
+ @stub_token_status = 401
# re-authorize before cache expires
get '/arvados/v1/users/current',
headers: auth(remote: 'zbbbb')
assert_response :success
- # simulate cache expiry
- ApiClientAuthorization.where('uuid like ?', 'zbbbb-%').
- update_all(expires_at: db_current_time - 1.minute)
-
+ uncache_token('zbbbb')
# re-authorize after cache expires
get '/arvados/v1/users/current',
params: {format: 'json'},
update_all(user_id: users(:active).id)
# revive original token and re-authorize
- @stub_status = 200
+ @stub_token_status = 200
@stub_content[:username] = 'blarney'
@stub_content[:email] = 'blarney@example.com'
get '/arvados/v1/users/current',
@stub_content[:is_active] = false
@stub_content[:is_invited] = false
- # simulate cache expiry
- ApiClientAuthorization.where(
- uuid: salted_active_token(remote: 'zbbbb').split('/')[1]).
- update_all(expires_at: db_current_time - 1.minute)
-
+ uncache_token('zbbbb')
# re-authorize after cache expires
get '/arvados/v1/users/current',
params: {format: 'json'},
assert_equal 'foo@example.com', json_response['email']
assert_equal 'barney', json_response['username']
- # Delete cached value. User should be inactive now.
- act_as_system_user do
- ApiClientAuthorization.delete_all
- end
-
+ uncache_token('zbbbb')
+ # User should be inactive now.
get '/arvados/v1/users/current',
params: {format: 'json'},
headers: auth(remote: 'zbbbb')
assert_equal 'zzzzz-tpzed-anonymouspublic', json_response['uuid']
end
+ [401, 403, 422, 500, 502, 503].each do |status|
+ test "propagate #{status} response from getting remote token" do
+ @stub_token_status = status
+ get "/arvados/v1/users/#{@stub_content[:uuid]}",
+ params: {format: "json"},
+ headers: auth(remote: "zbbbb")
+ assert_response status
+ end
+
+ test "propagate #{status} response from getting uncached user" do
+ @stub_status = status
+ get "/arvados/v1/users/#{@stub_content[:uuid]}",
+ params: {format: "json"},
+ headers: auth(remote: "zbbbb")
+ assert_response status
+ end
+ test "use cached user after getting #{status} response" do
+ url_path = "/arvados/v1/users/#{@stub_content[:uuid]}"
+ params = {format: "json"}
+ headers = auth(remote: "zbbbb")
+
+ get url_path, params: params, headers: headers
+ assert_response :success
+
+ uncache_token(headers["HTTP_AUTHORIZATION"])
+ expect_email = @stub_content[:email]
+ @stub_content[:email] = "new#{expect_email}"
+ @stub_status = status
+ get url_path, params: params, headers: headers
+ assert_response :success
+ user = User.find_by_uuid(@stub_content[:uuid])
+ assert_not_nil user
+ assert_equal expect_email, user.email
+ end
+ end
end
exit 1
fi
+USE_SSH_JUMPHOST=${USE_SSH_JUMPHOST:-}
+DISABLED_CONTROLLER=""
+
# Comma-separated list of nodes. This is used to dynamically adjust
# salt pillars.
NODELIST=""
{%- set _workers = ("__CONTROLLER_MAX_WORKERS__" or grains['num_cpus']*2)|int %}
{%- set max_workers = [_workers, 8]|max %}
{%- set max_reqs = ("__CONTROLLER_MAX_QUEUED_REQUESTS__" or 128)|int %}
+{%- set database_host = ("__DATABASE_EXTERNAL_SERVICE_HOST_OR_IP__" or "__DATABASE_INT_IP__") %}
+{%- set database_name = "__DATABASE_NAME__" %}
+{%- set database_user = "__DATABASE_USER__" %}
+{%- set database_password = "__DATABASE_PASSWORD__" %}
# The variables commented out are the default values that the formula uses.
# The uncommented values are REQUIRED values. If you don't set them, running
database:
# max concurrent connections per arvados server daemon
# connection_pool_max: 32
- name: __CLUSTER___arvados
- host: __DATABASE_INT_IP__
- password: "__DATABASE_PASSWORD__"
- user: __CLUSTER___arvados
+ name: {{ database_name }}
+ host: {{ database_host }}
+ password: {{ database_password }}
+ user: {{ database_user }}
encoding: en_US.utf8
client_encoding: UTF8
Replication: 2
Driver: S3
DriverParameters:
- Bucket: __CLUSTER__-nyw5e-000000000000000-volume
- IAMRole: __CLUSTER__-keepstore-00-iam-role
+ Bucket: __KEEP_AWS_S3_BUCKET__
+ IAMRole: __KEEP_AWS_IAM_ROLE__
Region: __KEEP_AWS_REGION__
Users:
{%- set controller_nodes = "__CONTROLLER_NODES__".split(',') %}
{%- set enable_balancer = ("__ENABLE_BALANCER__"|to_bool) %}
+{%- set data_retention_time = "__PROMETHEUS_DATA_RETENTION_TIME__" %}
### PROMETHEUS
prometheus:
- alertmanager
- node_exporter
pkg:
- use_upstream_repo: true
+ use_upstream_repo: false
+ use_upstream_archive: true
component:
prometheus:
+ service:
+ args:
+ storage.tsdb.retention.time: {{ data_retention_time }}
config:
global:
scrape_interval: 15s
instance: arvados-dispatch-cloud.__CLUSTER__
cluster: __CLUSTER__
+ {%- if "__DATABASE_INT_IP__" != "" %}
# Database
- job_name: postgresql
static_configs:
labels:
instance: database.__CLUSTER__
cluster: __CLUSTER__
+ {%- endif %}
# Nodes
{%- set node_list = "__NODELIST__".split(',') %}
{%- set tpldir = curr_tpldir %}
#CRUDE, but functional
+
+{%- if "__DATABASE_INT_IP__" != "" %}
extra_extra_hosts_entries_etc_hosts_database_host_present:
host.present:
- ip: __DATABASE_INT_IP__
- names:
- db.{{ arvados.cluster.name }}.{{ arvados.cluster.domain }}
- database.{{ arvados.cluster.name }}.{{ arvados.cluster.domain }}
+{%- endif %}
extra_extra_hosts_entries_etc_hosts_api_host_present:
host.present:
#
# SPDX-License-Identifier: AGPL-3.0
+{%- set database_host = ("__DATABASE_EXTERNAL_SERVICE_HOST_OR_IP__" or "127.0.0.1") %}
+{%- set database_name = "__DATABASE_NAME__" %}
+{%- set database_user = "__DATABASE_USER__" %}
+{%- set database_password = "__DATABASE_PASSWORD__" %}
+
# The variables commented out are the default values that the formula uses.
# The uncommented values are REQUIRED values. If you don't set them, running
# this formula will fail.
database:
# max concurrent connections per arvados server daemon
# connection_pool_max: 32
- name: __CLUSTER___arvados
- host: 127.0.0.1
- password: "__DATABASE_PASSWORD__"
- user: __CLUSTER___arvados
+ name: {{ database_name }}
+ host: {{ database_host }}
+ password: {{ database_password }}
+ user: {{ database_user }}
extra_conn_params:
client_encoding: UTF8
# Centos7 does not enable SSL by default, so we disable
#
# SPDX-License-Identifier: AGPL-3.0
+{%- set database_host = ("__DATABASE_EXTERNAL_SERVICE_HOST_OR_IP__" or "127.0.0.1") %}
+{%- set database_name = "__DATABASE_NAME__" %}
+{%- set database_user = "__DATABASE_USER__" %}
+{%- set database_password = "__DATABASE_PASSWORD__" %}
+
# The variables commented out are the default values that the formula uses.
# The uncommented values are REQUIRED values. If you don't set them, running
# this formula will fail.
database:
# max concurrent connections per arvados server daemon
# connection_pool_max: 32
- name: __CLUSTER___arvados
- host: 127.0.0.1
- password: "__DATABASE_PASSWORD__"
- user: __CLUSTER___arvados
+ name: {{ database_name }}
+ host: {{ database_host }}
+ password: {{ database_password }}
+ user: {{ database_user }}
extra_conn_params:
client_encoding: UTF8
# Centos7 does not enable SSL by default, so we disable
declare SSH_CONFFILE
checktools() {
- local MISSING=''
- for a in git ip ; do
- if ! which $a ; then
- MISSING="$MISSING $a"
- fi
- done
- if [[ -n "$MISSING" ]] ; then
- echo "Some tools are missing, please make sure you have the 'git' and 'iproute2' packages installed"
- exit 1
+ local MISSING=''
+ for a in git ip; do
+ if ! which $a; then
+ MISSING="$MISSING $a"
fi
+ done
+ if [[ -n "$MISSING" ]]; then
+ echo "Some tools are missing, please make sure you have the 'git' and 'iproute2' packages installed"
+ exit 1
+ fi
}
cleanup() {
- local NODE=$1
- local SSH=`ssh_cmd "$NODE"`
- # Delete the old repository
- $SSH $DEPLOY_USER@$NODE rm -rf ${GITTARGET}.git ${GITTARGET}
+ local NODE=$1
+ local SSH=$(ssh_cmd "$NODE")
+ # Delete the old repository
+ $SSH $DEPLOY_USER@$NODE rm -rf ${GITTARGET}.git ${GITTARGET}
}
sync() {
- local NODE=$1
- local BRANCH=$2
-
- # Synchronizes the configuration by creating a git repository on
- # each node, pushing our branch, and updating the checkout.
-
- if [[ "$NODE" != localhost ]] ; then
- SSH=`ssh_cmd "$NODE"`
- GIT="eval `git_cmd $NODE`"
+ local NODE=$1
+ local BRANCH=$2
- cleanup $NODE
+ # Synchronizes the configuration by creating a git repository on
+ # each node, pushing our branch, and updating the checkout.
- # Update the git remote for the remote repository.
- if ! $GIT remote add $NODE $DEPLOY_USER@$NODE:${GITTARGET}.git ; then
- $GIT remote set-url $NODE $DEPLOY_USER@$NODE:${GITTARGET}.git
- fi
+ if [[ "$NODE" != localhost ]]; then
+ SSH=$(ssh_cmd "$NODE")
+ GIT="eval $(git_cmd $NODE)"
- # Initialize the git repository. We're
- # actually going to make two repositories here because git
- # will complain if you try to push to a repository with a
- # checkout. So we're going to create a "bare" repository
- # and then clone a regular repository (with a checkout)
- # from that.
+ cleanup $NODE
- $SSH $DEPLOY_USER@$NODE git init --bare --shared=0600 ${GITTARGET}.git
- $GIT push $NODE $BRANCH
- $SSH $DEPLOY_USER@$NODE "umask 0077 && git clone -s ${GITTARGET}.git ${GITTARGET} && git -C ${GITTARGET} checkout ${BRANCH}"
+ # Update the git remote for the remote repository.
+ if ! $GIT remote add $NODE $DEPLOY_USER@$NODE:${GITTARGET}.git; then
+ $GIT remote set-url $NODE $DEPLOY_USER@$NODE:${GITTARGET}.git
fi
+
+ # Initialize the git repository. We're
+ # actually going to make two repositories here because git
+ # will complain if you try to push to a repository with a
+ # checkout. So we're going to create a "bare" repository
+ # and then clone a regular repository (with a checkout)
+ # from that.
+
+ $SSH $DEPLOY_USER@$NODE git init --bare --shared=0600 ${GITTARGET}.git
+ $GIT push $NODE $BRANCH
+ $SSH $DEPLOY_USER@$NODE "umask 0077 && git clone -s ${GITTARGET}.git ${GITTARGET} && git -C ${GITTARGET} checkout ${BRANCH}"
+ fi
}
deploynode() {
- local NODE=$1
- local ROLES=$2
- local BRANCH=$3
+ local NODE=$1
+ local ROLES=$2
+ local BRANCH=$3
- # Deploy a node. This runs the provision script on the node, with
- # the appropriate roles.
+ # Deploy a node. This runs the provision script on the node, with
+ # the appropriate roles.
- sync $NODE $BRANCH
+ sync $NODE $BRANCH
- if [[ -z "$ROLES" ]] ; then
- echo "No roles specified for $NODE, will deploy all roles"
- else
- ROLES="--roles ${ROLES}"
- fi
+ if [[ -z "$ROLES" ]]; then
+ echo "No roles specified for $NODE, will deploy all roles"
+ else
+ ROLES="--roles ${ROLES}"
+ fi
- logfile=deploy-${NODE}-$(date -Iseconds).log
- SSH=`ssh_cmd "$NODE"`
+ logfile=deploy-${NODE}-$(date -Iseconds).log
+ SSH=$(ssh_cmd "$NODE")
- if [[ "$NODE" = localhost ]] ; then
- SUDO=''
- if [[ $(whoami) != 'root' ]] ; then
- SUDO=sudo
- fi
- $SUDO ./provision.sh --config ${CONFIG_FILE} ${ROLES} 2>&1 | tee $logfile
- else
- $SSH $DEPLOY_USER@$NODE "cd ${GITTARGET} && git log -n1 HEAD && DISABLED_CONTROLLER=\"$DISABLED_CONTROLLER\" sudo --preserve-env=DISABLED_CONTROLLER ./provision.sh --config ${CONFIG_FILE} ${ROLES}" 2>&1 | tee $logfile
- cleanup $NODE
+ if [[ "$NODE" = localhost ]]; then
+ SUDO=''
+ if [[ $(whoami) != 'root' ]]; then
+ SUDO=sudo
fi
+ $SUDO ./provision.sh --config ${CONFIG_FILE} ${ROLES} 2>&1 | tee $logfile
+ else
+ $SSH $DEPLOY_USER@$NODE "cd ${GITTARGET} && git log -n1 HEAD && DISABLED_CONTROLLER=\"$DISABLED_CONTROLLER\" sudo --preserve-env=DISABLED_CONTROLLER ./provision.sh --config ${CONFIG_FILE} ${ROLES}" 2>&1 | tee $logfile
+ cleanup $NODE
+ fi
+}
+
+checkcert() {
+ local CERTNAME=$1
+ local CERTPATH="${CONFIG_DIR}/certs/${CERTNAME}"
+ if [[ ! -f "${CERTPATH}.crt" || ! -e "${CERTPATH}.key" ]]; then
+ echo "Missing ${CERTPATH}.crt or ${CERTPATH}.key files"
+ exit 1
+ fi
}
loadconfig() {
- if ! [[ -s ${CONFIG_FILE} && -s ${CONFIG_FILE}.secrets ]]; then
- echo "Must be run from initialized setup dir, maybe you need to 'initialize' first?"
- fi
- source common.sh
- GITTARGET=arvados-deploy-config-${CLUSTER}
-
- # Set up SSH so that it doesn't forward any environment variable. This is to avoid
- # getting "setlocale" errors on the first run, depending on the distro being used
- # to run the installer (like Debian).
- SSH_CONFFILE=$(mktemp)
- echo "Include config SendEnv -*" > ${SSH_CONFFILE}
+ if ! [[ -s ${CONFIG_FILE} && -s ${CONFIG_FILE}.secrets ]]; then
+ echo "Must be run from initialized setup dir, maybe you need to 'initialize' first?"
+ fi
+ source common.sh
+ GITTARGET=arvados-deploy-config-${CLUSTER}
+
+ # Set up SSH so that it doesn't forward any environment variable. This is to avoid
+ # getting "setlocale" errors on the first run, depending on the distro being used
+ # to run the installer (like Debian).
+ SSH_CONFFILE=$(mktemp)
+ echo "Include config SendEnv -*" >${SSH_CONFFILE}
}
ssh_cmd() {
- local NODE=$1
- if [ -z "${USE_SSH_JUMPHOST}" -o "${NODE}" == "${USE_SSH_JUMPHOST}" -o "${NODE}" == "localhost" ]; then
- echo "ssh -F ${SSH_CONFFILE}"
- else
- echo "ssh -F ${SSH_CONFFILE} -J ${DEPLOY_USER}@${USE_SSH_JUMPHOST}"
- fi
+ local NODE=$1
+ if [ -z "${USE_SSH_JUMPHOST}" -o "${NODE}" == "${USE_SSH_JUMPHOST}" -o "${NODE}" == "localhost" ]; then
+ echo "ssh -F ${SSH_CONFFILE}"
+ else
+ echo "ssh -F ${SSH_CONFFILE} -J ${DEPLOY_USER}@${USE_SSH_JUMPHOST}"
+ fi
}
git_cmd() {
- local NODE=$1
- echo "GIT_SSH_COMMAND=\"`ssh_cmd ${NODE}`\" git"
+ local NODE=$1
+ echo "GIT_SSH_COMMAND=\"$(ssh_cmd ${NODE})\" git"
}
set +u
subcmd="$1"
set -u
-if [[ -n "$subcmd" ]] ; then
- shift
+if [[ -n "$subcmd" ]]; then
+ shift
fi
case "$subcmd" in
- initialize)
- if [[ ! -f provision.sh ]] ; then
- echo "Must be run from arvados/tools/salt-install"
- exit
- fi
-
- checktools
-
- set +u
- SETUPDIR=$1
- PARAMS=$2
- SLS=$3
- TERRAFORM=$4
- set -u
-
- err=
- if [[ -z "$PARAMS" || ! -f local.params.example.$PARAMS ]] ; then
- echo "Not found: local.params.example.$PARAMS"
- echo "Expected one of multiple_hosts, single_host_multiple_hostnames, single_host_single_hostname"
- err=1
- fi
-
- if [[ -z "$SLS" || ! -d config_examples/$SLS ]] ; then
- echo "Not found: config_examples/$SLS"
- echo "Expected one of multi_host/aws, single_host/multiple_hostnames, single_host/single_hostname"
- err=1
- fi
-
- if [[ -z "$SETUPDIR" || -z "$PARAMS" || -z "$SLS" ]]; then
- echo "installer.sh <setup dir to initialize> <params template> <config template>"
- err=1
- fi
-
- if [[ -n "$err" ]] ; then
- exit 1
- fi
-
- echo "Initializing $SETUPDIR"
- git init --shared=0600 $SETUPDIR
- cp -r *.sh tests $SETUPDIR
-
- cp local.params.example.$PARAMS $SETUPDIR/${CONFIG_FILE}
- cp local.params.secrets.example $SETUPDIR/${CONFIG_FILE}.secrets
- cp -r config_examples/$SLS $SETUPDIR/${CONFIG_DIR}
-
- if [[ -n "$TERRAFORM" ]] ; then
- mkdir $SETUPDIR/terraform
- cp -r $TERRAFORM/* $SETUPDIR/terraform/
- fi
-
- cd $SETUPDIR
- echo '*.log' > .gitignore
- echo '**/.terraform' >> .gitignore
- echo '**/.infracost' >> .gitignore
-
- if [[ -n "$TERRAFORM" ]] ; then
- git add terraform
- fi
-
- git add *.sh ${CONFIG_FILE} ${CONFIG_FILE}.secrets ${CONFIG_DIR} tests .gitignore
- git commit -m"initial commit"
-
- echo
- echo "Setup directory $SETUPDIR initialized."
- if [[ -n "$TERRAFORM" ]] ; then
- (cd $SETUPDIR/terraform/vpc && terraform init)
- (cd $SETUPDIR/terraform/data-storage && terraform init)
- (cd $SETUPDIR/terraform/services && terraform init)
- echo "Now go to $SETUPDIR, customize 'terraform/vpc/terraform.tfvars' as needed, then run 'installer.sh terraform'"
- else
- echo "Now go to $SETUPDIR, customize '${CONFIG_FILE}', '${CONFIG_FILE}.secrets' and '${CONFIG_DIR}' as needed, then run 'installer.sh deploy'"
- fi
- ;;
-
- terraform)
- logfile=terraform-$(date -Iseconds).log
- (cd terraform/vpc && terraform apply -auto-approve) 2>&1 | tee -a $logfile
- (cd terraform/data-storage && terraform apply -auto-approve) 2>&1 | tee -a $logfile
- (cd terraform/services && terraform apply -auto-approve) 2>&1 | grep -v letsencrypt_iam_secret_access_key | tee -a $logfile
- (cd terraform/services && echo -n 'letsencrypt_iam_secret_access_key = ' && terraform output letsencrypt_iam_secret_access_key) 2>&1 | tee -a $logfile
- ;;
-
- terraform-destroy)
- logfile=terraform-$(date -Iseconds).log
- (cd terraform/services && terraform destroy) 2>&1 | tee -a $logfile
- (cd terraform/data-storage && terraform destroy) 2>&1 | tee -a $logfile
- (cd terraform/vpc && terraform destroy) 2>&1 | tee -a $logfile
- ;;
-
- generate-tokens)
- for i in BLOB_SIGNING_KEY MANAGEMENT_TOKEN SYSTEM_ROOT_TOKEN ANONYMOUS_USER_TOKEN WORKBENCH_SECRET_KEY DATABASE_PASSWORD; do
- echo ${i}=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 32 ; echo '')
- done
- ;;
-
- deploy)
- set +u
- NODE=$1
- set -u
-
- checktools
-
- loadconfig
-
- if grep -rni 'fixme' ${CONFIG_FILE} ${CONFIG_FILE}.secrets ${CONFIG_DIR} ; then
- echo
- echo "Some parameters still need to be updated. Please fix them and then re-run deploy."
- exit 1
- fi
-
- BRANCH=$(git rev-parse --abbrev-ref HEAD)
-
- set -x
-
- git add -A
- if ! git diff --cached --exit-code --quiet ; then
- git commit -m"prepare for deploy"
- fi
-
- # Used for rolling updates to disable individual nodes at the
- # load balancer.
- export DISABLED_CONTROLLER=""
- if [[ -z "$NODE" ]]; then
- for NODE in "${!NODES[@]}"
- do
- # First, just confirm we can ssh to each node.
- `ssh_cmd "$NODE"` $DEPLOY_USER@$NODE true
- done
-
- for NODE in "${!NODES[@]}"
- do
- # Do 'database' role first,
- if [[ "${NODES[$NODE]}" =~ database ]] ; then
- deploynode $NODE "${NODES[$NODE]}" $BRANCH
- unset NODES[$NODE]
- fi
- done
-
- BALANCER=${ROLE2NODES['balancer']:-}
-
- # Check if there are multiple controllers, they'll be comma-separated
- # in ROLE2NODES
- if [[ ${ROLE2NODES['controller']} =~ , ]] ;
- then
- # If we have multiple controllers then there must be
- # load balancer. We want to do a rolling update, take
- # down each node at the load balancer before updating
- # it.
-
- for NODE in "${!NODES[@]}"
- do
- if [[ "${NODES[$NODE]}" =~ controller ]] ; then
- export DISABLED_CONTROLLER=$NODE
-
- # Update balancer that the node is disabled
- deploynode $BALANCER "${NODES[$BALANCER]}" $BRANCH
-
- # Now update the node itself
- deploynode $NODE "${NODES[$NODE]}" $BRANCH
- unset NODES[$NODE]
- fi
- done
- else
- # Only one controller
- NODE=${ROLE2NODES['controller']}
- deploynode $NODE "${NODES[$NODE]}" $BRANCH
- unset NODES[$NODE]
- fi
-
- if [[ -n "$BALANCER" ]] ; then
- # Deploy balancer. In the rolling update case, this
- # will re-enable all the controllers at the balancer.
- export DISABLED_CONTROLLER=""
- deploynode $BALANCER "${NODES[$BALANCER]}" $BRANCH
- unset NODES[$BALANCER]
- fi
-
- for NODE in "${!NODES[@]}"
- do
- # Everything else (we removed the nodes that we
- # already deployed from the list)
- deploynode $NODE "${NODES[$NODE]}" $BRANCH
- done
- else
- # Just deploy the node that was supplied on the command line.
- deploynode $NODE "${NODES[$NODE]}" $BRANCH
- fi
-
- set +x
- echo
- echo "Completed deploy, run 'installer.sh diagnostics' to verify the install"
-
- ;;
-
- diagnostics)
- loadconfig
-
- set +u
- declare LOCATION=$1
- set -u
-
- if ! which arvados-client ; then
- echo "arvados-client not found, install 'arvados-client' package with 'apt-get' or 'yum'"
- exit 1
- fi
-
- if [[ -z "$LOCATION" ]] ; then
- echo "Need to provide '-internal-client' or '-external-client'"
- echo
- echo "-internal-client You are running this on the same private network as the Arvados cluster (e.g. on one of the Arvados nodes)"
- echo "-external-client You are running this outside the private network of the Arvados cluster (e.g. your workstation)"
- exit 1
- fi
-
- export ARVADOS_API_HOST="${DOMAIN}:${CONTROLLER_EXT_SSL_PORT}"
- export ARVADOS_API_TOKEN="$SYSTEM_ROOT_TOKEN"
-
- arvados-client diagnostics $LOCATION
- ;;
-
- *)
- echo "Arvados installer"
- echo ""
- echo "initialize initialize the setup directory for configuration"
- echo "terraform create cloud resources using terraform"
- echo "terraform-destroy destroy cloud resources created by terraform"
- echo "generate-tokens generate random values for tokens"
- echo "deploy deploy the configuration from the setup directory"
- echo "diagnostics check your install using diagnostics"
- ;;
+initialize)
+ if [[ ! -f provision.sh ]]; then
+ echo "Must be run from arvados/tools/salt-install"
+ exit
+ fi
+
+ checktools
+
+ set +u
+ SETUPDIR=$1
+ PARAMS=$2
+ SLS=$3
+ TERRAFORM=$4
+ set -u
+
+ err=
+ if [[ -z "$PARAMS" || ! -f local.params.example.$PARAMS ]]; then
+ echo "Not found: local.params.example.$PARAMS"
+ echo "Expected one of multiple_hosts, single_host_multiple_hostnames, single_host_single_hostname"
+ err=1
+ fi
+
+ if [[ -z "$SLS" || ! -d config_examples/$SLS ]]; then
+ echo "Not found: config_examples/$SLS"
+ echo "Expected one of multi_host/aws, single_host/multiple_hostnames, single_host/single_hostname"
+ err=1
+ fi
+
+ if [[ -z "$SETUPDIR" || -z "$PARAMS" || -z "$SLS" ]]; then
+ echo "installer.sh <setup dir to initialize> <params template> <config template>"
+ err=1
+ fi
+
+ if [[ -n "$err" ]]; then
+ exit 1
+ fi
+
+ echo "Initializing $SETUPDIR"
+ git init --shared=0600 $SETUPDIR
+ cp -r *.sh tests $SETUPDIR
+
+ cp local.params.example.$PARAMS $SETUPDIR/${CONFIG_FILE}
+ cp local.params.secrets.example $SETUPDIR/${CONFIG_FILE}.secrets
+ cp -r config_examples/$SLS $SETUPDIR/${CONFIG_DIR}
+
+ if [[ -n "$TERRAFORM" ]]; then
+ mkdir $SETUPDIR/terraform
+ cp -r $TERRAFORM/* $SETUPDIR/terraform/
+ fi
+
+ cd $SETUPDIR
+ echo '*.log' >.gitignore
+ echo '**/.terraform' >>.gitignore
+ echo '**/.infracost' >>.gitignore
+
+ if [[ -n "$TERRAFORM" ]]; then
+ git add terraform
+ fi
+
+ git add *.sh ${CONFIG_FILE} ${CONFIG_FILE}.secrets ${CONFIG_DIR} tests .gitignore
+ git commit -m"initial commit"
+
+ echo
+ echo "Setup directory $SETUPDIR initialized."
+ if [[ -n "$TERRAFORM" ]]; then
+ (cd $SETUPDIR/terraform/vpc && terraform init)
+ (cd $SETUPDIR/terraform/data-storage && terraform init)
+ (cd $SETUPDIR/terraform/services && terraform init)
+ echo "Now go to $SETUPDIR, customize 'terraform/vpc/terraform.tfvars' as needed, then run 'installer.sh terraform'"
+ else
+ echo "Now go to $SETUPDIR, customize '${CONFIG_FILE}', '${CONFIG_FILE}.secrets' and '${CONFIG_DIR}' as needed, then run 'installer.sh deploy'"
+ fi
+ ;;
+
+terraform)
+ logfile=terraform-$(date -Iseconds).log
+ (cd terraform/vpc && terraform apply -auto-approve) 2>&1 | tee -a $logfile
+ (cd terraform/data-storage && terraform apply -auto-approve) 2>&1 | tee -a $logfile
+ (cd terraform/services && terraform apply -auto-approve) 2>&1 | grep -v letsencrypt_iam_secret_access_key | tee -a $logfile
+ (cd terraform/services && echo -n 'letsencrypt_iam_secret_access_key = ' && terraform output letsencrypt_iam_secret_access_key) 2>&1 | tee -a $logfile
+ ;;
+
+terraform-destroy)
+ logfile=terraform-$(date -Iseconds).log
+ (cd terraform/services && terraform destroy) 2>&1 | tee -a $logfile
+ (cd terraform/data-storage && terraform destroy) 2>&1 | tee -a $logfile
+ (cd terraform/vpc && terraform destroy) 2>&1 | tee -a $logfile
+ ;;
+
+generate-tokens)
+ for i in BLOB_SIGNING_KEY MANAGEMENT_TOKEN SYSTEM_ROOT_TOKEN ANONYMOUS_USER_TOKEN WORKBENCH_SECRET_KEY DATABASE_PASSWORD; do
+ echo ${i}=$(
+ tr -dc A-Za-z0-9 </dev/urandom | head -c 32
+ echo ''
+ )
+ done
+ ;;
+
+deploy)
+ set +u
+ NODE=$1
+ set -u
+
+ checktools
+
+ loadconfig
+
+ if grep -rni 'fixme' ${CONFIG_FILE} ${CONFIG_FILE}.secrets ${CONFIG_DIR}; then
+ echo
+ echo "Some parameters still need to be updated. Please fix them and then re-run deploy."
+ exit 1
+ fi
+
+ if [[ ${SSL_MODE} == "bring-your-own" ]]; then
+ if [[ ! -z "${ROLE2NODES['balancer']:-}" ]]; then
+ checkcert balancer
+ fi
+ if [[ ! -z "${ROLE2NODES['controller']:-}" ]]; then
+ checkcert controller
+ fi
+ if [[ ! -z "${ROLE2NODES['keepproxy']:-}" ]]; then
+ checkcert keepproxy
+ fi
+ if [[ ! -z "${ROLE2NODES['keepweb']:-}" ]]; then
+ checkcert collections
+ checkcert download
+ fi
+ if [[ ! -z "${ROLE2NODES['monitoring']:-}" ]]; then
+ checkcert grafana
+ checkcert prometheus
+ fi
+ if [[ ! -z "${ROLE2NODES['webshell']:-}" ]]; then
+ checkcert webshell
+ fi
+ if [[ ! -z "${ROLE2NODES['websocket']:-}" ]]; then
+ checkcert websocket
+ fi
+ if [[ ! -z "${ROLE2NODES['workbench']:-}" ]]; then
+ checkcert workbench
+ fi
+ if [[ ! -z "${ROLE2NODES['workbench2']:-}" ]]; then
+ checkcert workbench2
+ fi
+ fi
+
+ BRANCH=$(git rev-parse --abbrev-ref HEAD)
+
+ set -x
+
+ git add -A
+ if ! git diff --cached --exit-code --quiet; then
+ git commit -m"prepare for deploy"
+ fi
+
+ # Used for rolling updates to disable individual nodes at the
+ # load balancer.
+ export DISABLED_CONTROLLER=""
+ if [[ -z "$NODE" ]]; then
+ for NODE in "${!NODES[@]}"; do
+ # First, just confirm we can ssh to each node.
+ $(ssh_cmd "$NODE") $DEPLOY_USER@$NODE true
+ done
+
+ for NODE in "${!NODES[@]}"; do
+ # Do 'database' role first,
+ if [[ "${NODES[$NODE]}" =~ database ]]; then
+ deploynode $NODE "${NODES[$NODE]}" $BRANCH
+ unset NODES[$NODE]
+ fi
+ done
+
+ BALANCER=${ROLE2NODES['balancer']:-}
+
+ # Check if there are multiple controllers, they'll be comma-separated
+ # in ROLE2NODES
+ if [[ ${ROLE2NODES['controller']} =~ , ]]; then
+ # If we have multiple controllers then there must be
+ # load balancer. We want to do a rolling update, take
+ # down each node at the load balancer before updating
+ # it.
+
+ for NODE in "${!NODES[@]}"; do
+ if [[ "${NODES[$NODE]}" =~ controller ]]; then
+ export DISABLED_CONTROLLER=$NODE
+
+ # Update balancer that the node is disabled
+ deploynode $BALANCER "${NODES[$BALANCER]}" $BRANCH
+
+ # Now update the node itself
+ deploynode $NODE "${NODES[$NODE]}" $BRANCH
+ unset NODES[$NODE]
+ fi
+ done
+ else
+ # Only one controller, check if it wasn't already taken care of.
+ NODE=${ROLE2NODES['controller']}
+ if [[ ! -z "${NODES[$NODE]:-}" ]]; then
+ deploynode $NODE "${NODES[$NODE]}" $BRANCH
+ unset NODES[$NODE]
+ fi
+ fi
+
+ if [[ -n "$BALANCER" ]]; then
+ # Deploy balancer. In the rolling update case, this
+ # will re-enable all the controllers at the balancer.
+ export DISABLED_CONTROLLER=""
+ deploynode $BALANCER "${NODES[$BALANCER]}" $BRANCH
+ unset NODES[$BALANCER]
+ fi
+
+ for NODE in "${!NODES[@]}"; do
+ # Everything else (we removed the nodes that we
+ # already deployed from the list)
+ deploynode $NODE "${NODES[$NODE]}" $BRANCH
+ done
+ else
+ # Just deploy the node that was supplied on the command line.
+ deploynode $NODE "${NODES[$NODE]}" $BRANCH
+ fi
+
+ set +x
+ echo
+ echo "Completed deploy, run 'installer.sh diagnostics' to verify the install"
+
+ ;;
+
+diagnostics)
+ loadconfig
+
+ set +u
+ declare LOCATION=$1
+ set -u
+
+ if ! which arvados-client; then
+ echo "arvados-client not found, install 'arvados-client' package with 'apt-get' or 'yum'"
+ exit 1
+ fi
+
+ if [[ -z "$LOCATION" ]]; then
+ echo "Need to provide '-internal-client' or '-external-client'"
+ echo
+ echo "-internal-client You are running this on the same private network as the Arvados cluster (e.g. on one of the Arvados nodes)"
+ echo "-external-client You are running this outside the private network of the Arvados cluster (e.g. your workstation)"
+ exit 1
+ fi
+
+ export ARVADOS_API_HOST="${DOMAIN}:${CONTROLLER_EXT_SSL_PORT}"
+ export ARVADOS_API_TOKEN="$SYSTEM_ROOT_TOKEN"
+
+ arvados-client diagnostics $LOCATION
+ ;;
+
+*)
+ echo "Arvados installer"
+ echo ""
+ echo "initialize initialize the setup directory for configuration"
+ echo "terraform create cloud resources using terraform"
+ echo "terraform-destroy destroy cloud resources created by terraform"
+ echo "generate-tokens generate random values for tokens"
+ echo "deploy deploy the configuration from the setup directory"
+ echo "diagnostics check your install using diagnostics"
+ ;;
esac
COMPUTE_AWS_REGION="${AWS_REGION}"
COMPUTE_USER="${DEPLOY_USER}"
-# Keep S3 backend region
+# Keep S3 backend settings
KEEP_AWS_REGION="${AWS_REGION}"
+KEEP_AWS_S3_BUCKET="${CLUSTER}-nyw5e-000000000000000-volume"
+KEEP_AWS_IAM_ROLE="${CLUSTER}-keepstore-00-iam-role"
# If you going to provide your own certificates for Arvados, the provision script can
# help you deploy them. In order to do that, you need to set `SSL_MODE=bring-your-own` above,
# Customize Prometheus & Grafana web UI access credentials
MONITORING_USERNAME=${INITIAL_USER}
MONITORING_EMAIL=${INITIAL_USER_EMAIL}
+
# Sets the directory for Grafana dashboards
# GRAFANA_DASHBOARDS_DIR="${SCRIPT_DIR}/local_config_dir/dashboards"
+# Sets the amount of data (expressed in time) Prometheus keeps on its
+# time-series database. Default is 15 days.
+# PROMETHEUS_DATA_RETENTION_TIME="180d"
+
# The mapping of nodes to roles
# installer.sh will log in to each of these nodes and then provision
# it for the specified roles.
KEEPSTORE0_INT_IP=10.1.2.13
SHELL_INT_IP=10.1.2.17
-# In a load balanced deployment, you can do rolling upgrades by specifying one
-# controller node name at a time, so that it gets removed from the pool and can
-# be upgraded.
-DISABLED_CONTROLLER=""
+DATABASE_NAME="${CLUSTER}_arvados"
+DATABASE_USER="${CLUSTER}_arvados"
+# Set this if using an external PostgreSQL service.
+#DATABASE_EXTERNAL_SERVICE_HOST_OR_IP=
# Performance tuning parameters. If these are not set, workers
# defaults on the number of cpus and queued requests defaults to 128.
KEEPSTORE0_INT_IP=""
SHELL_INT_IP=""
-DISABLED_CONTROLLER=""
+DATABASE_NAME="${CLUSTER}_arvados"
+DATABASE_USER="${CLUSTER}_arvados"
+# Set this if using an external PostgreSQL service.
+#DATABASE_EXTERNAL_SERVICE_HOST_OR_IP=
# The directory to check for the config files (pillars, states) you want to use.
# There are a few examples under 'config_examples'.
KEEPSTORE0_INT_IP=""
SHELL_INT_IP=""
-DISABLED_CONTROLLER=""
+DATABASE_NAME="${CLUSTER}_arvados"
+DATABASE_USER="${CLUSTER}_arvados"
+# Set this if using an external PostgreSQL service.
+#DATABASE_EXTERNAL_SERVICE_HOST_OR_IP=
# The directory to check for the config files (pillars, states) you want to use.
# There are a few examples under 'config_examples'.
s#__INITIAL_USER_PASSWORD__#${INITIAL_USER_PASSWORD}#g;
s#__INITIAL_USER__#${INITIAL_USER}#g;
s#__LE_AWS_REGION__#${LE_AWS_REGION:-}#g;
- s#__LE_AWS_SECRET_ACCESS_KEY__#${LE_AWS_SECRET_ACCESS_KEY}#g;
- s#__LE_AWS_ACCESS_KEY_ID__#${LE_AWS_ACCESS_KEY_ID}#g;
+ s#__LE_AWS_SECRET_ACCESS_KEY__#${LE_AWS_SECRET_ACCESS_KEY:-}#g;
+ s#__LE_AWS_ACCESS_KEY_ID__#${LE_AWS_ACCESS_KEY_ID:-}#g;
+ s#__DATABASE_NAME__#${DATABASE_NAME}#g;
+ s#__DATABASE_USER__#${DATABASE_USER}#g;
s#__DATABASE_PASSWORD__#${DATABASE_PASSWORD}#g;
+ s#__DATABASE_INT_IP__#${DATABASE_INT_IP:-}#g;
+ s#__DATABASE_EXTERNAL_SERVICE_HOST_OR_IP__#${DATABASE_EXTERNAL_SERVICE_HOST_OR_IP:-}#g;
s#__KEEPWEB_EXT_SSL_PORT__#${KEEPWEB_EXT_SSL_PORT}#g;
s#__KEEP_EXT_SSL_PORT__#${KEEP_EXT_SSL_PORT}#g;
s#__MANAGEMENT_TOKEN__#${MANAGEMENT_TOKEN}#g;
s#__SHELL_INT_IP__#${SHELL_INT_IP}#g;
s#__WORKBENCH1_INT_IP__#${WORKBENCH1_INT_IP}#g;
s#__WORKBENCH2_INT_IP__#${WORKBENCH2_INT_IP}#g;
- s#__DATABASE_INT_IP__#${DATABASE_INT_IP}#g;
s#__WORKBENCH_SECRET_KEY__#${WORKBENCH_SECRET_KEY}#g;
s#__SSL_KEY_ENCRYPTED__#${SSL_KEY_ENCRYPTED}#g;
s#__SSL_KEY_AWS_REGION__#${SSL_KEY_AWS_REGION:-}#g;
s#__DISABLED_CONTROLLER__#${DISABLED_CONTROLLER}#g;
s#__BALANCER_NODENAME__#${ROLE2NODES['balancer']:-}#g;
s#__PROMETHEUS_NODENAME__#${ROLE2NODES['monitoring']:-}#g;
+ s#__PROMETHEUS_DATA_RETENTION_TIME__#${PROMETHEUS_DATA_RETENTION_TIME:-15d}#g;
s#__CONTROLLER_NODES__#${ROLE2NODES['controller']:-}#g;
s#__NODELIST__#${NODELIST}#g;
s#__DISPATCHER_INT_IP__#${DISPATCHER_INT_IP}#g;
s#__COMPUTE_SUBNET__#${COMPUTE_SUBNET:-}#g;
s#__COMPUTE_AWS_REGION__#${COMPUTE_AWS_REGION:-}#g;
s#__COMPUTE_USER__#${COMPUTE_USER:-}#g;
+ s#__KEEP_AWS_S3_BUCKET__#${KEEP_AWS_S3_BUCKET:-}#g;
+ s#__KEEP_AWS_IAM_ROLE__#${KEEP_AWS_IAM_ROLE:-}#g;
s#__KEEP_AWS_REGION__#${KEEP_AWS_REGION:-}#g" \
"${SRCFILE}" > "${DSTFILE}"
}
test -d arvados || git clone --quiet https://git.arvados.org/arvados-formula.git ${F_DIR}/arvados
# If we want to try a specific branch of the formula
-if [ "x${BRANCH:-}" != "xmain" ]; then
+if [[ ! -z "${BRANCH:-}" && "x${BRANCH}" != "xmain" ]]; then
( cd ${F_DIR}/arvados && git checkout --quiet -t origin/"${BRANCH}" -b "${BRANCH}" )
elif [ "x${ARVADOS_TAG:-}" != "x" ]; then
( cd ${F_DIR}/arvados && git checkout --quiet tags/"${ARVADOS_TAG}" -b "${ARVADOS_TAG}" )