3 navsection: installguide
4 title: Multi-Host Arvados
7 Copyright (C) The Arvados Authors. All rights reserved.
9 SPDX-License-Identifier: CC-BY-SA-3.0
12 # "Introduction":#introduction
13 # "Prerequisites and planning":#prerequisites
14 ## "Create AWS infrastructure with Terraform":#terraform
15 ## "Create required insfrastructure manually":#inframanual
16 # "Download the installer":#download
17 # "Initialize the installer":#copy_config
18 # "Edit local.params":#localparams
19 # "Configure Keep storage":#keep
20 # "Choose the SSL configuration":#certificates
21 ## "Using a Let's Encrypt certificates":#lets-encrypt
22 ## "Bring your own certificates":#bring-your-own
23 # "Create a compute image":#create_a_compute_image
24 # "Begin installation":#installation
25 # "Further customization of the installation":#further_customization
26 # "Confirm the cluster is working":#test-install
27 ## "Debugging issues":#debugging
28 ## "Iterating on config changes":#iterating
29 ## "Common problems and solutions":#common-problems
30 # "Initial user and login":#initial_user
31 # "After the installation":#post_install
33 h2(#introduction). Introduction
35 This multi host installer is the recommendend way to set up a production Arvados cluster. These instructions include specific details for installing on Amazon Web Services (AWS), which are marked as "AWS specific". However with additional customization the installer can be used as a template for deployment on other cloud provider or HPC systems.
37 h2(#prerequisites). Prerequisites and planning
39 h3. Cluster ID and base domain
41 Choose a 5-character cluster identifier that will represent the cluster. Here are "guidelines on choosing a cluster identifier":../architecture/federation.html#cluster_id . Only lowercase letters and digits 0-9 are allowed. Examples will use @xarv1@ or @${CLUSTER}@, you should substitute the cluster id you have selected.
43 Determine the base domain for the cluster. This will be referred to as @${DOMAIN}@.
45 For example, if CLUSTER is @xarv1@ and DOMAIN is @example.com@, then @controller.${CLUSTER}.${DOMAIN}@ means @controller.xarv1.example.com@.
47 h3(#terraform). Create AWS infrastructure with Terraform
49 To simplify the tedious and error-prone process of building a working cloud infrastructure for your Arvados cluster, we provide a set of Terraform code files that you can run against Amazon Web Services.
51 These files are located in the @tools/salt-install/terraform/aws/@ directory and are divided in three sections:
53 # The @vpc/@ subdirectory controls the network related infrastructure of your cluster, including firewall rules and split-horizon DNS resolution.
54 # The @data-storage/@ subdirectory controls the stateful part of your cluster, currently only sets up the S3 bucket for holding the Keep blocks and in the future it'll also manage the database service.
55 # The @services/@ subdirectory controls the hosts that will run the different services on your cluster, makes sure that they have the required software for the installer to do its job.
57 h4. Software requirements & considerations
59 In addition of having "Terraform CLI":https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli tool installed on your computer, you'll also need the "AWS CLI":https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html tool and proper credentials already configured.
61 Once all the required tools are present, as a first step you should run @terraform init@ inside each subdirectory, so that all the required modules get downloaded. If you happen to miss this step, running @terraform apply@ will just exit with a message asking for this.
63 {% include 'notebox_begin' %}
64 The Terraform state files (that keep crucial infrastructure information from the cloud) will be saved inside each subdirectory, under the @terraform.tfstate@ name. You should keep these files secure as they contain unencrypted secrets. Research on state files management best practices is left as an exercise to the reader.
65 {% include 'notebox_end' %}
67 h4. Terraform code configuration
69 Each section described above contain a @terraform.tfvars@ file with some configuration values that you should set before applying each configuration. You'll at least need to set up the cluster prefix and domain name in @vpc/terraform.tfvars@:
71 <pre><code>region_name = "us-east-1"
72 # cluster_name = "xarv1"
73 # domain_name = "example.com"</code></pre>
75 The other @data-storage/terraform.tfvars@ and @services/terraform.tfvars@ files already have sensible defaults so you may not need to modify them.
77 h4. Create the infrastructure
79 The whole infrastructure needs to be built in stages by running @terraform apply@ inside each subdirectory in the order they are listed above. Each stage will output information that is needed by a following stage. The last stage @services/@ will output the information needed to set up the cluster's domain and continue with the installer, for example:
81 <pre><code>$ terraform apply
83 Apply complete! Resources: 16 added, 0 changed, 0 destroyed.
87 arvados_sg_id = "sg-02fa04a2c273166d7"
88 cluster_name = "xarv1"
89 domain_name = "example.com"
90 letsencrypt_iam_access_key_id = "AKIA43MU4DW7K57DBVSD"
91 letsencrypt_iam_secret_access_key = <sensitive>
93 "controller" = "10.1.1.1"
96 "keepproxy" = "10.1.1.2"
98 "workbench" = "10.1.1.5"
101 "controller" = "18.235.116.23"
102 "keep0" = "34.202.85.86"
103 "keep1" = "38.22.123.98"
104 "keepproxy" = "34.231.9.201"
105 "shell" = "44.208.155.240"
106 "workbench" = "52.204.134.136"
108 route53_dns_ns = tolist([
109 "ns-1119.awsdns-11.org",
110 "ns-1812.awsdns-34.co.uk",
111 "ns-437.awsdns-54.com",
112 "ns-809.awsdns-37.net",
114 subnet_id = "subnet-072a139f03938b710"
115 vpc_cidr = "10.1.0.0/16"
116 vpc_id = "vpc-0934aa4738300423a"
119 You'll see that the @letsencrypt_iam_secret_access_key@ data is obscured; to retrieve it you'll need to run the following command inside the @services/@ subdirectory:
121 <pre><code>$ terraform output letsencrypt_iam_secret_access_key
122 "FQ3+3lnBOtWUu+Nw+qb3RiAGqE7DxV9jFC+XTARl"</code></pre>
126 At this stage, the infrastructure for your Arvados cluster is up and running, ready for the installer to connect to the instances and do the final set up.
128 The domain name for your cluster (e.g.: xarv1.example.com) is managed via Route53 and the SSL certificates will be issued using Let's Encrypt.
130 Take note of the domain servers listed in @route53_dns_ns@ so you can delegate the zone to them.
132 You'll need to take note of @letsencrypt_iam_access_key_id@ and @letsencrypt_iam_secret_access_key@ for setting up @LE_AWS_*@ variables in @local.params@.
134 You'll also need @subnet_id@ and @arvados_sg_id@ to set up @DriverParameters.SubnetID@ and @DriverParameters.SecurityGroupIDs@ in @local_config_dir/pillars/arvados.sls@ as "described below":#create_a_compute_image.
136 h3(#inframanual). Create required infrastructure manually
138 If you would rather prefer to create/set up your infrastructure manually, below we provide some recommendations you will need to consider.
140 h4. Virtual Private Cloud (AWS specific)
142 We recommend setting Arvados up in a "Virtual Private Cloud (VPC)":https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html
144 When you do so, you need to configure a couple of additional things:
146 # "Create a subnet for the compute nodes":https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html
147 # You should set up a "security group which allows SSH access (port 22)":https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html
148 # Make sure to add a "VPC S3 endpoint":https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html
150 h4(#keep-bucket). S3 Bucket (AWS specific)
152 We recommend "creating an S3 bucket":https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html for data storage named @${CLUSTER}-nyw5e-000000000000000-volume@. We recommend creating an IAM role called @${CLUSTER}-keepstore-00-iam-role@ with a "policy that can read, write, list and delete objects in the bucket":configure-s3-object-storage.html#IAM . With the example cluster id @xarv1@ the bucket would be called @xarv1-nyw5e-000000000000000-volume@ and the role would be called @xarv1-keepstore-00-iam-role@.
154 These names are recommended because they are default names used in the configuration template. If you use different names, you will need to edit the configuration template later.
156 h4(#hosts). Required hosts
158 You will need to allocate several hosts (physical or virtual machines) for the fixed infrastructure of the Arvados cluster. These machines should have at least 2 cores and 8 GiB of RAM, running a supported Linux distribution.
160 {% include 'supportedlinux' %}
162 Allocate the following hosts as appropriate for your site. On AWS you may choose to do it manually with the AWS console, or using a DevOps tool such as CloudFormation or Terraform. With the exception of "keep0" and "keep1", all of these hosts should have external (public) IP addresses if you intend for them to be accessible outside of the private network or VPC.
164 The installer will set up the Arvados services on your machines. Here is the default assignment of services to machines:
168 ## arvados api server
169 ## arvados controller (recommendend hostname @controller.${CLUSTER}.${DOMAIN}@)
170 ## arvados websocket (recommendend hostname @ws.${CLUSTER}.${DOMAIN}@)
171 ## arvados cloud dispatcher
172 ## arvados keepbalance
173 # KEEPSTORE nodes (at least 2)
174 ## arvados keepstore (recommendend hostnames @keep0.${CLUSTER}.${DOMAIN}@ and @keep1.${CLUSTER}.${DOMAIN}@)
176 ## arvados keepproxy (recommendend hostname @keep.${CLUSTER}.${DOMAIN}@)
177 ## arvados keepweb (recommendend hostname @download.${CLUSTER}.${DOMAIN}@ and @*.collections.${CLUSTER}.${DOMAIN}@)
179 ## arvados workbench (recommendend hostname @workbench.${CLUSTER}.${DOMAIN}@)
180 ## arvados workbench2 (recommendend hostname @workbench2.${CLUSTER}.${DOMAIN}@)
181 ## arvados webshell (recommendend hostname @webshell.${CLUSTER}.${DOMAIN}@)
182 # SHELL node (optional)
183 ## arvados shell (recommended hostname @shell.${CLUSTER}.${DOMAIN}@)
185 When using the database installed by Arvados (and not an "external database":#ext-database), the database is stored under @/var/lib/postgresql@. Arvados logs are also kept in @/var/log@ and @/var/www/arvados-api/shared/log@. Accordingly, you should ensure that the disk partition containing @/var@ has adequate storage for your planned usage. We suggest starting with 50GiB of free space on the database host.
187 h4(#DNS). DNS hostnames for each service
189 You will need a DNS entry for each service. In the default configuration these are:
191 # @controller.${CLUSTER}.${DOMAIN}@
192 # @ws.${CLUSTER}.${DOMAIN}@
193 # @keep0.${CLUSTER}.${DOMAIN}@
194 # @keep1.${CLUSTER}.${DOMAIN}@
195 # @keep.${CLUSTER}.${DOMAIN}@
196 # @download.${CLUSTER}.${DOMAIN}@
197 # @*.collections.${CLUSTER}.${DOMAIN}@ -- important note, this must be a wildcard DNS, resolving to the @keepweb@ service
198 # @workbench.${CLUSTER}.${DOMAIN}@
199 # @workbench2.${CLUSTER}.${DOMAIN}@
200 # @webshell.${CLUSTER}.${DOMAIN}@
201 # @shell.${CLUSTER}.${DOMAIN}@
203 This is described in more detail in "DNS entries and TLS certificates":install-manual-prerequisites.html#dnstls.
205 h4. Additional prerequisites when preparing machines to run the installer
207 # From the account where you are performing the install, passwordless @ssh@ to each machine
208 This means the client's public key should added to @~/.ssh/authorized_keys@ on each node.
209 # Passwordless @sudo@ access on the account on each machine you will @ssh@ in to
210 This usually means adding the account to the @sudo@ group and having a rule like this in @/etc/sudoers.d/arvados_passwordless@ that allows members of group @sudo@ to execute any command without entering a password.
211 <pre>%sudo ALL=(ALL:ALL) NOPASSWD:ALL</pre>
212 # @git@ installed on each machine
213 # Port 443 reachable by clients
215 (AWS specific) The machine that runs the arvados cloud dispatcher will need an "IAM role that allows it to manage EC2 instances.":{{site.baseurl}}/install/crunch2-cloud/install-dispatch-cloud.html#IAM
217 If your infrastructure differs from the setup proposed above (ie, different hostnames), you can still use the installer, but "additional customization may be necessary":#further_customization .
219 h2(#download). Download the installer
221 {% assign local_params_src = 'multiple_hosts' %}
222 {% assign config_examples_src = 'multi_host/aws'%}
223 {% include 'download_installer' %}
225 h2(#localparams). Edit @local.params@
227 This can be found wherever you choose to initialize the install files (@~/setup-arvados-xarv1@ in these examples).
229 # Set @CLUSTER@ to the 5-character cluster identifier (e.g "xarv1")
230 # Set @DOMAIN@ to the base DNS domain of the environment, e.g. "example.com"
231 # Edit Internal IP settings. Since services share hosts, some hosts are the same. See "note about /etc/hosts":#etchosts
232 # Edit @CLUSTER_INT_CIDR@, this should be the CIDR of the private network that Arvados is running on, e.g. the VPC.
233 CIDR stands for "Classless Inter-Domain Routing" and describes which portion of the IP address that refers to the network. For example 192.168.3.0/24 means that the first 24 bits are the network (192.168.3) and the last 8 bits are a specific host on that network.
234 _AWS Specific: Go to the AWS console and into the VPC service, there is a column in this table view of the VPCs that gives the CIDR for the VPC (IPv4 CIDR)._
235 # Set @INITIAL_USER_EMAIL@ to your email address, as you will be the first admin user of the system.
236 # Set each @KEY@ / @TOKEN@ to a random string
237 Here's an easy way to create five random tokens:
238 <pre><code>for i in 1 2 3 4 5; do
239 tr -dc A-Za-z0-9 </dev/urandom | head -c 32 ; echo ''
242 # Set @DATABASE_PASSWORD@ to a random string (unless you "already have a database":#ext-database then you should set it to that database's password)
243 Important! If this contains any non-alphanumeric characters, in particular ampersand ('&'), it is necessary to add backslash quoting.
244 For example, if the password is @Lq&MZ<V']d?j@
245 With backslash quoting the special characters it should appear like this in local.params:
246 <pre><code>DATABASE_PASSWORD="Lq\&MZ\<V\'\]d\?j"</code></pre>
248 h3(#etchosts). Note on @/etc/hosts@
250 Because Arvados services are typically accessed by external clients, they are likely to have both a public IP address and a internal IP address.
252 On cloud providers such as AWS, sending internal traffic to a service's public IP address can incur egress costs and throttling. Thus it is very important for internal traffic to stay on the internal network. The installer implements this by updating @/etc/hosts@ on each node to associate each service's hostname with the internal IP address, so that when Arvados services communicate with one another, they always use the internal network address. This is NOT a substitute for DNS, you still need to set up DNS names for all of the services that have public IP addresses (it does, however, avoid a complex "split-horizon" DNS configuration).
254 It is important to be aware of this because if you mistype the IP address for any of the @*_INT_IP@ variables, hosts may unexpectedly fail to be able to communicate with one another. If this happens, check and edit as necessary the file @/etc/hosts@ on the host that is failing to make an outgoing connection.
256 h2(#keep). Configure Keep storage
258 The @multi_host/aws@ template uses S3 for storage. Arvados also supports "filesystem storage":configure-fs-storage.html and "Azure blob storage":configure-azure-blob-storage.html . Keep storage configuration can be found in in the @arvados.cluster.Volumes@ section of @local_config_dir/pillars/arvados.sls@.
260 h3. Object storage in S3 (AWS Specific)
262 Open @local_config_dir/pillars/arvados.sls@ and edit as follows:
264 # In the @arvados.cluster.Volumes.DriverParameters@ section, set @Region@ to the appropriate AWS region (e.g. 'us-east-1')
266 If you did not "follow the recommendend naming scheme":#keep-bucket for either the bucket or role, you'll need to update these parameters as well:
268 # Set @Bucket@ to the value of "keepstore bucket you created earlier":#keep-bucket
269 # Set @IAMRole@ to "keepstore role you created earlier":#keep-bucket
271 {% include 'ssl_config_multi' %}
273 h2(#authentication). Configure your authentication provider (optional, recommended)
275 By default, the installer will use the "Test" provider, which is a list of usernames and cleartext passwords stored in the Arvados config file. *This is low security configuration and you are strongly advised to configure one of the other "supported authentication methods":setup-login.html* .
277 h2(#ext-database). Using an external database (optional)
279 The standard behavior of the installer is to install and configure PostgreSQL for use by Arvados. You can optionally configure it to use a separately managed database instead.
281 Arvados requires a database that is compatible with PostgreSQL 9.5 or later. For example, Arvados is known to work with Amazon Aurora (note: even idle, Arvados services will periodically poll the database, so we strongly advise using "provisioned" mode).
283 # In @local.params@, remove 'database' from the list of roles assigned to the controller node:
285 [controller.${CLUSTER}.${DOMAIN}]=api,controller,websocket,dispatcher,keepbalance
289 # In @local.params@, set @DATABASE_INT_IP@ to the database endpoint (can be a hostname, does not have to be an IP address).
290 <pre><code>DATABASE_INT_IP=...
292 # In @local.params@, set @DATABASE_PASSWORD@ to the correct value. "See the previous section describing correct quoting":#localparams
293 # In @local_config_dir/pillars/arvados.sls@ you may need to adjust the database name and user. This can be found in the section @arvados.cluster.database@.
295 h2(#further_customization). Further customization of the installation (optional)
297 If you are installing on AWS and have followed all of the naming conventions recommend in this guide, you probably don't need to do any further customization.
299 If you are installing on a different cloud provider or on HPC, other changes may require editing the Saltstack pillars and states files found in @local_config_dir@. In particular, @local_config_dir/pillars/arvados.sls@ contains the template (in the @arvados.cluster@ section) used to produce the Arvados configuration file that is distributed to all the nodes. Consult the "Configuration reference":config.html for a comprehensive list of configuration keys.
301 Any extra Salt "state" files you add under @local_config_dir/states@ will be added to the Salt run and applied to the hosts.
303 h2(#create_a_compute_image). Create a compute image
305 {% include 'branchname' %}
307 On cloud installations, containers are dispatched in Docker daemons running in the _compute instances_, which need some additional setup. If you will use a HPC scheduler such as SLURM you can skip this section.
309 *Start by following "the instructions to build a cloud compute node image":{{site.baseurl}}/install/crunch2-cloud/install-compute-node.html using the "compute image builder script":https://github.com/arvados/arvados/tree/{{ branchname }}/tools/compute-images* .
311 Once you have that image created, Open @local_config_dir/pillars/arvados.sls@ and edit as follows (AWS specific settings described here, other cloud providers will have similar settings in their respective configuration section):
313 # In the @arvados.cluster.Containers.CloudVMs@ section:
314 ## Set @ImageID@ to the AMI produced by Packer
315 ## Set @DriverParameters.Region@ to the appropriate AWS region
316 ## Set @DriverParameters.AdminUsername@ to the admin user account on the image
317 ## Set the @DriverParameters.SecurityGroupIDs@ list to the VPC security group which you set up to allow SSH connections to these nodes
318 ## Set @DriverParameters.SubnetID@ to the value of SubnetId of your VPC
319 # Update @arvados.cluster.Containers.DispatchPrivateKey@ and paste the contents of the @~/.ssh/id_dispatcher@ file you generated in an earlier step.
320 # Update @arvados.cluster.InstanceTypes@ as necessary. The example instance types are for AWS, other cloud providers will of course have different instance types with different names and specifications.
321 (AWS specific) If m5/c5 node types are not available, replace them with m4/c4. You'll need to double check the values for Price and IncludedScratch/AddedScratch for each type that is changed.
323 h2(#installation). Begin installation
325 At this point, you are ready to run the installer script in deploy mode that will conduct all of the Arvados installation.
327 Run this in the @~/arvados-setup-xarv1@ directory:
330 ./installer.sh deploy
333 This will install and configure Arvados on all the nodes. It will take a while and produce a lot of logging. If it runs into an error, it will stop.
335 h2(#test-install). Confirm the cluster is working
337 When everything has finished, you can run the diagnostics.
339 Depending on where you are running the installer, you need to provide @-internal-client@ or @-external-client@.
341 If you are running the diagnostics from one of the Arvados machines inside the private network, you want @-internal-client@ .
343 You are an "external client" if you running the diagnostics from your workstation outside of the private network.
346 ./installer.sh diagnostics (-internal-client|-external-client)
349 h3(#debugging). Debugging issues
351 The installer records log files for each deployment.
353 Most service logs go to @/var/log/syslog@.
355 The logs for Rails API server and for Workbench can be found in
357 @/var/www/arvados-api/current/log/production.log@
359 @/var/www/arvados-workbench/current/log/production.log@
361 on the appropriate instances.
363 Workbench 2 is a client-side Javascript application. If you are having trouble loading Workbench 2, check the browser's developer console (this can be found in "Tools → Developer Tools").
365 h3(#iterating). Iterating on config changes
367 You can iterate on the config and maintain the cluster by making changes to @local.params@ and @local_config_dir@ and running @installer.sh deploy@ again.
369 If you are debugging a configuration issue on a specific node, you can speed up the cycle a bit by deploying just one node:
372 ./installer.sh deploy keep0.xarv1.example.com@
375 However, once you have a final configuration, you should run a full deploy to ensure that the configuration has been synchronized on all the nodes.
377 h3(#common-problems). Common problems and solutions
379 h4. PG::UndefinedTable: ERROR: relation \"api_clients\" does not exist
381 The arvados-api-server package sets up the database as a post-install script. If the database host or password wasn't set correctly (or quoted correctly) at the time that package is installed, it won't be able to set up the database.
383 This will manifest as an error like this:
386 #<ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation \"api_clients\" does not exist
389 If this happens, you need to
391 1. correct the database information
392 2. run @./installer.sh deploy xarv1.example.com@ to update the configuration on the API/controller node
393 3. Log in to the API/controller server node, then run this command to re-run the post-install script, which will set up the database:
394 <pre>dpkg-reconfigure arvados-api-server</pre>
395 4. Re-run @./installer.sh deploy@ again to synchronize everything, and so that the install steps that need to contact the API server are run successfully.
397 h4. Missing ENA support (AWS Specific)
399 If the AMI wasn't built with ENA (extended networking) support and the instance type requires it, it'll fail to start. You'll see an error in syslog on the node that runs @arvados-dispatch-cloud@. The solution is to build a new AMI with --aws-ena-support true
401 h2(#initial_user). Initial user and login
403 At this point you should be able to log into the Arvados cluster. The initial URL will be
405 https://workbench.@${CLUSTER}.${DOMAIN}@
407 If you did *not* "configure a different authentication provider":#authentication you will be using the "Test" provider, and the provision script creates an initial user for testing purposes. This user is configured as administrator of the newly created cluster. It uses the values of @INITIAL_USER@ and @INITIAL_USER_PASSWORD@ the @local.params@ file.
409 If you *did* configure a different authentication provider, the first user to log in will automatically be given Arvados admin privileges.
411 h2(#post_install). After the installation
413 As part of the operation of @installer.sh@, it automatically creates a @git@ repository with your configuration templates. You should retain this repository but be aware that it contains sensitive information (passwords and tokens used by the Arvados services).
415 As described in "Iterating on config changes":#iterating you may use @installer.sh deploy@ to re-run the Salt to deploy configuration changes and upgrades. However, be aware that the configuration templates created for you by @installer.sh@ are a snapshot which are not automatically kept up to date.
417 When deploying upgrades, consult the "Arvados upgrade notes":{{site.baseurl}}/admin/upgrading.html to see if changes need to be made to the configuration file template in @local_config_dir/pillars/arvados.sls@. To specify the version to upgrade to, set the @VERSION@ parameter in @local.params@.
419 See also "Maintenance and upgrading":{{site.baseurl}}/admin/maintenance-and-upgrading.html for more information.