3 navsection: installguide
4 title: Build a cloud compute node image
7 Copyright (C) The Arvados Authors. All rights reserved.
9 SPDX-License-Identifier: CC-BY-SA-3.0
12 {% include 'notebox_begin_warning' %}
13 @arvados-dispatch-cloud@ is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.
14 {% include 'notebox_end' %}
16 p(#introduction). This page describes how to build a compute node image that can be used to run containers dispatched by Arvados in the cloud.
18 # "Prerequisites":#prerequisites
19 ## "Create and configure an SSH keypair":#sshkeypair
20 ## "Get the Arvados source":#git-clone
21 ## "Install Ansible":#install-ansible
22 ## "Install Packer and the Ansible plugin":#install-packer
23 # "Fully automated build with Packer and Ansible":#building
24 ## "Write Ansible settings for the compute node":#ansible-variables
25 ## "Set up Packer for your cloud":#packer-variables
26 ### "AWS":#aws-variables
27 ### "Azure":#azure-variables
28 ## "Run Packer":#run-packer
29 # "Partially automated build with Ansible":#ansible-build
30 ## "Write Ansible settings for the compute node":#ansible-variables-standalone
31 ## "Write an Ansible inventory":#ansible-inventory
32 ## "Run Ansible":#run-ansible
33 # "Manual build":#requirements
35 h2(#prerequisites). Prerequisites
37 h3(#sshkeypair). Create and configure an SSH keypair
39 @arvados-dispatch-cloud@ communicates with the compute nodes via SSH. To do this securely, a SSH keypair is needed. Generate a SSH keypair with no passphrase:
42 <pre><code>~$ <span class="userinput">ssh-keygen -N '' -f ~/.ssh/id_dispatcher</span>
43 Generating public/private rsa key pair.
44 Your identification has been saved in /home/user/.ssh/id_dispatcher.
45 Your public key has been saved in /home/user/.ssh/id_dispatcher.pub.
46 The key fingerprint is:
51 After you do this, the contents of the private key in @~/.ssh/id_dispatcher@ need to be stored in your "cluster configuration file":{{ site.baseurl }}/admin/config.html under @Containers.DispatchPrivateKey@.
53 The public key at @~/.ssh/id_dispatcher.pub@ will need to be authorized to access instances booted from the image. Keep this file; our Ansible playbook will read it to set this up for you.
55 h3(#git-clone). Get the Arvados source
57 Compute node templates are only available in the Arvados source tree. Clone a copy of the Arvados source for the version of Arvados you're using in a directory convenient for you:
59 {% include 'branchname' %}
61 <pre><code>~$ <span class="userinput">git clone --depth=1 --branch=<strong>{{ branchname }}</strong> git://git.arvados.org/arvados.git ~/<strong>arvados</strong></span>
65 h3(#install-ansible). Install Ansible
67 We provide an Ansible playbook that can run on a Debian or Ubuntu system to set up a node with all the software and configuration necessary to do Arvados compute work. "Install Ansible following their instructions.":https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html It is okay to install Ansible to a nonstandard location; you can configure other parts of the automation with that location.
69 h3(#install-packer). Install Packer and the Ansible plugin
71 We provide Packer templates that can automatically create a compute instance, configure it with Ansible, shut it down, and create a cloud image from the result. "Install Packer following their instructions.":https://developer.hashicorp.com/packer/docs/install After you do, install Packer's Ansible provisioner by running:
74 <pre><code>~$ <span class="userinput">packer plugins install github.com/hashicorp/ansible</span>
78 h2(#building). Fully automated build with Packer and Ansible
80 After you have both tools installed, you can configure both with information about your Arvados cluster and cloud environment and then run a fully automated build.
82 h3(#ansible-variables). Write Ansible settings for the compute node
84 In the @tools/compute-images@ directory of your Arvados source checkout, copy @host_config.example.yml@ to @host_config.yml@. Edit @host_config.yml@ with information about how your compute nodes should be set up following the instructions in the comments.
86 h3(#packer-variables). Set up Packer for your cloud
88 You need to provide different configuration to Packer depending on which cloud you're deploying Arvados in.
90 h4(#aws-variables). AWS
92 Install Packer's AWS builder by running:
95 <pre><code>~$ <span class="userinput">packer plugins install github.com/hashicorp/amazon</span>
99 In the @tools/compute-images@ directory of your Arvados source checkout, copy @aws_config.example.json@ to @aws_config.json@. Fill in values for the configuration settings as follows:
101 * If you already have AWS credentials configured that Packer can use to create and manage an EC2 instance, set @aws_profile@ to the name of those credentials in your configuration. Otherwise, set @aws_access_key@ and @aws_secret_key@ with information from an API token with those permissions.
102 * Set @aws_region@, @vpc_id@, and @subnet_id@ with identifiers for the network where Packer should create the EC2 instance.
103 * Set @aws_source_ami@ to the AMI of the base image that should be booted and used as the base for your compute node image. Set @ssh_user@ to the name of administrator account that is used on that image.
104 * Set @arvados_cluster@ to the same five-alphanumeric identifier used under @Clusters@ in your Arvados cluster configuration.
105 * If you installed Ansible to a nonstandard location, set @ansible_command@ to the absolute path of @ansible-playbook@. For example, if you installed Ansible in a virtualenv at @~/ansible@, set @ansible_command@ to {% raw %}<notextile><code class="userinput">"{{env `HOME`}}<strong>/ansible</strong>/bin/ansible-playbook"</code></notextile>{% endraw %}.
107 When you finish writing your configuration, "run Packer":#run-packer.
109 h4(#azure-variables). Azure
115 Install Packer's Azure builder by running:
118 <pre><code>~$ <span class="userinput">packer plugins install github.com/hashicorp/azure</span>
122 In the @tools/compute-images@ directory of your Arvados source checkout, copy @azure_config.example.json@ to @azure_config.json@. Fill in values for the configuration settings as follows:
124 * The settings load credentials from Azure's standard environment variables. As long as you have these environment variables set in the shell before you run Packer, they will be loaded as normal. Alternatively, you can set them directly in the configuration file. These secrets can be generated from the Azure portal, or with the CLI using a command like:<notextile><pre><code>~$ <span class="userinput">az ad sp create-for-rbac --name Packer --password ...</span>
125 </code></pre></notextile>
126 * Set @location@ and @resource_group@ with identifiers for where Packer should create the cloud instance.
127 * Set @image_sku@ to the identifier of the base image that should be booted and used as the base for your compute node image. Set @ssh_user@ to the name of administrator account you want to use on that image.
128 * Set @ssh_private_key_file@ to the path with the private key you generated earlier for the dispatcher to use. For example, {% raw %}<notextile><code class="userinput">"{{env `HOME`}}/.ssh/<strong>id_dispatcher</strong>"</code></notextile>{% endraw %}.
129 * Set @arvados_cluster@ to the same five-alphanumeric identifier used under @Clusters@ in your Arvados cluster configuration.
130 * If you installed Ansible to a nonstandard location, set @ansible_command@ to the absolute path of @ansible-playbook@. For example, if you installed Ansible in a virtualenv at @~/ansible@, set @ansible_command@ to {% raw %}<notextile><code class="userinput">"{{env `HOME`}}<strong>/ansible</strong>/bin/ansible-playbook"</code></notextile>{% endraw %}.
132 When you finish writing your configuration, "run Packer":#run-packer.
134 h3(#run-packer). Run Packer
136 In the @tools/compute-images@ directory of your Arvados source checkout, run Packer with your configuration and the template appropriate for your cloud. For example, to build an image on AWS, run:
139 <pre><code>arvados/tools/compute-images$ <span class="userinput">packer build -var-file=<strong>aws</strong>_config.json <strong>aws</strong>_template.json</span>
143 To build an image on Azure, replace both instances of *@aws@* with *@azure@*, and run that command.
145 {% include 'notebox_begin_warning' %}
146 If @packer build@ fails early with @ok=0@, @changed=0@, @failed=1@, and a message like this:
149 <pre><code>TASK [Gathering Facts] *********************************************************
150 fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}
152 PLAY RECAP *********************************************************************
153 default : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
157 This might mean the version of @scp@ on your computer is trying to use new protocol features that doesn't work with the older SSH server on the cloud image. You can work around this by running:
160 <pre><code>$ <span class="userinput">export ANSIBLE_SCP_EXTRA_ARGS="'-O'"</span>
164 Then rerun your full @packer build@ command from the same shell.
165 {% include 'notebox_end' %}
167 If the build succeeds, it will report the identifier of your image at the end of the process. For example, when you build an AWS image, it will look like this:
170 <pre><code>==> Builds finished. The artifacts of successful builds are:
171 --> amazon-ebs: AMIs were created:
172 us-east-1: <strong>ami-012345abcdef56789</strong>
176 That identifier can now be set as @CloudVMs.ImageID@ in your cluster configuration. You do not need to run any other compute node build process on this page; continue to "installing the cloud dispatcher":install-dispatch-cloud.html.
178 h2(#ansible-build). Partially automated build with Ansible
180 If Arvados does not include a template for your cloud, or you do not have permission to run Packer, you can run the Ansible playbook by itself. This can set up a base Debian or Ubuntu system with all the software and configuration necessary to do Arvados compute work. After it's done, you can manually snapshot the node and create a cloud image from it.
182 h3(#ansible-variables-standalone). Write Ansible settings for the compute node
184 In the @tools/compute-images@ directory of your Arvados source checkout, copy @host_config.example.yml@ to @host_config.yml@. Edit @host_config.yml@ with information about how your compute nodes should be set up following the instructions in the comments. Note that you *must set* @arvados_cluster_id@ in this file since you are not running Packer.
186 h3(#ansible-inventory). Write an Ansible inventory
188 The compute node playbook runs on a host named @default@. In the @tools/compute-images@ directory of your Arvados source checkout, write a file named @inventory.ini@ with information about how to connect to this node via SSH. It should be one line like this:
191 <pre><code># Example inventory.ini for an Arvados compute node
192 <span class="userinput">default ansible_host=<strong>192.0.2.9</strong> ansible_user=<strong>admin</strong></span>
196 * @ansible_host@ can be the running node's hostname or IP address. You need to be able to reach this host from the system where you're running Ansible.
197 * @ansible_user@ names the user account that Ansible should use for the SSH connection. It needs to have permission to use @sudo@ on the running node.
199 You can add other Ansible configuration options like @ansible_port@ to your inventory if needed. Refer to the "Ansible inventory documentation":https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html for details.
201 h3(#run-ansible). Run Ansible
203 If you installed Ansible inside a virtualenv, activate that virtualenv now. Then, in the @tools/compute-images@ directory of your Arvados source checkout, run @ansible-playbook@ with your inventory and configuration:
206 <pre><code>arvados/tools/compute-images$ <span class="userinput">ansible-playbook --ask-become-pass --inventory=inventory.ini --extra-vars=@host_config.yml</span>
210 You'll be prompted with @BECOME password:@. Enter the password for the @ansible_user@ you defined in the inventory to use sudo on the running node.
212 {% include 'notebox_begin_warning' %}
213 If @ansible-playbook@ fails early with @ok=0@, @changed=0@, @failed=1@, and a message like this:
216 <pre><code>TASK [Gathering Facts] *********************************************************
217 fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}
219 PLAY RECAP *********************************************************************
220 default : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
224 This might mean the version of @scp@ on your computer is trying to use new protocol features that doesn't work with the older SSH server on the cloud image. You can work around this by running:
227 <pre><code>$ <span class="userinput">export ANSIBLE_SCP_EXTRA_ARGS="'-O'"</span>
231 Then rerun your full @ansible-playbook@ command from the same shell.
232 {% include 'notebox_end' %}
234 If it succeeds, Ansible should report a "PLAY RECAP" with @failed=0@:
237 <pre><code>PLAY RECAP *********************************************************************
238 default : ok=41 changed=37 unreachable=0 <strong>failed=0</strong> skipped=5 rescued=0 ignored=0
242 Your node is now ready to run Arvados compute work. You can snapshot the node, create an image from it, and set that image as @CloudVMs.ImageID@ in your Arvados cluster configuration. The details of that process are cloud-specific and out of scope for this documentation. You do not need to run any other compute node build process on this page; continue to "installing the cloud dispatcher":install-dispatch-cloud.html.
244 h2(#requirements). Manual build
246 If you cannot run Ansible, you can create a cloud instance, manually set it up to be a compute node, and then create an image from it. The details of this process depend on which distribution you use on the cloud instance and which cloud you use; all these variations are out of scope for this documentation. These are the requirements:
248 * Except on Azure, the SSH public key you generated previously must be an authorized key for the user that Crunch is configured to use. For example, if your cluster's @CloudVMs.DriverParameters.AdminUsername@ setting is *@crunch@*, then the dispatcher's public key should be listed in <notextile><code class="userinput">~<strong>crunch</strong>/.ssh/authorized_keys</code></notextile> in the image. This user must also be allowed to use sudo without a password unless the user is @root@.
249 (On Azure, the dispatcher makes additional calls to automatically set up and authorize the user, making these steps unnecessary.)
250 * SSH needs to be running and reachable by @arvados-dispatch-cloud@ on the port named by @CloudVMs.SSHPort@ in your cluster's configuration file (default 22).
251 * Install the @python3-arvados-fuse@ package. Enable the @user_allow_other@ option in @/etc/fuse.conf@.
252 * Install either "Docker":https://docs.docker.com/engine/install/ or "Singularity":https://docs.sylabs.io/guides/3.0/user-guide/installation.html as appropriate based on the @Containers.RuntimeEngine@ setting in your cluster's configuration file. If you install Docker, you may also want to install and set up the @arvados-docker-cleaner@ package to conserve space on long-running instances, but it's not strictly required.
253 * All available scratch space should be made available under @/tmp@.