---
layout: default
navsection: installguide
title: Build a cloud compute node image
...
{% comment %}
Copyright (C) The Arvados Authors. All rights reserved.

SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}

{% include 'notebox_begin_warning' %}
@arvados-dispatch-cloud@ is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.
{% include 'notebox_end' %}

p(#introduction). This page describes how to build a compute node image that can be used to run containers dispatched by Arvados in the cloud.

# "Prerequisites":#prerequisites
## "Create and configure an SSH keypair":#sshkeypair
## "Get the Arvados source":#git-clone
## "Install Ansible":#install-ansible
## "Install Packer and the Ansible plugin":#install-packer
# "Fully automated build with Packer and Ansible":#building
## "Write Ansible settings for the compute node":#ansible-variables
## "Set up Packer for your cloud":#packer-variables
### "AWS":#aws-variables
### "Azure":#azure-variables
## "Run Packer":#run-packer
# "Partially automated build with Ansible":#ansible-build
## "Write Ansible settings for the compute node":#ansible-variables-standalone
## "Write an Ansible inventory":#ansible-inventory
## "Run Ansible":#run-ansible
# "Manual build":#requirements

h2(#prerequisites). Prerequisites

h3(#sshkeypair). Create and configure an SSH keypair

@arvados-dispatch-cloud@ communicates with the compute nodes via SSH. To do this securely, an SSH keypair is needed. The key type must be RSA or ED25519 to work with Amazon EC2. Generate an ED25519 keypair with no passphrase:

<notextile>
<pre><code>~$ <span class="userinput">ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_dispatcher</span>
Generating public/private ed25519 key pair.
Your identification has been saved in /home/user/.ssh/id_dispatcher.
Your public key has been saved in /home/user/.ssh/id_dispatcher.pub.
The key fingerprint is:
[...]
</code></pre>
</notextile>

After you do this, the contents of the private key in @~/.ssh/id_dispatcher@ need to be stored in your "cluster configuration file":{{ site.baseurl }}/admin/config.html under @Containers.DispatchPrivateKey@.

The public key at @~/.ssh/id_dispatcher.pub@ will need to be authorized to access instances booted from the image. Keep this file; our Ansible playbook will read it to set this up for you.

h3(#git-clone). Get the Arvados source

Compute node templates are only available in the Arvados source tree. Clone a copy of the Arvados source for the version of Arvados you're using in a directory convenient for you:

{% include 'branchname' %}
<notextile>
<pre><code>~$ <span class="userinput">git clone --depth=1 --branch=<strong>{{ branchname }}</strong> git://git.arvados.org/arvados.git ~/<strong>arvados</strong></span>
</code></pre>
</notextile>

h3(#install-ansible). Install Ansible

We provide an Ansible playbook that can run on a Debian or Ubuntu system to set up a node with all the software and configuration necessary to do Arvados compute work. "Install Ansible following their instructions":https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html. Our playbooks are tested with Ansible 8.7 and require at least Ansible 8. If you install Ansible with @pipx@ or @pip@, you can request this version by running:

<notextile>
<pre><code>~$ <span class="userinput">pipx install <strong>ansible~=8.7</strong></span>
OR ~$ <span class="userinput">pip install <strong>ansible~=8.7</strong></span>
</code></pre>
</notextile>

It is okay to install Ansible to a nonstandard location; you can configure other parts of the automation with that location.

{% include 'notebox_begin_warning' %}
Ubuntu 20.04 "focal" includes Python 3.8, which is too old to run Ansible 8 and our Ansible playbooks. If your cluster runs Ubuntu 20.04, you will need to use a system with a newer version of Python to build the compute node image. The system where you build the compute node image only needs to be able to communicate with your cloud provider. It does not need to be part of the Arvados cluster or have any Arvados client tools installed. Your Arvados cluster, and the compute node image you build, can all still be based on Ubuntu 20.04.
{% include 'notebox_end' %}

h3(#install-packer). Install Packer and the Ansible plugin

We provide Packer templates that can automatically create a compute instance, configure it with Ansible, shut it down, and create a cloud image from the result. "Install Packer following their instructions.":https://developer.hashicorp.com/packer/docs/install After you do, install Packer's Ansible provisioner by running:

<notextile>
<pre><code>~$ <span class="userinput">packer plugins install github.com/hashicorp/ansible</span>
</code></pre>
</notextile>

h2(#building). Fully automated build with Packer and Ansible

After you have both tools installed, you can configure both with information about your Arvados cluster and cloud environment and then run a fully automated build.

h3(#ansible-variables). Write Ansible settings for the compute node

In the @tools/compute-images@ directory of your Arvados source checkout, copy @host_config.example.yml@ to @host_config.yml@. Edit @host_config.yml@ with information about how your compute nodes should be set up following the instructions in the comments.

h3(#packer-variables). Set up Packer for your cloud

You need to provide different configuration to Packer depending on which cloud you're deploying Arvados in.

h4(#aws-variables). AWS

Install Packer's AWS builder by running:

<notextile>
<pre><code>~$ <span class="userinput">packer plugins install github.com/hashicorp/amazon</span>
</code></pre>
</notextile>

In the @tools/compute-images@ directory of your Arvados source checkout, copy @aws_config.example.json@ to @aws_config.json@. Fill in values for the configuration settings as follows:

* If you already have AWS credentials configured that Packer can use to create and manage an EC2 instance, set @aws_profile@ to the name of those credentials in your configuration. Otherwise, set @aws_access_key@ and @aws_secret_key@ with information from an API token with those permissions.
* Set @aws_region@, @vpc_id@, and @subnet_id@ with identifiers for the network where Packer should create the EC2 instance.
* Set @aws_source_ami@ to the AMI of the base image that should be booted and used as the base for your compute node image. Set @ssh_user@ to the name of administrator account that is used on that image.
* Set @aws_volume_gb@ to the size of of the image you want to create in GB. The default 20 should be sufficient for most installs. You may increase this if you're using a custom source AMI with more software pre-installed.
* Set @arvados_cluster@ to the same five-alphanumeric identifier used under @Clusters@ in your Arvados cluster configuration.
* If you installed Ansible to a nonstandard location, set @ansible_command@ to the absolute path of @ansible-playbook@. For example, if you installed Ansible in a virtualenv at @~/ansible@, set @ansible_command@ to {% raw %}<notextile><code class="userinput">"{{env `HOME`}}<strong>/ansible</strong>/bin/ansible-playbook"</code></notextile>{% endraw %}.

When you finish writing your configuration, "run Packer":#run-packer.

h4(#azure-variables). Azure

{% comment %}
FIXME: Incomplete
{% endcomment %}

Install Packer's Azure builder by running:

<notextile>
<pre><code>~$ <span class="userinput">packer plugins install github.com/hashicorp/azure</span>
</code></pre>
</notextile>

In the @tools/compute-images@ directory of your Arvados source checkout, copy @azure_config.example.json@ to @azure_config.json@. Fill in values for the configuration settings as follows:

* The settings load credentials from Azure's standard environment variables. As long as you have these environment variables set in the shell before you run Packer, they will be loaded as normal. Alternatively, you can set them directly in the configuration file. These secrets can be generated from the Azure portal, or with the CLI using a command like:<notextile><pre><code>~$ <span class="userinput">az ad sp create-for-rbac --name Packer --password ...</span>
</code></pre></notextile>
* Set @location@ and @resource_group@ with identifiers for where Packer should create the cloud instance.
* Set @image_sku@ to the identifier of the base image that should be booted and used as the base for your compute node image. Set @ssh_user@ to the name of administrator account you want to use on that image.
* Set @ssh_private_key_file@ to the path with the private key you generated earlier for the dispatcher to use. For example, {% raw %}<notextile><code class="userinput">"{{env `HOME`}}/.ssh/<strong>id_dispatcher</strong>"</code></notextile>{% endraw %}.
* Set @arvados_cluster@ to the same five-alphanumeric identifier used under @Clusters@ in your Arvados cluster configuration.
* If you installed Ansible to a nonstandard location, set @ansible_command@ to the absolute path of @ansible-playbook@. For example, if you installed Ansible in a virtualenv at @~/ansible@, set @ansible_command@ to {% raw %}<notextile><code class="userinput">"{{env `HOME`}}<strong>/ansible</strong>/bin/ansible-playbook"</code></notextile>{% endraw %}.

When you finish writing your configuration, "run Packer":#run-packer.

h3(#run-packer). Run Packer

In the @tools/compute-images@ directory of your Arvados source checkout, run Packer with your configuration and the template appropriate for your cloud. For example, to build an image on AWS, run:

<notextile>
<pre><code>arvados/tools/compute-images$ <span class="userinput">packer build -var-file=<strong>aws</strong>_config.json <strong>aws</strong>_template.json</span>
</code></pre>
</notextile>

To build an image on Azure, replace both instances of *@aws@* with *@azure@*, and run that command.

{% include 'notebox_begin_warning' %}
If @packer build@ fails early with @ok=0@, @changed=0@, @failed=1@, and a message like this:

<notextile>
<pre><code>TASK [Gathering Facts] *********************************************************
fatal: [default]: FAILED! =&gt; {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}

PLAY RECAP *********************************************************************
default : ok=0  changed=0  unreachable=0  failed=1  skipped=0  rescued=0  ignored=0
</code></pre>
</notextile>

This might mean the version of @scp@ on your computer is trying to use new protocol features that doesn't work with the older SSH server on the cloud image. You can work around this by running:

<notextile>
<pre><code>$ <span class="userinput">export ANSIBLE_SCP_EXTRA_ARGS="'-O'"</span>
</code></pre>
</notextile>

Then rerun your full @packer build@ command from the same shell.
{% include 'notebox_end' %}

If the build succeeds, it will report the identifier of your image at the end of the process. For example, when you build an AWS image, it will look like this:

<notextile>
<pre><code>==&gt; Builds finished. The artifacts of successful builds are:
--&gt; amazon-ebs: AMIs were created:
us-east-1: <strong>ami-012345abcdef56789</strong>
</code></pre>
</notextile>

That identifier can now be set as @CloudVMs.ImageID@ in your cluster configuration. You do not need to run any other compute node build process on this page; continue to "installing the cloud dispatcher":install-dispatch-cloud.html.

h2(#ansible-build). Partially automated build with Ansible

If Arvados does not include a template for your cloud, or you do not have permission to run Packer, you can run the Ansible playbook by itself. This can set up a base Debian or Ubuntu system with all the software and configuration necessary to do Arvados compute work. After it's done, you can manually snapshot the node and create a cloud image from it.

h3(#ansible-variables-standalone). Write Ansible settings for the compute node

In the @tools/compute-images@ directory of your Arvados source checkout, copy @host_config.example.yml@ to @host_config.yml@. Edit @host_config.yml@ with information about how your compute nodes should be set up following the instructions in the comments. Note that you *must set* @arvados_cluster_id@ in this file since you are not running Packer.

h3(#ansible-inventory). Write an Ansible inventory

The compute node playbook runs on a host named @default@. In the @tools/compute-images@ directory of your Arvados source checkout, write a file named @inventory.ini@ with information about how to connect to this node via SSH. It should be one line like this:

<notextile>
<pre><code># Example inventory.ini for an Arvados compute node
<span class="userinput">default ansible_host=<strong>192.0.2.9</strong> ansible_user=<strong>admin</strong></span>
</code></pre>
</notextile>

* @ansible_host@ can be the running node's hostname or IP address. You need to be able to reach this host from the system where you're running Ansible.
* @ansible_user@ names the user account that Ansible should use for the SSH connection. It needs to have permission to use @sudo@ on the running node.

You can add other Ansible configuration options like @ansible_port@ to your inventory if needed. Refer to the "Ansible inventory documentation":https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html for details.

h3(#run-ansible). Run Ansible

If you installed Ansible inside a virtualenv, activate that virtualenv now. Then, in the @tools/compute-images@ directory of your Arvados source checkout, run @ansible-playbook@ with your inventory and configuration:

<notextile>
<pre><code>arvados/tools/compute-images$ <span class="userinput">ansible-playbook --ask-become-pass --inventory=inventory.ini --extra-vars=@host_config.yml ../ansible/build-compute-image.yml</span>
</code></pre>
</notextile>

You'll be prompted with @BECOME password:@. Enter the password for the @ansible_user@ you defined in the inventory to use sudo on the running node.

{% include 'notebox_begin_warning' %}
If @ansible-playbook@ fails early with @ok=0@, @changed=0@, @failed=1@, and a message like this:

<notextile>
<pre><code>TASK [Gathering Facts] *********************************************************
fatal: [default]: FAILED! =&gt; {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}

PLAY RECAP *********************************************************************
default : ok=0  changed=0  unreachable=0  failed=1  skipped=0  rescued=0  ignored=0
</code></pre>
</notextile>

This might mean the version of @scp@ on your computer is trying to use new protocol features that doesn't work with the older SSH server on the cloud image. You can work around this by running:

<notextile>
<pre><code>$ <span class="userinput">export ANSIBLE_SCP_EXTRA_ARGS="'-O'"</span>
</code></pre>
</notextile>

Then rerun your full @ansible-playbook@ command from the same shell.
{% include 'notebox_end' %}

If it succeeds, Ansible should report a "PLAY RECAP" with @failed=0@:

<notextile>
<pre><code>PLAY RECAP *********************************************************************
default : ok=41  changed=37  unreachable=0  <strong>failed=0</strong>  skipped=5  rescued=0  ignored=0
</code></pre>
</notextile>

Your node is now ready to run Arvados compute work. You can snapshot the node, create an image from it, and set that image as @CloudVMs.ImageID@ in your Arvados cluster configuration. The details of that process are cloud-specific and out of scope for this documentation. You do not need to run any other compute node build process on this page; continue to "installing the cloud dispatcher":install-dispatch-cloud.html.

h2(#requirements). Manual build

If you cannot run Ansible, you can create a cloud instance, manually set it up to be a compute node, and then create an image from it. The details of this process depend on which distribution you use on the cloud instance and which cloud you use; all these variations are out of scope for this documentation. These are the requirements:

* Except on Azure, the SSH public key you generated previously must be an authorized key for the user that Crunch is configured to use. For example, if your cluster's @CloudVMs.DriverParameters.AdminUsername@ setting is *@crunch@*, then the dispatcher's public key should be listed in <notextile><code class="userinput">~<strong>crunch</strong>/.ssh/authorized_keys</code></notextile> in the image. This user must also be allowed to use sudo without a password unless the user is @root@.
  (On Azure, the dispatcher makes additional calls to automatically set up and authorize the user, making these steps unnecessary.)
* SSH needs to be running and reachable by @arvados-dispatch-cloud@ on the port named by @CloudVMs.SSHPort@ in your cluster's configuration file (default 22).
* Install the @python3-arvados-fuse@ package. Enable the @user_allow_other@ option in @/etc/fuse.conf@.
* Install either "Docker":https://docs.docker.com/engine/install/ or "Singularity":https://docs.sylabs.io/guides/3.0/user-guide/installation.html as appropriate based on the @Containers.RuntimeEngine@ setting in your cluster's configuration file. If you install Docker, you may also want to install and set up the @arvados-docker-cleaner@ package to conserve space on long-running instances, but it's not strictly required.
* All available scratch space should be made available under @/tmp@.