- install/install-shell-server.html.textile.liquid
- install/install-webshell.html.textile.liquid
- install/install-arv-git-httpd.html.textile.liquid
- - Containers API (cloud):
+ - Containers API (all):
- install/install-jobs-image.html.textile.liquid
+ - Containers API (cloud):
- install/crunch2-cloud/install-compute-node.html.textile.liquid
- install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid
- - Containers API (slurm):
+ - Compute nodes (Slurm or LSF):
+ - install/crunch2/install-compute-node-docker.html.textile.liquid
+ - install/crunch2/install-compute-node-singularity.html.textile.liquid
+ - Containers API (Slurm):
- install/crunch2-slurm/install-dispatch.html.textile.liquid
- install/crunch2-slurm/configure-slurm.html.textile.liquid
- - install/crunch2-slurm/install-compute-node.html.textile.liquid
- install/crunch2-slurm/install-test.html.textile.liquid
- - Containers API (lsf):
+ - Containers API (LSF):
- install/crunch2-lsf/install-dispatch.html.textile.liquid
- Additional configuration:
- install/singularity.html.textile.liquid
h2. Scheduling parameters
-Parameters to be passed to the container scheduler (e.g., SLURM) when running a container.
+Parameters to be passed to the container scheduler (e.g., Slurm) when running a container.
table(table table-bordered table-condensed).
|_. Key|_. Type|_. Description|_. Notes|
h2(#fuse). Update fuse.conf
-{% include 'notebox_begin_warning' %}
-This is only needed when Containers.RuntimeEngine is set to @docker@, skip this section when running @singularity@.
-{% include 'notebox_end' %}
-
FUSE must be configured with the @user_allow_other@ option enabled for Crunch to set up Keep mounts that are readable by containers. Install this file as @/etc/fuse.conf@:
<notextile>
h2(#docker-cleaner). Update docker-cleaner.json
-{% include 'notebox_begin_warning' %}
-This is only needed when Containers.RuntimeEngine is set to @docker@, skip this section when running @singularity@.
-{% include 'notebox_end' %}
-
The @arvados-docker-cleaner@ program removes least recently used Docker images as needed to keep disk usage below a configured limit.
Create a file @/etc/arvados/docker-cleaner/docker-cleaner.json@ in an editor, with the following contents.
# To submit work, create a "container request":{{site.baseurl}}/api/methods/container_requests.html in the @Committed@ state.
# The system will fufill the container request by creating or reusing a "Container object":{{site.baseurl}}/api/methods/containers.html and assigning it to the @container_uuid@ field. If the same request has been submitted in the past, it may reuse an existing container. The reuse behavior can be suppressed with @use_existing: false@ in the container request.
-# The dispatcher process will notice a new container in @Queued@ state and submit a container executor to the underlying work queuing system (such as SLURM).
+# The dispatcher process will notice a new container in @Queued@ state and submit a container executor to the underlying work queuing system (such as Slurm).
# The container executes. Upon termination the container goes into the @Complete@ state. If the container execution was interrupted or lost due to system failure, it will go into the @Cancelled@ state.
# When the container associated with the container request is completed, the container request will go into the @Final@ state.
# The @output_uuid@ field of the container request contains the uuid of output collection produced by container request.
Priority 1000 is the highest priority.
-The actual order that containers execute is determined by the underlying scheduling software (e.g. SLURM) and may be based on a combination of container priority, submission time, available resources, and other factors.
+The actual order that containers execute is determined by the underlying scheduling software (e.g. Slurm) and may be based on a combination of container priority, submission time, available resources, and other factors.
In the current implementation, the magnitude of difference in priority between two containers affects the weight of priority vs age in determining scheduling order. If two containers have only a small difference in priority (for example, 500 and 501) and the lower priority container has a longer queue time, the lower priority container may be scheduled before the higher priority container. Use a greater magnitude difference (for example, 500 and 600) to give higher weight to priority over queue time.
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-Arvados can be configured to use "Singularity":https://sylabs.io/singularity/ instead of Docker to execute containers on cloud nodes or a SLURM/LSF cluster. Singularity may be preferable due to its simpler installation and lack of long-running daemon process and special system users/groups. See the "Singularity page in the installation guide":{{ site.baseurl }}/install/singularity.html for configuration details.
+Arvados can be configured to use "Singularity":https://sylabs.io/singularity/ instead of Docker to execute containers on cloud nodes or a Slurm/LSF cluster. Singularity may be preferable due to its simpler installation and lack of long-running daemon process and special system users/groups. See the "Singularity page in the installation guide":{{ site.baseurl }}/install/singularity.html for configuration details.
h2. Design overview
Arvados @Singularity@ support is a work in progress. These are the current limitations of the implementation:
* Even when using the Singularity runtime, users' container images are expected to be saved in Docker format. Specifying a @.sif@ file as an image when submitting a container request is not yet supported.
-* Arvados' Singularity implementation does not yet limit the amount of memory available in a container. Each container will have access to all memory on the host where it runs, unless memory use is restricted by SLURM/LSF.
+* Arvados' Singularity implementation does not yet limit the amount of memory available in a container. Each container will have access to all memory on the host where it runs, unless memory use is restricted by Slurm/LSF.
* The Docker ENTRYPOINT instruction is ignored.
* Arvados is tested with Singularity version 3.7.4. Other versions may not work.
{% endcomment %}
{% include 'notebox_begin_warning' %}
-arvados-dispatch-cloud is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm.
+@arvados-dispatch-cloud@ is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.
{% include 'notebox_end' %}
# "Introduction":#introduction
{% endcomment %}
{% include 'notebox_begin_warning' %}
-arvados-dispatch-cloud is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm.
+@arvados-dispatch-cloud@ is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.
{% include 'notebox_end' %}
# "Introduction":#introduction
{% endcomment %}
{% include 'notebox_begin_warning' %}
-arvados-dispatch-lsf is only relevant for on premises clusters that will spool jobs to LSF. Skip this section if you are installing a cloud cluster.
+@arvados-dispatch-lsf@ is only relevant for on premises clusters that will spool jobs to LSF. Skip this section if you use Slurm or if you are installing a cloud cluster.
{% include 'notebox_end' %}
h2(#overview). Overview
In order to run containers, you must choose a user that has permission to set up FUSE mounts and run Singularity/Docker containers on each compute node. This install guide refers to this user as the @crunch@ user. We recommend you create this user on each compute node with the same UID and GID, and add it to the @fuse@ and @docker@ system groups to grant it the necessary permissions. However, you can run the dispatcher under any account with sufficient permissions across the cluster.
-Set up all of your compute nodes "as you would for a SLURM cluster":../crunch2-slurm/install-compute-node.html.
+Set up all of your compute nodes with "Docker":../crunch2/install-compute-node-singularity.html or "Singularity":../crunch2/install-compute-node-docker.html.
*Current limitations*:
* Arvados container priority is not propagated to LSF job priority. This can cause inefficient use of compute resources, and even deadlock if there are fewer compute nodes than concurrent Arvados workflows.
{% endcomment %}
{% include 'notebox_begin_warning' %}
-crunch-dispatch-slurm is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you are installing a cloud cluster.
+@crunch-dispatch-slurm@ is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you use LSF or if you are installing a cloud cluster.
{% include 'notebox_end' %}
-Containers can be dispatched to a Slurm cluster. The dispatcher sends work to the cluster using Slurm's @sbatch@ command, so it works in a variety of SLURM configurations.
+Containers can be dispatched to a Slurm cluster. The dispatcher sends work to the cluster using Slurm's @sbatch@ command, so it works in a variety of Slurm configurations.
In order to run containers, you must run the dispatcher as a user that has permission to set up FUSE mounts and run Docker containers on each compute node. This install guide refers to this user as the @crunch@ user. We recommend you create this user on each compute node with the same UID and GID, and add it to the @fuse@ and @docker@ system groups to grant it the necessary permissions. However, you can run the dispatcher under any account with sufficient permissions across the cluster.
Whenever you change this file, you will need to update the copy _on every compute node_ as well as the controller node, and then run @sudo scontrol reconfigure@.
-*@ControlMachine@* should be a DNS name that resolves to the Slurm controller (dispatch/API server). This must resolve correctly on all Slurm worker nodes as well as the controller itself. In general SLURM is very sensitive about all of the nodes being able to communicate with the controller _and one another_, all using the same DNS names.
+*@ControlMachine@* should be a DNS name that resolves to the Slurm controller (dispatch/API server). This must resolve correctly on all Slurm worker nodes as well as the controller itself. In general Slurm is very sensitive about all of the nodes being able to communicate with the controller _and one another_, all using the same DNS names.
*@SelectType=select/linear@* is needed on cloud-based installations that update node sizes dynamically, but it can only schedule one container at a time on each node. On a static or homogeneous cluster, use @SelectType=select/cons_res@ with @SelectTypeParameters=CR_CPU_Memory@ instead to enable node sharing.
* In @application.yml@: <code>assign_node_hostname: worker1-%<slot_number>04d</code>
* In @slurm.conf@: <code>NodeName=worker1-[0000-0255]</code>
-If your worker hostnames are already assigned by other means, and the full set of names is known in advance, have your worker node bootstrapping script (see "Installing a compute node":install-compute-node.html) send its current hostname, rather than expect Arvados to assign one.
+If your worker hostnames are already assigned by other means, and the full set of names is known in advance, have your worker node bootstrapping script send its current hostname, rather than expect Arvados to assign one.
* In @application.yml@: <code>assign_node_hostname: false</code>
* In @slurm.conf@: <code>NodeName=alice,bob,clay,darlene</code>
{% endcomment %}
{% include 'notebox_begin_warning' %}
-crunch-dispatch-slurm is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you are installing a cloud cluster.
+@crunch-dispatch-slurm@ is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you use LSF or if you are installing a cloud cluster.
{% include 'notebox_end' %}
# "Introduction":#introduction
h2(#introduction). Introduction
-This assumes you already have a Slurm cluster, and have "set up all of your compute nodes":install-compute-node.html. Slurm packages are available for CentOS, Debian and Ubuntu. Please see your distribution package repositories. For information on installing Slurm from source, see "this install guide":https://slurm.schedmd.com/quickstart_admin.html
+This assumes you already have a Slurm cluster, and have set up all of your compute nodes with "Docker":../crunch2/install-compute-node-docker.html or "Singularity":../crunch2/install-compute-node-singularity.html. Slurm packages are available for CentOS, Debian and Ubuntu. Please see your distribution package repositories. For information on installing Slurm from source, see "this install guide":https://slurm.schedmd.com/quickstart_admin.html
The Arvados Slurm dispatcher can run on any node that can submit requests to both the Arvados API server and the Slurm controller (via @sbatch@). It is not resource-intensive, so you can run it on the API server node.
h3(#PrioritySpread). Containers.Slurm.PrioritySpread
crunch-dispatch-slurm adjusts the "nice" values of its Slurm jobs to ensure containers are prioritized correctly relative to one another. This option tunes the adjustment mechanism.
-* If non-Arvados jobs run on your Slurm cluster, and your Arvados containers are waiting too long in the Slurm queue because their "nice" values are too high for them to compete with other SLURM jobs, you should use a smaller PrioritySpread value.
+* If non-Arvados jobs run on your Slurm cluster, and your Arvados containers are waiting too long in the Slurm queue because their "nice" values are too high for them to compete with other Slurm jobs, you should use a smaller PrioritySpread value.
* If you have an older Slurm system that limits nice values to 10000, a smaller @PrioritySpread@ can help avoid reaching that limit.
* In other cases, a larger value is beneficial because it reduces the total number of adjustments made by executing @scontrol@.
Some versions of Docker (at least 1.9), when run under systemd, require the cgroup parent to be specified as a systemd slice. This causes an error when specifying a cgroup parent created outside systemd, such as those created by Slurm.
-You can work around this issue by disabling the Docker daemon's systemd integration. This makes it more difficult to manage Docker services with systemd, but Crunch does not require that functionality, and it will be able to use Slurm's cgroups as container parents. To do this, "configure the Docker daemon on all compute nodes":install-compute-node.html#configure_docker_daemon to run with the option @--exec-opt native.cgroupdriver=cgroupfs@.
+You can work around this issue by disabling the Docker daemon's systemd integration. This makes it more difficult to manage Docker services with systemd, but Crunch does not require that functionality, and it will be able to use Slurm's cgroups as container parents. To do this, configure the Docker daemon on all compute nodes to run with the option @--exec-opt native.cgroupdriver=cgroupfs@.
{% include 'notebox_end' %}
{% endcomment %}
{% include 'notebox_begin_warning' %}
-crunch-dispatch-slurm is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you are installing a cloud cluster.
+@crunch-dispatch-slurm@ is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you use LSF or if you are installing a cloud cluster.
{% include 'notebox_end' %}
h2. Test compute node setup
h2. Test the dispatcher
+Make sure all of your compute nodes are set up with "Docker":../crunch2/install-compute-node-singularity.html or "Singularity":../crunch2/install-compute-node-docker.html.
+
On the dispatch node, start monitoring the crunch-dispatch-slurm logs:
<notextile>
---
layout: default
navsection: installguide
-title: Set up a Slurm compute node
+title: Set up a compute node with Docker
...
{% comment %}
Copyright (C) The Arvados Authors. All rights reserved.
{% endcomment %}
{% include 'notebox_begin_warning' %}
-crunch-dispatch-slurm is only relevant for on premises clusters that will spool jobs to Slurm. Skip this section if you are installing a cloud cluster.
+This page describes the requirements for a compute node in a Slurm or LSF cluster that will run containers dispatched by @crunch-dispatch-slurm@ or @arvados-dispatch-lsf@. If you are installing a cloud cluster, refer to "Build a cloud compute node image":/install/crunch2-cloud/install-compute-node.html.
+{% include 'notebox_end' %}
+
+{% include 'notebox_begin_warning' %}
+These instructions apply when Containers.RuntimeEngine is set to @docker@, refer to "Set up a compute node with Singularity":install-compute-node-singularity.html when running @singularity@.
{% include 'notebox_end' %}
# "Introduction":#introduction
# "Set up Docker":#docker
# "Update fuse.conf":#fuse
# "Update docker-cleaner.json":#docker-cleaner
-# "Configure Linux cgroups accounting":#cgroups
-# "Install Docker":#install_docker
-# "Configure the Docker daemon":#configure_docker_daemon
# "Install'python-arvados-fuse and crunch-run and arvados-docker-cleaner":#install-packages
h2(#introduction). Introduction
--- /dev/null
+---
+layout: default
+navsection: installguide
+title: Set up a compute node with Singularity
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+{% include 'notebox_begin_warning' %}
+This page describes the requirements for a compute node in a Slurm or LSF cluster that will run containers dispatched by @crunch-dispatch-slurm@ or @arvados-dispatch-lsf@. If you are installing a cloud cluster, refer to "Build a cloud compute node image":/install/crunch2-cloud/install-compute-node.html.
+{% include 'notebox_end' %}
+
+{% include 'notebox_begin_warning' %}
+These instructions apply when Containers.RuntimeEngine is set to @singularity@, refer to "Set up a Slurm compute node with Docker":install-compute-node-docker.html when running @docker@.
+{% include 'notebox_end' %}
+
+# "Introduction":#introduction
+# "Set up Singularity":#singularity
+# "Update fuse.conf":#fuse
+# "Install'python-arvados-fuse and crunch-run":#install-packages
+
+h2(#introduction). Introduction
+
+This page describes how to configure a compute node so that it can be used to run containers dispatched by Arvados, with Slurm on a static cluster. These steps must be performed on every compute node.
+
+h2(#singularity). Set up Singularity
+
+See "Singularity container runtime":../singularity.html
+
+{% assign arvados_component = 'python-arvados-fuse crunch-run' %}
+
+{% include 'install_packages' %}
h2(#configuration). Configuration
-To use singularity, first make sure "Singularity is installed":https://sylabs.io/guides/3.7/user-guide/quick_start.html on your cloud worker image or SLURM/LSF compute nodes as applicable. Note @squashfs-tools@ is required.
+To use singularity, first make sure "Singularity is installed":https://sylabs.io/guides/3.7/user-guide/quick_start.html on your cloud worker image or Slurm/LSF compute nodes as applicable. Note @squashfs-tools@ is required.
<notextile>
<pre><code>$ <span class="userinput">singularity version</span>
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-This section documents language bindings for the "Arvados API":{{site.baseurl}}/api and Keep that are available for various programming languages. Not all features are available in every SDK. The most complete SDK is the Python SDK. Note that this section only gives a high level overview of each SDK. Consult the "Arvados API":{{site.baseurl}}/api section for detailed documentation about Arvados API calls available on each resource.
+This section documents language bindings for the "Arvados API":{{site.baseurl}}/api/index.html and Keep that are available for various programming languages. Not all features are available in every SDK. The most complete SDK is the Python SDK. Note that this section only gives a high level overview of each SDK. Consult the "Arvados API":{{site.baseurl}}/api/index.html section for detailed documentation about Arvados API calls available on each resource.
* "Python SDK":{{site.baseurl}}/sdk/python/sdk-python.html (also includes essential command line tools such as "arv-put" and "arv-get")
* "Command line SDK":{{site.baseurl}}/sdk/cli/install.html ("arv")