From: Tom Clegg Date: Thu, 23 Jun 2022 04:56:31 +0000 (-0400) Subject: 19166: Explain HPC container shell in architecture docs. X-Git-Tag: 2.5.0~106^2~13 X-Git-Url: https://git.arvados.org/arvados.git/commitdiff_plain/9587429b4ee56fe9a1ca3555ecebd04e0dae929d?hp=c4bae86d39f237df8ac6a5505323f6a93011514a 19166: Explain HPC container shell in architecture docs. Arvados-DCO-1.1-Signed-off-by: Tom Clegg --- diff --git a/doc/_config.yml b/doc/_config.yml index 7c5e6d986e..2f31336185 100644 --- a/doc/_config.yml +++ b/doc/_config.yml @@ -161,6 +161,7 @@ navbar: - Computation with Crunch: - api/execution.html.textile.liquid - architecture/dispatchcloud.html.textile.liquid + - architecture/hpc.html.textile.liquid - architecture/singularity.html.textile.liquid - Other: - api/permission-model.html.textile.liquid diff --git a/doc/architecture/hpc.html.textile.liquid b/doc/architecture/hpc.html.textile.liquid new file mode 100644 index 0000000000..03a464971e --- /dev/null +++ b/doc/architecture/hpc.html.textile.liquid @@ -0,0 +1,29 @@ +--- +layout: default +navsection: architecture +title: Dispatching containers to HPC +... +{% comment %} +Copyright (C) The Arvados Authors. All rights reserved. + +SPDX-License-Identifier: CC-BY-SA-3.0 +{% endcomment %} + +Arvados can be configured to run containers on an HPC cluster using Slurm or LSF, as an alternative to "dispatching to cloud VMs":dispatchcloud.html. + +In this configuration, the appropriate Arvados dispatcher service -- @crunch-dispatch-slurm@ or @arvados-dispatch-lsf@ -- picks up each container as it appears in the Arvados queue and submits a short shell script as a batch job to the HPC job queue. The shell script executes the @crunch-run@ container supervisor which retrieves the container specification from the Arvados controller, starts an arv-mount process, runs the container using @docker exec@ or @singularity exec@, and sends updates (logs, outputs, exit code, etc.) back to the Arvados controller. + +h2. Container communication channel (reverse https tunnel) + +The crunch-run program runs a gateway server to facilitate the “container shell” feature. However, depending on the site's network topology, the Arvados controller may not be able to connect directly to the compute node where a given crunch-run process is running. + +Instead, in the HPC configuration, crunch-run connects to the Arvados controller at startup and sets up a multiplexed tunnel, allowing the controller process to connect to crunch-run's gateway server without initiating a connection to the compute node, or even knowing the compute node's IP address. + +This means that when a client requests a container shell connection, the traffic goes through two or three servers: +# The client connects to a controller host C1. +# If the multiplexed tunnel is connected to a different controller host C2, then C1 proxies the incoming request to C2, using C2's InternalURL. +# The controller host (C1 or C2) uses the multiplexed tunnel to connect to crunch-run's container gateway. + +h2. Scaling + +The @API.MaxConcurrentRequests@ configuration should not be set too low, or the long-lived tunnel connections can starve other clients.