doc/architecture/hpc.html.textile.liquid

   1 ---
   2 layout: default
   3 navsection: architecture
   4 title: Dispatching containers to HPC
   5 ...
   6 {% comment %}
   7 Copyright (C) The Arvados Authors. All rights reserved.
   8
   9 SPDX-License-Identifier: CC-BY-SA-3.0
  10 {% endcomment %}
  11
  12 Arvados can be configured to run containers on an HPC cluster using Slurm or LSF, as an alternative to "dispatching to cloud VMs":dispatchcloud.html.
  13
  14 In this configuration, the appropriate Arvados dispatcher service -- @crunch-dispatch-slurm@ or @arvados-dispatch-lsf@ -- picks up each container as it appears in the Arvados queue and submits a short shell script as a batch job to the HPC job queue. The shell script executes the @crunch-run@ container supervisor which retrieves the container specification from the Arvados controller, starts an arv-mount process, runs the container using @docker exec@ or @singularity exec@, and sends updates (logs, outputs, exit code, etc.) back to the Arvados controller.
  15
  16 h2. Container communication channel (reverse https tunnel)
  17
  18 The crunch-run program runs a gateway server to facilitate the “container shell” feature. However, depending on the site's network topology, the Arvados controller may not be able to connect directly to the compute node where a given crunch-run process is running.
  19
  20 Instead, in the HPC configuration, crunch-run connects to the Arvados controller at startup and sets up a multiplexed tunnel, allowing the controller process to connect to crunch-run's gateway server without initiating a connection to the compute node, or even knowing the compute node's IP address.
  21
  22 This means that when a client requests a container shell connection, the traffic goes through two or three servers:
  23 # The client connects to a controller host C1.
  24 # If the multiplexed tunnel is connected to a different controller host C2, then C1 proxies the incoming request to C2, using C2's InternalURL.
  25 # The controller host (C1 or C2) uses the multiplexed tunnel to connect to crunch-run's container gateway.
  26
  27 h2. Scaling
  28
  29 The @API.MaxConcurrentRequests@ configuration should not be set too low, or the long-lived tunnel connections can starve other clients.