--- layout: default navsection: installguide title: Planning and prerequisites ... {% comment %} Copyright (C) The Arvados Authors. All rights reserved. SPDX-License-Identifier: CC-BY-SA-3.0 {% endcomment %} Before attempting installation, you should begin by reviewing supported platforms, choosing backends for identity, storage, and scheduling, and decide how you will distribute Arvados services onto machines. You should also choose an Arvados Cluster ID, choose your hostnames, and aquire TLS certificates. It may be helpful to make notes as you go along using one of these worksheets: "New cluster checklist for AWS":new_cluster_checklist_AWS.xlsx - "New cluster checklist for Azure":new_cluster_checklist_Azure.xlsx - "New cluster checklist for on premises Slurm":new_cluster_checklist_slurm.xlsx The installation guide describes how to set up a basic standalone Arvados instance. Additional configuration for features including "federation,":{{site.baseurl}}/admin/federation.html "collection versioning,":{{site.baseurl}}/admin/collection-versioning.html "managed properties,":{{site.baseurl}}/admin/collection-managed-properties.html and "storage classes":{{site.baseurl}}/admin/collection-managed-properties.html are described in the "Admin guide.":{{site.baseurl}}/admin The Arvados storage subsystem is called "keep". The compute subsystem is called "crunch". # "Supported GNU/Linux distributions":#supportedlinux # "Choosing which components to install":#components # "Identity provider":#identity # "Storage backend (Keep)":#storage # "Container compute scheduler (Crunch)":#scheduler # "Hardware or virtual machines":#machines # "Arvados Cluster ID":#clusterid # "DNS and TLS":#dnstls h2(#supportedlinux). Supported GNU/Linux distributions {% include 'supportedlinux' %} h2(#components). Choosing which components to install Arvados consists of many components, some of which may be omitted (at the cost of reduced functionality.) It may also be helpful to review the "Arvados Architecture":{{site.baseurl}}/architecture to understand how these components interact. table(table table-bordered table-condensed). |\3=. *Core*| |"PostgreSQL database":install-postgresql.html |Stores data for the API server.|Required.| |"API server + Controller":install-api-server.html |Core Arvados logic for managing users, groups, collections, containers, and enforcing permissions.|Required.| |\3=. *Keep (storage)*| |"Keepstore":install-keepstore.html |Stores content-addressed blocks in a variety of backends (local filesystem, cloud object storage).|Required.| |"Keepproxy":install-keepproxy.html |Gateway service to access keep servers from external networks.|Required to be able to use arv-put, arv-get, or arv-mount outside the private Arvados network.| |"Keep-web":install-keep-web.html |Gateway service providing read/write HTTP and WebDAV support on top of Keep.|Required to access files from Workbench.| |"Keep-balance":install-keep-balance.html |Storage cluster maintenance daemon responsible for moving blocks to their optimal server location, adjusting block replication levels, and trashing unreferenced blocks.|Required to free deleted data from underlying storage, and to ensure proper replication and block distribution (including support for storage classes).| |\3=. *User interface*| |"Workbench":install-workbench-app.html, "Workbench2":install-workbench2-app.html |Primary graphical user interface for working with file collections and running containers.|Optional. Depends on API server, keep-web, websockets server.| |"Workflow Composer":install-composer.html |Graphical user interface for editing Common Workflow Language workflows.|Optional. Depends on git server (arvados-git-httpd).| |\3=. *Additional services*| |"Websockets server":install-ws.html |Event distribution server.|Required to view streaming container logs in Workbench.| |"Shell server":install-shell-server.html |Synchronize (create/delete/configure) Unix shell accounts with Arvados users.|Optional.| |"Git server":install-arv-git-httpd.html |Arvados-hosted git repositories, with Arvados-token based authentication.|Optional, but required by Workflow Composer.| |\3=. *Crunch (running containers)*| |"arvados-dispatch-cloud":crunch2-cloud/install-dispatch-cloud.html |Allocate and free cloud VM instances on demand based on workload.|Optional, not needed for a static Slurm cluster such as on-premises HPC.| |"crunch-dispatch-slurm":crunch2-slurm/install-dispatch.html |Run analysis workflows using Docker or Singularity containers distributed across a Slurm cluster.|Optional, not needed for a Cloud installation, or if you wish to use Arvados for data management only.| |"crunch-dispatch-lsf":crunch2-lsf/install-dispatch.html |Run analysis workflows using Docker or Singularity containers distributed across an LSF cluster.|Optional, not needed for a Cloud installation, or if you wish to use Arvados for data management only.| h2(#identity). Identity provider Choose which backend you will use to authenticate users. * Google login to authenticate users with a Google account. * OpenID Connect (OIDC) if you have Single-Sign-On (SSO) service that supports the OpenID Connect standard. * LDAP login to authenticate users by username/password using the LDAP protocol, supported by many services such as OpenLDAP and Active Directory. * PAM login to authenticate users by username/password according to the PAM configuration on the controller node. h2(#postgresql). PostgreSQL Arvados works well with a standalone PostgreSQL installation. When deploying on AWS, Aurora RDS also works but Aurora Serverless is not recommended. h2(#storage). Storage backend Choose which backend you will use for storing and retrieving content-addressed Keep blocks. * File systems storage, such as ext4 or xfs, or network file systems such as GPFS or Lustre * Amazon S3, or other object storage that supports the S3 API including Google Cloud Storage and Ceph. * Azure blob storage You should also determine the desired replication factor for your data. A replication factor of 1 means only a single copy of a given data block is kept. With a conventional file system backend and a replication factor of 1, a hard drive failure is likely to lose data. For this reason the default replication factor is 2 (two copies are kept). A backend may have its own replication factor (such as durability guarantees of cloud buckets) and Arvados will take this into account when writing a new data block. h2(#scheduler). Container compute scheduler Choose which backend you will use to schedule computation. * On AWS EC2 and Azure, you probably want to use @arvados-dispatch-cloud@ to manage the full lifecycle of cloud compute nodes: starting up nodes sized to the container request, executing containers on those nodes, and shutting nodes down when no longer needed. * For on-premises HPC clusters using "slurm":https://slurm.schedmd.com/ use @crunch-dispatch-slurm@ to execute containers with slurm job submissions. * For on-premises HPC clusters using "LSF":https://www.ibm.com/products/hpc-workload-management/ use @crunch-dispatch-lsf@ to execute containers with slurm job submissions. * For single node demos, use @crunch-dispatch-local@ to execute containers directly. h2(#machines). Hardware (or virtual machines) Choose how to allocate Arvados services to machines. We recommend that each machine start with a clean installation of a supported GNU/Linux distribution. For a production installation, this is a reasonable starting point:
~$ tr -dc 0-9a-z </dev/urandom | head -c5; echo