4 title: Balancing Keep servers
8 Copyright (C) The Arvados Authors. All rights reserved.
10 SPDX-License-Identifier: CC-BY-SA-3.0
13 This page describes how to balance keepstore servers using keep-balance. Keep-balance creates new copies of under-replicated blocks, deletes excess copies of over-replicated and unreferenced blocks, and moves blocks to better positions (e.g. after adding new keepstore servers) so clients find them faster.
15 See "the Keep-balance install docs":{{site.baseurl}}/install/install-keep-balance.html for installation instructions.
19 The keep-balance service determines which blocks are candidates for deletion and instructs the keepstore to move those blocks to the trash. When a block is newly written, it is protected from deletion for the duration in @BlobSigningTTL@. During this time, it cannot be trashed or deleted.
21 If keep-balance instructs keepstore to trash a block which is older than @BlobSigningTTL@, and @BlobTrashLifetime@ is non-zero, the block will be moved to "trash". A block which is in the trash is no longer accessible by read requests, but has not yet been permanently deleted. Blocks which are in the trash may be recovered using the "untrash" API endpoint. Blocks are permanently deleted after they have been in the trash for the duration in @BlobTrashLifetime@.
23 Keep-balance is also responsible for balancing the distribution of blocks across keepstore servers by asking servers to pull blocks from other servers (as determined by their "storage class":{{site.baseurl}}/admin/storage-classes.html and "rendezvous hashing order":{{site.baseurl}}/architecture/keep-clients.html#rendezvous). Pulling a block makes a copy. If a block is overreplicated (i.e. there are excess copies) after pulling, it will be subsequently trashed and deleted on the original server, subject to @BlobTrash@ and @BlobTrashLifetime@ settings.
27 By default, keep-balance operates periodically, i.e. do a scan/balance operation, sleep, repeat.
29 The @Collections.BalancePeriod@ value in @/etc/arvados/config.yml@ determines the interval between start times of successive scan/balance operations. If an operation takes longer than the @Collections.BalancePeriod@, the next operation will follow it immediately. If SIGUSR1 is received during an idle period between operations, the next operation will start immediately.
31 Keep-balance can also be run with the @-once@ flag to do a single scan/balance operation and then exit. The exit code will be zero if the operation was successful.
33 h3. Additional configuration
35 For configuring resource usage tuning and lost block reporting, please see the @Collections.BlobMissingReport@, @Collections.BalanceCollectionBatch@, @Collections.BalanceCollectionBuffers@ option in the "default config.yml file":{{site.baseurl}}/admin/config.html.
37 The @Collections.BalancePullLimit@ and @Collections.BalanceTrashLimit@ configuration entries determine the maximum number of pull and trash operations keep-balance will attempt to apply on each keepstore server. If both values are zero, keep-balance will operate in "dry run" mode, where all changes are computed but none are committed.
41 Keep-balance does not attempt to discover whether committed pull and trash requests ever get carried out -- only that they are accepted by the Keep services. If some services are full, new copies of under-replicated blocks might never get made, only repeatedly requested.