--- layout: default navsection: admin title: Metrics ... {% comment %} Copyright (C) The Arvados Authors. All rights reserved. SPDX-License-Identifier: CC-BY-SA-3.0 {% endcomment %} Some Arvados services publish Prometheus/OpenMetrics-compatible metrics at @/metrics@, and some provide additional runtime status at @/status.json@. Metrics can help you understand how components perform under load, find performance bottlenecks, and detect and diagnose problems. To access metrics endpoints, services must be configured with a "management token":management-token.html. When accessing a metrics endpoint, prefix the management token with @"Bearer "@ and supply it in the @Authorization@ request header.
curl -sfH "Authorization: Bearer your_management_token_goes_here" "https://0.0.0.0:25107/status.json"h2. Keep-web Keep-web exports metrics at @/metrics@ -- e.g., @https://collections.zzzzz.arvadosapi.com/metrics@. table(table table-bordered table-condensed). |_. Name|_. Type|_. Description| |request_duration_seconds|summary|elapsed time between receiving a request and sending the last byte of the response body (segmented by HTTP request method and response status code)| |time_to_status_seconds|summary|elapsed time between receiving a request and sending the HTTP response status code (segmented by HTTP request method and response status code)| Metrics in the @arvados_keepweb_collectioncache@ namespace report keep-web's internal cache of Arvados collection metadata. table(table table-bordered table-condensed). |_. Name|_. Type|_. Description| |arvados_keepweb_collectioncache_requests|counter|cache lookups| |arvados_keepweb_collectioncache_api_calls|counter|outgoing API calls| |arvados_keepweb_collectioncache_permission_hits|counter|collection-to-permission cache hits| |arvados_keepweb_collectioncache_pdh_hits|counter|UUID-to-PDH cache hits| |arvados_keepweb_collectioncache_hits|counter|PDH-to-manifest cache hits| |arvados_keepweb_collectioncache_cached_manifests|gauge|number of collections in the cache| |arvados_keepweb_collectioncache_cached_manifest_bytes|gauge|memory consumed by cached collection manifests| h2. Keepstore Keepstore exports metrics at @/status.json@ -- e.g., @http://keep0.zzzzz.arvadosapi.com:25107/status.json@. h3. Root table(table table-bordered table-condensed). |_. Attribute|_. Type|_. Description| |Volumes| array of "volumeStatusEnt":#volumeStatusEnt || |BufferPool| "PoolStatus":#PoolStatus || |PullQueue| "WorkQueueStatus":#WorkQueueStatus || |TrashQueue| "WorkQueueStatus":#WorkQueueStatus || |RequestsCurrent| int || |RequestsMax| int || |Version| string || h3(#volumeStatusEnt). volumeStatusEnt table(table table-bordered table-condensed). |_. Attribute|_. Type|_. Description| |Label| string|| |Status| "VolumeStatus":#VolumeStatus || |VolumeStats| "ioStats":#ioStats || h3(#VolumeStatus). VolumeStatus table(table table-bordered table-condensed). |_. Attribute|_. Type|_. Description| |MountPoint| string|| |DeviceNum| uint64|| |BytesFree| uint64|| |BytesUsed| uint64|| h3(#ioStats). ioStats table(table table-bordered table-condensed). |_. Attribute|_. Type|_. Description| |Errors| uint64|| |Ops| uint64|| |CompareOps| uint64|| |GetOps| uint64|| |PutOps| uint64|| |TouchOps| uint64|| |InBytes| uint64|| |OutBytes| uint64|| h3(#PoolStatus). PoolStatus table(table table-bordered table-condensed). |_. Attribute|_. Type|_. Description| |BytesAllocatedCumulative| uint64|| |BuffersMax| int|| |BuffersInUse| int|| h3(#WorkQueueStatus). WorkQueueStatus table(table table-bordered table-condensed). |_. Attribute|_. Type|_. Description| |InProgress| int|| |Queued| int|| h3. Example response
{ "Volumes": [ { "Label": "[UnixVolume /var/lib/arvados/keep0]", "Status": { "MountPoint": "/var/lib/arvados/keep0", "DeviceNum": 65029, "BytesFree": 222532972544, "BytesUsed": 435456679936 }, "InternalStats": { "Errors": 0, "InBytes": 1111, "OutBytes": 0, "OpenOps": 1, "StatOps": 4, "FlockOps": 0, "UtimesOps": 0, "CreateOps": 0, "RenameOps": 0, "UnlinkOps": 0, "ReaddirOps": 0 } } ], "BufferPool": { "BytesAllocatedCumulative": 67108864, "BuffersMax": 20, "BuffersInUse": 0 }, "PullQueue": { "InProgress": 0, "Queued": 0 }, "TrashQueue": { "InProgress": 0, "Queued": 0 }, "RequestsCurrent": 1, "RequestsMax": 40, "Version": "dev" }h2. Keep-balance Keep-balance exports metrics at @/metrics@ -- e.g., @http://keep.zzzzz.arvadosapi.com:9005/metrics@. table(table table-bordered table-condensed). |_. Name|_. Type|_. Description| |arvados_keep_total_{replicas,blocks,bytes}|gauge|stored data (stored in backend volumes, whether referenced or not)| |arvados_keep_garbage_{replicas,blocks,bytes}|gauge|garbage data (unreferenced, and old enough to trash)| |arvados_keep_transient_{replicas,blocks,bytes}|gauge|transient data (unreferenced, but too new to trash)| |arvados_keep_overreplicated_{replicas,blocks,bytes}|gauge|overreplicated data (more replicas exist than are needed)| |arvados_keep_underreplicated_{replicas,blocks,bytes}|gauge|underreplicated data (fewer replicas exist than are needed)| |arvados_keep_lost_{replicas,blocks,bytes}|gauge|lost data (referenced by collections, but not found on any backend volume)| |arvados_keep_dedup_block_ratio|gauge|deduplication ratio (block references in collections ÷ distinct blocks referenced)| |arvados_keep_dedup_byte_ratio|gauge|deduplication ratio (block references in collections ÷ distinct blocks referenced, weighted by block size)| |arvados_keepbalance_get_state_seconds|summary|time to get all collections and keepstore volume indexes for one iteration| |arvados_keepbalance_changeset_compute_seconds|summary|time to compute changesets for one iteration| |arvados_keepbalance_send_pull_list_seconds|summary|time to send pull lists to all keepstore servers for one iteration| |arvados_keepbalance_send_trash_list_seconds|summary|time to send trash lists to all keepstore servers for one iteration| |arvados_keepbalance_sweep_seconds|summary|time to complete one iteration| Each @arvados_keep_@ storage state statistic above is presented as a set of three metrics: table(table table-bordered table-condensed). |*_blocks|distinct block hashes| |*_bytes|bytes stored on backend volumes| |*_replicas|objects/files stored on backend volumes| h2. Node manager The node manager status end point provides a snapshot of internal status at the time of the most recent wishlist update. table(table table-bordered table-condensed). |_. Attribute|_. Type|_. Description| |nodes_booting|int|Number of nodes in booting state| |nodes_unpaired|int|Number of nodes in unpaired state| |nodes_busy|int|Number of nodes in busy state| |nodes_idle|int|Number of nodes in idle state| |nodes_fail|int|Number of nodes in fail state| |nodes_down|int|Number of nodes in down state| |nodes_shutdown|int|Number of nodes in shutdown state| |nodes_wish|int|Number of nodes in the current wishlist| |node_quota|int|Current node count ceiling due to cloud quota limits| |config_max_nodes|int|Configured max node count| h3. Example
{ "actor_exceptions": 0, "idle_times": { "compute1": 0, "compute3": 0, "compute2": 0, "compute4": 0 }, "create_node_errors": 0, "destroy_node_errors": 0, "nodes_idle": 0, "config_max_nodes": 8, "list_nodes_errors": 0, "node_quota": 8, "Version": "1.1.4.20180719160944", "nodes_wish": 0, "nodes_unpaired": 0, "nodes_busy": 4, "boot_failures": 0 }