X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/0a3d7a02236cbec448203a1b2218b5e0630d1c00..231a86fd3f7e30e9f66d71d92ad7c26578637e37:/doc/admin/metrics.html.textile.liquid diff --git a/doc/admin/metrics.html.textile.liquid b/doc/admin/metrics.html.textile.liquid index 107431267e..893eac1c83 100644 --- a/doc/admin/metrics.html.textile.liquid +++ b/doc/admin/metrics.html.textile.liquid @@ -10,12 +10,38 @@ Copyright (C) The Arvados Authors. All rights reserved. SPDX-License-Identifier: CC-BY-SA-3.0 {% endcomment %} -Metrics endpoints are found at @/status.json@ on many Arvados services. The purpose of metrics are to provide statistics about the operation of a service, suitable for diagnosing how well a service is performing under load. +Some Arvados services publish Prometheus/OpenMetrics-compatible metrics at @/metrics@, and some provide additional runtime status at @/status.json@. Metrics can help you understand how components perform under load, find performance bottlenecks, and detect and diagnose problems. -Metrics endpoints must be configured with a "management token":management-token.html . +To access metrics endpoints, services must be configured with a "management token":management-token.html. When accessing a metrics endpoint, prefix the management token with @"Bearer "@ and supply it in the @Authorization@ request header. + +
curl -sfH "Authorization: Bearer your_management_token_goes_here" "https://0.0.0.0:25107/status.json"
+
+ +h2. Keep-web + +Keep-web exports metrics at @/metrics@ -- e.g., @https://collections.zzzzz.arvadosapi.com/metrics@. + +table(table table-bordered table-condensed). +|_. Name|_. Type|_. Description| +|request_duration_seconds|summary|elapsed time between receiving a request and sending the last byte of the response body (segmented by HTTP request method and response status code)| +|time_to_status_seconds|summary|elapsed time between receiving a request and sending the HTTP response status code (segmented by HTTP request method and response status code)| + +Metrics in the @arvados_keepweb_collectioncache@ namespace report keep-web's internal cache of Arvados collection metadata. + +table(table table-bordered table-condensed). +|_. Name|_. Type|_. Description| +|arvados_keepweb_collectioncache_requests|counter|cache lookups| +|arvados_keepweb_collectioncache_api_calls|counter|outgoing API calls| +|arvados_keepweb_collectioncache_permission_hits|counter|collection-to-permission cache hits| +|arvados_keepweb_collectioncache_pdh_hits|counter|UUID-to-PDH cache hits| +|arvados_keepweb_collectioncache_hits|counter|PDH-to-manifest cache hits| +|arvados_keepweb_collectioncache_cached_manifests|gauge|number of collections in the cache| +|arvados_keepweb_collectioncache_cached_manifest_bytes|gauge|memory consumed by cached collection manifests| h2. Keepstore +Keepstore exports metrics at @/status.json@ -- e.g., @http://keep0.zzzzz.arvadosapi.com:25107/status.json@. + h3. Root table(table table-bordered table-condensed). @@ -73,6 +99,80 @@ table(table table-bordered table-condensed). |InProgress| int|| |Queued| int|| +h3. Example response + +
+{
+  "Volumes": [
+    {
+      "Label": "[UnixVolume /var/lib/arvados/keep0]",
+      "Status": {
+        "MountPoint": "/var/lib/arvados/keep0",
+        "DeviceNum": 65029,
+        "BytesFree": 222532972544,
+        "BytesUsed": 435456679936
+      },
+      "InternalStats": {
+        "Errors": 0,
+        "InBytes": 1111,
+        "OutBytes": 0,
+        "OpenOps": 1,
+        "StatOps": 4,
+        "FlockOps": 0,
+        "UtimesOps": 0,
+        "CreateOps": 0,
+        "RenameOps": 0,
+        "UnlinkOps": 0,
+        "ReaddirOps": 0
+      }
+    }
+  ],
+  "BufferPool": {
+    "BytesAllocatedCumulative": 67108864,
+    "BuffersMax": 20,
+    "BuffersInUse": 0
+  },
+  "PullQueue": {
+    "InProgress": 0,
+    "Queued": 0
+  },
+  "TrashQueue": {
+    "InProgress": 0,
+    "Queued": 0
+  },
+  "RequestsCurrent": 1,
+  "RequestsMax": 40,
+  "Version": "dev"
+}
+
+ +h2. Keep-balance + +Keep-balance exports metrics at @/metrics@ -- e.g., @http://keep.zzzzz.arvadosapi.com:9005/metrics@. + +table(table table-bordered table-condensed). +|_. Name|_. Type|_. Description| +|arvados_keep_total_{replicas,blocks,bytes}|gauge|stored data (stored in backend volumes, whether referenced or not)| +|arvados_keep_garbage_{replicas,blocks,bytes}|gauge|garbage data (unreferenced, and old enough to trash)| +|arvados_keep_transient_{replicas,blocks,bytes}|gauge|transient data (unreferenced, but too new to trash)| +|arvados_keep_overreplicated_{replicas,blocks,bytes}|gauge|overreplicated data (more replicas exist than are needed)| +|arvados_keep_underreplicated_{replicas,blocks,bytes}|gauge|underreplicated data (fewer replicas exist than are needed)| +|arvados_keep_lost_{replicas,blocks,bytes}|gauge|lost data (referenced by collections, but not found on any backend volume)| +|arvados_keep_dedup_block_ratio|gauge|deduplication ratio (block references in collections ÷ distinct blocks referenced)| +|arvados_keep_dedup_byte_ratio|gauge|deduplication ratio (block references in collections ÷ distinct blocks referenced, weighted by block size)| +|arvados_keepbalance_get_state_seconds|summary|time to get all collections and keepstore volume indexes for one iteration| +|arvados_keepbalance_changeset_compute_seconds|summary|time to compute changesets for one iteration| +|arvados_keepbalance_send_pull_list_seconds|summary|time to send pull lists to all keepstore servers for one iteration| +|arvados_keepbalance_send_trash_list_seconds|summary|time to send trash lists to all keepstore servers for one iteration| +|arvados_keepbalance_sweep_seconds|summary|time to complete one iteration| + +Each @arvados_keep_@ storage state statistic above is presented as a set of three metrics: + +table(table table-bordered table-condensed). +|*_blocks|distinct block hashes| +|*_bytes|bytes stored on backend volumes| +|*_replicas|objects/files stored on backend volumes| + h2. Node manager The node manager status end point provides a snapshot of internal status at the time of the most recent wishlist update. @@ -89,3 +189,28 @@ table(table table-bordered table-condensed). |nodes_wish|int|Number of nodes in the current wishlist| |node_quota|int|Current node count ceiling due to cloud quota limits| |config_max_nodes|int|Configured max node count| + +h3. Example + +
+{
+  "actor_exceptions": 0,
+  "idle_times": {
+    "compute1": 0,
+    "compute3": 0,
+    "compute2": 0,
+    "compute4": 0
+  },
+  "create_node_errors": 0,
+  "destroy_node_errors": 0,
+  "nodes_idle": 0,
+  "config_max_nodes": 8,
+  "list_nodes_errors": 0,
+  "node_quota": 8,
+  "Version": "1.1.4.20180719160944",
+  "nodes_wish": 0,
+  "nodes_unpaired": 0,
+  "nodes_busy": 4,
+  "boot_failures": 0
+}
+