minio/docs/metrics/v3.md
Bala FA 7edc352d23
Add ILM metrics in metrics-v3 (#19539)
Signed-off-by: Bala.FA <bala@minio.io>
2024-06-06 02:36:25 -07:00

41 KiB

Metrics Version 3

In metrics version 3, all metrics are available under the endpoint:

/minio/metrics/v3

however, a specific path under this is required.

Metrics are organized into groups at paths relative to the top-level endpoint above.

Metrics Request Handling

Each endpoint below can be queried at different intervals as needed via a scrape configuration in Prometheus or a compatible metrics collection tool.

For ease of configuration, each (non-empty) parent of the path serves all metric endpoints that are at descendant paths. For example, to query all system metrics one needs to only scrape /minio/metrics/v3/system/.

Some metrics are bucket specific. These will have a /bucket component in their path. As the number of buckets can be large, the metrics scrape operation needs to be provided with a specific list of buckets via the bucket query parameter. Only metrics for the given buckets will be returned (with the bucket label set). For example to query API metrics for buckets test1 and test2, make a scrape request to /minio/metrics/v3/api/bucket?buckets=test1,test2.

Instead of a metrics scrape, it is also possible to list the metrics that would be returned by a path. This is done by adding a ?list query parameter. The MinIO server will then list all possible metrics that could be returned. During an actual metrics scrape, only available metrics are returned - not all of them. With the list query parameter, the output format can be selected - just set the request Content-Type to application/json for JSON output, or text/plain for a simple markdown formatted table. The latter is the default.

Request, System and Cluster Metrics

At a high level metrics are grouped into three categories, listed in the following sub-sections. The path in each of the tables is relative to the top-level endpoint.

Request metrics

These are metrics about requests served by the (current) node.

Path Description
/api/requests Metrics over all requests
/bucket/api Metrics over all requests for a given bucket

Audit metrics

These are metrics about the minio audit functionality

Path Description
/audit Metrics related to audit functionality

ILM metrics

These are metrics about the minio ILM functionality

Path Description
/ilm Metrics related to ILM functionality

Logger webhook metrics

These are metrics about the minio logger webhooks

Path Description
/logger/webhook Metrics related to logger webhooks

Notification metrics

These are metrics about the minio notification functionality

Path Description
/notification Metrics related to notification functionality

Scanner metrics

These are metrics about the minio scanner

Path Description
/scanner Metrics related to the MinIO scanner

System metrics

These are metrics about the minio process and the node.

Path Description
/system/drive Metrics about drives on the system
/system/memory Metrics about memory on the system
/system/network/internode Metrics about internode requests made by the node
/system/process Standard process metrics

Debug metrics

These are metrics for debugging

Path Description
/debug/go Standard Go lang metrics

Cluster metrics

These present metrics about the whole MinIO cluster.

Path Description
/cluster/health Cluster health metrics
/cluster/usage/objects Object statistics
/cluster/usage/buckets Object statistics by bucket
/cluster/erasure-set Erasure set metrics

Metrics Listing

Each of the following sub-sections list metrics returned by each of the endpoints.

The standard metrics group for GoCollector is not shown below.

/api/requests

Name Type Help Labels
minio_api_requests_rejected_auth_total counter Total number of requests rejected for auth failure type,pool_index,server
minio_api_requests_rejected_header_total counter Total number of requests rejected for invalid header type,pool_index,server
minio_api_requests_rejected_timestamp_total counter Total number of requests rejected for invalid timestamp type,pool_index,server
minio_api_requests_rejected_invalid_total counter Total number of invalid requests type,pool_index,server
minio_api_requests_waiting_total gauge Total number of requests in the waiting queue type,pool_index,server
minio_api_requests_incoming_total gauge Total number of incoming requests type,pool_index,server
minio_api_requests_inflight_total gauge Total number of requests currently in flight name,type,pool_index,server
minio_api_requests_total counter Total number of requests name,type,pool_index,server
minio_api_requests_errors_total counter Total number of requests with (4xx and 5xx) errors name,type,pool_index,server
minio_api_requests_5xx_errors_total counter Total number of requests with 5xx errors name,type,pool_index,server
minio_api_requests_4xx_errors_total counter Total number of requests with 4xx errors name,type,pool_index,server
minio_api_requests_canceled_total counter Total number of requests canceled by the client name,type,pool_index,server
minio_api_requests_ttfb_seconds_distribution counter Distribution of time to first byte across API calls name,type,le,pool_index,server
minio_api_requests_traffic_sent_bytes counter Total number of bytes sent type,pool_index,server
minio_api_requests_traffic_received_bytes counter Total number of bytes received type,pool_index,server

/bucket/api

Name Type Help Labels
minio_bucket_api_traffic_received_bytes counter Total number of bytes sent for a bucket bucket,type,server,pool_index
minio_bucket_api_traffic_sent_bytes counter Total number of bytes received for a bucket bucket,type,server,pool_index
minio_bucket_api_inflight_total gauge Total number of requests currently in flight for a bucket bucket,name,type,server,pool_index
minio_bucket_api_total counter Total number of requests for a bucket bucket,name,type,server,pool_index
minio_bucket_api_canceled_total counter Total number of requests canceled by the client for a bucket bucket,name,type,server,pool_index
minio_bucket_api_4xx_errors_total counter Total number of requests with 4xx errors for a bucket bucket,name,type,server,pool_index
minio_bucket_api_5xx_errors_total counter Total number of requests with 5xx errors for a bucket bucket,name,type,server,pool_index
minio_bucket_api_ttfb_seconds_distribution counter Distribution of time to first byte across API calls for a bucket bucket,name,le,type,server,pool_index

/bucket/replication

Name Type Help Labels
minio_bucket_replication_last_hour_failed_bytes gauge Total number of bytes failed at least once to replicate in the last hour on a bucket bucket,server
minio_bucket_replication_last_hour_failed_count gauge Total number of objects which failed replication in the last hour on a bucket bucket,server
minio_bucket_replication_last_minute_failed_bytes gauge Total number of bytes failed at least once to replicate in the last full minute on a bucket bucket,server
minio_bucket_replication_last_minute_failed_count gauge Total number of objects which failed replication in the last full minute on a bucket bucket,server
minio_bucket_replication_latency_ms gauge Replication latency on a bucket in milliseconds bucket,operation,range,targetArn,server
minio_bucket_replication_proxied_delete_tagging_requests_total counter Number of DELETE tagging requests proxied to replication target bucket,server
minio_bucket_replication_proxied_get_requests_failures counter Number of failures in GET requests proxied to replication target bucket,server
minio_bucket_replication_proxied_get_requests_total counter Number of GET requests proxied to replication target bucket,server
minio_bucket_replication_proxied_get_tagging_requests_failures counter Number of failures in GET tagging requests proxied to replication target bucket,server
minio_bucket_replication_proxied_get_tagging_requests_total counter Number of GET tagging requests proxied to replication target bucket,server
minio_bucket_replication_proxied_head_requests_failures counter Number of failures in HEAD requests proxied to replication target bucket,server
minio_bucket_replication_proxied_head_requests_total counter Number of HEAD requests proxied to replication target bucket,server
minio_bucket_replication_proxied_put_tagging_requests_failures counter Number of failures in PUT tagging requests proxied to replication target bucket,server
minio_bucket_replication_proxied_put_tagging_requests_total counter Number of PUT tagging requests proxied to replication target bucket,server
minio_bucket_replication_sent_bytes counter Total number of bytes replicated to the target bucket,server
minio_bucket_replication_sent_count counter Total number of objects replicated to the target bucket,server
minio_bucket_replication_total_failed_bytes counter Total number of bytes failed at least once to replicate since server start bucket,server
minio_bucket_replication_total_failed_count counter Total number of objects which failed replication since server start bucket,server
minio_bucket_replication_proxied_delete_tagging_requests_failures counter Number of failures in DELETE tagging requests proxied to replication target bucket,server

/audit

Name Type Help Labels
minio_audit_failed_messages counter Total number of messages that failed to send since start target_id,server
minio_audit_target_queue_length gauge Number of unsent messages in queue for target target_id,server
minio_audit_total_messages counter Total number of messages sent since start target_id,server

/system/drive

Name Type Help Labels
minio_system_drive_used_bytes gauge Total storage used on a drive in bytes drive,set_index,drive_index,pool_index,server
minio_system_drive_free_bytes gauge Total storage free on a drive in bytes drive,set_index,drive_index,pool_index,server
minio_system_drive_total_bytes gauge Total storage available on a drive in bytes drive,set_index,drive_index,pool_index,server
minio_system_drive_used_inodes gauge Total used inodes on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_free_inodes gauge Total free inodes on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_total_inodes gauge Total inodes available on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_timeout_errors_total counter Total timeout errors on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_io_errors_total counter Total I/O errors on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_availability_errors_total counter Total availability errors (I/O errors, timeouts) on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_waiting_io gauge Total waiting I/O operations on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_api_latency_micros gauge Average last minute latency in µs for drive API storage operations drive,api,set_index,drive_index,pool_index,server
minio_system_drive_offline_count gauge Count of offline drives pool_index,server
minio_system_drive_online_count gauge Count of online drives pool_index,server
minio_system_drive_count gauge Count of all drives pool_index,server
minio_system_drive_health gauge Drive health (0 = offline, 1 = healthy, 2 = healing) drive,set_index,drive_index,pool_index,server
minio_system_drive_reads_per_sec gauge Reads per second on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_reads_kb_per_sec gauge Kilobytes read per second on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_reads_await gauge Average time for read requests served on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_writes_per_sec gauge Writes per second on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_writes_kb_per_sec gauge Kilobytes written per second on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_writes_await gauge Average time for write requests served on a drive drive,set_index,drive_index,pool_index,server
minio_system_drive_perc_util gauge Percentage of time the disk was busy drive,set_index,drive_index,pool_index,server

/system/memory

Name Type Help Labels
minio_system_memory_used gauge Used memory on the node server
minio_system_memory_used_perc gauge Used memory percentage on the node server
minio_system_memory_free gauge Free memory on the node server
minio_system_memory_total gauge Total memory on the node server
minio_system_memory_buffers gauge Buffers memory on the node server
minio_system_memory_cache gauge Cache memory on the node server
minio_system_memory_shared gauge Shared memory on the node server
minio_system_memory_available gauge Available memory on the node server

/system/cpu

Name Type Help Labels
minio_system_cpu_avg_idle gauge Average CPU idle time server
minio_system_cpu_avg_iowait gauge Average CPU IOWait time server
minio_system_cpu_load gauge CPU load average 1min server
minio_system_cpu_load_perc gauge CPU load average 1min (percentage) server
minio_system_cpu_nice gauge CPU nice time server
minio_system_cpu_steal gauge CPU steal time server
minio_system_cpu_system gauge CPU system time server
minio_system_cpu_user gauge CPU user time server

/system/network/internode

Name Type Help Labels
minio_system_network_internode_errors_total counter Total number of failed internode calls server,pool_index
minio_system_network_internode_dial_errors_total counter Total number of internode TCP dial timeouts and errors server,pool_index
minio_system_network_internode_dial_avg_time_nanos gauge Average dial time of internodes TCP calls in nanoseconds server,pool_index
minio_system_network_internode_sent_bytes_total counter Total number of bytes sent to other peer nodes server,pool_index
minio_system_network_internode_recv_bytes_total counter Total number of bytes received from other peer nodes server,pool_index

/system/process

Name Type Help Labels
minio_system_process_locks_read_total gauge Number of current READ locks on this peer server
minio_system_process_locks_write_total gauge Number of current WRITE locks on this peer server
minio_system_process_cpu_total_seconds counter Total user and system CPU time spent in seconds server
minio_system_process_go_routine_total gauge Total number of go routines running server
minio_system_process_io_rchar_bytes counter Total bytes read by the process from the underlying storage system including cache, /proc/[pid]/io rchar server
minio_system_process_io_read_bytes counter Total bytes read by the process from the underlying storage system, /proc/[pid]/io read_bytes server
minio_system_process_io_wchar_bytes counter Total bytes written by the process to the underlying storage system including page cache, /proc/[pid]/io wchar server
minio_system_process_io_write_bytes counter Total bytes written by the process to the underlying storage system, /proc/[pid]/io write_bytes server
minio_system_process_start_time_seconds gauge Start time for MinIO process in seconds since Unix epoc server
minio_system_process_uptime_seconds gauge Uptime for MinIO process in seconds server
minio_system_process_file_descriptor_limit_total gauge Limit on total number of open file descriptors for the MinIO Server process server
minio_system_process_file_descriptor_open_total gauge Total number of open file descriptors by the MinIO Server process server
minio_system_process_syscall_read_total counter Total read SysCalls to the kernel. /proc/[pid]/io syscr server
minio_system_process_syscall_write_total counter Total write SysCalls to the kernel. /proc/[pid]/io syscw server
minio_system_process_resident_memory_bytes gauge Resident memory size in bytes server
minio_system_process_virtual_memory_bytes gauge Virtual memory size in bytes server
minio_system_process_virtual_memory_max_bytes gauge Maximum virtual memory size in bytes server

/cluster/health

Name Type Help Labels
minio_cluster_health_drives_offline_count gauge Count of offline drives in the cluster
minio_cluster_health_drives_online_count gauge Count of online drives in the cluster
minio_cluster_health_drives_count gauge Count of all drives in the cluster
minio_cluster_health_nodes_offline_count gauge Count of offline nodes in the cluster
minio_cluster_health_nodes_online_count gauge Count of online nodes in the cluster
minio_cluster_health_capacity_raw_total_bytes gauge Total cluster raw storage capacity in bytes
minio_cluster_health_capacity_raw_free_bytes gauge Total cluster raw storage free in bytes
minio_cluster_health_capacity_usable_total_bytes gauge Total cluster usable storage capacity in bytes
minio_cluster_health_capacity_usable_free_bytes gauge Total cluster usable storage free in bytes

/cluster/config

Name Type Help Labels
minio_cluster_config_rrs_parity gauge Reduced redundancy storage class parity
minio_cluster_config_standard_parity gauge Standard storage class parity

/cluster/usage/objects

Name Type Help Labels
minio_cluster_usage_objects_since_last_update_seconds gauge Time since last update of usage metrics in seconds
minio_cluster_usage_objects_total_bytes gauge Total cluster usage in bytes
minio_cluster_usage_objects_count gauge Total cluster objects count
minio_cluster_usage_objects_versions_count gauge Total cluster object versions (including delete markers) count
minio_cluster_usage_objects_delete_markers_count gauge Total cluster delete markers count
minio_cluster_usage_objects_buckets_count gauge Total cluster buckets count
minio_cluster_usage_objects_size_distribution gauge Cluster object size distribution range
minio_cluster_usage_objects_version_count_distribution gauge Cluster object version count distribution range

/cluster/usage/buckets

Name Type Help Labels
minio_cluster_usage_buckets_since_last_update_seconds gauge Time since last update of usage metrics in seconds
minio_cluster_usage_buckets_total_bytes gauge Total bucket size in bytes bucket
minio_cluster_usage_buckets_objects_count gauge Total objects count in bucket bucket
minio_cluster_usage_buckets_versions_count gauge Total object versions (including delete markers) count in bucket bucket
minio_cluster_usage_buckets_delete_markers_count gauge Total delete markers count in bucket bucket
minio_cluster_usage_buckets_quota_total_bytes gauge Total bucket quota in bytes bucket
minio_cluster_usage_buckets_object_size_distribution gauge Bucket object size distribution range,bucket
minio_cluster_usage_buckets_object_version_count_distribution gauge Bucket object version count distribution range,bucket

/cluster/erasure-set

Name Type Help Labels
minio_cluster_erasure_set_overall_write_quorum gauge Overall write quorum across pools and sets
minio_cluster_erasure_set_overall_health gauge Overall health across pools and sets (1=healthy, 0=unhealthy)
minio_cluster_erasure_set_read_quorum gauge Read quorum for the erasure set in a pool pool_id,set_id
minio_cluster_erasure_set_write_quorum gauge Write quorum for the erasure set in a pool pool_id,set_id
minio_cluster_erasure_set_online_drives_count gauge Count of online drives in the erasure set in a pool pool_id,set_id
minio_cluster_erasure_set_healing_drives_count gauge Count of healing drives in the erasure set in a pool pool_id,set_id
minio_cluster_erasure_set_health gauge Health of the erasure set in a pool (1=healthy, 0=unhealthy) pool_id,set_id
minio_cluster_erasure_set_read_tolerance gauge No of drive failures that can be tolerated without disrupting read operations pool_id,set_id
minio_cluster_erasure_set_write_tolerance gauge No of drive failures that can be tolerated without disrupting write operations pool_id,set_id
minio_cluster_erasure_set_read_health gauge Health of the erasure set in a pool for read operations (1=healthy, 0=unhealthy) pool_id,set_id
minio_cluster_erasure_set_write_health gauge Health of the erasure set in a pool for write operations (1=healthy, 0=unhealthy) pool_id,set_id

/cluster/iam

Name Type Help Labels
minio_cluster_iam_last_sync_duration_millis counter Last successful IAM data sync duration in milliseconds
minio_cluster_iam_plugin_authn_service_failed_requests_minute counter When plugin authentication is configured, returns failed requests count in the last full minute
minio_cluster_iam_plugin_authn_service_last_fail_seconds counter When plugin authentication is configured, returns time (in seconds) since the last failed request to the service
minio_cluster_iam_plugin_authn_service_last_succ_seconds counter When plugin authentication is configured, returns time (in seconds) since the last successful request to the service
minio_cluster_iam_plugin_authn_service_succ_avg_rtt_ms_minute counter When plugin authentication is configured, returns average round-trip-time of successful requests in the last full minute
minio_cluster_iam_plugin_authn_service_succ_max_rtt_ms_minute counter When plugin authentication is configured, returns maximum round-trip-time of successful requests in the last full minute
minio_cluster_iam_plugin_authn_service_total_requests_minute counter When plugin authentication is configured, returns total requests count in the last full minute
minio_cluster_iam_since_last_sync_millis counter Time (in milliseconds) since last successful IAM data sync
minio_cluster_iam_sync_failures counter Number of failed IAM data syncs since server start
minio_cluster_iam_sync_successes counter Number of successful IAM data syncs since server start

/logger/webhook

Name Type Help Labels
minio_logger_webhook_failed_messages counter Number of messages that failed to send server,name,endpoint
minio_logger_webhook_queue_length gauge Webhook queue length server,name,endpoint
minio_logger_webhook_total_message counter Total number of messages sent to this target server,name,endpoint

/replication

Name Type Help Labels
minio_replication_average_active_workers gauge Average number of active replication workers server
minio_replication_average_queued_bytes gauge Average number of bytes queued for replication since server start server
minio_replication_average_queued_count gauge Average number of objects queued for replication since server start server
minio_replication_average_data_transfer_rate gauge Average replication data transfer rate in bytes/sec server
minio_replication_current_active_workers gauge Total number of active replication workers server
minio_replication_current_data_transfer_rate gauge Current replication data transfer rate in bytes/sec server
minio_replication_last_minute_queued_bytes gauge Number of bytes queued for replication in the last full minute server
minio_replication_last_minute_queued_count gauge Number of objects queued for replication in the last full minute server
minio_replication_max_active_workers gauge Maximum number of active replication workers seen since server start server
minio_replication_max_queued_bytes gauge Maximum number of bytes queued for replication since server start server
minio_replication_max_queued_count gauge Maximum number of objects queued for replication since server start server
minio_replication_max_data_transfer_rate gauge Maximum replication data transfer rate in bytes/sec seen since server start server

/notification

Name Type Help Labels
minio_notification_current_send_in_progress counter Number of concurrent async Send calls active to all targets server
minio_notification_events_errors_total counter Events that were failed to be sent to the targets server
minio_notification_events_sent_total counter Total number of events sent to the targets server
minio_notification_events_skipped_total counter Events that were skipped to be sent to the targets due to the in-memory queue being full server

/scanner

Name Type Help Labels
minio_scanner_bucket_scans_finished counter Total number of bucket scans finished since server start server
minio_scanner_bucket_scans_started counter Total number of bucket scans started since server start server
minio_scanner_directories_scanned counter Total number of directories scanned since server start server
minio_scanner_last_activity_seconds gauge Time elapsed (in seconds) since last scan activity server
minio_scanner_objects_scanned counter Total number of unique objects scanned since server start server
minio_scanner_versions_scanned counter Total number of object versions scanned since server start server

/ilm

Name Type Help Labels
minio_cluster_ilm_expiry_pending_tasks gauge Number of pending ILM expiry tasks in the queue server
minio_cluster_ilm_transition_active_tasks gauge Number of active ILM transition tasks server
minio_cluster_ilm_transition_pending_tasks gauge Number of pending ILM transition tasks in the queue server
minio_cluster_ilm_transition_missed_immediate_tasks counter Number of missed immediate ILM transition tasks server
minio_cluster_ilm_versions_scanned counter Total number of object versions checked for ILM actions since server start server