self-hosted/minio

mirror of https://github.com/minio/minio synced 2024-09-30 05:04:22 +00:00

Author	SHA1	Message	Date
Anis Eleuch	b7f319b62a	properly reload a fresh drive when found in a failed state during startup (#20145 ) All checks were successful VulnCheck / Analysis (push) Successful in 1m47s Details When a drive is in a failed state when a single node multiple drives deployment is started, a replacement of a fresh disk will not be properly healed unless the user restarts the node. Fix this by always adding the new fresh disk to globalLocalDrivesMap. Also remove globalLocalDrives for simplification, a map to store local node drives can still be used since the order of local drives of a node is not defined.	2024-07-24 16:30:33 -07:00
Harshavardhana	e432e79324	avoid calling 'admin info' for disk, cpu, net metrics collection (#19762 ) resource metrics collection was incorrectly making fan-out liveness peer calls where it's not needed.	2024-05-17 08:15:13 -07:00
Shireesh Anjal	f7b665347e	Add system CPU metrics to metrics-v3 (#19560 ) endpoint: /minio/metrics/v3/system/cpu metrics: - minio_system_cpu_avg_idle - minio_system_cpu_avg_iowait - minio_system_cpu_load - minio_system_cpu_load_perc - minio_system_cpu_nice - minio_system_cpu_steal - minio_system_cpu_system - minio_system_cpu_user	2024-04-23 16:56:12 -07:00
Shireesh Anjal	6df76ca73c	Add system memory metrics in v3 (#19486 ) Following memory metrics will be added under /system/memory - available - buffers - cache - free - shared - total - used - used_perc	2024-04-16 22:10:25 -07:00
Shireesh Anjal	08d3d06a06	Add drive metrics in metrics-v3 (#19452 ) Add following metrics: - used_inodes - total_inodes - healing - online - reads_per_sec - reads_kb_per_sec - reads_await - writes_per_sec - writes_kb_per_sec - writes_await - perc_util To be able to calculate the `per_sec` values, we capture the IOStats-related data in the beginning (along with the time at which they were captured), and compare them against the current values subsequently. This is because dividing by "time since server uptime." doesn't work in k8s environments.	2024-04-11 10:46:34 -07:00
Aditya Manthramurthy	b2c5b75efa	feat: Add Metrics V3 API (#19068 ) Metrics v3 is mainly a reorganization of metrics into smaller groups of metrics and the removal of internal aggregation of metrics received from peer nodes in a MinIO cluster. This change adds the endpoint `/minio/metrics/v3` as the top-level metrics endpoint and under this, various sub-endpoints are implemented. These are currently documented in `docs/metrics/v3.md` The handler will serve metrics at any path `/minio/metrics/v3/PATH`, as follows: when PATH is a sub-endpoint listed above => serves the group of metrics under that path; or when PATH is a (non-empty) parent directory of the sub-endpoints listed above => serves metrics from each child sub-endpoint of PATH. otherwise, returns a no resource found error All available metrics are listed in the `docs/metrics/v3.md`. More will be added subsequently.	2024-03-10 01:15:15 -08:00
Praveen raj Mani	30c2596512	Read drive IO stats from sysfs instead of procfs (#19131 ) Currently, we read from `/proc/diskstats` which is found to be un-reliable in k8s environments. We can read from `sysfs` instead. Also, cache the latest drive io stats to find the diff and update the metrics.	2024-02-26 11:34:50 -08:00
Harshavardhana	b6e98aed01	fix: found races in accessing globalLocalDrives (#19069 ) make a copy before accessing globalLocalDrives Bonus: update console v0.46.0 Signed-off-by: Harshavardhana <harsha@minio.io>	2024-02-16 17:15:57 -08:00
Shireesh Anjal	7b9f9e0628	fix incorrect disk io stats in k8s environment (#19016 ) The previous logic of calculating per second values for disk io stats divides the stats by the host uptime. This doesn't work in k8s environment as the uptime is of the pod, but the stats (from /proc/diskstats) are from the host. Fix this by storing the initial values of uptime and the stats at the timme of server startup, and using the difference between current and initial values when calculating the per second values.	2024-02-13 07:35:11 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	b3314e97a6	re-use the same local drive used by remote-peer (#18645 ) historically, we have always kept storage-rest-server and a local storage API separate without much trouble, since they both can independently operate due to no special state() between them. however, over some time, we have added state() such as - drive monitoring threads now there will be "2" of them per drive instead of just 1. - concurrent tokens available per drive are now twice instead of just single shared, allowing unexpectedly high amount of I/O to go through. - applying serialization by using walkMutexes can now be adequately honored for both remote callers and local callers.	2023-12-13 19:27:55 -08:00
Shireesh Anjal	7350a29fec	Capture percentage of cpu load and memory used (#18596 ) By default the cpu load is the cumulative of all cores. Capture the percentage load (load * 100 / cpu-count) Also capture the percentage memory used (used * 100 / total)	2023-12-06 13:19:59 -08:00
Klaus Post	ba6218b354	fix: resource metrics "concurrent map iteration and map write" (#18273 ) `resourceMetricsMap` has no protection against concurrent reads and writes. Add a mutex and don't use maps from the last iteration. Bug introduced in #18057 Fixes #18271	2023-10-18 13:28:50 -07:00
Shireesh Anjal	6d20ec3bea	Add support for resource metrics (#18057 ) Add a new endpoint for "resource" metrics `/v2/metrics/resource` This should return system metrics related to drives, network, CPU and memory. Except for drives, other metrics should have corresponding "avg" and "max" values also. Reuse the real-time feature to capture the required data, introducing CPU and memory metrics in it. Collect the data every minute and keep updating the average and max values accordingly, returning the latest values when the API is called.	2023-09-30 13:40:20 -07:00

14 commits