self-hosted/minio

mirror of https://github.com/minio/minio synced 2024-09-06 08:44:03 +00:00

Author	SHA1	Message	Date
Harshavardhana	0c31e61343	allow protection from invalid config values (#19460 ) we have had numerous reports on some config values not having default values, causing features misbehaving and not having default values set properly. This PR tries to address all these concerns once and for all. Each new sub-system that gets added - must check for invalid keys - must have default values set - must not "return err" when being saved into a global state() instead collate as part of other subsystem errors allow other sub-systems to independently initialize.	2024-04-10 18:10:30 -07:00
Anis Eleuch	c6f8dc431e	Add a warning when the total size of an object versions exceeds 1 TiB (#19435 )	2024-04-08 10:45:03 -07:00
Harshavardhana	c957e0d426	fix: increase the tiering part size to 128MiB (#19424 ) also introduce 8MiB buffer to read from for bigger parts	2024-04-08 02:22:27 -07:00
Aditya Manthramurthy	c9e9a8e2b9	fix: ldap: use validated base DNs (#19406 ) This fixes a regression from #19358 which prevents policy mappings created in the latest release from being displayed in policy entity listing APIs. This is due to the possibility that the base DNs in the LDAP config are not in a normalized form and #19358 introduced normalized of mapping keys (user DNs and group DNs). When listing, we check if the policy mappings are on entities that parse as valid DNs that are descendants of the base DNs in the config. Test added that demonstrates a failure without this fix.	2024-04-04 11:36:18 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Harshavardhana	2228eb61cb	Add more tests for ARN and its format (#19408 ) Original work from #17566 modified to fit the new requirements	2024-04-04 01:31:34 -07:00
jiuker	3d86ae12bc	feat: support EdDSA/Ed25519 for oss (#19397 )	2024-04-02 16:02:35 -07:00
Sveinn	ba46ee5dfa	Adding console targets back into systemtarget log slice (#19398 )	2024-04-02 15:56:14 -07:00
Klaus Post	912bbb2f1d	Always return slice with cap (#19395 ) Documentation promised this - so we should do it as well. Try to get a buffer and stash if it isn't big enough.	2024-04-02 08:56:18 -07:00
Klaus Post	b435806d91	Reduce big message RPC allocations (#19390 ) Use `ODirectPoolSmall` buffers for inline data in PutObject. Add a separate call for inline data that will fetch a buffer for the inline data before unmarshal.	2024-04-01 16:42:09 -07:00
Harshavardhana	1c99597a06	update() inlineBlock settings properly in storageClass config (#19382 )	2024-03-29 08:07:06 -07:00
Shubhendu	468a9fae83	Enable replication of SSE-C objects (#19107 ) If site replication enabled across sites, replicate the SSE-C objects as well. These objects could be read from target sites using the same client encryption keys. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-28 10:44:56 -07:00
Aditya Manthramurthy	7e45d84ace	ldap: improve normalization of DN values (#19358 ) Instead of relying on user input values, we use the DN value returned by the LDAP server. This handles cases like when a mapping is set on a DN value `uid=svc.algorithm,OU=swengg,DC=min,DC=io` with a user input value (with unicode variation) of `uid=svc﹒algorithm,OU=swengg,DC=min,DC=io`. The LDAP server on lookup of this DN returns the normalized value where the unicode dot character `SMALL FULL STOP` (in the user input), gets replaced with regular full stop.	2024-03-27 23:45:26 -07:00
Harshavardhana	3e38fa54a5	set max versions to be IntMax to avoid premature failures (#19360 ) let users/customers set relevant values make default value to be non-applicable.	2024-03-27 18:08:07 -07:00
Harshavardhana	364d3a0ac9	fix: new staticheck and linter issues reported (#19340 )	2024-03-27 08:10:40 -07:00
Harshavardhana	0a56dbde2f	allow configuring inline shard size value (#19336 )	2024-03-26 15:06:19 -07:00
Klaus Post	7ff4164d65	Fix races in IAM cache lazy loading (#19346 ) Fix races in IAM cache Fixes #19344 On the top level we only grab a read lock, but we write to the cache if we manage to fetch it. `a03dac41eb/cmd/iam-store.go (L446)` is also flipped to what it should be AFAICT. Change the internal cache structure to a concurrency safe implementation. Bonus: Also switch grid implementation.	2024-03-26 11:12:57 -07:00
Sveinn	1fc4203c19	Webhook targets refactor and bug fixes (#19275 ) - old version was unable to retain messages during config reload - old version could not go from memory to disk during reload - new version can batch disk queue entries to single for to reduce I/O load - error logging has been improved, previous version would miss certain errors. - logic for spawning/despawning additional workers has been adjusted to trigger when half capacity is reached, instead of when the log queue becomes full. - old version would json marshall x2 and unmarshal 1x for every log item. Now we only do marshal x1 and then we GetRaw from the store and send it without having to re-marshal.	2024-03-25 09:44:20 -07:00
Krishnan Parthasarathi	da81c6cc27	Encode dir obj names before expiration (#19305 ) Object names of directory objects qualified for ExpiredObjectAllVersions must be encoded appropriately before calling on deletePrefix on their erasure set. e.g., a directory object and regular objects with overlapping prefixes could lead to the expiration of regular objects, which is not the intention of ILM. ``` bucket/dir/ ---> directory object bucket/dir/obj-1 ``` When `bucket/dir/` qualifies for expiration, the current implementation would remove regular objects under the prefix `bucket/dir/`, in this case, `bucket/dir/obj-1`.	2024-03-21 10:21:35 -07:00
Anis Eleuch	b657ffa496	fix: Fix crash when logging events and anonymous is enabled (#19313 ) Events log does not have a stacktrace. So Trace is nil. Fix a crash in this case when an event is printed while anonymous logging is enabled.	2024-03-21 10:19:36 -07:00
Andreas Auernhammer	999bbd3a14	crypto: generate OEK using HMAC-SHA256 instead of SHA256 (#19297 ) This commit changes how MinIO generates the object encryption key (OEK) when encrypting an object using server-side encryption. This change is fully backwards compatible. Now, MinIO generates the OEK as following: ``` Nonce = RANDOM(32) // generate 256 bit random value OEK = HMAC-SHA256(EK, Context \|\| Nonce) ``` Before, the OEK was computed as following: ``` Nonce = RANDOM(32) // generate 256 bit random value OEK = SHA256(EK \|\| Nonce) ``` The new scheme does not technically fix a security issue but uses a more familiar scheme. The only requirement for the OEK generation function is that it produces a (pseudo)random value for every pair (`EK`,`Nonce`) as long as no `EK`-`Nonce` combination is repeated. This prevents a faulty PRNG from repeating or generating a "bad" key. The previous scheme guarantees that the `OEK` is a (pseudo)random value given that no pair (`EK`,`Nonce`) repeats under the assumption that SHA256 is indistinguable from a random oracle. The new scheme guarantees that the `OEK` is a (pseudo)random value given that no pair (`EK`, `Nonce`) repeats under the assumption that SHA256's underlying compression function is a PRF/PRP. While the later is a weaker assumption, and therefore, less likely to be false, both are considered true. SHA256 is believed to be indistinguable from a random oracle AND its compression function is assumed to be a PRF/PRP. As far as the OEK generating is concerned, the OS random number generator is not required to be pseudo-random but just non-repeating. Apart from being more compatible to standard definitions and descriptions for how to generate crypto. keys, this change does not have any impact of the actual security of the OEK key generation. Signed-off-by: Andreas Auernhammer <github@aead.dev>	2024-03-19 13:28:10 -07:00
Harshavardhana	4d7068931a	change the notification queue full message (#19293 )	2024-03-19 00:30:10 -07:00
jiuker	d7fb6fddf6	feat: add user specific redis auth (#19285 )	2024-03-18 21:37:54 -07:00
Harshavardhana	d4aac7cd72	add deprecated expiry_workers to be ignored (#19289 ) avoids error during upgrades such as ``` API: SYSTEM() Time: 19:19:22 UTC 03/18/2024 DeploymentID: 24e4b574-b28d-4e94-9bfa-03c363a600c2 Error: Invalid api configuration: found invalid keys (expiry_workers=100 ) for 'api' sub-system, use 'mc admin config reset myminio api' to fix invalid keys (*fmt.wrapError) 11: internal/logger/logger.go:260:logger.LogIf() ... ```	2024-03-18 15:25:32 -07:00
Harshavardhana	c201d8bda9	write anything beyond 4k to be written in 4k pages (#19269 ) we were prematurely not writing 4k pages while we could have due to the fact that most buffers would be multiples of 4k upto some number and there shall be some remainder. We only need to write the remainder without O_DIRECT.	2024-03-15 12:27:59 -07:00
Harshavardhana	93fb7d62d8	allow dynamically changing max_object_versions per object (#19265 )	2024-03-14 18:07:19 -07:00
Harshavardhana	ce1c640ce0	feat: allow retaining parity SLA to be configurable (#19260 ) at scale customers might start with failed drives, causing skew in the overall usage ratio per EC set. make this configurable such that customers can turn this off as needed depending on how comfortable they are.	2024-03-14 03:38:33 -07:00
Klaus Post	5c32058ff3	cosmetic: Move request goroutines to methods (#19241 ) Cosmetic change, but breaks up a big code block and will make a goroutine dumps of streams are more readable, so it is clearer what each goroutine is doing.	2024-03-13 11:43:58 -07:00
huajin tong	a25a8312d8	fix: some flyby typos in the code (#19212 ) Signed-off-by: thirdkeyword <fliterdashen@gmail.com>	2024-03-10 14:09:36 -07:00
Krishnan Parthasarathi	2007dd26ae	ilm: Expire if object past expected expiry date (#19230 ) When an object qualifies for both tiering and expiration rules and is past its expiration date, it should be expired without requiring to tier it, even when tiering event occurs before expiration.	2024-03-08 22:41:22 -08:00
Klaus Post	51f62a8da3	Port ListBuckets to websockets layer & some cleanup (#19199 )	2024-03-08 11:08:18 -08:00
Harshavardhana	233cc3905a	add batchSize support for webhook endpoints (#19214 ) configure batch size to send audit/logger events in batches instead of sending one event per connection. this is mainly to optimize the number of requests we make to webhook endpoint.	2024-03-07 12:17:46 -08:00
Harshavardhana	e91a4a414c	merge startHTTPLogger() many callers into a simpler pattern (#19211 ) simplify audit webhook worker model fixes couple of bugs like - ping(ctx) was creating a logger without updating number of workers leading to incorrect nWorkers scaling, causing an additional worker that is not tracked properly. - h.logCh <- entry could potentially hang for when the queue is full on heavily loaded systems.	2024-03-06 08:09:46 -08:00
Harshavardhana	74ccee6619	avoid too much auditing during decom/rebalance make it more robust (#19174 ) there can be a sudden spike in tiny allocations, due to too much auditing being done, also don't hang on the ``` h.logCh <- entry ``` after initializing workers if you do not have a way to dequeue for some reason.	2024-03-06 03:43:16 -08:00
Krishnan Parthasarathi	c26b8d4eb8	Set expected expiry date for ExpiredObjectAllVersions (#19210 )	2024-03-05 22:28:57 -08:00
Krishnan Parthasarathi	b69bcdcdc4	Fix ilm config at startup (#19189 ) Remove api.expiration_workers config setting which was inadvertently left behind. Per review comment https://github.com/minio/minio/pull/18926, expiration_workers can be configured via ilm.expiration_workers.	2024-03-04 18:50:24 -08:00
Krishnan Parthasarathi	a7577da768	Improve expiration of tiered objects (#18926 ) - Use a shared worker pool for all ILM expiry tasks - Free version cleanup executes in a separate goroutine - Add a free version only if removing the remote object fails - Add ILM expiry metrics to the node namespace - Move tier journal tasks to expiryState - Remove unused on-disk journal for tiered objects pending deletion - Distribute expiry tasks across workers such that the expiry of versions of the same object serialized - Ability to resize worker pool without server restart - Make scaling down of expiryState workers' concurrency safe; Thanks @klauspost - Add error logs when expiryState and transition state are not initialized (yet) * metrics: Add missed tier journal entry tasks * Initialize the ILM worker pool after the object layer	2024-03-01 21:11:03 -08:00
Andreas Auernhammer	09626d78ff	automatically generate root credentials with KMS (#19025 ) With this commit, MinIO generates root credentials automatically and deterministically if: - No root credentials have been set. - A KMS (KES) is configured. - API access for the root credentials is disabled (lockdown mode). Before, MinIO defaults to `minioadmin` for both the access and secret keys. Now, MinIO generates unique root credentials automatically on startup using the KMS. Therefore, it uses the KMS HMAC function to generate pseudo-random values. These values never change as long as the KMS key remains the same, and the KMS key must continue to exist since all IAM data is encrypted with it. Backward compatibility: This commit should not cause existing deployments to break. It only changes the root credentials of deployments that have a KMS configured (KES, not a static key) but have not set any admin credentials. Such implementations should be rare or not exist at all. Even if the worst case would be updating root credentials in mc or other clients used to administer the cluster. Root credentials are anyway not intended for regular S3 operations. Signed-off-by: Andreas Auernhammer <github@aead.dev>	2024-03-01 13:09:42 -08:00
Harshavardhana	2c2f5d871c	debug: introduce support for configuring client connect WRITE deadline (#19170 ) just like client-conn-read-deadline, added a new flag that does client-conn-write-deadline as well. Both are not configured by default, since we do not yet know what is the right value. Allow this to be configurable if needed.	2024-03-01 08:00:42 -08:00
Harshavardhana	c599c11e70	fix: relax metadata checks for healing (#19165 ) we should do this to ensure that we focus on data healing as primary focus, fixing metadata as part of healing must be done but making data available is the main focus. the main reason is metadata inconsistencies can cause data availability issues, which must be avoided at all cost. will be bringing in an additional healing mechanism that involves "metadata-only" heal, for now we do not expect to have these checks. continuation of #19154 Bonus: add a pro-active healthcheck to perform a connection	2024-02-29 22:49:01 -08:00
Klaus Post	40fb3371fa	Mux: Send async mux ack and fix stream error responses (#19149 ) Streams can return errors if the cancelation is picked up before the response stream close is picked up. Under extreme load, this could lead to missing responses. Send server mux ack async so a blocked send cannot block newMuxStream call. Stream will not progress until mux has been acked.	2024-02-28 10:05:18 -08:00
Harshavardhana	51874a5776	fix: allow DNS disconnection events to happen in k8s (#19145 ) in k8s things really do come online very asynchronously, we need to use implementation that allows this randomness. To facilitate this move WriteAll() as part of the websocket layer instead. Bonus: avoid instances of dnscache usage on k8s	2024-02-28 09:54:52 -08:00
Aditya Manthramurthy	62ce52c8fd	cachevalue: simplify exported interface (#19137 ) - Also add cache options type	2024-02-28 09:09:09 -08:00
jiuker	0aae0180fb	feat: add userCredentials for nats (#19139 )	2024-02-27 10:11:55 -08:00
Anis Eleuch	95032e4710	ilm: Select an object when all AND tags are satisfied (#19134 ) Currently, if one object tag matches with one lifecycle tag filter, ILM will select it, however, this is wrong. All the Tag filters in the lifecycle document should be satisfied.	2024-02-26 16:01:20 -08:00
Praveen raj Mani	30c2596512	Read drive IO stats from sysfs instead of procfs (#19131 ) Currently, we read from `/proc/diskstats` which is found to be un-reliable in k8s environments. We can read from `sysfs` instead. Also, cache the latest drive io stats to find the diff and update the metrics.	2024-02-26 11:34:50 -08:00
Klaus Post	2b5e4b853c	Improve caching (#19130 ) * Remove lock for cached operations. * Rename "Relax" to `ReturnLastGood`. * Add `CacheError` to allow caching values even on errors. * Add NoWait that will return current value with async fetching if within 2xTTL. * Make benchmark somewhat representative. ``` Before: BenchmarkCache-12 16408370 63.12 ns/op 0 B/op After: BenchmarkCache-12 428282187 2.789 ns/op 0 B/op ``` * Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning. * Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics. * Do async fetching of usage caches.	2024-02-26 10:49:19 -08:00
Harshavardhana	a3ac62596c	move timedValue -> cachevalue package (#19114 )	2024-02-23 13:28:14 -08:00
Harshavardhana	53aa8f5650	use typos instead of codespell (#19088 )	2024-02-21 22:26:06 -08:00
Shubhendu	56887f3208	Add DeleteAll with expiry days non zero value only (#19095 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-02-21 12:28:34 -08:00
Klaus Post	92180bc793	Add array recycling safety (#19103 ) Nil entries when recycling arrays.	2024-02-21 12:27:35 -08:00
Klaus Post	22aa16ab12	Fix grid reconnection deadlock (#19101 ) If network conditions have filled the output queue before a reconnect happens blocked sends could stop reconnects from happening. In short `respMu` would be held for a mux client while sending - if the queue is full this will never get released and closing the mux client will hang. A) Use the mux client context instead of connection context for sends, so sends are unblocked when the mux client is canceled. B) Use a `TryLock` on "close" and cancel the request if we cannot get the lock at once. This will unblock any attempts to send.	2024-02-21 07:49:34 -08:00
Harshavardhana	cd419a35fe	simplify broker healthcheck by following kafka guidelines (#19082 ) fixes #19081	2024-02-20 00:16:35 -08:00
Klaus Post	e06168596f	Convert more peer <--> peer REST calls (#19004 ) * Convert more peer <--> peer REST calls * Clean up in general. * Add JSON wrapper. * Add slice wrapper. * Add option to make handler return nil error if no connection is given, `IgnoreNilConn`. Converts the following: ``` + HandlerGetMetrics + HandlerGetResourceMetrics + HandlerGetMemInfo + HandlerGetProcInfo + HandlerGetOSInfo + HandlerGetPartitions + HandlerGetNetInfo + HandlerGetCPUs + HandlerServerInfo + HandlerGetSysConfig + HandlerGetSysServices + HandlerGetSysErrors + HandlerGetAllBucketStats + HandlerGetBucketStats + HandlerGetSRMetrics + HandlerGetPeerMetrics + HandlerGetMetacacheListing + HandlerUpdateMetacacheListing + HandlerGetPeerBucketMetrics + HandlerStorageInfo + HandlerGetLocks + HandlerBackgroundHealStatus + HandlerGetLastDayTierStats + HandlerSignalService + HandlerGetBandwidth ```	2024-02-19 14:54:46 -08:00
Harshavardhana	607cafadbc	converge clusterRead health into cluster health (#19063 )	2024-02-15 16:48:36 -08:00
Anis Eleuch	68dde2359f	log: Add logger.Event to send to console and other logger targets (#19060 ) Add a new function logger.Event() to send the log to Console and http/kafka log webhooks. This will include some internal events such as disk healing and rebalance/decommissioning	2024-02-15 15:13:30 -08:00
Praveen raj Mani	ac8e9ce04f	Send a bucket notification event on DeleteObject() for non-existing object (#19037 ) Send a bucket notification event on DeleteObject for non-existing objects	2024-02-13 07:34:17 -08:00
Taran Pelkey	4d94609c44	FIx unexpected behavior when creating service account (#19036 )	2024-02-13 02:31:43 -08:00
Harshavardhana	afd19de5a9	fix: allow configuring excess versions alerting (#19028 ) Bonus: enable audit alerts for object versions beyond the configured value, default is '100' versions per object beyond which scanner will alert for each such objects.	2024-02-11 23:41:53 -08:00
Harshavardhana	997ba3a574	introduce reader deadlines for net.Conn (#19023 ) Bonus: set "retry-after" header for AWS SDKs if possible to honor them.	2024-02-09 13:25:16 -08:00
Klaus Post	8e68ff9321	Add extra disconnect safety (#19022 ) Fix reported races that are actually synchronized by network calls. But this should add some extra safety for untimely disconnects. Race reported: ``` WARNING: DATA RACE Read at 0x00c00171c9c0 by goroutine 214: github.com/minio/minio/internal/grid.(muxClient).addResponse() e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:519 +0x111 github.com/minio/minio/internal/grid.(muxClient).error() e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:470 +0x21d github.com/minio/minio/internal/grid.(Connection).handleDisconnectClientMux() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1391 +0x15b github.com/minio/minio/internal/grid.(Connection).handleMsg() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1190 +0x1ab github.com/minio/minio/internal/grid.(Connection).handleMessages.func1() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:981 +0x610 Previous write at 0x00c00171c9c0 by goroutine 1081: github.com/minio/minio/internal/grid.(muxClient).roundtrip() e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:94 +0x324 github.com/minio/minio/internal/grid.(muxClient).traceRoundtrip() e:/gopath/src/github.com/minio/minio/internal/grid/trace.go:74 +0x10e4 github.com/minio/minio/internal/grid.(Subroute).Request() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:366 +0x230 github.com/minio/minio/internal/grid.(SingleHandler[go.shape.github.com/minio/minio/cmd.DiskInfoOptions,go.shape.github.com/minio/minio/cmd.DiskInfo]).Call() e:/gopath/src/github.com/minio/minio/internal/grid/handlers.go:554 +0x3fd github.com/minio/minio/cmd.(storageRESTClient).DiskInfo() e:/gopath/src/github.com/minio/minio/cmd/storage-rest-client.go:314 +0x270 github.com/minio/minio/cmd.erasureObjects.getOnlineDisksWithHealingAndInfo.func1() e:/gopath/src/github.com/minio/minio/cmd/erasure.go:293 +0x171 ``` This read will always happen after the write, since there is a network call in between. However a disconnect could come in while we are setting up the call, so we protect against that with extra checks.	2024-02-09 08:43:38 -08:00
Harshavardhana	035a3ea4ae	optimize startup sequence performance (#19009 ) - bucket metadata does not need to look for legacy things anymore if b.Created is non-zero - stagger bucket metadata loads across lots of nodes to avoid the current thundering herd problem. - Remove deadlines for RenameData, RenameFile - these calls should not ever be timed out and should wait until completion or wait for client timeout. Do not choose timeouts for applications during the WRITE phase. - increase R/W buffer size, increase maxMergeMessages to 30	2024-02-08 11:21:21 -08:00
Klaus Post	7ec43bd177	Fix blocked streams blocking reconnects (#19017 ) We have observed cases where a blocked stream will block for cancellations. This happens when response channel is blocked and we want to push an error. This will have the response mutex locked, which will prevent all other operations until upstream is unblocked. Make this behavior non-blocking and if blocked spawn a goroutine that will send the response and close the output. Still a lot of "dancing". Added a test for this and reviewed.	2024-02-08 10:15:27 -08:00
Shubhendu	980fb5e2ab	Enable expired-object-all-versions (#18954 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-02-06 13:36:22 -08:00
Klaus Post	9bcc46d93d	Fix second muxclient context leak (#18987 ) Subrouted requests were also leaking contexts in mux clients. Similar to #18956	2024-02-06 13:35:16 -08:00
Klaus Post	22687c1f50	Add websocket TCP write timeouts (#18988 ) Add 3 second write timeout to writes. This will make dead TCP connections terminate in a reasonable time. Fixes writes blocking for reconnection.	2024-02-06 13:34:46 -08:00
Klaus Post	ebc6c9b498	Fix tracing send on closed channel (#18982 ) Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing: ``` Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]: Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(PubSub[...]).SubscribeJSON.func1() Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(PubSub[...]).SubscribeJSON in goroutine 378010884 Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352 ``` Wait explicitly for the goroutine to exit. Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.	2024-02-06 08:57:30 -08:00
Harshavardhana	100c35c281	avoid excessive logs when peer is down (#18969 )	2024-02-04 23:25:42 -08:00
Harshavardhana	960d604013	disconnected returns, an unexpected error to List() returning 500s (#18959 ) provide the error string appropriately so that the matching of error types works. Also add a string based fallback for the said error.	2024-02-03 01:04:33 -08:00
Klaus Post	63bf5f42a1	Fix mux client memory leak (#18956 ) Add missing client cancellation, resulting in memory buildup tracing back to context.WithCancelCause/context.WithCancelDeadlineCause	2024-02-02 15:31:06 -08:00
Harshavardhana	ff80cfd83d	move Make,Delete,Head,Heal bucket calls to websockets (#18951 )	2024-02-02 14:54:54 -08:00
Harshavardhana	99fde2ba85	deprecate disk tokens, instead rely on deadlines and active monitoring (#18947 ) disk tokens usage is not necessary anymore with the implementation of deadlines for storage calls and active monitoring of the drive for I/O timeouts. Functionality kicking off a bad drive is still supported, it's just that we do not have to serialize I/O in the manner tokens would do.	2024-02-02 10:10:54 -08:00
Klaus Post	ce0cb913bc	Fix ineffective recycling (#18952 ) Recycle would always be called on the dummy value `any(newRT())` instead of the actual value given to the recycle function. Caught by race tests, but mostly harmless, except for reduced perf. Other minor cleanups. Introduced in #18940 (unreleased)	2024-02-02 08:48:12 -08:00
Harshavardhana	d99d16e8c3	simplify deadlineWriter, re-use WithDeadline (#18948 )	2024-02-02 03:02:31 -08:00
Anis Eleuch	6fd63e920a	log: Use error log type instead of Application/MinIO type (#18930 ) * log: Use error log type instead of Application/MinIO type Also bump github.com/shirou/gopsutil version to address cross compilation issues. * Apply suggestions from code review Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com> --------- Co-authored-by: Anis Eleuch <anis@min.io> Co-authored-by: Harshavardhana <harsha@minio.io> Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2024-02-01 16:13:57 -08:00
Klaus Post	b192bc348c	Improve object reuse for grid messages (#18940 ) Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers. This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool. Add internal request reuse of internal types. Add for safe (pointerless) types explicitly. Only log params for internal types. Doing Sprint(obj) is just a bit too messy.	2024-02-01 12:41:20 -08:00
Harshavardhana	6440d0fbf3	move a collection of peer APIs to websockets (#18936 )	2024-02-01 10:47:20 -08:00
Frank Wessels	4cd777a5e0	Correct small typo in pubsub (#18923 )	2024-01-31 01:01:53 -08:00
Klaus Post	6da4a9c7bb	Improve tracing & notification scalability (#18903 ) * Perform JSON encoding on remote machines and only forward byte slices. * Migrate tracing & notification to WebSockets.	2024-01-30 12:49:02 -08:00
Anis Eleuch	a669946357	Add cgroup v2 support for memory limit (#18905 )	2024-01-30 11:13:27 -08:00
Harshavardhana	2ddf2ca934	allow configuring maximum idle connections per host (#18908 )	2024-01-29 16:50:37 -08:00
Harshavardhana	9987ff570b	avoid calling close for nil inbound/outblock channels	2024-01-28 19:56:32 -08:00
Harshavardhana	9ef132c33b	remove excessive logging due to runtime.debugStack	2024-01-28 18:10:42 -08:00
Harshavardhana	7743d952dc	fix: incomingBytes() to update via handleMessages() (#18891 ) previous change #18880 was incomplete	2024-01-28 14:35:53 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Klaus Post	38de8e6936	grid: Simpler reconnect logic (#18889 ) Do not rely on `connChange` to do reconnects. Instead, you can block while the connection is running and reconnect when handleMessages returns. Add fully async monitoring instead of monitoring on the main goroutine and keep this to avoid full network lockup.	2024-01-28 08:46:15 -08:00
Harshavardhana	c51f9ef940	fix: regression in internode bytes counting (#18880 ) wire up missing metrics since #18461 Bonus: fix trace output inconsistency	2024-01-27 00:25:49 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	e377bb949a	migrate bootstrap logic directly to websockets (#18855 ) improve performance for startup sequences by 2x for 300+ nodes.	2024-01-24 13:36:44 -08:00
Praveen raj Mani	c905d3fe21	fix: Re-use TCP connections for Kafka dials (#18860 ) Fixes #18857	2024-01-24 13:10:52 -08:00
Klaus Post	6968f7237a	Add separate grid reconnection mutex (#18862 ) Add separate reconnection mutex Give more safety around reconnects and make sure a state change isn't missed. Tested with several runs of `λ go test -race -v -count=500` Adds separate mutex and doesn't mix in the testing mutex.	2024-01-24 11:49:39 -08:00
Klaus Post	4a6c97463f	Fix all racy use of NewDeadlineWorker (#18861 ) AlmosAll uses of NewDeadlineWorker, which relied on secondary values, were used in a racy fashion, which could lead to inconsistent errors/data being returned. It also propagates the deadline downstream. Rewrite all these to use a generic WithDeadline caller that can return an error alongside a value. Remove the stateful aspect of DeadlineWorker - it was racy if used - but it wasn't AFAICT. Fixes races like: ``` WARNING: DATA RACE Read at 0x00c130b29d10 by goroutine 470237: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).ReadVersion() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:702 +0x611 github.com/minio/minio/cmd.readFileInfo() github.com/minio/minio/cmd/erasure-metadata-utils.go:160 +0x122 github.com/minio/minio/cmd.erasureObjects.getObjectFileInfo.func1.1() github.com/minio/minio/cmd/erasure-object.go:809 +0x27a github.com/minio/minio/cmd.erasureObjects.getObjectFileInfo.func1.2() github.com/minio/minio/cmd/erasure-object.go:828 +0x61 Previous write at 0x00c130b29d10 by goroutine 470298: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).ReadVersion.func1() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:698 +0x244 github.com/minio/minio/internal/ioutil.(DeadlineWorker).Run.func1() github.com/minio/minio/internal/ioutil/ioutil.go:141 +0x33 WARNING: DATA RACE Write at 0x00c0ba6e6c00 by goroutine 94507: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).StatVol.func1() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:419 +0x104 github.com/minio/minio/internal/ioutil.(DeadlineWorker).Run.func1() github.com/minio/minio/internal/ioutil/ioutil.go:141 +0x33 Previous read at 0x00c0ba6e6c00 by goroutine 94463: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).StatVol() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:422 +0x47e github.com/minio/minio/cmd.getBucketInfoLocal.func1() github.com/minio/minio/cmd/peer-s3-server.go:275 +0x122 github.com/minio/pkg/v2/sync/errgroup.(*Group).Go.func1() ``` Probably back from #17701	2024-01-24 10:08:31 -08:00
Klaus Post	feeeef71f1	Add extra protection for grid reconnects (#18840 ) Race checks would occasionally show race on handleMsgWg WaitGroup by debug messages (used in test only). Use the `connMu` mutex to protect this against concurrent Wait/Add. Fixes #18827	2024-01-22 09:39:06 -08:00
Klaus Post	83bf15a703	grid: Return rejection reason (#18834 ) When rejecting incoming grid requests fill out the rejection reason and log it once. This will give more context when startup is failing. Already logged after a retry on caller.	2024-01-19 10:35:24 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Frank Wessels	4d2320ba8b	fix: a small typo in dsync (#18816 )	2024-01-17 20:34:26 -08:00
Klaus Post	479940b7d0	Deallocate huge read buffers (#18813 ) If a message buffer is excessively huge, release it back so it isn't kept around forever.	2024-01-17 11:47:42 -08:00
Poorna	b2b26d9c95	support proxying of tagging requests in replication (#18649 ) support proxying of tagging requests in active-active replication Note: even if proxying is successful, PutObjectTagging/DeleteObjectTagging will continue to report a 404 since the object is not present locally.	2024-01-12 23:51:33 -08:00
jiuker	a89e0bab7d	fix: s3 sql parse error for colums as with quotes (#18765 )	2024-01-09 09:19:11 -08:00
Sveinn	9b8ba97f9f	feat: add support for GetObjectAttributes API (#18732 )	2024-01-05 10:43:06 -08:00

1 2 3 4 5 ...

765 commits