Commit graph

6140 commits

Author SHA1 Message Date
Klaus Post 3b7747b42b
Tweak multipart uploads (#19756)
* Store ModTime in the upload ID; return it when listing instead of the current time.
* Use this ModTime to expire and skip reading the file info.
* Consistent upload sorting in listing (since it now has the ModTime).
* Exclude healing disks to avoid returning an empty list.
2024-05-17 09:40:09 -07:00
Harshavardhana e432e79324
avoid calling 'admin info' for disk, cpu, net metrics collection (#19762)
resource metrics collection was incorrectly making fan-out
liveness peer calls where it's not needed.
2024-05-17 08:15:13 -07:00
Harshavardhana 08d74819b6
handle racy updates to globalSite config (#19750)
```
==================
WARNING: DATA RACE
Read at 0x0000082be990 by goroutine 205:
  github.com/minio/minio/cmd.setCommonHeaders()

Previous write at 0x0000082be990 by main goroutine:
  github.com/minio/minio/cmd.lookupConfigs()
```
2024-05-16 16:13:47 -07:00
Poorna aa3fde1784
Add ListObjectsV2 unit test (#19753)
for PR: #19725
2024-05-15 20:40:51 -07:00
Harshavardhana 0b3eb7f218
add more deadlines and pass around context under most situations (#19752) 2024-05-15 15:19:00 -07:00
Klaus Post b792b36495
Add Veeam storage class override (#19748)
Recent Veeam is very picky about storage class names. Add `_MINIO_VEEAM_FORCE_SC` env var.

It will override the storage class returned by the storage backend if it is non-standard
and we detect a Veeam client by checking the User Agent.

Applies to HeadObject/GetObject/ListObject*
2024-05-15 11:04:16 -07:00
Harshavardhana d3db7d31a3
fix: add deadlines for all synchronous REST callers (#19741)
add deadlines that can be dynamically changed via
the drive max timeout values.

Bonus: optimize "file not found" case and hung drives/network - circuit break the check and return right
away instead of waiting.
2024-05-15 09:52:29 -07:00
Shireesh Anjal c05ca63158
Fix crash on /minio/metrics/v3?list (#19745)
An unchecked map access was causing panic.
2024-05-15 09:06:35 -07:00
Shireesh Anjal 0e59e50b39
Capture ttfb api metrics only for GetObject (#19733)
as that is the only API where the TTFB metric is beneficial, and
capturing this for all APIs exponentially increases the response size in
large clusters.
2024-05-14 23:25:13 -07:00
Klaus Post d4b391de1b
Add PutObject Ring Buffer (#19605)
Replace the `io.Pipe` from streamingBitrotWriter -> CreateFile with a fixed size ring buffer.

This will add an output buffer for encoded shards to be written to disk - potentially via RPC.

This will remove blocking when `(*streamingBitrotWriter).Write` is called, and it writes hashes and data.

With current settings, the write looks like this:

```
Outbound
┌───────────────────┐             ┌────────────────┐               ┌───────────────┐                      ┌────────────────┐
│                   │   Parr.     │                │  (http body)  │               │                      │                │
│ Bitrot Hash       │     Write   │      Pipe      │      Read     │  HTTP buffer  │    Write (syscall)   │  TCP Buffer    │
│ Erasure Shard     │ ──────────► │  (unbuffered)  │ ────────────► │   (64K Max)   │ ───────────────────► │    (4MB)       │
│                   │             │                │               │  (io.Copy)    │                      │                │
└───────────────────┘             └────────────────┘               └───────────────┘                      └────────────────┘
```

We write a Hash (32 bytes). Since the pipe is unbuffered, it will block until the 32 bytes have 
been delivered to the TCP buffer, and the next Read hits the Pipe.

Then we write the shard data. This will typically be bigger than 64KB, so it will block until two blocks 
have been read from the pipe.

When we insert a ring buffer:

```
Outbound
┌───────────────────┐             ┌────────────────┐               ┌───────────────┐                      ┌────────────────┐
│                   │             │                │  (http body)  │               │                      │                │
│ Bitrot Hash       │     Write   │  Ring Buffer   │      Read     │  HTTP buffer  │    Write (syscall)   │  TCP Buffer    │
│ Erasure Shard     │ ──────────► │    (2MB)       │ ────────────► │   (64K Max)   │ ───────────────────► │    (4MB)       │
│                   │             │                │               │  (io.Copy)    │                      │                │
└───────────────────┘             └────────────────┘               └───────────────┘                      └────────────────┘
```

The hash+shard will fit within the ring buffer, so writes will not block - but will complete after a 
memcopy. Reads can fill the 64KB buffer if there is data for it.

If the network is congested, the ring buffer will become filled, and all syscalls will be on full buffers.
Only when the ring buffer is filled will erasure coding start blocking.

Since there is always "space" to write output data, we remove the parallel writing since we are 
always writing to memory now, and the goroutine synchronization overhead probably not worth taking. 

If the output were blocked in the existing, we would still wait for it to unblock in parallel write, so it would 
make no difference there - except now the ring buffer smoothes out the load.

There are some micro-optimizations we could look at later. The biggest is that, in most cases, 
we could encode directly to the ring buffer - if we are not at a boundary. Also, "force filling" the 
Read requests (i.e., blocking until a full read can be completed) could be investigated and maybe 
allow concurrent memory on read and write.
2024-05-14 17:11:04 -07:00
Olli Janatuinen 534e7161df
SFTP: Correctly inform client about unsupported commands (#19735) 2024-05-14 03:29:30 -07:00
Harshavardhana 9b219cd646
fix: return quorum based error, temporary failures must be ignored (#19732) 2024-05-14 03:29:17 -07:00
Shireesh Anjal 3bab4822f3
Add logger webhook metrics in metrics-v3 (#19515)
endpoint: /minio/metrics/v3/cluster/webhook
metrics:
- failed_messages (counter)
- online (gauge)
- queue_length (gauge)
- total_messages (counter)
2024-05-14 00:27:33 -07:00
coderwander 3c5f2d8916
fix some typo in struct name comments (#19513)
Signed-off-by: coderwander <770732124@qq.com>
2024-05-14 00:26:50 -07:00
Shireesh Anjal 5808190398
Add more metrics to v3/cluster/erasure-set (#19714)
Metrics being added:

- read_tolerance: No of drive failures that can be tolerated without
  disrupting read operations
- write_tolerance: No of drive failures that can be tolerated without
  disrupting write operations
- read_health: Health of the erasure set in a pool for read operations
  (1=healthy, 0=unhealthy)
- write_health: Health of the erasure set in a pool for write operations
  (1=healthy, 0=unhealthy)
2024-05-14 00:25:56 -07:00
Shireesh Anjal b2a82248b1
Move /system/go to /debug/go (#19707) 2024-05-14 00:25:37 -07:00
Klaus Post c36eaedb93
Re-add "Fix incorrect merging of slash-suffixed objects (#19729)
Adds regression test for #19699

Failures are a bit luck based, since it requires objects to be placed on different sets.

However this generates a failure prior to #19699

* Revert "Revert "Fix incorrect merging of slash-suffixed objects (#19699)""

This reverts commit f30417d9a8.

* Don't override when suffix doesn't match. Instead rely on quorum for each.
2024-05-13 09:30:24 -07:00
Poorna 7752b03add
optimize max-keys=2 listing for spark workloads (#19725)
to return results appropriately for versioned buckets, especially
when underlying prefixes have been deleted
2024-05-13 07:57:42 -07:00
Shireesh Anjal 074d70112d
Consolidate drive health related metrics into single metric (#19706)
Instead of having "online" and "healing" as two metrics, replace with a
single metric "health" which can have following values:

0 = offline
1 = healthy
2 = healing
2024-05-12 10:23:50 -07:00
Harshavardhana e8d14c0d90
verify preconditions during CompleteMultipart (#19713)
Bonus: hold the write lock properly to apply
optimistic concurrency during NewMultipartUpload()
2024-05-10 17:31:22 -07:00
Shireesh Anjal 60d7e8143a
Move /cluster/audit to /audit (#19708)
As the audit metrics are server level and not 
overall cluster level.
2024-05-10 07:50:39 -07:00
Klaus Post 9667a170de
Add usage cache cleanup and lower forced top compaction (#19719)
Lower forced compaction to 250K entries.

If there is more than 250K entries on the top level force compact it and log an error.
2024-05-10 07:49:50 -07:00
Harshavardhana b598402738 fix: unexpected credentials missing while passing 2024-05-09 18:41:38 -07:00
Harshavardhana 72ff69d9bb
add log-prefix name for specifying custom log-name (#19712) 2024-05-09 14:29:37 -07:00
Harshavardhana f30417d9a8 Revert "Fix incorrect merging of slash-suffixed objects (#19699)"
This reverts commit 2f7a10ab31.
2024-05-09 12:32:05 -07:00
jiuker 47a4ad3cd7
fix: truncate Expiration to second when Add ServiceAccount (#19674)
Truncate Expiration at the second when Add ServiceAccount
2024-05-09 11:08:04 -07:00
Klaus Post 2f7a10ab31
Fix incorrect merging of slash-suffixed objects (#19699)
If two objects share everything but one object has a slash prefix, those would be merged in listings, 
with secondary properties used for a tiebreak.

Example: An object with the key `prefix/obj` would be merged with an object named `prefix/obj/`. 
While this violates the [no object can be a prefix of another](https://min.io/docs/minio/linux/operations/concepts/thresholds.html#conflicting-objects), let's resolve these.

If we have an object with 'name' and a directory named 'name/' discard the directory only - but allow objects 
of 'name' and 'name/' (xldir) to be uniquely returned.

Regression from #15772
2024-05-09 11:05:45 -07:00
Harshavardhana b534dc69ab
deprecate unexpected healing failed counters (#19705)
simplify this to avoid verbose metrics, and make
room for valid metrics to be reported for alerting
etc.
2024-05-09 11:04:41 -07:00
Harshavardhana 7b7d2ea7d4
pass around correct endpoint while registering remote storage (#19710) 2024-05-09 11:03:54 -07:00
Aditya Manthramurthy e00de1c302
ldap-import: Add additional logs (#19691)
These logs are being added to provide better debugging of LDAP
normalization on IAM import.
2024-05-09 10:52:53 -07:00
Harshavardhana 3549e583a6
results must be a single channel to avoid overwriting healing.bin (#19702) 2024-05-09 10:15:03 -07:00
Andi f5e3eedf34
chore: use errors.New to replace fmt.Errorf with no parameters (#19568)
Signed-off-by: ChengenH <hce19970702@gmail.com>
2024-05-09 01:44:07 -07:00
Harshavardhana 9a267f9270
allow caller context during reloads() to cancel (#19687)
canceled callers might linger around longer,
can potentially overwhelm the system. Instead
provider a caller context and canceled callers
don't hold on to them.

Bonus: we have no reason to cache errors, we should
never cache errors otherwise we can potentially have
quorum errors creeping in unexpectedly. We should
let the cache when invalidating hit the actual resources
instead.
2024-05-08 17:51:34 -07:00
Klaus Post ec49fff583
Accept multipart checksums with part count (#19680)
Accept multipart uploads where the combined checksum provides the expected part count.

It seems this was added by AWS to make the API more consistent, even if the 
data is entirely superfluous on multiple levels.

Improves AWS S3 compatibility.
2024-05-08 09:18:34 -07:00
Andreas Auernhammer 8b660e18f2
kms: add support for MinKMS and remove some unused/broken code (#19368)
This commit adds support for MinKMS. Now, there are three KMS
implementations in `internal/kms`: Builtin, MinIO KES and MinIO KMS.

Adding another KMS integration required some cleanup. In particular:
 - Various KMS APIs that haven't been and are not used have been
   removed. A lot of the code was broken anyway.
 - Metrics are now monitored by the `kms.KMS` itself. For basic
   metrics this is simpler than collecting metrics for external
   servers. In particular, each KES server returns its own metrics
   and no cluster-level view.
 - The builtin KMS now uses the same en/decryption implemented by
   MinKMS and KES. It still supports decryption of the previous
   ciphertext format. It's backwards compatible.
 - Data encryption keys now include a master key version since MinKMS
   supports multiple versions (~4 billion in total and 10000 concurrent)
   per key name.

Signed-off-by: Andreas Auernhammer <github@aead.dev>
2024-05-07 16:55:37 -07:00
Harshavardhana 981497799a
return appropriate error upon reaching maxClients() (#19669) 2024-05-07 13:41:56 -07:00
Olli Janatuinen b413ff9fdb
Support user certificate based authentication on SFTP (#19650) 2024-05-06 23:41:25 -07:00
Harshavardhana 6a15580817
fix: collect quorum errors for deletePrefix() (#19685)
do not return error for single drive being offline.
2024-05-06 22:44:46 -07:00
Cesar N 39633a5581
Set Console Redirect URL env variable (#19683) 2024-05-06 19:47:59 -07:00
Harshavardhana 888d2bb1d8
support ETag value to be '*' (#19682)
This supports '*' as per behavior to
comply with AWS S3 behavior for

- 'If-Match: *'
- 'If-None-Match: *'
2024-05-06 17:08:42 -07:00
Klaus Post 847ee5ac45
Make WalkDir return errors (#19677)
If used, 'opts.Marker` will cause many missed entries since results are returned 
unsorted, and pools are serialized.

Switch to fully concurrent listing and merging across pools to return sorted entries.
2024-05-06 13:27:52 -07:00
jiuker 9a9a49aa84
fix: Ignore AWSAccessKeyId check for SignV2 policy condition (#19673) 2024-05-06 03:52:41 -07:00
Harshavardhana a03ca80269
support 'mc support perf object' with root login disabled (#19672)
It is expected that whoever is using the credentials which has
the proper set of permissions must be able to run.

`mc support perf object`

While the root login is disabled.
2024-05-06 02:45:10 -07:00
Harshavardhana 523bd769f1
add support for specific error response for InvalidRange (#19668)
fixes #19648

AWS S3 returns the actual object size as part of XML
response for InvalidRange error, this is used apparently
by SDKs to retry the request without the range.
2024-05-05 09:56:21 -07:00
Harshavardhana 8ff70ea5a9
turn-off coloring if we have std{err,out} dumb terminals (#19667) 2024-05-03 17:17:57 -07:00
Harshavardhana da3e7747ca
avoid using 10MiB EC buffers in maxAPI calculations (#19665)
max requests per node is more conservative in its value
causing premature serialization of the calls, avoid it
for newer deployments.
2024-05-03 13:08:20 -07:00
Klaus Post 4afb59e63f
fix: walk missing entries with opts.Marker set (#19661)
'opts.Marker` is causing many missed entries if used since results are returned unsorted. Also since pools are serialized.

Switch to do fully concurrent listing and merging across pools to return sorted entries.

Returning errors on listings is impossible with the current API, so document that.

Return an error at once if no drives are found instead of just returning an empty listing and no error.
2024-05-03 10:26:51 -07:00
Harshavardhana 1526e7ece3
extend server config.yaml to support per pool set drive count (#19663)
This is to support deployments migrating from a multi-pooled
wider stripe to lower stripe. MINIO_STORAGE_CLASS_STANDARD
is still expected to be same for all pools. So you can satisfy
adding custom drive count based pools by adjusting the storage
class value.

```
version: v2
address: ':9000'
rootUser: 'minioadmin'
rootPassword: 'minioadmin'
console-address: ':9001'
pools: # Specify the nodes and drives with pools
  -
    args:
        - 'node{11...14}.example.net/data{1...4}'
  -
    args:
        - 'node{15...18}.example.net/data{1...4}'
  -
    args:
        - 'node{19...22}.example.net/data{1...4}'
  -
    args:
        - 'node{23...34}.example.net/data{1...10}'
    set-drive-count: 6
```
2024-05-03 08:54:03 -07:00
Krishnan Parthasarathi 6c07bfee8a
With retention, skip actions expiring all versions (#19657)
ILM actions due to ExpiredObjectDeleteAllVersions and
DelMarkerExpiration are ignored when object locking is enabled on a
bucket.
Note: This applies to object versions which may not have retention
configured on them. This applies to all object versions in this bucket,
including those created before the retention config was applied.
2024-05-03 04:18:58 -07:00
Poorna 446c760820
replication: Avoid proxying if requested object is a deletemarker (#19656)
Fixes: #19654
2024-05-02 13:15:54 -07:00
Shireesh Anjal 04f92f1291
Change endpoint format for per-bucket metrics (#19655)
Per-bucket metrics endpoints always start with /bucket and the bucket
name is appended to the path. e.g. if the collector path is /bucket/api,
the endpoint for the bucket "mybucket" would be
/minio/metrics/v3/bucket/api/mybucket

Change the existing bucket api endpoint accordingly from /api/bucket to
/bucket/api
2024-05-02 10:37:57 -07:00
Bala FA e5b16adb1c
Add cluster IAM metrics in metrics-v3 (#19595)
Signed-off-by: Bala.FA <bala@minio.io>
2024-05-02 01:20:42 -07:00
Harshavardhana 402a3ac719
support compression after rotation of logs (#19647) 2024-05-01 15:38:07 -07:00
Aditya Manthramurthy f3d61c51fc
fix: Filter out cust. AssumeRole Token for audit (#19646)
The `Token` parameter is a sensitive value that should not be output in the Audit log for STS AssumeRoleWithCustomToken API.

Bonus: Add a simple tool that echoes audit logs to the console.
2024-05-01 14:31:13 -07:00
Klaus Post 0cde17ae5d
Return listing when exceeding min disk errors (#19644)
When listing, with drives returning `errFileNotFound,` `errVolumeNotFound`, or `errUnformattedDisk,`, 
we could get below `minDisks` drives being left.

This would result in a quorum never being reachable for any object. Therefore, the listing 
would continue, but no results would ever be produced.

Include `fnf` in the mindisk check since it is incremented on these errors. This will stop 
listing when minDisks are left.

Allow `opts.minDisks` to not return errVolumeNotFound or errFileNotFound and return that. 
That will allow for good results even if disks return something else.

We switch `errUnformattedDisk` to a regular error. If we have enough of those, we should just fail.
2024-05-01 10:59:08 -07:00
Harshavardhana 8c1bba681b
add logrotate support for MinIO logs (#19641) 2024-05-01 10:57:52 -07:00
Klaus Post dbfb5e797b
Wait one minute after startup to restart decommissioning (#19645)
Typically not all drives are connected, so we delay 3 minutes before resuming.
This greatly reduces risk of starting to list unconnected drives, or drives we risk being disconnected soon.

This delay is not applied when starting with an admin call.
2024-05-01 08:18:21 -07:00
Harshavardhana 08ff702434
enhance ListSVCs() API to return more info to avoid InfoSvc() (#19642)
ConsoleUI like applications rely on combination of

ListServiceAccounts() and InfoServiceAccount() to populate
UI elements, however individually these calls can be slow
causing the entire UI to load sluggishly.
2024-05-01 05:41:13 -07:00
Klaus Post 0e2148264a
Fix --stfp "mac-algos=..." overwrites cipher algorithms (#19643)
Setting MAC algorithms overwrites cipher algorithms.

Followup to #19636
2024-05-01 04:07:40 -07:00
Krishnan Parthasarathi 7926401cbd
ilm: Handle DeleteAllVersions action differently for DEL markers (#19481)
i.e., this rule element doesn't apply to DEL markers.

This is a breaking change to how ExpiredObejctDeleteAllVersions
functions today. This is necessary to avoid the following highly probable
footgun scenario in the future.

Scenario:
The user uses tags-based filtering to select an object's time to live(TTL). 
The application sometimes deletes objects, too, making its latest
version a DEL marker. The previous implementation skipped tag-based filters
if the newest version was DEL marker, voiding the tag-based TTL. The user is
surprised to find objects that have expired sooner than expected.

* Add DelMarkerExpiration action

This ILM action removes all versions of an object if its
the latest version is a DEL marker.

```xml
<DelMarkerObjectExpiration>
    <Days> 10 </Days>
</DelMarkerObjectExpiration>
```

1. Applies only to objects whose,
  • The latest version is a DEL marker.
  • satisfies the number of days criteria
2. Deletes all versions of this object
3. Associated rule can't have tag-based filtering

Includes,
- New bucket event type for deletion due to DelMarkerExpiration
2024-04-30 18:11:10 -07:00
Harshavardhana 8161411c5d
fix: a crash in RemoveReplication target (#19640)
calling a remote target remove with a perfectly
well constructed ARN can lead to a crash for a bucket
with no replication configured.

This PR fixes, and adds a crash check for ImportMetadata
as well.
2024-04-30 18:09:56 -07:00
Klaus Post f64dea2aac
Allow custom SFTP algorithm selection (#19636)
Algorithms are comma separated.
Note that valid values does not in all cases represent default values.

`--sftp=pub-key-algos=...` specifies the supported client public key
authentication algorithms. Note that this doesn't include certificate types
since those use the underlying algorithm. This list is sent to the client if
it supports the server-sig-algs extension. Order is irrelevant.

Valid values
```
ssh-ed25519
sk-ssh-ed25519@openssh.com
sk-ecdsa-sha2-nistp256@openssh.com
ecdsa-sha2-nistp256
ecdsa-sha2-nistp384
ecdsa-sha2-nistp521
rsa-sha2-256
rsa-sha2-512
ssh-rsa
ssh-dss
```

`--sftp=kex-algos=...` specifies the supported key-exchange algorithms in preference order.

Valid values:

```
curve25519-sha256
curve25519-sha256@libssh.org
ecdh-sha2-nistp256
ecdh-sha2-nistp384
ecdh-sha2-nistp521
diffie-hellman-group14-sha256
diffie-hellman-group16-sha512
diffie-hellman-group14-sha1
diffie-hellman-group1-sha1
```

`--sftp=cipher-algos=...` specifies the allowed cipher algorithms.
If unspecified then a sensible default is used.

Valid values:
```
aes128-ctr
aes192-ctr
aes256-ctr
aes128-gcm@openssh.com
aes256-gcm@openssh.com
chacha20-poly1305@openssh.com
arcfour256
arcfour128
arcfour
aes128-cbc
3des-cbc
```

`--sftp=mac-algos=...` specifies a default set of MAC algorithms in preference order.
This is based on RFC 4253, section 6.4, but with hmac-md5 variants removed because they have
reached the end of their useful life.

Valid values:

```
hmac-sha2-256-etm@openssh.com
hmac-sha2-512-etm@openssh.com
hmac-sha2-256
hmac-sha2-512
hmac-sha1
hmac-sha1-96
```
2024-04-30 08:15:45 -07:00
Shubhendu 6579304d8c
Suppress metrics with zero values (#19638)
This would reduce the size of data in response of metrics
listing. While graphing we can default these metrics with
a zero value if not found.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-04-30 08:05:22 -07:00
Klaus Post 3cf8a7c888
Always unfreeze when connection dies (#19634)
Unfreeze as soon as the incoming connection is terminated and don't wait for everything to complete.

We don't want to keep the services frozen if something becomes stuck.
2024-04-29 10:39:04 -07:00
Harshavardhana a372c6a377
a bunch of fixes for error handling (#19627)
- handle errFileCorrupt properly
- micro-optimization of sending done() response quicker
  to close the goroutine.
- fix logger.Event() usage in a couple of places
- handle the rest of the client to return a different error other than
  lastErr() when the client is closed.
2024-04-28 10:53:50 -07:00
Poorna 9e95703efc
iam reload policy mapping of STS users properly (#19626) 2024-04-27 03:04:10 -07:00
Anis Eleuch d8e05aca81
heal/list: Fix rare incomplete listing with flaky internode connections (#19625)
listPathRaw() counts errDiskNotFound as a valid error to indicate a
listing stream end. However, storage.WalkDir() is allowed to return
errDiskNotFound anytime since grid.ErrDisconnected is converted to
errDiskNotFound.

This affects fresh disk healing and should affect S3 listing as well.
2024-04-26 12:52:52 -07:00
Praveen raj Mani 410a1ac040
Handle failures in pool rebalancing (#19623) 2024-04-26 12:29:28 -07:00
Shireesh Anjal 4caa3422bd
Add process metrics in metrics-v3 (#19612)
endpoint: /minio/metrics/v3/system/process
metrics:
- locks_read_total
- locks_write_total
- cpu_total_seconds
- go_routine_total
- io_rchar_bytes
- io_read_bytes
- io_wchar_bytes
- io_write_bytes
- start_time_seconds
- uptime_seconds
- file_descriptor_limit_total
- file_descriptor_open_total
- syscall_read_total
- syscall_write_total
- resident_memory_bytes
- virtual_memory_bytes
- virtual_memory_max_bytes

Since the standard process collector implements only a subset of these
metrics, remove it and implement our own custom process collector that
captures all the process metrics we need.
2024-04-26 09:07:23 -07:00
Anis Eleuch 135874ebdc
heal: Avoid marking a bucket as done when remote drives are offline (#19587) 2024-04-25 23:32:14 -07:00
Poorna e7aa26dc29
fix: allow DeleteObject unversioned objects with insufficient read quorum (#19581)
Since the object is being permanently deleted, the lack of read quorum should not
matter as long as sufficient disks are online to complete the deletion with parity
requirements.

If several pools have the same object with insufficient read quorum, attempt to
delete object from all the pools where it exists
2024-04-25 17:31:12 -07:00
Harshavardhana c54ffde568
add metrics ioerror counter for alerts on I/O errors (#19618) 2024-04-25 15:01:31 -07:00
Anis Eleuch 9a3c992d7a
heal: Fix regression in healing a new fresh drive (#19615) 2024-04-25 14:55:41 -07:00
Aditya Manthramurthy 0c855638de
fix: LDAP init. issue when LDAP server is down (#19619)
At server startup, LDAP configuration is validated against the LDAP
server. If the LDAP server is down at that point, we need to cleanly
disable LDAP configuration. Previously, LDAP would remain configured but
error out in strange ways because initialization did not complete
without errors.
2024-04-25 14:28:16 -07:00
Ramon de Klein 4c0acba62d
Fixes an internal error while force-deleting a bucket (#19614) 2024-04-25 09:27:27 -07:00
Aditya Manthramurthy 62c3cdee75
fix: IAM LDAP access key import bug (#19608)
When importing access keys (i.e. service accounts) for LDAP accounts,
we are requiring groups to exist under one of the configured group base
DNs. This is not correct. This change fixes this by only checking for
existence and storing the normalized form of the group DN - we do not
return an error if the group is not under a base DN.

Test is updated to illustrate an import failure that would happen
without this change.
2024-04-25 08:50:16 -07:00
Aditya Manthramurthy 3212d0c8cd
fix: IAM import for LDAP should replace mappings (#19607)
Existing IAM import logic for LDAP creates new mappings when the
normalized form of the mapping key differs from the existing mapping key
in storage. This change effectively replaces the existing mapping key by
first deleting it and then recreating with the normalized form of the
mapping key.

For e.g. if an older deployment had a policy mapped to a user DN -

`UID=alice1,OU=people,OU=hwengg,DC=min,DC=io`

instead of adding a mapping for the normalized form -

`uid=alice1,ou=people,ou=hwengg,dc=min,dc=io`

we should replace the existing mapping.

This ensures that duplicates mappings won't remain after the import.

Some additional cleanup cases are also covered. If there are multiple
mappings for the name normalized key such as:

`UID=alice1,OU=people,OU=hwengg,DC=min,DC=io`
`uid=alice1,ou=people,ou=hwengg,DC=min,DC=io`
`uid=alice1,ou=people,ou=hwengg,dc=min,dc=io`

we check if the list of policies mapped to all these keys are exactly
the same, and if so remove all of them and create a single mapping with
the normalized key. However, if the policies mapped to such keys differ,
the import operation returns an error as the server cannot automatically
pick the "right" list of policies to map.
2024-04-25 08:49:53 -07:00
Harshavardhana 1d03bea965
support preserving renameData() on inlined content during overwrites (#19609)
extending #19548 to inlined-data as well.
2024-04-24 18:14:08 -07:00
jiuker df93ff92ba
fix: site-replication will reset group status when add user (#19594) 2024-04-24 08:54:24 -07:00
Shireesh Anjal 77d5331e85
Fix few wrongly defined metric types (#19586)
`minio_cluster_webhook_queue_length` was wrongly defined as `counter`
where-as it should be `gauge`

Following were wrongly defined as `gauge` when they should actually be
`counter`:

- minio_bucket_replication_sent_bytes
- minio_bucket_replication_received_bytes
- minio_bucket_replication_total_failed_bytes
- minio_bucket_replication_total_failed_count
2024-04-23 23:19:40 -07:00
Bala FA 14cdadfb56
Add cluster notification metrics in metrics-v3 (#19533)
Signed-off-by: Bala.FA <bala@minio.io>
2024-04-23 21:10:35 -07:00
Harshavardhana f3a52cc195
simplify listener implementation setup customizations in right place (#19589) 2024-04-23 21:08:47 -07:00
Aditya Manthramurthy 7640cd24c9
fix: avoid some IAM import errors if LDAP enabled (#19591)
When LDAP is enabled, previously we were:

- rejecting creation of users and groups via the IAM import functionality

- throwing a `not a valid DN` error when non-LDAP group mappings are present

This change allows for these cases as we need to support situations
where the MinIO server contains users, groups and policy mappings
created before LDAP was enabled.
2024-04-23 18:23:08 -07:00
Shireesh Anjal f7b665347e
Add system CPU metrics to metrics-v3 (#19560)
endpoint: /minio/metrics/v3/system/cpu

metrics:
- minio_system_cpu_avg_idle
- minio_system_cpu_avg_iowait
- minio_system_cpu_load
- minio_system_cpu_load_perc
- minio_system_cpu_nice
- minio_system_cpu_steal
- minio_system_cpu_system
- minio_system_cpu_user
2024-04-23 16:56:12 -07:00
Harshavardhana 9693c382a8
make renameData() more defensive during overwrites (#19548)
instead upon any error in renameData(), we still
preserve the existing dataDir in some form for
recoverability in strange situations such as out
of disk space type errors.

Bonus: avoid running list and heal() instead allow
versions disparity to return the actual versions,
uuid to heal. Currently limit this to 100 versions
and lesser disparate objects.

an undo now reverts back the xl.meta from xl.meta.bkp
during overwrites on such flaky setups.

Bonus: Save N depth syscalls via skipping the parents
upon overwrites and versioned updates.

Flaky setup examples are stretch clusters with regular
packet drops etc, we need to add some defensive code
around to avoid dangling objects.
2024-04-23 10:15:52 -07:00
jiuker ee1047bd52
fix: can't get total disksize for decom status (#19585) 2024-04-23 04:33:28 -07:00
Seiya 5ea5ab162b
Remove leading zero strings in return value of (*xlMetaV2)getDataDirs() (#19567)
remove leading zero strings in return value of getDataDirs()
2024-04-22 22:07:37 -07:00
Klaus Post b5a09ff96b
Fix RenameData data race (#19579)
RenameData could start operating on inline data after timing out 
and the call returned due to WithDeadline.

This could cause a buffer to write to the inline data being written.

Since no writes are in `RenameData` and the call is canceled, 
this doesn't present a corruption issue. But a race is a race and 
should be fixed.

Copy inline data to a fresh buffer.
2024-04-22 22:07:19 -07:00
Harshavardhana 95c65f4e8f
do not panic on rebalance during server restarts (#19563)
This PR makes a feasible approach to handle all the scenarios
that we must face to avoid returning "panic."

Instead, we must return "errServerNotInitialized" when a
bucketMetadataSys.Get() is called, allowing the caller to
retry their operation and wait.

Bonus fix the way data-usage-cache stores the object.
Instead of storing usage-cache.bin with the bucket as
`.minio.sys/buckets`, the `buckets` must be relative
to the bucket `.minio.sys` as part of the object name.

Otherwise, there is no way to decommission entries at
`.minio.sys/buckets` and their final erasure set positions.

A bucket must never have a `/` in it. Adds code to read()
from existing data-usage.bin upon upgrade.
2024-04-22 10:49:30 -07:00
Harshavardhana 6bfff7532e
re-use transport and set stronger backwards compatible Ciphers (#19565)
This PR fixes a few things

- FIPS support for missing for remote transports, causing
  MinIO could end up using non-FIPS Ciphers in FIPS mode

- Avoids too many transports, they all do the same thing
  to make connection pooling work properly re-use them.

- globalTCPOptions must be set before setting transport
  to make sure the client conn deadlines are honored properly.

- GCS warm tier must re-use our transport

- Re-enable trailing headers support.
2024-04-21 04:43:18 -07:00
Harshavardhana 1aa8896ad6 Revert "cleanup: Simplify usage of MinIOSourceProxyRequest (#19553)"
This reverts commit 928c0181bf.

This change was not correct, reverting.

We track 3 states with the ProxyRequest header - if replication process wants
to know if object is already replicated with a HEAD, it shouldn't proxy back
   - Poorna
2024-04-20 02:05:54 -07:00
Krishnan Parthasarathi 3e32ceb39f
Disable trailing header support for MinIO tiers (#19561)
AWS S3 trailing header support was recently enabled on the warm tier
client connection to MinIO type remote tiers. With this enabled, we are
seeing the following error message at http transport layer.

> Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 400 Bad Request\r\nContent-Type: text/plain; charset=utf-8\r\nConnection: close\r\n\r\n400 Bad Request"; err=<nil>

This is an interim fix until we identify the root cause for this behaviour in the
minio-go client package.
2024-04-19 19:32:25 -07:00
jiuker 9205434ed3
fix: ignore signaturev2 for policy header check (#19551) 2024-04-19 09:45:54 -07:00
Harshavardhana cd50e9b4bc
make LRU cache global for internode tokens (#19555) 2024-04-19 09:45:14 -07:00
Klaus Post ec816f3840
Reduce parallelReader allocs (#19558) 2024-04-19 09:44:59 -07:00
Klaus Post 5f774951b1
Store object EC in metadata header (#19534)
Keep the EC in header, so it can be retrieved easily for dynamic quorum calculations.

To not force a full metadata decode on every read the value will be 0/0 for data written in previous versions.

Size is expected to increase by 2 bytes per version, since all valid values can be represented with 1 byte each.

Example:
```
λ xl-meta xl.meta
{
  "Versions": [
    {
      "Header": {
        "EcM": 4,
        "EcN": 8,
        "Flags": 6,
        "ModTime": "2024-04-17T11:46:25.325613+02:00",
        "Signature": "0a409875",
        "Type": 1,
        "VersionID": "8e03504e11234957b2727bc53eda0d55"
      },
...
```

Not used for operations yet.
2024-04-19 09:43:43 -07:00
Harshavardhana 72f5cb577e
optimize ftp/sftp upload() implementations to avoid CPU load (#19552) 2024-04-19 05:23:42 -07:00
Robert Lützner 928c0181bf
cleanup: Simplify usage of MinIOSourceProxyRequest (#19553)
This replaces a convoluted condition that ultimately evaluated to

"is this HTTP header present in the request or not?"
2024-04-19 05:23:31 -07:00
Harshavardhana 03767d26da
fix: get rid of large buffers (#19549)
these lead to run-away usage of memory
beyond which the Go's GC can handle, we
have to re-visit this differently, remove
this for now.
2024-04-19 04:26:59 -07:00
Sveinn 108e6f92d4
updating tests to use new mc --enc flags (#19508) 2024-04-19 01:43:09 -07:00