Commit graph

38 commits

Author SHA1 Message Date
Harshavardhana caac9d216e
remove all the frivolous logs, that may or may not be actionable (#18922)
for actionable, inspections we have `mc support inspect`

we do not need double logging, healing will report relevant
errors if any, in terms of quorum lost etc.
2024-01-30 18:11:45 -08:00
Harshavardhana 80ca120088
remove checkBucketExist check entirely to avoid fan-out calls (#18917)
Each Put, List, Multipart operations heavily rely on making
GetBucketInfo() call to verify if bucket exists or not on
a regular basis. This has a large performance cost when there
are tons of servers involved.

We did optimize this part by vectorizing the bucket calls,
however its not enough, beyond 100 nodes and this becomes
fairly visible in terms of performance.
2024-01-30 12:43:25 -08:00
Harshavardhana dd2542e96c
add codespell action (#18818)
Original work here, #18474,  refixed and updated.
2024-01-17 23:03:17 -08:00
jiuker be02333529
feat: drive sub-sys to max timeout reload (#18501) 2023-11-27 09:15:06 -08:00
Harshavardhana e7b60c4d65
Add slow drive timeouts to match with active disk monitoring (#17701)
allow active disk-monitoring to be configurable, and use
these add deadlines in various call layers for various
syscalls.
2023-07-25 16:58:31 -07:00
Kaan Kabalak f64d62b01d
Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Kaan Kabalak 21fbe88e1f
Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
Harshavardhana 41e1654f9a
remove spurious logging for object not found (#15842) 2022-10-12 04:28:21 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Anis Elleuch 57d1f31054
Do not log erasure read failure when disk goes offline (#15277)
Avoid printing the following log

```
API: SYSTEM
Time: Fri Jul 08 2022 11:48:40 GMT+0100
Error: Error(disk not found) reading erasure shards at...

Backtrace:
0: internal/logger/logger.go:278:logger.LogIf()
1: cmd/bitrot-streaming.go:156:cmd.(*streamingBitrotReader).ReadAt()
2: cmd/erasure-decode.go:165:cmd.(*parallelReader).Read.func1()
```
2022-07-12 09:56:56 -07:00
Klaus Post 2451b9a75a
fix: hanging operations on PUT with slow IO (#13087)
#11878 added "keepHTTPResponseAlive" to CreateFile requests. 
The problem is that it will begin writing to the response before the 
body is read after 10 seconds. This will abort the writes on the 
client-side, since it assumes the server has received what it wants.

The proposed solution here is to monitor the completion of the body 
before beginning to send keepalive pings.

Fixes observed high number of goroutines stuck in `io.Copy` in 
`github.com/minio/minio/cmd.(*xlStorage).CreateFile` and 
`(*storageRESTClient).CreateFile` stuck in `http.DrainBody`.
2021-08-27 09:16:36 -07:00
Harshavardhana 1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00
Harshavardhana ebf75ef10d
fix: remove all unused code (#12360) 2021-05-24 09:28:19 -07:00
Klaus Post cde6469b88
Fix hanging erasure writes (#12253)
However, this slice is also used for closing the writers, so close is never called on these.

Furthermore when an error is returned from a write it is now reported to the reader.

bonus: remove unused heal param from `newBitrotWriter`.

* Remove copy, now that we don't mutate.
2021-05-17 08:32:28 -07:00
Harshavardhana d84261aa6d
fix: ensure proper usage of DataDir (#12300)
- GetObject() should always use a common dataDir to
  read from when it starts reading, this allows the
  code in erasure decoding to have sane expectations.

- Healing should always heal on the common dataDir, this
  allows the code in dangling object detection to purge
  dangling content.

These both situations can happen under certain types of
retries during PUT when server is restarting etc, some
namespace entries might be left over.
2021-05-14 16:50:47 -07:00
Harshavardhana e84f533c6c
add missing wait groups for certain io.Pipe() usage (#12264)
wait groups are necessary with io.Pipes() to avoid
races when a blocking function may not be expected
and a Write() -> Close() before Read() races on each
other. We should avoid such situations..

Co-authored-by: Klaus Post <klauspost@gmail.com>
2021-05-11 09:18:37 -07:00
Harshavardhana ff36baeaa7
fix: attempt to drain the ReadFileStream for connection pooling (#12208)
avoid time_wait build up with getObject requests if there are
pending callers and they timeout, can lead to time_wait states

Bonus share the same buffer pool with erasure healing logic,
additionally also fixes a race where parallel readers were
never cleanup during Encode() phase, because pipe.Reader end
was never closed().

Added closer right away upon an error during Encode to make
sure to avoid racy Close() while stream was still being
Read().
2021-05-04 10:12:08 -07:00
Harshavardhana 069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Klaus Post 2623338dc5
Inline small file data in xl.meta file (#11758) 2021-03-29 17:00:55 -07:00
Harshavardhana 9a6487319a
remove MINIO_IO_DEADLINE support (#11841)
this feature in actual deployment was found
to be not that useful, remove support for this
for now.
2021-03-20 02:47:04 -07:00
Harshavardhana 51a8619a79
[feat] Add configurable deadline for writers (#11822)
This PR adds deadlines per Write() calls, such
that slow drives are timed-out appropriately and
the overall responsiveness for Writes() is always
up to a predefined threshold providing applications
sustained latency even if one of the drives is slow
to respond.
2021-03-18 14:09:55 -07:00
Harshavardhana e019f21bda
fix: trigger heal if one of the parts are not found (#11358)
Previously we added heal trigger when bit-rot checks
failed, now extend that to support heal when parts
are not found either. This healing gets only triggered
if we can successfully decode the object i.e read
quorum is still satisfied for the object.
2021-01-27 10:21:14 -08:00
Harshavardhana f21d650ed4
fix: readData in bulk call using messagepack byte wrappers (#11228)
This PR refactors the way we use buffers for O_DIRECT and
to re-use those buffers for messagepack reader writer.

After some extensive benchmarking found that not all objects
have this benefit, and only objects smaller than 64KiB see
this benefit overall.

Benefits are seen from almost all objects from

1KiB - 32KiB

Beyond this no objects see benefit with bulk call approach
as the latency of bytes sent over the wire v/s streaming
content directly from disk negate each other with no
remarkable benefits.

All other optimizations include reuse of msgp.Reader,
msgp.Writer using sync.Pool's for all internode calls.
2021-01-07 19:27:31 -08:00
Harshavardhana d0027c3c41
do not use large buffers if not necessary (#11220)
without this change, there is a performance
regression for small objects GETs, this makes
the overall speed to go back to pre '59d363'
commit days.
2021-01-04 18:51:52 -08:00
Harshavardhana c4131c2798
feat: Small object optimization read data in single bulk call (#11207) 2021-01-03 11:27:57 -08:00
Harshavardhana 7f9498f43f
fix: ignore faulty drives and continue (#10511)
drives might return different types of errors
handle them individually, and for some errors
just log an error and continue
2020-09-18 12:09:05 -07:00
Klaus Post 2d58a8d861
Add storage layer contexts (#10321)
Add context to all (non-trivial) calls to the storage layer. 

Contexts are propagated through the REST client.

- `context.TODO()` is left in place for the places where it needs to be added to the caller.
- `endWalkCh` could probably be removed from the walkers, but no changes so far.

The "dangerous" part is that now a caller disconnecting *will* propagate down,  so a 
"delete" operation will now be interrupted. In some cases we might want to disconnect 
this functionality so the operation completes if it has started, leaving the system in a cleaner state.
2020-09-04 09:45:06 -07:00
Harshavardhana 75d44b3bae
add disk for more context in bitrot errors (#10296) 2020-08-20 09:41:15 -07:00
Harshavardhana f44cfb2863
use GlobalContext whenever possible (#9280)
This change is throughout the codebase to
ensure that all codepaths honor GlobalContext
2020-04-09 09:30:02 -07:00
Bala FA 95e89f1712
proactive deep heal object when a bitrot is detected (#9192) 2020-04-01 12:14:00 -07:00
Harshavardhana ff5bf51952 admin/heal: Fix deep healing to heal objects under more conditions (#8321)
- Heal if the part.1 is truncated from its original size
- Heal if the part.1 fails while being verified in between
- Heal if the part.1 fails while being at a certain offset

Other cleanups include make sure to flush the HTTP responses
properly from storage-rest-server, avoid using 'defer' to
improve call latency. 'defer' incurs latency avoid them
in our hot-paths such as storage-rest handlers.

Fixes #8319
2019-10-02 01:42:15 +05:30
Krishna Srinivas 58d90ed73c Avoid network transfer for bitrot verification during healing (#7375) 2019-07-08 13:51:18 -07:00
Krishna Srinivas 43be00ea1b Remove logs from bitrot-streaming.go as erasure layer is already logging (#7603) 2019-05-01 21:46:00 -07:00
Praveen raj Mani c113d4e49c Posix CreateFile should work for compressed lengths (#7584) 2019-04-30 16:27:31 -07:00
kannappanr 5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana 9629de8230 Add proper context based logging when bitrot stream calls fail (#7415) 2019-03-26 13:59:33 -07:00
Harshavardhana df35d7db9d Introduce staticcheck for stricter builds (#7035) 2019-02-13 18:29:36 +05:30
Krishna Srinivas 98c950aacd Streaming bitrot verification support (#7004) 2019-01-17 18:28:18 +05:30