Bytes is a useful tool for managing multiple slices into the same region
of memory, and the other things it used to have been removed to reduce
complexity. The exact strategy for managing the multiple references is
no longer hard-coded, but instead backing by a customizable vtable.
- Removed ability to mutate the underlying memory from the `Bytes` type.
- Removed the "inline" (SBO) mechanism in `Bytes`. The reduces a large
amount of complexity, and improves performance when accessing the
slice of bytes, since a branch is no longer needed to check if the
data is inline.
- Removed `Bytes` knowledge of `BytesMut` (`BytesMut` may grow that
knowledge back at a future point.)
If `shallow_clone` is called with `&mut self`, and `Bytes` contains
`Vec`, then expensive CAS can be avoided, because no other thread
have references to this `Bytes` object.
Bench `split_off_and_drop` difference:
Before the diff:
```
test split_off_and_drop ... bench: 91,858 ns/iter (+/- 17,401)
```
With the diff:
```
test split_off_and_drop ... bench: 81,162 ns/iter (+/- 17,603)
```
Slice operation should return inline when possible
It is cheaper than atomic increment/decrement.
Before this patch:
```
test slice_avg_le_inline_from_arc ... bench: 28,582 ns/iter (+/- 3,880)
test slice_empty ... bench: 8,797 ns/iter (+/- 1,325)
test slice_large_le_inline_from_arc ... bench: 27,684 ns/iter (+/- 5,920)
test slice_short_from_arc ... bench: 27,439 ns/iter (+/- 5,783)
```
After this patch:
```
test slice_avg_le_inline_from_arc ... bench: 18,872 ns/iter (+/- 2,937)
test slice_empty ... bench: 9,136 ns/iter (+/- 1,908)
test slice_large_le_inline_from_arc ... bench: 18,052 ns/iter (+/- 2,981)
test slice_short_from_arc ... bench: 18,200 ns/iter (+/- 2,534)
```
Return empty `Bytes` object
Bench for `slice_empty` difference is
```
55 ns/iter (+/- 1) # before this patch
17 ns/iter (+/- 5) # with this patch
```
Bench for `slice_not_empty` is
```
25,058 ns/iter (+/- 1,099) # before this patch
25,072 ns/iter (+/- 1,593) # with this patch
```
The previous implementation didn't factor in a single `Bytes` handle
being stored in an `Arc`. This new implementation correctly impelments
both `Bytes` and `BytesMut` such that both are `Sync`.
The rewrite also increases the number of bytes that can be stored
inline.