`swizzle` had the wrong operands, and the vector masking boolean logic
was incorrect in the internal `shuffle_or_0` implementation. `shuffle`
was previously implemented as a dynamic swizzle, when it uses an
immediate operand for lane indices in the spec.
(cherry picked from commit 9cc3e7d32d150dd30d683c1a8cf0bd59676f14ab)
This necessitates marking bit_cast as ALWAYS_INLINE since emitting it as
a function call there will create an unnecessary potential SSE
registers -> plain registers/memory round-trip.
This warning is triggered when one accepts or returns vectors from a
function (that is not marked with [[gnu::target(...)]]) which would have
been otherwise passed in register if the current translation unit had
been compiled with more permissive flags wrt instruction selection (i.
e. if one adds -mavx2 to cmdline). This will never be a problem for us
since we (a) never use different instruction selection options across
ABI boundaries; (b) most of the affected functions are actually
TU-local.
Moreover, even if we somehow properly annotated all of the SIMD helpers,
calling them across ABI (or target) boundaries would still be very
dangerous because of inconsistent and bogus handling of
[[gnu::target(...)]] across compilers. See
https://github.com/llvm/llvm-project/issues/64706 and
https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
Is it another great upgrade to our PNG encoder like in 9aafaec259?
Well, not really - it's not a 2x or 55x improvement like you saw there,
but still it saves something:
- a screenshot of a blank Serenity desktop dropped from about 45 KiB
to 40 KiB.
- re-encoding NASA photo of the Earth to PNG again saves about 25%
(16.5 MiB -> 12.3 MiB), compared to not using filters.
[1]: https://commons.wikimedia.org/wiki/File:The_Blue_Marble_(remastered).jpg
This implements an 8-bit front stencil buffer. Stencil operations are
SIMD optimized. LibGL changes include:
* New `glStencilMask` and `glStencilMaskSeparate` functions
* New context parameter `GL_STENCIL_CLEAR_VALUE`