Commit graph

325 commits

Author SHA1 Message Date
Kyle Evans e55512504d Prepare the system for _FORTIFY_SOURCE
Notably:
- libc needs to #undef some of the macros from ssp/* for underlying
  implementations
- ssp/* wants a __RENAME() macro (snatched more or less from NetBSD)

There's some extra hinkiness included for read(), since libc spells it
as "_read" while the rest of the world spells it "read."

Reviewed by:	imp, ngie
Sponsored by:	Stormshield
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D32307
2024-05-13 00:23:50 -05:00
Brooks Davis d7847a8d35 lib{c,sys}: return wrapped syscall APIs to libc
These provide standard APIs, but are implemented using another system
call (e.g., pipe implemented in terms of pipe2) or are interposed by the
threading library to support cancelation.

After discussion with kib (see D44111), I've concluded that it is
better to keep most public interfaces in libc with as little
as possible in libsys.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44241
2024-03-13 18:36:02 +00:00
Brooks Davis 1e2502bfca libc: move MD sys related symbols to libsys
This is a mix genuine MD interfaces and compat symbols like _getlogin.

Reviewed by:	kib, emaste, imp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/908
2024-02-05 20:34:56 +00:00
Brooks Davis cdecda8da3 libc: move rfork_thread(3) to libsys
rfork_thread(3) is assembly that makes syscalls directly and uses
cerror so it belongs in libsys.

Reviewed by:	kib, emaste, imp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/908
2024-02-05 20:34:56 +00:00
Brooks Davis 31a46e2cc8 libc: Move per-arch sys/Makefile.inc to libsys
libc/<arch>/sys/Makefile.inc -> libsys/<arch>/Makefile.sys.

Require that libsys/<arch>/Makefile.sys exist.  At least for current
archtiectures, it's not possible for an architecture to not have and MD
syscall bits.

powerpcspe/Makefile.sys's structure means it had to be modified when moved
so rename detection won't work, but it has trivial contents so the
history is unimportant.

Reviewed by:	kib, emaste, imp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/908
2024-02-05 20:34:55 +00:00
Brooks Davis 4bc66c0f9f libc: remove remaining x86 sys bits to libsys
Reviewed by:	kib, emaste, imp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/908
2024-02-05 20:34:55 +00:00
Brooks Davis 8269e7673c libsys: relocate implementations and manpages
Remove core system call implementations and documentation to lib/libsys
and lib/libsys/<arch> from lib/libc/sys and lib/libc/<arch>/<sys>.
Update paths to allow libc to find them in their new home.

Reviewed by:	kib, emaste, imp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/908
2024-02-05 20:34:55 +00:00
Mark Johnston 4dedcb1bb5 libc/amd64: Disable ASAN for amd64_archlevel.c
The code in this file runs before the sanitizer can initialize its
shadow map.

Fixes:	ad2fac552c ("lib/libc/amd64: add archlevel-based simd dispatch framework")
2024-01-27 22:12:01 -05:00
Robert Clausecker fb197a4f77 lib/libc/amd64/string: add memrchr() scalar, baseline implementation
The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.

The baseline implementation is similar to timingsafe_memcmp.  It's
slightly slower than memchr() due to the more complicated main
loop, but I don't think that can be significantly improved.

Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D42925
2023-12-25 15:00:05 +01:00
Robert Clausecker ea7b13771c lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()
This picks up the accelerated implementation of memccpy().

Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision: https://reviews.freebsd.org/D42902
2023-12-25 14:59:48 +01:00
Robert Clausecker fc0e38a7a6 lib/libc/amd64/string: add memccpy scalar, baseline implementation
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced
implementation of memccpy for amd64. A scalar implementation calling
into memchr and memcpy to do the job is provided, too.

Please note that this code does not behave exactly the same as the C
implementation of memccpy for overlapping inputs. However, overlapping
inputs are not allowed for this function by ISO/IEC 9899:1999 and neither
has the C implementation any code to deal with the possibility. It just
proceeds byte-by-byte, which may or may not do the expected thing for
some overlaps. We do not document whether overlapping inputs are
supported in memccpy(3).

Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D42902
2023-12-25 14:59:42 +01:00
Robert Clausecker 2b7b03b7ae lib/libc/amd64/string: implement strlcat() through strlcpy()
This should pick up our optimised memchr(), strlen(), and strlcpy()
when strlcat() is called.

Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D42863
2023-12-25 14:59:31 +01:00
Robert Clausecker 74d6cfad54 lib/libc/amd64/string: add strlcpy scalar, baseline implementation
Somewhat similar to stpncpy, but different in that we need to compute
the full source length even if the buffer is shorter than the source.

strlcat is implemented as a simple wrapper around strlcpy.  The scalar
implementation of strlcpy just calls into strlen() and memcpy() to do
the job.

Perf-wise we're very close to stpncpy.  The code is slightly slower as
it needs to carry on with finding the source string length even if the
buffer ends before the string.

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision: https://reviews.freebsd.org/D42863
2023-12-25 14:56:05 +01:00
Robert Clausecker aff9143a24 lib/libc/amd64/string/strcat.S: enable use of SIMD
strcat has a bespoke scalar assembly implementation we
inherited from NetBSD.  While it performs well, it is
better to call into our SIMD implementations if any SIMD
features are available at all.  So do that and implement
strcat() by calling into strlen() and strcpy() if these
are available.

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Reviison: https://reviews.freebsd.org/D42600
2023-12-25 14:55:53 +01:00
Robert Clausecker e19d46c808 lib/libc/amd64/string: implement strncpy() by calling stpncpy()
Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D42519
2023-12-25 14:55:48 +01:00
Robert Clausecker 90253d49db lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple
function.  A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.

I'm quite happy with the performance.  glibc only beats us for very long
strings, likely due to the use of AVX-512.  The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it.  Still beats the old C code, except
for very short strings.

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision: https://reviews.freebsd.org/D42519
2023-12-25 14:55:42 +01:00
Robert Clausecker fd2ecd91ae lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps.
On amd64, we now have an optimised implementation of strcspn(),
so instead of implementing the inner loop manually, just call
into the optimised routine.

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D42346
2023-12-25 14:55:30 +01:00
Robert Clausecker 2ed514a220 lib/libc/amd64/string: add strrchr scalar, baseline implementation
The baseline implementation is very straightforward, while the scalar
implementation suffers from register pressure and the need to use SWAR
techniques similar to those used for strchr().

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D42217
2023-12-25 14:55:22 +01:00
Robert Clausecker 14289e973f lib/libc/amd64/string: add strncmp scalar, baseline implementation
The scalar implementation is fairly straightforward and merely unrolled
four times.  The baseline implementation closely follows D41971 with
appropriate extensions and extra code paths to pay attention to string
length.

Performance is quite good.  We beat both glibc (except for very long
strings, but they likely use AVX which we don't) and Bionic (except for
medium-sized aligned strings, where we are still in the same ballpark).

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision: https://reviews.freebsd.org/D42122
2023-12-25 14:55:13 +01:00
Robert Clausecker f4fc317c36 lib/libc/amd64/string: implement strpbrk() through strcspn()
This lets us use our optimised strcspn() routine for strpbrk() calls.

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D41980
2023-12-25 14:54:58 +01:00
Robert Clausecker bca25680b9 lib/libc/amd64/string/strcmp.S: add baseline implementation
This is the most complicated one so far.  The basic idea is to process
the bulk of the string in aligned blocks of 16 bytes such that one
string runs ahead and the other runs behind.  The string that runs ahead
is checked for NUL bytes, the one that runs behind is compared with the
corresponding chunk of the string that runs ahead.  This trades an extra
load per iteration for the very complicated block-reassembly needed in
the other implementations (bionic, glibc).  On the flip side, we need
two code paths depending on the relative alignment of the two buffers.

The initial part of the string is compared directly if it is known not
to cross a page boundary.  Otherwise, a complex slow path to avoid
crossing into unmapped memory commences.

Performance-wise we beat bionic for misaligned strings (i.e. the strings
do not share an alignment offset) and reach comparable performance for
aligned strings.  glibc is a bit better as it has a special kernel for
AVX-512, where this stuff is a bit easier to do.

Sponsored by:	The FreeBSD Foundation
Tested by:	developers@, exp-run
Approved by:	mjg
MFC after:	1 month
MFC to:		stable/14
PR:		275785
Differential Revision:	https://reviews.freebsd.org/D41971
2023-12-25 14:54:33 +01:00
Robert Clausecker c91cd7d03a lib/libc/amd64/string/strcspn.S: always return earliest match in 17--32 char case
When matching against a set of 17--32 characters, strcspn() uses two
invocations of PCMPISTRI to match against the first 16 characters
of the set and then the remaining characters.  If a match was found in
the first half of the set, the code originally immediately returned
that match.  However, it is possible for a match in the second half of
the set to occur earlier in the vector, leading to that match being
overlooked.

Fix the code by checking if there is a match in the second half of the
set and taking the earlier of the two matches.

The correctness of the function has been verified with extended unit
tests and test runs against the glibc test suite.

Approved by:	mjg (implicit, via IRC)
MFC after:	1 week
MFC to:		stable/14
2023-12-21 03:17:17 +01:00
Brooks Davis 8f465f509b {amd64,i386}/SYS.h: add _SYSCALL and _SYSCALL_BODY
Add a _SYSCALL(name) which calls the SYS_name syscall.  Use it to add a
_SYSCALL_BODY() macro which invokes the syscall and calls cerror as
required.  Use the latter to implement PSEUDO() and RSYSCALL().

Reviewed by:	imp, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D43059
2023-12-18 22:28:42 +00:00
Brooks Davis ec27c0bb3e libc: don't needlessly add vfork.o to NOASM
For architectures where vfork.S was named Ovfork.S this was needed, but
it was always pointless here as an entry in either MDASM or NOASM is
equivalent.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D42914
2023-12-06 20:49:08 +00:00
Brooks Davis 7893419d49 Remove never implemented sbrk and sstk syscalls
Both system calls were stubs returning EOPNOTSUPP and libc did not
provide _ or __sys_ prefixed symbols.  The actual implementation of
sbrk(2) is on top of the undocumented break(2) system call.

Technically this is a change in ABI, but no non-contrived program ever
called these syscalls.

Reviewed by:	kib, emaste
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D42872
2023-12-04 20:36:08 +00:00
Warner Losh dc36d6f9bb lib: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:28 -07:00
Brooks Davis c704518681 libc: centralize a few numeric symbols
fabs, __infinity, and __nan are universally implemented so declare them
in gen/Symbol.map.

We would also include __flt_rounds, but  it's under FBSD_1.3 on arm so
until that's gone we're stuck with it.  Likewise, everyone but i386
implements fp[gs]etmask.

Reviewed by:	imp, kib, emaste
Differential Revision:	https://reviews.freebsd.org/D42618
2023-11-15 23:42:37 +00:00
Brooks Davis 5d79b5445e libc: centralize makecontext symbols
Declare makecontext() and __makecontext() symbols centrally as they are
always implemented.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D42617
2023-11-15 23:42:18 +00:00
Brooks Davis 1c656143be libc: centralize {_,sig,}{set,long}jmp symbols
These symbols are universally exposed and documented so declare them
centrally.  Double- and triple-underscore versions exist on some
platforms, but leave those alone for now.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D42616
2023-11-15 23:41:35 +00:00
Brooks Davis ff3a9d8e29 libc: centralize ntoh symbols
These are implemented by net/ntoh.c via headers and compiler intrinsics
so declare them in net/Symbol.map.

Reviewed by:	imp, kib, emaste
Differential Revision:	https://reviews.freebsd.org/D42615
2023-11-15 23:40:54 +00:00
Brooks Davis e4a1800f06 libc: further centralize syscall symbols
All architectures necessarily implement _exit(2) and vfork(2) so
declare them in sys/Symbol.map.

Reviewed by:	imp, kib, emaste
Differential Revision:	https://reviews.freebsd.org/D42614
2023-11-15 23:40:33 +00:00
Brooks Davis 1ca63a8219 libc: Remove empty comments in Symbol.map
These were left over from $FreeBSD$ removal.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D42612
2023-11-15 17:51:03 +00:00
Brooks Davis b73eace889 libc/<arch>/sys/Makefile.inc: remove cruft
Remove stray blank lines left over from $FreeBSD$ removal as well as
some CVS-era (perhaps pre-repocopy) version comments.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D42611
2023-11-15 17:50:53 +00:00
Warner Losh 559a218c9b libc: Purge unneeded cdefs.h
These sys/cdefs.h are not needed. Purge them. They are mostly left-over
from the $FreeBSD$ removal. A few in libc are still required for macros
that cdefs.h defines. Keep those.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D42385
2023-11-01 16:44:30 -06:00
Robert Clausecker 5048c1b855 lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison
logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE)
implementation was omitted this time as I was not able to get it to
perform adequately.  Best I got was 8% over the scalar version for
long inputs, but slower for short inputs.

Sponsored by:	The FreeBSD Foundation
Approved by:	security (cperciva)
Inspired by:	https://github.com/moon-chilled/fancy-memcmp
Differential Revision:	https://reviews.freebsd.org/D41696
2023-10-15 15:25:53 -04:00
Robert Clausecker 76c2b331bc lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
data operand independent timing by Intel.

Sponsored by:	The FreeBSD Foundation
Approved by:	security (cperciva)
Differential Revision:	https://reviews.freebsd.org/D41673
2023-10-15 15:19:04 -04:00
Robert Clausecker 953b93cf24 lib/libc/amd64/string/memcmp.S: harden against phony buffer lengths
When memcmp(a, b, len) (or equally, bcmp) is called with a phony length
such that a + len < a, the code would malfunction and not compare the
two buffers correctly.  While such arguments are illegal (buffers do not
wrap around the end of the address space), it is neverthless conceivable
that people try things like memcmp(a, b, SIZE_MAX) to compare a and b
until the first mismatch, in the knowledge that such a mismatch exists,
expecting memcmp() to stop comparing somewhere around the mismatch.
While memcmp() is usually written to confirm to this assumption, no
version of ISO/IEC 9899 guarantees this behaviour (in contrast to
memchr() for which it is).

Neverthless it appears sensible to at least not grossly misbehave on
phony lengths.  This change hardens memcmp() against this case by
comparing at least until the end of the address space if a + len
overflows a 64 bit integer.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg (blanket, via IRC)
See also:	b2618b651b
MFC after:	1 week
2023-09-16 00:20:32 -04:00
Robert Clausecker 52d4a4d4e0 lib/libc/amd64/string/strcspn.S: fix behaviour with sets of 17--32
When a string is matched against a set of 17--32 characters, each chunk
of the string is matched first against the first 16 characters of the
set and then against the remaining characters.  We also check at the
same time if the string has a nul byte in the current chunk, terminating
the search if it does.

Due to misconceived logic, the order of checks was "first half of set,
nul byte, second half of set", meaning that a match with the second half
of the set was ignored when the string ended in the same 16 bytes.
Reverse the order of checks to fix this problem.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg (blanket, via IRC)
MFC after:	1 week
MFC to:		stable/14
2023-09-11 22:58:43 -04:00
Robert Clausecker b2618b651b lib/libc/amd64/string/memchr.S: fix behaviour with overly long buffers
When memchr(buf, c, len) is called with a phony len (say, SIZE_MAX),
buf + len overflows and we have buf + len < buf.  This confuses the
implementation and makes it return incorrect results.  Neverthless we
must support this case as memchr() is guaranteed to work even with
phony buffer lengths, as long as a match is found before the buffer
actually ends.

Sponsored by:	The FreeBSD Foundation
Reported by:	yuri, des
Tested by:	des
Approved by:	mjg (blanket, via IRC)
MFC after:	1 week
MFC to:		stable/14
PR:		273652
2023-09-10 08:52:59 -04:00
Robert Clausecker 331737281c lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement
strnlen(3) with better perofrmance.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
Differential Revision:	https://reviews.freebsd.org/D41598
2023-09-08 17:22:31 -04:00
Robert Clausecker de12a689fa lib/libc/amd64/string: add memchr(3) scalar, baseline implementation
This is conceptually similar to strchr(3), but there are
slight changes to account for the buffer having an explicit
buffer length.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
Differential Revision:	https://reviews.freebsd.org/D41598
2023-09-08 17:22:20 -04:00
Robert Clausecker 7084133cde lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation
This is conceptually very similar to the strcspn(3) implementations
from D41557, but we can't do the fast paths the same way.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
Differential Revision:	https://reviews.freebsd.org/D41567
2023-09-08 17:21:59 -04:00
Robert Clausecker 474408bb79 lib/libc/amd64/string: add strcspn(3) scalar, x86-64-v2 implementation
This changeset adds both a scalar and an x86-64-v2 implementation
of the strcspn(3) function to libc. A baseline implementation does not
appear to be feasible given the requirements of the function.

The scalar implementation is similar to the generic libc implementation,
but expands the bit set into a byte set to reduce latency, improving
performance. This approach could probably be backported to the generic
C version to benefit other platforms.

The x86-64-v2 implementation is built around the infamous pcmpistri
instruction. An alternative implementation based on the Muła/Langdale
algorithm [1] was prototyped, but performed worse than the pcmpistri
approach except for sets of more than 16 characters with long input
strings.

All implementations provide special cases for the empty set (reduces to
strlen as well as single-character sets (reduces to strchr). The
x86-64-v2 kernel falls back to the scalar implementation for sets of
more than 32 characters. This limit could be raised by additional
multiples of 16 through the use of additional pcmpistri code paths, but
I consider this case to be too rare to be of importance.

[1]: http://0x80.pl/articles/simd-byte-lookup.html

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
MFC after:	1 week
MFC to:		stable/14
Differential Revision:	https://reviews.freebsd.org/D41557
2023-09-08 17:20:19 -04:00
Robert Clausecker 3d8ef251aa lib/libc/amd64/string/strchrnul.S: fix edge case in scalar code
When the buffer is immediately preceeded by the character we
are looking for and begins with one higher than that character,
and the buffer is misaligned, a match was errorneously detected
in the first character.  Fix this by changing the way we prevent
matches before the buffer from being detected: instead of
removing the corresponding bit from the 0x80..80 mask, set the
LSB of bytes before the buffer after xoring with the character we
look for.

The bug only affects amd64 with ARCHLEVEL=scalar (cf. simd(7)).
The change comes at a 2% performance impact for short strings
if ARCHLEVEL is set to scalar.  The default configuration is not
affected.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ strchrnul.scalar.0.out │       strchrnul.scalar.2.out       │
        │         sec/op         │   sec/op     vs base               │
Short                57.89µ ± 2%   59.08µ ± 1%  +2.07% (p=0.030 n=20)
Mid                  19.24µ ± 0%   19.73µ ± 0%  +2.53% (p=0.000 n=20)
Long                 11.03µ ± 0%   11.03µ ± 0%       ~ (p=0.547 n=20)
geomean              23.07µ        23.43µ       +1.53%

        │ strchrnul.scalar.0.out │       strchrnul.scalar.2.out        │
        │          B/s           │     B/s       vs base               │
Short               2.011Gi ± 2%   1.970Gi ± 1%  -2.02% (p=0.030 n=20)
Mid                 6.049Gi ± 0%   5.900Gi ± 0%  -2.47% (p=0.000 n=20)
Long                10.56Gi ± 0%   10.56Gi ± 0%       ~ (p=0.547 n=20)
geomean             5.045Gi        4.969Gi       -1.50%

MFC to:		stable/14
MFC after:	3 days
Approved by:	mjg (blanket, via IRC)
Sponsored by:	The FreeBSD Foundation
2023-08-25 21:21:54 +02:00
Robert Clausecker 8803f01e93 lib/libc/amd64/string/memcmp.S: add baseline implementation
This changeset adds a baseline implementation of memcmp and bcmp
for amd64. The same code is used for both functions with conditional
code were the behaviour differs (we need more precise output for the
memcmp case).

FreeBSD documents that memcmp returns the difference between the
mismatching characters. Slightly faster code would be possible could
we relax this requirement to the ISO/IEC 9899:1999 requirement of
merely returning a negative/positive integer or zero.

Performance is better than bionic and glibc, except for long strings
were the two are 13% faster. This could be because they use SSE4
ptest which we cannot use in a baseline kernel.

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
Differential Revision:	https://reviews.freebsd.org/D41442
2023-08-21 21:19:46 +02:00
Robert Clausecker 9fbea87028 lib/libc/amd64/string/stpcpy.S: add baseline implementation
This commit adds a baseline implementation of stpcpy(3) for amd64.
It performs quite well in comparison to the previous scalar implementation
as well as agains bionic and glibc (though glibc is faster for very long
strings).  Fiddle with the Makefile to also have strcpy(3) call into the
optimised stpcpy(3) code, fixing an oversight from D9841.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp ngie emaste
Approved by:	mjg kib
Fixes:		D9841
Differential Revision:	https://reviews.freebsd.org/D41349
2023-08-21 20:59:38 +02:00
Warner Losh d0b2dbfa0e Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:55:03 -06:00
Warner Losh 1d386b48a5 Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:42 -06:00
Warner Losh 2a63c3be15 Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:29 -06:00
Warner Losh 42b388439b Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:23 -06:00