Commit graph

141 commits

Author SHA1 Message Date
Warner Losh bd234c0d4c sort: Only build FreeBSD-specific ALTMON_x stuff when ATLMON_1 is defined
On MacOS, we bootstrap sort. Since ALTMON_* are not defined there, the
build blows up. Since we don't need this feature for the FreeBSD build
process, and since we won't use it unless we actually install the NL
files that have this data in it, just #ifdef it out for now. In the
extremely unlikely event that the FreeBSD bootstrap/build process grows
this dependency, we can evaluate the best solution then (which most
likely is going to be not depend on the local's month names).

Fixes:			3d44dce90a (MacOS builds and github CI)
Sponsored by:		Netflix
Reviewed by:		jrtc27, jlduran@gmail.com, markj
Differential Revision:	https://reviews.freebsd.org/D42868
2023-12-07 13:42:52 -07:00
Christos Margiolis 3d44dce90a sort: test against all month formats in month-sort
The CLDR specification [1] defines three possible month formats:

- Abbreviation (e.g Jan, Ιαν)
- Full (e.g January, Ιανουαρίου)
- Standalone (e.g January, Ιανουάριος)

Many languages use different case endings depending on whether the month
is referenced as a standalone word (nominative case), or in date context
(genitive, partitive, etc.). sort(1)'s -M option currently sorts months
by testing input against only the abbrevation format, which is
essentially a substring of the full format. While this works fine for
languages like English, where there are no cases, for languages where
there is a different case ending between the abbreviation/full and
standalone formats, it is not sufficient.

For example, in Greek, "May" can take the following forms:

Abbreviation: Μαΐ (genitive case)
Full: Μαΐου (genitive case)
Standalone: Μάιος (nominative case)

If we use the standalone format in Greek, sort(1) will not able to match
"Μαΐ" to "Μάιος" and the sort will fail.

This change makes sort(1) test against all three formats. It also works
when the input contains mixed formats.

[1] https://cldr.unicode.org/translation/date-time/date-time-patterns

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D42847
2023-12-01 02:30:10 +02:00
Warner Losh 5e3934b15a usr.bin: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:01 -07:00
Warner Losh bdcbfde31e usr.bin: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Warner Losh b2c76c41be Remove $FreeBSD$: one-line nroff pattern
Remove /^\.\\"\s*\$FreeBSD\$$\n/
2023-08-16 11:55:15 -06:00
Warner Losh d0b2dbfa0e Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:55:03 -06:00
Warner Losh 1d386b48a5 Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:42 -06:00
Warner Losh 2a63c3be15 Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:29 -06:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Baptiste Daroussin 226e41467e sort: deindent file_reader_free and cleanup its usage 2022-10-13 10:51:17 +02:00
Baptiste Daroussin ffd41d39c6 sort: simplify file_reader_clean
Deindent the function, remove useless tests:
 - free already test if argument is NULL
 - closefile already test if the input is stdin or null
2022-10-13 10:42:23 +02:00
Baptiste Daroussin f9d9a7cc4f sort: deindent closefile 2022-10-13 10:38:12 +02:00
Baptiste Daroussin 48a53cc484 sort: use asprintf(3) instead of malloc + snprintf(3) 2022-10-13 10:34:57 +02:00
Baptiste Daroussin 958b0d4642 sort: deindent openfile 2022-10-13 10:31:34 +02:00
Baptiste Daroussin f079ef8aa4 sort: simplify the code to handle -z flag 2022-10-13 10:24:11 +02:00
Baptiste Daroussin 4d4fcf619e sort: cleanup now unused structutre and prototypes 2022-10-13 10:24:11 +02:00
Baptiste Daroussin 8b9071360a sort: unify the code to read from FILE *
Previously the code to read from a local file or stdin was sperarated
After the change to remove the home made line reader used for stdin
(replaced by getdelim) it apprears that the rest of the code which is
used to read from any FILE * but stdin can benefit from the exact same
change.
2022-10-13 10:24:11 +02:00
Baptiste Daroussin e8815fb30b sort: remove unused function 2022-10-13 10:24:11 +02:00
Baptiste Daroussin f02c783757 sort: use memset to initialize structure when possible 2022-10-13 10:24:11 +02:00
Baptiste Daroussin 3f9e5e59bd sort: use mkstemp(3) instead of reinventing it
MFC After:	1 week
2022-10-12 18:01:57 +02:00
Baptiste Daroussin b58094c0d9 sort: replace home made line reader by getdelim(3)
The previous code had bug when reading lines with an unexpected
encoding, returning without the full line being captured.
This result in sort complaining with "sort: Illegal byte sequence"

Using getdelim(3) instead of the home made code, fixes the situation.

PR:		241679
Reported by:	Ronald F. Guilmette <rfg-freebsd@tristatelogic.com>
MFC After:	1 week
Reviewed by:	markj, imp
Differential Revision:	https://reviews.freebsd.org/D36948
2022-10-12 17:37:33 +02:00
Baptiste Daroussin ed990a7a2f sort: remove NLS support
NLS support for sort(1) is:
1/ incomplete: many error string are not using nls
2/ only covers hu_HU.ISO8859-2
2022-10-12 16:24:29 +02:00
Baptiste Daroussin ecc3c29167 sort: replace malloc+memset with calloc 2022-10-12 16:12:04 +02:00
Baptiste Daroussin a312f3e742 sort: add wrapper around calloc 2022-10-12 16:12:04 +02:00
Doug Rabson 0c19c4db74 Move sort to runtime
Allows pkg bootstrap without having to install FreeBSD-utilities
2022-07-29 11:27:25 +01:00
Mark Johnston 8d8b9b560a sort: Fix message catalogue usage
- Check that catopen() succeeded before calling catclose().  musl will
  crash in the latter if the catalogue descriptor is -1.
- Keep the message catalogue open for most of sort(1)'s actual
  operation.
- Don't use catgets(3) to print error messages if catopen(3) had failed.

Reviewed by:	arichardson, emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34081
2022-01-28 16:52:29 -05:00
Mark Johnston e9bfb50d5e sort: Fix random sort
bwsrawdata() is supposed to return the string buffer.

PR:		259451
Reported by:	sigsys@gmail.com
Fixes:		d053fb22f6 ("usr.bin/sort: Avoid UBSan errors")
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-10-29 14:29:50 -04:00
Alex Richardson d053fb22f6 usr.bin/sort: Avoid UBSan errors
UBSan complains about out-of-bounds accesses for zero-length arrays. To
avoid this we can use flexible array members. However, the C standard does
not allow for structures that only contain flexible array members, so we
move the length parameters into that structure too.

Split out from D28233.

Reviewed By:	markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D31009
2021-07-06 10:51:05 +01:00
Cyril Zhang 68d3790ba0 sort: Change default algorithm to mergesort
This results in a significant improvement in the runtime of sort(1) when
radix sort cannot be used.  This comes at the expense of increased
memory usage, but this is small relative to sort's overall memory usage.

PR:		255551
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30319
2021-06-17 13:53:03 -04:00
Mark Johnston 186ba88a7c sort: Hook NetBSD tests up to the build
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-13 09:34:01 -04:00
Cyril Zhang 71ec05a212 sort: Cache value of MB_CUR_MAX
Every usage of MB_CUR_MAX results in a call to __mb_cur_max.  This is
inefficient and redundant.  Caching the value of MB_CUR_MAX in a global
variable removes these calls and speeds up the runtime of sort.  For
numeric sorting, runtime is almost halved in some tests.

PR:		255551
PR:		255840
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30170
2021-05-13 09:33:19 -04:00
Cyril Zhang fa43162c63 sort: Stop "fixing" obsolete key syntax after -- flag
PR:		255798
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30234
2021-05-13 09:33:19 -04:00
Alex Richardson 266b51ac6e Fix -Wpointer-sign warnings in bwstring.c 2020-09-10 15:37:19 +00:00
Gordon Bergling 2d955e4199 sort(1): Remove duplicate option check
Reviewed by:	lwhsu, emaste
Approved by:	emaste
Obtained from:	DragonFlyBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23892
2020-09-08 15:01:49 +00:00
Conrad Meyer 9f7e5bdad1 sort(1): Fix two wchar-related bugs in radixsort
Sort(1)'s radixsort implementation was broken for multibyte LC_CTYPEs in at
least two ways:

  * In actual radix sort, it would only bucket the least significant
    byte from each wchar, ignoring the 24 most-significant bits of each
    unicode character.

  * In degenerate cases / "fast paths," it would fall back to another
    sorting algorithm (default: mergesort) with a bogus comparator
    offset.  The string comparison functions in sort(1) take an offset
    in units of the operating character size.  However, radixsort was
    passing an offset in units of bytes.  The byte offset must be
    divided by sizeof(wchar_t).

This revision addresses both discovered issues.

Some example testcases:

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=C LC_COLLATE=C LANG=C           sort --radixsort --debug

  $ (for i in $(jot 34); do echo 耳耳耳耳耳; echo 耳耳耳耳脳; echo 耳耳耳耳脴; done) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

PR:		247494
Reported by:	knu
MFC after:	I do not intend to, but parties interested in stable might want to
2020-06-23 16:43:48 +00:00
Simon J. Gerraty 2c9a9dfc18 Update Makefile.depend files
Update a bunch of Makefile.depend files as
a result of adding Makefile.depend.options files

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22494
2019-12-11 17:37:53 +00:00
Simon J. Gerraty 5ab1c5846f Add Makefile.depend.options
Leaf directories that have dependencies impacted
by options need a Makefile.depend.options file
to avoid churn in Makefile.depend

DIRDEPS for cases such as OPENSSL, TCP_WRAPPERS etc
can be set in local.dirdeps-options.mk
which can add to those set in Makefile.depend.options

See share/mk/dirdeps-options.mk

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22469
2019-12-11 17:37:37 +00:00
Sevan Janiyan 08509077b3 Adjust history, info source from v1's manuals
https://www.bell-labs.com/usr/dmr/www/1stEdman.html

MFC after:	5 days
2019-09-04 13:44:46 +00:00
Conrad Meyer f20b149b45 sort(1): Memoize MD5 computation to reduce repeated computation
Experimentally, reduces sort -R time of a 148160 line corpus from about
3.15s to about 0.93s on this particular system.

There's probably room for improvement using some digest other than md5, but
I don't want to look at sort(1) anymore.  Some discussion of other possible
improvements in the Test Plan section of the Differential.

PR:		230792
Reviewed by:	jhb (earlier version)
Differential Revision:	https://reviews.freebsd.org/D19885
2019-04-13 04:42:17 +00:00
Conrad Meyer 7a590a370a sort(1): Simplify and bound random seeding
Bound input file processing length to avoid the issue reported in [1].  For
simplicity, only allow regular file and character device inputs.  For
character devices, only allow /dev/random (and /dev/urandom symblink).

32 bytes of random is perfectly sufficient to seed MD5; we don't need any
more.  Users that want to use large files as seeds are encouraged to truncate
those files down to an appropriate input file via tools like sha256(1).

(This does not change the sort algorithm of sort -R.)

[1]: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-August/053152.html

PR:		230792
Reported by:	Ali Abdallah <aliovx AT gmail.com>
Relnotes:	yes
2019-04-11 05:08:49 +00:00
Conrad Meyer 74504eefa1 sort(1): Whitespace and style cleanup
No functional change.

Sponsored by:	Dell EMC Isilon
2019-04-11 00:39:06 +00:00
Conrad Meyer fff4eaebbf sort(1): randomcoll: Skip the memory allocation entirely
There's no reason to order based on strcmp of ASCII digests instead of
memcmp of the raw digests.

While here, remove collision fallback.  If you collide two MD5s, they're
probably the same string anyway.  If robustness against MD5 collisions is
desired, maybe we shouldn't use MD5.

None of the behavior of sort -R is specified by POSIX, so we're free to
implement this however we like.  E.g., using a 128-bit counter and block cipher
to generate unique indices for each line of input.

PR:		230792 (2/many)
Relnotes:	This will change the sort order for a given dataset with a
		given seed.  Other similarly breaking changes are planned.
Sponsored by:	Dell EMC Isilon
2019-04-04 23:32:27 +00:00
Conrad Meyer e667e2a480 sort(1): randomcoll: Don't sort on ENOMEM
PR:		230792 (1/many)
Sponsored by:	Dell EMC Isilon
2019-04-04 20:27:13 +00:00
Alex Richardson 101db63b42 Don't use absolute path to sed when building usr.bin/join
This is required to build sort on Linux hosts since sed is in /bin there.

Approved By:	jhb (mentor)
2018-08-23 18:18:43 +00:00
Kyle Evans 7137597e15 sort(1): Fix -m when only implicit stdin is used for input
Observe:

printf "a\nb\nc\n" > /tmp/foo
# Next command results in no output
cat /tmp/foo | sort -m
# Next command results in proper output
cat /tmp/foo | sort -m -
# Also works:
sort -m /tmp/foo

Some const'ification was done to simplify the actual solution of adding "-"
explicitly to the file list if we didn't have any file arguments left over.

PR:		190099
MFC after:	1 week
2018-06-20 03:31:19 +00:00
Kyle Evans 36180cd53d sort(1): Add bits to allow easy checking against NetBSD tests
I'm looking at sort(1) failures, for better or worse.
2018-06-20 03:10:49 +00:00
Mark Johnston 7f180c0f80 Fix the WITH_SORT_THREADS build.
PR:		201664
MFC after:	1 week
2018-02-07 20:36:37 +00:00
Pedro F. Giffuni 1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
Bryan Drewery ea825d0274 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:07:04 +00:00
Pedro F. Giffuni 759a9a9d24 sort(1): Remove unneeded initializations.
Found by:	Clang static analyzer
2017-02-17 19:53:20 +00:00