Commit graph

90 commits

Author SHA1 Message Date
Warner Losh eb690a0576 awk: Merge in bsd-feature branch of OTA from 20240422 (31bb33a32f71)
In the last 2nd edition import, I mistakenly grabbed from the 'main'
branch of upstream rather than the bsd-feature branch. This means that
we have a regression in awk from that point forward: all the
BSD-specific bit functions (and a few others) were dropped. This
restores it at the same level.

MFC After:		1 day
Sponsored by:		Netflix
2024-05-14 12:17:55 -06:00
Warner Losh 1023317ac4 ota: Merge one true awk 20240422 (a3b68e649d2d)
Apr 22, 2024:
	fixed regex engine gototab reallocation issue that was
	introduced during the Nov 24 rewrite. Thanks to Arnold Robbins.
	Fixed a scan bug in split in the case the separator is a single
	character. thanks to Oguz Ismail for spotting the issue.

Mar 10, 2024:
	fixed use-after-free bug in fnematch due to adjbuf invalidating
	the pointers to buf. thanks to github user caffe3 for spotting
	the issue and providing a fix, and to Miguel Pineiro Jr.
	for the alternative fix.
	MAX_UTF_BYTES in fnematch has been replaced with awk_mb_cur_max.
	thanks to Miguel Pineiro Jr.

Sponsored by:		Netflix
2024-05-04 15:50:33 -06:00
Warner Losh ba7b7f94c2 awk: Fix the tests
I'd forgotten that we have to adjust the stderr tests from
upstream. Remove the OK files. Also remove system-status.*.  These
restore the fixes I made in 517e52b6c2 which were lost when I imported
the last version of awk.

Also, force LANG to be C.UTF-8 when testing to ensure that stray lang
settings don't fail tests.

Sponsored by:		Netflix
2024-03-07 22:52:56 -07:00
Warner Losh f32a6403d3 Merge one true awk from 2024-01-22 for the Awk Second Edition support
This brings in Unicode support, CSV support and a number of bug fixes.
They are described in _The AWK Programming Language_, Second Edition, by
Al Aho, Brian Kernighan, and Peter Weinberger (Addison-Wesley, 2024,
ISBN-13 978-0138269722, ISBN-10 0138269726).

Sponsored by:		Netflix
2024-02-29 10:42:06 -07:00
Warner Losh b2376a5f1e Revert "awk: Merge upstream 2nd Edition Awk Book"
The pre-push testing I did turned out to be testing the old version with
the old testsuite (for reasons I don't understnad). There's issues with
the new version, the new test in the suite or (likely) both. Revert
until they can be chased down.

This should also fix the github CI that's gone red since this commit.

This reverts commit 3fd60a6b73, reversing
changes made to 194df014fe.

Sponsored by:		Netflix
2023-11-15 15:28:05 -07:00
Warner Losh 3fd60a6b73 awk: Merge upstream 2nd Edition Awk Book
Merge in the November 2nd, 2023 version of one true awk.

This brings in Unicode support, CSV support and a number of bug fixes.

Sponsored by:		Netflix
Reviewed by:		delphij
Differential Revision:	https://reviews.freebsd.org/D42447
2023-11-13 21:49:34 -07:00
Ed Maste 5dbd073b04 awk: errror on printf format strings lacking conversion specifier
Reported by:	phk
Reviewed by:	imp, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39573
2023-04-14 13:31:02 -04:00
Warner Losh 3fe0a5d2f6 Awk: Add error file
Add the expected output on stderr file.

Sponsored by:		Netflix
2021-11-06 16:24:36 -06:00
Warner Losh 517e52b6c2 awk: Move to using two sets of tests
Upstream one-true-awk has two sets of tests. These are in addition to
NetBSD's tests we're using. The 'bugs-fixed' tests from upstream are
ready to use as-is (more or less). However, the 'tests' from upstream
are not, so for now we'll just use the netbsd and bugs-fixed tests.
They provide an OK workout and are better than nothing, though the tests
themselves are for specific esoteric things.

The upstream bugs-fixed tests are *ALMOST* a drop in. However, 3 test
for errors and the upstream test jig mashes stdout and stderr together,
which atf doesn't do, so make a tiny tweak to the upstream tests that I
hope to upstream. Plus upstream has ../a.out: instead of awk: in the
output. Not sure how to deal with this yet, so I've not proposed
anything upstream and have changed the test locally.

In addition, the system-status.awk test is not suitable to run in ATF.
It wants to force sh to dump core, but kyua doesn't seem to allow that
sometimes so the test will fail or pass based on whether or not a core
dump can be created. Since it's unstable, remove it.

This required moving the netbsd tests to a new direcotry, so update
mtree files as well. The change is useless for 'make check' without it.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31376
2021-11-05 08:53:36 -06:00
Warner Losh fd2a4a31d9 awk: document updating
Fill in all the details to the standard process so they are hand in one
place and don't need to be re-remembered or rediscovered for the next
import.

Sponsored by:		Netflix
2021-08-01 11:31:50 -06:00
Warner Losh 23f24377b1 awk: Merge 20210729 from One True Awk upstream (0592de4a)
July 27, 2021:
	As per IEEE Std 1003.1-2008, -F "str" is now consistent with
	-v FS="str" when str is null. Thanks to Warner Losh.

July 24, 2021:
	Fix readrec's definition of a record. This fixes an issue
	with NetBSD's RS regular expression support that can cause
	an infinite read loop. Thanks to Miguel Pineiro Jr.

	Fix regular expression RS ^-anchoring. RS ^-anchoring needs to
	know if it is reading the first record of a file. This change
	restores a missing line that was overlooked when porting NetBSD's
	RS regex functionality. Thanks to Miguel Pineiro Jr.

	Fix size computation in replace_repeat() for special case
	REPEAT_WITH_Q. Thanks to Todd C. Miller.

Also, included the tests from upstream, though they aren't yet connected
to the tree.

Sponsored by:		Netflix
2021-08-01 10:22:39 -06:00
Warner Losh 4e52f5db35 awk: Flag -Ft as deprecated behavior
Upstream is poised to deprecate the -Ft wart in one true awk. None of
the other awks do this, and the gawk maintainer says that he's had no
requests for it in gawk in 30 years maintaining it. github can find a
few instances of it in the wild. As such, warn that it's deprecated and
will go away in the future.

MFC After:		3 days
Sponsored by:		Netflix
2021-07-30 23:33:37 -06:00
Warner Losh a2e3e11873 awk: Make -F '' and -v FS="" behave the same
IEEE Std 1003.1-2008 mandates that -F str be treated the same as -v
FS=str. For a null string, this was not the case. Since awk(1) documents
that a null string for FS has a specific behavior, make -F '' behave
consistently with -v FS="".

PR:			241441
Upstream issue:		https://github.com/onetrueawk/awk/issues/127
Upstream pull request:	https://github.com/onetrueawk/awk/pull/128
MFC After:		2 weeks
Sponsored by:		Netflix
2021-07-24 09:08:16 -06:00
Warner Losh 5ab82b00cc awk: Remove last markings we have on awk
We normally don't add $FreeBSD$ to contrib software. However, these
changes date back to the CVS era of source code management and have been
overlooked. Now that all these files are back to the same as the
upstream bsd-features branch, remove the FreeBSD specific changes, which
are now just $FreeBSD$ and the (FreeBSD) in the version string.

MFC After:		2 weeks
Sponsored by:		Netflix
2021-07-21 20:24:57 -06:00
Warner Losh 628bd30ab5 awk: revert to upstream behavior for ranges for gawk compatibility
In 2005, FreeBSD changed one-true-awk to honor the locale's collating
order. This was billed as a temporary patch. It was also compatible with
the then-current behavior of gawk. That temporary patch has lasted 16
years now.

However, IEEE Std 1003.1-2008 changed the behaivor of ranges in regular
expressions outside of the "C" and "POSIX" locales to be undefined.

Starting in 2011, gawk 4.0 stopped using the locale for the range
regular expressions and used the traditional behavior only. The
maintainer had grown weary of answering why '[A-Z]' would sometimes
match lower-case expressions. The details about are explained here:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html

To restore compatibility with other implementaitons of awk, revert this
patch. FreeBSD is the odd-system out. It also has the nice side effect
of eliminating the last of our differences with upstream one-true-awk.

Reviewed by:		cy, rgrimes
MFC After:		2 weeks
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31114
2021-07-21 20:22:43 -06:00
Warner Losh 2929813c4f Revert "awk: Issue a warning for old hex behavior."
This reverts commit acf9cf323f. It warns
about too many false positive cases.

Sponsored by:		Netflix
2021-07-21 20:17:50 -06:00
Warner Losh acf9cf323f awk: Issue a warning for old hex behavior.
Since FreeBSD has allowed "0x" hex strings to be converted to integers
for a long time, and since upstream has killed that behavior, warn about
this issue. This will allow us to deprecate this behavior for 14.0 while
giving our users of 12.x and 13.x fair warning.

Sponsored by:		Netflix
2021-07-21 20:03:35 -06:00
Warner Losh 0c92d88c91 awk: remove proctab.c
proctab.c is a generated file and never should have been committed to
the tree. This file has been added and removed a couple of times, most
recently added by me in my 2019 updates.

Sponsored by:		Netflix
2021-07-19 22:34:37 -06:00
Warner Losh d4d252c499 awk: revert upstream's attempt to disallow hex strings
Upstream one-true-awk decided to disallow hex strings as numbers. This
is in line with awk's behavior prior to C99, and allowed by the POSIX
standard. The standard, however, allows them to be treated as numbers
because that's what the standard said in the 2001 through 2004 editions.
Since 2001, the nawk in FreeBSD has treated them as numbers, so restore
that behavior, allowed by the standard.

A number of scripts in the FreeBSD tree depend on this interpretation,
including scripts to build the kernel which had mysteriously started
failing for some people and not others. By re-allowing 0x hex numbers,
this fixes those scripts and restores POLA.

Upstream issue:		https://github.com/onetrueawk/awk/issues/126
Sponsored by:		Netflix
Reviewed by:		kevans
MFC After:		asap due to regression alrady merged to stable
Differential Revision:	https://reviews.freebsd.org/D31199
2021-07-15 17:08:03 -06:00
Warner Losh f68a53dba9 awk: Reduce diffs with upstream to almost nothing.
In the merge of 20210215, I left two merge conflicts #if 0'd by mistake
to check later rather than resolve them as part of the merge.  This code
turns out to be from the original one-true-awk import and not FreeBSD
specific, so remove them.

Remove a extra definition of HAT.

Remove a stylistic change that also appears to be a mismerge along the
way.

Remove FREEBSD-upgrade. Nobody has updated it since the original 2007
cvs import. It talks about old CVS branches that never made it into svn,
let alone git. New imports will follow the standard practices now, so
there's nothing left to document.

Move README to README.md and copy the README.md from upstream over.

This leaves just the $FreeBSD$ lines (which remain for the stable/12
merge) and the strcoll part of ru@'s r201989/d98dd8e5f94c as the only
diffs with upstream. FreeBSD also still has its own man page, which I
don't plan on changing. Once this commit is merged to stable/12, I plan
no further merges to stable/12. Sometime after that I'll remove the
$FreeBSD$ lines to reduce the diffs even more (though i want to make
sure plans won't change first). I also plan to talk to upstream about
this change...

MFC After:		2 weeks
Sponsored by:		Netflix
2021-07-08 23:05:13 -06:00
Warner Losh f39dd6a978 one-true-awk: import 20210221 (1e4bc42c53a1) which fixes a number of bugs
Import the latest bsd-features branch of the one-true-awk upstream:

o Move to bison for $YACC
o Set close-on-exec flag for file and pipe redirects that aren't std*
o lots of little fixes to modernize ocde base
o free sval member before setting it
o fix a bug where a{0,3} could match aaaa
o pull in systime and strftime from NetBSD awk
o pull in fixes from {Net,Free,Open}BSD (normalized our code with them)
o add BSD extensions and, or, xor, compl, lsheift, rshift (mostly a nop)

Also revert a few of the trivial FreeBSD changes that were done slightly
differently in the upstreaming process. Also, our PR database may have
been mined by upstream for these fixes, and Mikolaj Golub may deserve
credit for some of the fixes in this update.

Suggested by:		Mikolaj Golub <to.my.trociny@gmail.com>
PR:			143363,143365,143368,143369,143373,143375,214782
Sponsored by:		Netflix
2021-07-07 19:25:43 -06:00
Alex Richardson 1116946093 Fix another UBSan error in awk
This applies my upstreamed fix: ad9bd2f40a
Found By:	UBSan
2020-09-21 19:03:12 +00:00
Alex Richardson ae692c42cb awk: Fix subobject out-of-bounds access
When matching a regex with ^, it would attempt to access
gototab[NSTATES][NCHARS+2], and therefore access the state for the \002
character instead. This change is required to run awk under CHERI (with
sub-object bounds) and when running with UBSan instrumentation.

This was committed upstream as cbf924342b

Found by:	CHERI (with subobject bounds enabled)
Obtained from:	CheriBSD
Reviewed By:	imp
Differential Revision: https://reviews.freebsd.org/D26509
2020-09-21 19:03:07 +00:00
Warner Losh d9e8cf281b Another partial revert of r301289.
In this case, a change was made in one-true-awk from *FS to
getsval(fsloc) in a line just after one of the lines that had the 0 ->
NULL change. It works both ways as far as I can tell.  It looks like a
bug fix, but I've not tried to track down which ancient version of
one-true-awk it was in (github starts too late for tracking this
down). Before and after the changes the regression suite is passes
100% relative to the un-modified one-true-awk.
2019-06-03 05:25:22 +00:00
Warner Losh 31d232c2a3 Fix mismerge that crept into r301289.
The conversion of 0 -> NULL required a rebase at some point, as noted
in r301289 when pfg commited it. In that rebase, three lines remained
that had been removed in a prior version of awk, and one of them had a
0 -> NULL change causing a conflict. The conflict should have been
resolved by removing the three lines, but wasn't. This introduces a
regression into f.split3 test which prior to this commit we were
failing, but a pure onetrueawk wasn't. Remove the offending 3 lines.
2019-06-03 05:25:16 +00:00
Warner Losh adb46ac4c0 Revert r348518
It should not have happened. The change is actually in upstream and I misread the diffs.
2019-06-02 20:52:21 +00:00
Warner Losh 2675e1b91d Reapply r301691:
Revert r301689 - one-true-awk: Avoid a NULL dereference.

I got this wrong and the coverity report doesn't match the NetBSD change,
which was thought for a different version.

The change wouldn't hurt but let's wait until upstream figures this out.
2019-06-02 20:47:15 +00:00
Warner Losh 06d1e65393 Reapply r315426 by pfg:
|    MFV r315425: one-true-awk: have calloc(3) do the multiplication.
2019-06-02 16:30:53 +00:00
Warner Losh 10ce5b990f Reapply r301289 by pfg:
|    MFV r300961: one-true-awk: replace 0 with NULL for pointers
|    Also remove a redundant semicolon.
|    Also had to rebase on upstream pull.
2019-06-02 16:28:20 +00:00
Warner Losh b525355729 Merge from upstream at 4189ef5d from https://github.com/onetrueawk/awk.git
Note: this backs out a number of changes we've made to awk because
they aren't upstream, but are on the vendor branch. Those will be
reapplied. svn makes it needlessly difficult to know which ones, but
at least r315426, r301289, and maybe r301691, though there may be
others too. None of these are critical, so bisecting through this
point is safe for all but awk regression tests :).
2019-06-02 16:25:07 +00:00
Devin Teske e0ff4751f0 Update awk(1) manual to state an exception to egrep(1)-like RE syntax
Reviewed by:	imp, jmg
MFC after:	3 days
Sponsored by:	Smule, Inc.
Differential Revision:	https://reviews.freebsd.org/D17739
2018-11-02 23:03:40 +00:00
Warner Losh d12420d872 Don't display empty error context.
Context extraction didn't handle this case and showed uninitialized memory.

Obtained from: OpenBSD lib.c 1.21
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D12379
2017-09-24 05:04:06 +00:00
Warner Losh 8e537f8ae0 Fix %c for floating values that become 0 when coerced to int.
Obtained from: OpenBSD run.c 1.36 (From Jeremy Devenport)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D12379
2017-09-24 05:04:02 +00:00
Warner Losh 547f34cace Fix uninitialized variable
echo | awk 'BEGIN {i=$1; print i}' prints a boatload of stack
garbage. NUL terminate the memory returned from malloc to prevent it.

Obtained from: OpenBSD run.c 1.40
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D12379
2017-09-24 05:03:57 +00:00
Warner Losh 73f7ff91b2 Implement gawk multiple-arg extension to and, or, and xor.
gawk allows multiple arguemnts to bit-wiste and, or and xor
functions. Implement an arbitrary number of arguments for these
functions. Also, use NULL in preference to 0 to match rest of file.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D12361
2017-09-14 05:48:23 +00:00
Warner Losh 69679fc10f Bring in bit operation functions, ala gawk.
These are from OpenBSD:
>>> Extend awk with bitwise operations. This is an extension to the awk
>>> spec and documented as such, but comes in handy from time to time.
>>> The prototypes make it compatible with a similar GNU awk extension.
>>>
>>> ok millert@, enthusiasm from deraadt@

Edited to fix cut and paste in error messages, as well as
using tabs instead of spaces after #defines added.

Obtained From: OpenBSD awk.h 1.12, lex.c 1.10, run.c 1.29
Differential Revision: https://reviews.freebsd.org/D12361
Sponsored by: Netflix
2017-09-14 05:47:55 +00:00
Pedro F. Giffuni 6c10e0ba0b MFV r315425:
one-true-awk: have calloc(3) do the multiplication.

MFC after:	3 days
2017-03-16 21:32:05 +00:00
Andrey A. Chernov cd3912b6be The bug:
$ echo x | awk '/[[:cntrl:]]/'
x

The NUL character in cntrl class truncates the pattern, and an empty
pattern matches anything. The patch skips NUL as a quick fix.

PR:     195792
Submitted by:   kdrakehp@zoho.com
Approved by:    bwk@cs.princeton.edu (the author)
MFC after:      3 days
2016-09-03 23:04:56 +00:00
Andrey A. Chernov 6c2a17d0b5 Back out non-collating [a-z] ranges.
Instead of changing the whole course to another POSIX-permitted way
for consistency and uniformity I decide to completely ignore missing
regex fucntionality and focus on fixing bugs in what we have now,
too many small obstacles we have choicing other way, counting ports.
Corresponding libc changes are backed out in r302824.
2016-07-14 09:31:52 +00:00
Andrey A. Chernov 1d148a7c3f After removing collation for [a-z] ranges in r302512, do it here too.
I'll try to keep the change very minimal to not touch contribed code much.
I'll send it upstream when it will be merged to main branches,
but we need the change right now here.
2016-07-13 10:01:31 +00:00
Pedro F. Giffuni c8b6d1e472 Revert r301689 - one-true-awk: Avoid a NULL dereference.
I got this wrong and the coverity report doesn't match the NetBSD change,
which was thought for a different version.

The change wouldn't hurt but let's wait until upstream figures this out.
2016-06-08 19:39:44 +00:00
Pedro F. Giffuni 17ce5a9b90 one-true-awk: Avoid a NULL dereference.
CID:		270862
Obtained from:	NetBSD (CVS Rev. 1.8)
MFC after:	2 weeks
2016-06-08 19:24:48 +00:00
Pedro F. Giffuni 9051825205 MFV r300961:
one-true-awk: replace 0 with NULL for pointers

Also remove a redundant semicolon.
2016-06-03 21:23:11 +00:00
Pedro F. Giffuni a4b2ac79e4 awk: Use random(3) instead of rand(3)
While none of them is considered even near to cryptographic
level, random(3) is a better random generator than rand(3).

Use random(3) for awk as is done in other systems.

Thanks to Chenguang Li for discussing this in the lists
and submitting the patch upstream.

PR:		193147
MFC after:	5 weeks
2014-09-19 18:24:02 +00:00
Xin LI 0840e960f9 MFV: one-true-awk 20121220.
MFC after:	1 month
2013-01-03 07:25:30 +00:00
Ruslan Ermilov aa0da2e494 - Merged awk upstream that includes a fix for a bug exposed by kmod_syms.mk.
- Provide a build aid for those who already have a buggy awk(1) installed.

Approved by:	re (kib)
2011-08-11 10:29:10 +00:00
Ruslan Ermilov d86a0988d2 Update to a 7-Aug-2011 release.
Approved by:	re (kib)
2011-08-09 12:54:43 +00:00
Ruslan Ermilov b40501fb67 Update to a 6-May-2011 release (upstreamed some of our changes). 2011-05-06 14:21:46 +00:00
Ruslan Ermilov 1b11b78377 Update to a 1-May-2011 release (except for the isblank change). 2011-05-03 11:47:19 +00:00
Ruslan Ermilov d98dd8e5f9 Apply patches directly to sources. Their effect is as follows:
- Make one-true-awk respect locale's collating order in [a-z]
  bracket expressions, until a more complete fix (like handing
  BREs) is ready.

- Don't require a space between -[fv] and its argument.
2010-01-10 08:02:07 +00:00