freebsd-src/FIXES

/****************************************************************
Copyright (C) Lucent Technologies 1997
All Rights Reserved

Permission to use, copy, modify, and distribute this software and
its documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appear in all
copies and that both that the copyright notice and this
permission notice and warranty disclaimer appear in supporting
documentation, and that the name Lucent Technologies or any of
its entities not be used in advertising or publicity pertaining
to distribution of the software without specific, written prior
permission.

LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
THIS SOFTWARE.
****************************************************************/

This file lists all bug fixes, changes, etc., made since the
second edition of the AWK book was published in September 2023.

Oct 30, 2023:
	multiple fixes and a minor code cleanup.
	disabled utf-8 for non-multibyte locales, such as C or POSIX.
	fixed a bad char * cast that causes incorrect results on big-endian
	systems. also fixed an out-of-bounds read for empty CCL.
	fixed a buffer overflow in substr with utf-8 strings.
	many thanks to Todd C Miller.


Sep 24, 2023:
	fnematch and getrune have been overhauled to solve issues around
	unicode FS and RS. also fixed gsub null match issue with unicode.
	big thanks to Arnold Robbins.

Sep 12, 2023:
	Fixed a length error in u8_byte2char that set RSTART to
	incorrect (cannot happen) value for EOL match(str, /$/).


-----------------------------------------------------------------

[This entry is a summary, not a precise list of changes.]

	Added --csv option to enable processing of comma-separated
	values inputs.  When --csv is enabled, fields are separated
	by commas, fields may be quoted with " double quotes, fields
	may contain embedded newlines.

	If no explicit separator argument is provided, split() uses
	the setting of --csv to determine how fields are split.

	Strings may now contain UTF-8 code points (not necessarily
	characters).  Functions that operate on characters, like
	length, substr, index, match, etc., use UTF-8, so the length
	of a string of 3 emojis is 3, not 12 as it would be if bytes
	were counted.

	Regular expressions are processes as UTF-8.

	Unicode literals can be written as \u followed by one
	to eight hexadecimal digits.  These may appear in strings and
	regular expressions.