Commit graph

7 commits

Author SHA1 Message Date
Nguyễn Thái Ngọc Duy 9038531f1b t/helper: merge test-regex into test-tool
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-27 08:45:47 -07:00
Ævar Arnfjörð Bjarmason e01b4dab01 grep: change non-ASCII -i test to stop using --debug
Change a non-ASCII case-insensitive test case to stop using --debug,
and instead simply test for the expected results.

The test coverage remains the same with this change, but the test
won't break due to internal refactoring.

This test was added in commit 793dc676e0 ("grep/icase: avoid kwsset
when -F is specified", 2016-06-25). It was asserting that the regex
must be compiled with compile_fixed_regexp(), instead test for the
expected results, allowing the underlying implementation to change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-21 08:25:37 +09:00
Ævar Arnfjörð Bjarmason 3eb585c112 test-lib: rename the LIBPCRE prerequisite to PCRE
Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
libpcre2 support, where having just "LIBPCRE" would be confusing as it
implies v1 of the library.

None of these tests are incompatible between versions 1 & 2 of
libpcre, it's less confusing to give them a more general name to make
it clear that they work on both library versions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-21 08:25:37 +09:00
Nguyễn Thái Ngọc Duy b51a9c1479 diffcore-pickaxe: support case insensitive match on non-ascii
Similar to the "grep -F -i" case, we can't use kws on icase search
outside ascii range, so we quote the string and pass it to regcomp as
a basic regexp and let regex engine deal with case sensitivity.

The new test is put in t7812 instead of t4209-log-pickaxe because
lib-gettext.sh might cause problems elsewhere, probably.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy 18547aacf5 grep/pcre: support utf-8
In the previous change in this function, we add locale support for
single-byte encodings only. It looks like pcre only supports utf-* as
multibyte encodings, the others are left in the cold (which is
fine).

We need to enable PCRE_UTF8 so pcre can find character boundary
correctly. It's needed for case folding (when --ignore-case is used)
or '*', '+' or similar syntax is used.

The "has_non_ascii()" check is to be on the conservative side. If
there's non-ascii in the pattern, the searched content could still be
in utf-8, but we can treat it just like a byte stream and everything
should work. If we force utf-8 based on locale only and pcre validates
utf-8 and the file content is in non-utf8 encoding, things break.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy 793dc676e0 grep/icase: avoid kwsset when -F is specified
Similar to the previous commit, we can't use kws on icase search
outside ascii range. But we can't simply pass the pattern to
regcomp/pcre like the previous commit because it may contain regex
special characters, so we need to quote the regex first.

To avoid misquote traps that could lead to undefined behavior, we
always stick to basic regex engine in this case. We don't need fancy
features for grepping a literal string anyway.

basic_regex_quote_buf() assumes that if the pattern is in a multibyte
encoding, ascii chars must be unambiguously encoded as single
bytes. This is true at least for UTF-8. For others, let's wait until
people yell up. Chances are nobody uses multibyte, non utf-8 charsets
anymore.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:30 -07:00
Nguyễn Thái Ngọc Duy 5c1ebcca4d grep/icase: avoid kwsset on literal non-ascii strings
When we detect the pattern is just a literal string, we avoid heavy
regex engine and use fast substring search implemented in kwsset.c.
But kws uses git-ctype which is locale-independent so it does not know
how to fold case properly outside ascii range. Let regcomp or pcre
take care of this case instead. Slower, but accurate.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27 07:31:35 -07:00