development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 04:01:21 +00:00

Author	SHA1	Message	Date
Ævar Arnfjörð Bjarmason	e01b4dab01	grep: change non-ASCII -i test to stop using --debug Change a non-ASCII case-insensitive test case to stop using --debug, and instead simply test for the expected results. The test coverage remains the same with this change, but the test won't break due to internal refactoring. This test was added in commit `793dc676e0` ("grep/icase: avoid kwsset when -F is specified", 2016-06-25). It was asserting that the regex must be compiled with compile_fixed_regexp(), instead test for the expected results, allowing the underlying implementation to change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-21 08:25:37 +09:00
Ævar Arnfjörð Bjarmason	3eb585c112	test-lib: rename the LIBPCRE prerequisite to PCRE Rename the LIBPCRE prerequisite to PCRE. This is for preparation for libpcre2 support, where having just "LIBPCRE" would be confusing as it implies v1 of the library. None of these tests are incompatible between versions 1 & 2 of libpcre, it's less confusing to give them a more general name to make it clear that they work on both library versions. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-21 08:25:37 +09:00
Nguyễn Thái Ngọc Duy	b51a9c1479	diffcore-pickaxe: support case insensitive match on non-ascii Similar to the "grep -F -i" case, we can't use kws on icase search outside ascii range, so we quote the string and pass it to regcomp as a basic regexp and let regex engine deal with case sensitivity. The new test is put in t7812 instead of t4209-log-pickaxe because lib-gettext.sh might cause problems elsewhere, probably. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy	18547aacf5	grep/pcre: support utf-8 In the previous change in this function, we add locale support for single-byte encodings only. It looks like pcre only supports utf-* as multibyte encodings, the others are left in the cold (which is fine). We need to enable PCRE_UTF8 so pcre can find character boundary correctly. It's needed for case folding (when --ignore-case is used) or '*', '+' or similar syntax is used. The "has_non_ascii()" check is to be on the conservative side. If there's non-ascii in the pattern, the searched content could still be in utf-8, but we can treat it just like a byte stream and everything should work. If we force utf-8 based on locale only and pcre validates utf-8 and the file content is in non-utf8 encoding, things break. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: Plamen Totev <plamen.totev@abv.bg> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy	793dc676e0	grep/icase: avoid kwsset when -F is specified Similar to the previous commit, we can't use kws on icase search outside ascii range. But we can't simply pass the pattern to regcomp/pcre like the previous commit because it may contain regex special characters, so we need to quote the regex first. To avoid misquote traps that could lead to undefined behavior, we always stick to basic regex engine in this case. We don't need fancy features for grepping a literal string anyway. basic_regex_quote_buf() assumes that if the pattern is in a multibyte encoding, ascii chars must be unambiguously encoded as single bytes. This is true at least for UTF-8. For others, let's wait until people yell up. Chances are nobody uses multibyte, non utf-8 charsets anymore. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-01 12:44:30 -07:00
Nguyễn Thái Ngọc Duy	5c1ebcca4d	grep/icase: avoid kwsset on literal non-ascii strings When we detect the pattern is just a literal string, we avoid heavy regex engine and use fast substring search implemented in kwsset.c. But kws uses git-ctype which is locale-independent so it does not know how to fold case properly outside ascii range. Let regcomp or pcre take care of this case instead. Slower, but accurate. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-27 07:31:35 -07:00

6 commits