git/t/t4210-log-i18n.sh
Ævar Arnfjörð Bjarmason 44570188a0 grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"
Fix a bug introduced in 18547aacf5 ("grep/pcre: support utf-8",
2016-06-25) that was missed due to a blindspot in our tests, as
discussed in the previous commit. I then blindly copied the same bug
in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) when
adding the PCRE v2 code.

We should not tell PCRE that we're processing UTF-8 just because we're
dealing with non-ASCII. In the case of e.g. "log --encoding=<...>"
under is_utf8_locale() the haystack might be in ISO-8859-1, and the
needle might be in a non-UTF-8 encoding.

Maybe we should be more strict here and die earlier? Should we also be
converting the needle to the encoding in question, and failing if it's
not a string that's valid in that encoding? Maybe.

But for now matching this as non-UTF8 at least has some hope of
producing sensible results, since we know that our default heuristic
of assuming the text to be matched is in the user locale encoding
isn't true when we've explicitly encoded it to be in a different
encoding.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 09:11:09 -07:00

94 lines
2.5 KiB
Bash
Executable file

#!/bin/sh
test_description='test log with i18n features'
. ./lib-gettext.sh
# two forms of é
utf8_e=$(printf '\303\251')
latin1_e=$(printf '\351')
# invalid UTF-8
invalid_e=$(printf '\303\50)') # ")" at end to close opening "("
test_expect_success 'create commits in different encodings' '
test_tick &&
cat >msg <<-EOF &&
utf8
t${utf8_e}st
EOF
git add msg &&
git -c i18n.commitencoding=utf8 commit -F msg &&
cat >msg <<-EOF &&
latin1
t${latin1_e}st
EOF
git add msg &&
git -c i18n.commitencoding=ISO-8859-1 commit -F msg
'
test_expect_success 'log --grep searches in log output encoding (utf8)' '
cat >expect <<-\EOF &&
latin1
utf8
EOF
git log --encoding=utf8 --format=%s --grep=$utf8_e >actual &&
test_cmp expect actual
'
test_expect_success !MINGW 'log --grep searches in log output encoding (latin1)' '
cat >expect <<-\EOF &&
latin1
utf8
EOF
git log --encoding=ISO-8859-1 --format=%s --grep=$latin1_e >actual &&
test_cmp expect actual
'
test_expect_success !MINGW 'log --grep does not find non-reencoded values (utf8)' '
git log --encoding=utf8 --format=%s --grep=$latin1_e >actual &&
test_must_be_empty actual
'
test_expect_success 'log --grep does not find non-reencoded values (latin1)' '
git log --encoding=ISO-8859-1 --format=%s --grep=$utf8_e >actual &&
test_must_be_empty actual
'
for engine in fixed basic extended perl
do
prereq=
if test $engine = "perl"
then
prereq="PCRE"
else
prereq=""
fi
force_regex=
if test $engine != "fixed"
then
force_regex=.*
fi
test_expect_success GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
cat >expect <<-\EOF &&
latin1
utf8
EOF
LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$latin1_e\" >actual &&
test_cmp expect actual
"
test_expect_success GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$utf8_e\" >actual &&
test_must_be_empty actual
"
test_expect_success GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
test_must_be_empty actual
"
done
test_done