This removes the ability to inject "poison" gettext() messages via the
GIT_TEST_GETTEXT_POISON special test setup.
I initially added this as a compile-time option in bb946bba76 (i18n:
add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and
most recently modified to be toggleable at runtime in
6cdccfce1e (i18n: make GETTEXT_POISON a runtime option, 2018-11-08)..
The reason for its removal is that the trade-off of maintaining it
v.s. what it's getting us has long since flipped. When gettext was
integrated in 5e9637c629 (i18n: add infrastructure for translating
Git with gettext, 2011-11-18) there was understandable concern on the
Git ML that in marking messages for translation en-masse we'd
inadvertently mark plumbing messages. The GETTEXT_POISON facility was
a way to smoke those out via our test suite.
Nowadays however we're done (or almost entirely done) with any marking
of messages for translation. New messages are usually marked by their
authors, who'll know whether it makes sense to translate them or
not. If not any errors in marking the messages are much more likely to
be spotted in review than in the the initial deluge of i18n patches in
the 2011-2012 era.
So let's just remove this. This leaves the test suite in a state where
we still have a lot of test_i18n, C_LOCALE_OUTPUT
etc. uses. Subsequent commits will remove those too.
The change to t/lib-rebase.sh is a selective revert of the relevant
part of f2d17068fd (i18n: rebase-interactive: mark comments of squash
for translation, 2016-06-17), and the comment in
t/t3406-rebase-message.sh is from c7108bf9ed (i18n: rebase: mark
messages for translation, 2012-07-25).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There has been a push to remove extern from function declarations.
Remove some instances of "extern" for function declarations which are
caught by Coccinelle. Note that Coccinelle has some difficulty with
processing functions with `__attribute__` or varargs so some `extern`
declarations are left behind to be dealt with in a future patch.
This was the Coccinelle patch used:
@@
type T;
identifier f;
@@
- extern
T f(...);
and it was run with:
$ git ls-files \*.{c,h} |
grep -v ^compat/ |
xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place
Files under `compat/` are intentionally excluded as some are directly
copied from external sources and we should avoid churning them as much
as possible.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the GETTEXT_POISON compile-time + runtime GIT_GETTEXT_POISON
test parameter to only be a GIT_TEST_GETTEXT_POISON=<non-empty?>
runtime parameter, to be consistent with other parameters documented
in "Running tests with special setups" in t/README.
When I added GETTEXT_POISON in bb946bba76 ("i18n: add GETTEXT_POISON
to simulate unfriendly translator", 2011-02-22) I was concerned with
ensuring that the _() function would get constant folded if NO_GETTEXT
was defined, and likewise that GETTEXT_POISON would be compiled out
unless it was defined.
But as the benchmark in my [1] shows doing a one-off runtime
getenv("GIT_TEST_[...]") is trivial, and since GETTEXT_POISON was
originally added the GIT_TEST_* env variables have become the common
idiom for turning on special test setups.
So change GETTEXT_POISON to work the same way. Now the
GETTEXT_POISON=YesPlease compile-time option is gone, and running the
tests with GIT_TEST_GETTEXT_POISON=[YesPlease|] can be toggled on/off
without recompiling.
This allows for conditionally amending tests to test with/without
poison, similar to what 859fdc0c3c ("commit-graph: define
GIT_TEST_COMMIT_GRAPH", 2018-08-29) did for GIT_TEST_COMMIT_GRAPH. Do
some of that, now we e.g. always run the t0205-gettext-poison.sh test.
I did enough there to remove the GETTEXT_POISON prerequisite, but its
inverse C_LOCALE_OUTPUT is still around, and surely some tests using
it can be converted to e.g. always set GIT_TEST_GETTEXT_POISON=.
Notes on the implementation:
* We still compile a dedicated GETTEXT_POISON build in Travis
CI. Perhaps this should be revisited and integrated into the
"linux-gcc" build, see ae59a4e44f ("travis: run tests with
GIT_TEST_SPLIT_INDEX", 2018-01-07) for prior art in that area. Then
again maybe not, see [2].
* We now skip a test in t0000-basic.sh under
GIT_TEST_GETTEXT_POISON=YesPlease that wasn't skipped before. This
test relies on C locale output, but due to an edge case in how the
previous implementation of GETTEXT_POISON worked (reading it from
GIT-BUILD-OPTIONS) wasn't enabling poison correctly. Now it does,
and needs to be skipped.
* The getenv() function is not reentrant, so out of paranoia about
code of the form:
printf(_("%s"), getenv("some-env"));
call use_gettext_poison() in our early setup in git_setup_gettext()
so we populate the "poison_requested" variable in a codepath that's
won't suffer from that race condition.
* We error out in the Makefile if you're still saying
GETTEXT_POISON=YesPlease to prompt users to change their
invocation.
* We should not print out poisoned messages during the test
initialization itself to keep it more readable, so the test library
hides the variable if set in $GIT_TEST_GETTEXT_POISON_ORIG during
setup. See [3].
See also [4] for more on the motivation behind this patch, and the
history of the GETTEXT_POISON facility.
1. https://public-inbox.org/git/871s8gd32p.fsf@evledraar.gmail.com/
2. https://public-inbox.org/git/20181102163725.GY30222@szeder.dev/
3. https://public-inbox.org/git/20181022202241.18629-2-szeder.dev@gmail.com/
4. https://public-inbox.org/git/878t2pd6yu.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function returns true if git is running under an UTF-8
locale. pcre in the next patch will need this.
is_encoding_utf8() is used instead of strcmp() to catch both "utf-8"
and "utf8" suffixes.
When built with no gettext support, we peek in several env variables
to detect UTF-8. pcre library might support utf-8 even if libc is
built without locale support.. The peeking code is a copy from
compat/regex/regcomp.c
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calling setlocale(LC_MESSAGES, ...) directly from http.c, without
including <locale.h>, was causing compilation warnings. Move the
helper function to gettext.c that already includes the header and
where locale-related issues are handled.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gettext N_ macro is used to mark strings for translation
without actually translating them. At runtime the string is
expected to be passed to the gettext API for translation.
If two N_ macro invocations appear next to each other with only
whitespace (or nothing at all) between them, the two separate
strings will be marked for translation, but the preprocessor
will then silently combine the strings into one and at runtime
the string passed to gettext will not match the strings that
were translated so no translation will actually occur.
Avoid this by adding parentheses around the expansion of the
N_ macro so that instead of ending up with two adjacent strings
that are then combined by the preprocessor, two adjacent strings
surrounded by parentheses result instead which causes a compile
error so the mistake can be quickly found and corrected.
However, since these string literals are typically assigned to
static variables and not all compilers support parenthesized
string literal assignments, allow this to be controlled by the
Makefile with the default only enabled when the compiler is
known to support the syntax.
For now only __GNUC__ enables this by default which covers both
gcc and clang which should result in early detection of any
adjacent N_ macros.
Although the necessary tests make the affected files a bit less
elegant, the benefit of avoiding propagation of a translation-
marking error to all the translation teams thus creating extra
work for them when the error is eventually detected and fixed
would seem to outweigh the minor inelegance the additional
configuration tests introduce.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The status report from "git fetch", when messages like 'up-to-date'
are translated, did not align the branch names well.
* nd/fetch-status-alignment:
fetch: align per-ref summary report in UTF-8 locales
fetch does printf("%-*s", width, "foo") where "foo" can be a utf-8
string, but width is in bytes, not columns. For ASCII it's fine as one
byte takes one column. For utf-8, this may result in misaligned ref
summary table.
Introduce gettext_width() function that returns the string length in
columns (currently only supports utf-8 locales). Make the code use
TRANSPORT_SUMMARY(x) where the length is compensated properly in
non-English locales.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gettext .po files have a header, but it looks like the
translation specification for an empty string. This results in
_("") actually returning that header.
Check the input to _() and do not call gettext() on an empty string;
in some places, we run _(opts->help) where opts->help may be empty.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The syntax
static const char ignore_error[] = ("something");
is invalid C. A parenthesized string is not allowed as an array
initializer.
Some compilers, for example GCC and MSVC, allow this syntax as an
extension, but it is not a portable construct. tcc does not parse it, for
example.
Remove the parenthesis from the definition of the N_() macro to
fix this.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Q_ function translates a string representing some pharse with an
alternative plural form and uses the 'count' argument to choose which
form to return. Use of Q_ solves the "%d noun(s)" problem in a way
that is portable to languages outside the Germanic and Romance
families.
In English, the semantics of Q_(sing, plur, count) are roughly
equivalent to
count == 1 ? _(sing) : _(plur)
while in other languages there can be more variants (count == 0; more
random-looking rules based on the historical pronunciation of the
number). Behind the scenes, the singular form is used to look up a
family of translations and the plural form is ignored unless no
translation is available.
Define such a Q_ in gettext.h with the English semantics so C code can
start using it to mark phrases with a count for translation.
The name "Q_" is taken from subversion and stands for "quantity".
Many projects just use ngettext directly without a wrapper analogous
to _; we should not do so because git's gettext.h is meant not to
conflict with system headers that might include libintl.h.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tweak the GETTEXT_POISON facility so it is activated at run time
instead of compile time. If the GIT_GETTEXT_POISON environment
variable is set, _(msg) will result in gibberish as before; but if the
GIT_GETTEXT_POISON variable is not set, it will return the message for
human-readable output. So the behavior of mistranslated and
untranslated git can be compared without rebuilding git in between.
For simplicity we always set the GIT_GETTEXT_POISON variable in tests.
This does not affect builds without the GETTEXT_POISON compile-time
option set, so non-i18n git will not be slowed down.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new GETTEXT_POISON compile-time parameter to make _(msg) always
return gibberish. So now you can run
make GETTEXT_POISON=YesPlease
to get a copy of git that functions correctly (one hopes) but produces
output that is in nobody's native language at all.
This is a debugging aid for people who are working on the i18n part of
the system, to make sure that they are not marking plumbing messages
that should never be translated with _().
As new strings get marked for translation, naturally a number of tests
will be broken in this mode. Tests that depend on output from
Porcelain will need to be marked with the new C_LOCALE_OUTPUT test
prerequisite. Newly failing tests that do not depend on output from
Porcelain would be bugs due to messages that should not have been
marked for translation.
Note that the string we're using ("# GETTEXT POISON #") intentionally
starts the pound sign. Some of Git's tests such as
t3404-rebase-interactive.sh rely on interactive editing with a fake
editor, and will needlessly break if the message doesn't start with
something the interactive editor considers a comment.
A future patch will fix fix the underlying cause of that issue by
adding "#" characters to the commit advice automatically.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The _ function is for translating strings into the user's chosen
language. The N_ macro just marks translatable strings for the
xgettext(1) tool without translating them; it is intended for use in
contexts where a function call cannot be used. So, for example:
fprintf(stderr, _("Expansion of alias '%s' failed; "
"'%s' is not a git command\n"),
cmd, argv[0]);
and
const char *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
/* ERROR_WOULD_OVERWRITE */
N_("Entry '%s' would be overwritten by merge. Cannot merge."),
[...]
Define such _ and N_ in a new gettext.h and include it in cache.h, so
they can be used everywhere. Each just returns its argument for now.
_ is a function rather than a macro like N_ to avoid the temptation to
use _("foo") as a string literal (which would be a compile-time error
once _(s) expands to an expression for the translation of s).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>