i18n: add infrastructure for translating Git with gettext
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-17 23:14:42 +00:00
|
|
|
|
#!/bin/sh
|
|
|
|
|
#
|
|
|
|
|
# Copyright (c) 2010 Ævar Arnfjörð Bjarmason
|
|
|
|
|
#
|
|
|
|
|
|
|
|
|
|
test_description="Gettext reencoding of our *.po/*.mo files works"
|
|
|
|
|
|
|
|
|
|
. ./lib-gettext.sh
|
|
|
|
|
|
t0204: clarify the "observe undefined behaviour" test
This test asks for an impossible conversion to the system by
preparing an UTF-8 translation with characters that cannot be
expressed in ISO-8859-1, and then asking the message shown in
ISO-8859-1. Even though the behaviour against such a request is
undefined, it may be interesting to see what the system does, and
the purpose of this test is to see if there are platforms that
exhibit behaviour that we haven't seen.
The original recognized two known modes of behaviour:
- the key used to query the message catalog ("TEST: Old English
Runes"), saying "I cannot do that i18n".
- impossible characters replaced with ASCII "?", saying "I punt".
but they were treated totally differently. The test simply issued
an informational message "Your system punts on this one" for the
first error mode, while it diagnosed the latter as "Your system is
good; you pass!".
It turns out that Mac OS X exhibits a third mode of error behaviour,
to spew out the raw value stored in the message catalog. The test
diagnosed this behaviour as "broken", but it is merely trying to do
its best to respond to an impossible request by saying "I punt" in a
way that is slightly different from the second one.
Update the offending test to make it clear what is (and is not)
being tested, update the code structure so that newly discovered
error mode can easily be added to it later, and reword the message
that comes from a failing case to clarify that it is not the system
that is broken when it fails, but merely that the behaviour is not
something we have seen.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07 23:36:59 +00:00
|
|
|
|
# The constants used in a tricky observation for undefined behaviour
|
|
|
|
|
RUNES="TILRAUN: ᚻᛖ ᚳᚹᚫᚦ ᚦᚫᛏ ᚻᛖ ᛒᚢᛞᛖ ᚩᚾ ᚦᚫᛗ ᛚᚪᚾᛞᛖ ᚾᚩᚱᚦᚹᛖᚪᚱᛞᚢᛗ ᚹᛁᚦ ᚦᚪ ᚹᛖᛥᚫ"
|
|
|
|
|
PUNTS="TILRAUN: ?? ???? ??? ?? ???? ?? ??? ????? ??????????? ??? ?? ????"
|
|
|
|
|
MSGKEY="TEST: Old English Runes"
|
i18n: add infrastructure for translating Git with gettext
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-17 23:14:42 +00:00
|
|
|
|
|
|
|
|
|
test_expect_success GETTEXT_LOCALE 'gettext: Emitting UTF-8 from our UTF-8 *.mo files / Icelandic' '
|
|
|
|
|
printf "TILRAUN: Halló Heimur!" >expect &&
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_locale" gettext "TEST: Hello World!" >actual &&
|
|
|
|
|
test_cmp expect actual
|
|
|
|
|
'
|
|
|
|
|
|
|
|
|
|
test_expect_success GETTEXT_LOCALE 'gettext: Emitting UTF-8 from our UTF-8 *.mo files / Runes' '
|
t0204: clarify the "observe undefined behaviour" test
This test asks for an impossible conversion to the system by
preparing an UTF-8 translation with characters that cannot be
expressed in ISO-8859-1, and then asking the message shown in
ISO-8859-1. Even though the behaviour against such a request is
undefined, it may be interesting to see what the system does, and
the purpose of this test is to see if there are platforms that
exhibit behaviour that we haven't seen.
The original recognized two known modes of behaviour:
- the key used to query the message catalog ("TEST: Old English
Runes"), saying "I cannot do that i18n".
- impossible characters replaced with ASCII "?", saying "I punt".
but they were treated totally differently. The test simply issued
an informational message "Your system punts on this one" for the
first error mode, while it diagnosed the latter as "Your system is
good; you pass!".
It turns out that Mac OS X exhibits a third mode of error behaviour,
to spew out the raw value stored in the message catalog. The test
diagnosed this behaviour as "broken", but it is merely trying to do
its best to respond to an impossible request by saying "I punt" in a
way that is slightly different from the second one.
Update the offending test to make it clear what is (and is not)
being tested, update the code structure so that newly discovered
error mode can easily be added to it later, and reword the message
that comes from a failing case to clarify that it is not the system
that is broken when it fails, but merely that the behaviour is not
something we have seen.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07 23:36:59 +00:00
|
|
|
|
printf "%s" "$RUNES" >expect &&
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_locale" gettext "$MSGKEY" >actual &&
|
i18n: add infrastructure for translating Git with gettext
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-17 23:14:42 +00:00
|
|
|
|
test_cmp expect actual
|
|
|
|
|
'
|
|
|
|
|
|
|
|
|
|
test_expect_success GETTEXT_ISO_LOCALE 'gettext: Emitting ISO-8859-1 from our UTF-8 *.mo files / Icelandic' '
|
|
|
|
|
printf "TILRAUN: Halló Heimur!" | iconv -f UTF-8 -t ISO8859-1 >expect &&
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_iso_locale" gettext "TEST: Hello World!" >actual &&
|
|
|
|
|
test_cmp expect actual
|
|
|
|
|
'
|
|
|
|
|
|
t0204: clarify the "observe undefined behaviour" test
This test asks for an impossible conversion to the system by
preparing an UTF-8 translation with characters that cannot be
expressed in ISO-8859-1, and then asking the message shown in
ISO-8859-1. Even though the behaviour against such a request is
undefined, it may be interesting to see what the system does, and
the purpose of this test is to see if there are platforms that
exhibit behaviour that we haven't seen.
The original recognized two known modes of behaviour:
- the key used to query the message catalog ("TEST: Old English
Runes"), saying "I cannot do that i18n".
- impossible characters replaced with ASCII "?", saying "I punt".
but they were treated totally differently. The test simply issued
an informational message "Your system punts on this one" for the
first error mode, while it diagnosed the latter as "Your system is
good; you pass!".
It turns out that Mac OS X exhibits a third mode of error behaviour,
to spew out the raw value stored in the message catalog. The test
diagnosed this behaviour as "broken", but it is merely trying to do
its best to respond to an impossible request by saying "I punt" in a
way that is slightly different from the second one.
Update the offending test to make it clear what is (and is not)
being tested, update the code structure so that newly discovered
error mode can easily be added to it later, and reword the message
that comes from a failing case to clarify that it is not the system
that is broken when it fails, but merely that the behaviour is not
something we have seen.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07 23:36:59 +00:00
|
|
|
|
test_expect_success GETTEXT_ISO_LOCALE 'gettext: impossible ISO-8859-1 output' '
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_iso_locale" gettext "$MSGKEY" >runes &&
|
|
|
|
|
case "$(cat runes)" in
|
|
|
|
|
"$MSGKEY")
|
|
|
|
|
say "Your system gives back the key to message catalog"
|
|
|
|
|
;;
|
|
|
|
|
"$PUNTS")
|
|
|
|
|
say "Your system replaces an impossible character with ?"
|
|
|
|
|
;;
|
|
|
|
|
"$RUNES")
|
|
|
|
|
say "Your system gives back the raw message for an impossible request"
|
|
|
|
|
;;
|
|
|
|
|
*)
|
|
|
|
|
say "We never saw the error behaviour your system exhibits"
|
|
|
|
|
false
|
|
|
|
|
;;
|
|
|
|
|
esac
|
i18n: add infrastructure for translating Git with gettext
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-17 23:14:42 +00:00
|
|
|
|
'
|
|
|
|
|
|
|
|
|
|
test_expect_success GETTEXT_LOCALE 'gettext: Fetching a UTF-8 msgid -> UTF-8' '
|
|
|
|
|
printf "TILRAUN: ‚einfaldar‘ og „tvöfaldar“ gæsalappir" >expect &&
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_locale" gettext "TEST: ‘single’ and “double” quotes" >actual &&
|
|
|
|
|
test_cmp expect actual
|
|
|
|
|
'
|
|
|
|
|
|
|
|
|
|
# How these quotes get transliterated depends on the gettext implementation:
|
|
|
|
|
#
|
|
|
|
|
# Debian: ,einfaldar' og ,,tvöfaldar" [GNU libintl]
|
|
|
|
|
# FreeBSD: `einfaldar` og "tvöfaldar" [GNU libintl]
|
|
|
|
|
# Solaris: ?einfaldar? og ?tvöfaldar? [Solaris libintl]
|
|
|
|
|
#
|
|
|
|
|
# Just make sure the contents are transliterated, and don't use grep -q
|
|
|
|
|
# so that these differences are emitted under --verbose for curious
|
|
|
|
|
# eyes.
|
|
|
|
|
test_expect_success GETTEXT_ISO_LOCALE 'gettext: Fetching a UTF-8 msgid -> ISO-8859-1' '
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_iso_locale" gettext "TEST: ‘single’ and “double” quotes" >actual &&
|
|
|
|
|
grep "einfaldar" actual &&
|
|
|
|
|
grep "$(echo tvöfaldar | iconv -f UTF-8 -t ISO8859-1)" actual
|
|
|
|
|
'
|
|
|
|
|
|
|
|
|
|
test_expect_success GETTEXT_LOCALE 'gettext.c: git init UTF-8 -> UTF-8' '
|
|
|
|
|
printf "Bjó til tóma Git lind" >expect &&
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_locale" git init repo >actual &&
|
|
|
|
|
test_when_finished "rm -rf repo" &&
|
|
|
|
|
grep "^$(cat expect) " actual
|
|
|
|
|
'
|
|
|
|
|
|
|
|
|
|
test_expect_success GETTEXT_ISO_LOCALE 'gettext.c: git init UTF-8 -> ISO-8859-1' '
|
|
|
|
|
printf "Bjó til tóma Git lind" >expect &&
|
|
|
|
|
LANGUAGE=is LC_ALL="$is_IS_iso_locale" git init repo >actual &&
|
|
|
|
|
test_when_finished "rm -rf repo" &&
|
|
|
|
|
grep "^$(cat expect | iconv -f UTF-8 -t ISO8859-1) " actual
|
|
|
|
|
'
|
|
|
|
|
|
|
|
|
|
test_done
|