gettext —- 多语种国际化服务
源代码:Lib/gettext.py
The gettext
module provides internationalization (I18N) and localization(L10N) services for your Python modules and applications. It supports both theGNU gettext message catalog API and a higher level, class-based API that maybe more appropriate for Python files. The interface described below allows youto write your module and application messages in one natural language, andprovide a catalog of translated messages for running under different naturallanguages.
同时还给出一些本地化 Python 模块及应用程序的小技巧。
GNU gettext API
模块 gettext
定义了下列 API,这与 gettext API 类似。如果你使用该 API,将会对整个应用程序产生全局的影响。如果你的应用程序支持多语种,而语言选择取决于用户的区域设置,这通常正是你所想要的。而如果你正在本地化某个 Python 模块,或者你的应用程序需要在运行时切换语言,相反你或许想用基于类的API。
gettext.
bindtextdomain
(domain, localedir=None)- Bind the domain to the locale directory localedir. More concretely,
gettext
will look for binary.mo
files for the given domain usingthe path (on Unix):localedir/language/LCMESSAGES/domain.mo
, where_languages is searched for in the environment variablesLANGUAGE
,LC_ALL
,LC_MESSAGES
, andLANG
respectively.
如果遗漏了 localedir 或者设置为 None
,那么将返回当前 domain 所绑定的值 1
gettext.
bindtextdomain_codeset
(_domain, codeset=None)- Bind the domain to codeset, changing the encoding of byte stringsreturned by the
lgettext()
,ldgettext()
,lngettext()
andldngettext()
functions.If codeset is omitted, then the current binding is returned.
gettext.
textdomain
(domain=None)- Change or query the current global domain. If domain is
None
, then thecurrent global domain is returned, otherwise the global domain is set todomain, which is returned.
gettext.
gettext
(message)- Return the localized translation of message, based on the current globaldomain, language, and locale directory. This function is usually aliased as
_()
in the local namespace (see examples below).
gettext.
dgettext
(domain, message)- Like
gettext()
, but look the message up in the specified domain.
gettext.
ngettext
(singular, plural, n)- Like
gettext()
, but consider plural forms. If a translation is found,apply the plural formula to n, and return the resulting message (somelanguages have more than two plural forms). If no translation is found, returnsingular if n is 1; return plural otherwise.
The Plural formula is taken from the catalog header. It is a C or Pythonexpression that has a free variable n; the expression evaluates to the indexof the plural in the catalog. Seethe GNU gettext documentationfor the precise syntax to be used in .po
files and theformulas for a variety of languages.
gettext.
dngettext
(domain, singular, plural, n)- Like
ngettext()
, but look the message up in the specified domain.
gettext.
ldngettext
(domain, singular, plural, n)- Equivalent to the corresponding functions without the
l
prefix(gettext()
,dgettext()
,ngettext()
anddngettext()
),but the translation is returned as a byte string encoded in the preferredsystem encoding if no other encoding was explicitly set withbind_textdomain_codeset()
.
警告
These functions should be avoided in Python 3, because they returnencoded bytes. It's much better to use alternatives which returnUnicode strings instead, since most Python applications will want tomanipulate human readable text as strings instead of bytes. Further,it's possible that you may get unexpected Unicode-related exceptionsif there are encoding problems with the translated strings. It ispossible that the l*()
functions will be deprecated in future Pythonversions due to their inherent problems and limitations.
Note that GNU gettext also defines a dcgettext()
method, butthis was deemed not useful and so it is currently unimplemented.
Here's an example of typical usage for this API:
- import gettext
- gettext.bindtextdomain('myapplication', '/path/to/my/language/directory')
- gettext.textdomain('myapplication')
- _ = gettext.gettext
- # ...
- print(_('This is a translatable string.'))
Class-based API
The class-based API of the gettext
module gives you more flexibility andgreater convenience than the GNU gettext API. It is the recommendedway of localizing your Python applications and modules. gettext
definesa GNUTranslations
class which implements the parsing of GNU .mo
formatfiles, and has methods for returning strings. Instances of this class can alsoinstall themselves in the built-in namespace as the function _()
.
gettext.
find
(domain, localedir=None, languages=None, all=False)- This function implements the standard
.mo
file search algorithm. Ittakes a domain, identical to whattextdomain()
takes. Optionallocaledir is as inbindtextdomain()
. Optional languages is a list ofstrings, where each string is a language code.
If localedir is not given, then the default system locale directory is used.2 If languages is not given, then the following environment variables aresearched: LANGUAGE
, LCALL
, LC_MESSAGES
, andLANG
. The first one returning a non-empty value is used for the_languages variable. The environment variables should contain a colon separatedlist of languages, which will be split on the colon to produce the expected listof language code strings.
find()
then expands and normalizes the languages, and then iteratesthrough them, searching for an existing file built of these components:
localedir/language/LC_MESSAGES/domain.mo
The first such file name that exists is returned by find()
. If no suchfile is found, then None
is returned. If all is given, it returns a listof all file names, in the order in which they appear in the languages list orthe environment variables.
gettext.
translation
(domain, localedir=None, languages=None, class=None, _fallback=False, codeset=None)- Return a
*Translations
instance based on the domain, localedir,and languages, which are first passed tofind()
to get a list of theassociated.mo
file paths. Instances with identical.mo
filenames are cached. The actual class instantiated is class__ ifprovided, otherwiseGNUTranslations
. The class's constructor musttake a single file object argument. If provided, _codeset will changethe charset used to encode translated strings in thelgettext()
andlngettext()
methods.
If multiple files are found, later files are used as fallbacks for earlier ones.To allow setting the fallback, copy.copy()
is used to clone eachtranslation object from the cache; the actual instance data is still shared withthe cache.
If no .mo
file is found, this function raises OSError
iffallback is false (which is the default), and returns aNullTranslations
instance if fallback is true.
在 3.3 版更改: IOError
used to be raised instead of OSError
.
gettext.
install
(domain, localedir=None, codeset=None, names=None)- This installs the function
()
in Python's builtins namespace, based on_domain, localedir, and codeset which are passed to the functiontranslation()
.
For the names parameter, please see the description of the translationobject's install()
method.
As seen below, you usually mark the strings in your application that arecandidates for translation, by wrapping them in a call to the _()
function, like this:
- print(_('This string will be translated.'))
For convenience, you want the _()
function to be installed in Python'sbuiltins namespace, so it is easily accessible in all modules of yourapplication.
The NullTranslations class
Translation classes are what actually implement the translation of originalsource file message strings to translated message strings. The base class usedby all translation classes is NullTranslations
; this provides the basicinterface you can use to write your own specialized translation classes. Hereare the methods of NullTranslations
:
- class
gettext.
NullTranslations
(fp=None) Takes an optional file objectfp, which is ignored by the base class.Initializes "protected" instance variables info_ and charset which are setby derived classes, as well as __fallback, which is set through
add_fallback()
. It then callsself.parse(fp)
if _fp is notNone
.parse
(_fp)No-op in the base class, this method takes file object fp, and readsthe data from the file, initializing its message catalog. If you have anunsupported message catalog file format, you should override this methodto parse your format.
Add fallback as the fallback object for the current translation object.A translation object should consult the fallback if it cannot provide atranslation for a given message.
If a fallback has been set, forward
gettext()
to the fallback.Otherwise, return message. Overridden in derived classes.If a fallback has been set, forward
ngettext()
to the fallback.Otherwise, return singular if n is 1; return plural otherwise.Overridden in derived classes.lngettext
(singular, plural, n)- Equivalent to
gettext()
andngettext()
, but the translationis returned as a byte string encoded in the preferred system encodingif no encoding was explicitly set withset_output_charset()
.Overridden in derived classes.
警告
These methods should be avoided in Python 3. See the warning for thelgettext()
function.
info
()Return the "protected"
_info
variable, a dictionary containingthe metadata found in the message catalog file.Return the encoding of the message catalog file.
Return the encoding used to return translated messages in
lgettext()
andlngettext()
.Change the encoding used to return translated messages.
- This method installs
gettext()
into the built-in namespace,binding it to_
.
If the names parameter is given, it must be a sequence containing thenames of functions you want to install in the builtins namespace inaddition to _()
. Supported names are 'gettext'
, 'ngettext'
,'lgettext'
and 'lngettext'
.
Note that this is only one way, albeit the most convenient way, to makethe ()
function available to your application. Because it affectsthe entire application globally, and specifically the built-in namespace,localized modules should never install ()
. Instead, they should usethis code to make _()
available to their module:
- import gettext
- t = gettext.translation('mymodule', ...)
- _ = t.gettext
This puts _()
only in the module's global namespace and so onlyaffects calls within this module.
The GNUTranslations class
The gettext
module provides one additional class derived fromNullTranslations
: GNUTranslations
. This class overrides_parse()
to enable reading GNU gettext format .mo
filesin both big-endian and little-endian format.
GNUTranslations
parses optional metadata out of the translationcatalog. It is convention with GNU gettext to include metadata asthe translation for the empty string. This metadata is in RFC 822-stylekey: value
pairs, and should contain the Project-Id-Version
key. If thekey Content-Type
is found, then the charset
property is used toinitialize the "protected" _charset
instance variable, defaulting toNone
if not found. If the charset encoding is specified, then all messageids and message strings read from the catalog are converted to Unicode usingthis encoding, else ASCII is assumed.
Since message ids are read as Unicode strings too, all *gettext()
methodswill assume message ids as Unicode strings, not byte strings.
The entire set of key/value pairs are placed into a dictionary and set as the"protected" _info
instance variable.
If the .mo
file's magic number is invalid, the major version number isunexpected, or if other problems occur while reading the file, instantiating aGNUTranslations
class can raise OSError
.
- class
gettext.
GNUTranslations
The following methods are overridden from the base class implementation:
gettext
(message)Look up the message id in the catalog and return the corresponding messagestring, as a Unicode string. If there is no entry in the catalog for themessage id, and a fallback has been set, the look up is forwarded to thefallback's
gettext()
method. Otherwise, themessage id is returned.- Do a plural-forms lookup of a message id. singular is used as the message idfor purposes of lookup in the catalog, while n is used to determine whichplural form to use. The returned message string is a Unicode string.
If the message id is not found in the catalog, and a fallback is specified,the request is forwarded to the fallback's ngettext()
method. Otherwise, when n is 1 singular is returned, and plural isreturned in all other cases.
Here is an example:
- n = len(os.listdir('.'))
- cat = GNUTranslations(somefile)
- message = cat.ngettext(
- 'There is %(num)d file in this directory',
- 'There are %(num)d files in this directory',
- n) % {'num': n}
lgettext
(message)lngettext
(singular, plural, n)- Equivalent to
gettext()
andngettext()
, but the translationis returned as a byte string encoded in the preferred system encodingif no encoding was explicitly set withset_output_charset()
.
警告
These methods should be avoided in Python 3. See the warning for thelgettext()
function.
Solaris message catalog support
The Solaris operating system defines its own binary .mo
file format, butsince no documentation can be found on this format, it is not supported at thistime.
The Catalog constructor
GNOME uses a version of the gettext
module by James Henstridge, but thisversion has a slightly different API. Its documented usage was:
- import gettext
- cat = gettext.Catalog(domain, localedir)
- _ = cat.gettext
- print(_('hello world'))
For compatibility with this older module, the function Catalog()
is analias for the translation()
function described above.
One difference between this module and Henstridge's: his catalog objectssupported access through a mapping API, but this appears to be unused and so isnot currently supported.
Internationalizing your programs and modules
Internationalization (I18N) refers to the operation by which a program is madeaware of multiple languages. Localization (L10N) refers to the adaptation ofyour program, once internationalized, to the local language and cultural habits.In order to provide multilingual messages for your Python programs, you need totake the following steps:
prepare your program or module by specially marking translatable strings
run a suite of tools over your marked files to generate raw messages catalogs
create language-specific translations of the message catalogs
use the
gettext
module so that message strings are properly translated
In order to prepare your code for I18N, you need to look at all the strings inyour files. Any string that needs to be translated should be marked by wrappingit in ('…')
—- that is, a call to the function ()
. For example:
- filename = 'mylog.txt'
- message = _('writing a log message')
- with open(filename, 'w') as fp:
- fp.write(message)
In this example, the string 'writing a log message'
is marked as a candidatefor translation, while the strings 'mylog.txt'
and 'w'
are not.
There are a few tools to extract the strings meant for translation.The original GNU gettext only supported C or C++ sourcecode but its extended version xgettext scans code writtenin a number of languages, including Python, to find strings marked astranslatable. Babel is a Pythoninternationalization library that includes a pybabel
script toextract and compile message catalogs. François Pinard's programcalled xpot does a similar job and is available as part ofhis po-utils package.
(Python also includes pure-Python versions of these programs, calledpygettext.py and msgfmt.py; some Python distributionswill install them for you. pygettext.py is similar toxgettext, but only understands Python source code andcannot handle other programming languages such as C or C++.pygettext.py supports a command-line interface similar toxgettext; for details on its use, run pygettext.py—help
. msgfmt.py is binary compatible with GNUmsgfmt. With these two programs, you may not need the GNUgettext package to internationalize your Pythonapplications.)
xgettext, pygettext, and similar tools generate.po
files that are message catalogs. They are structuredhuman-readable files that contain every marked string in the sourcecode, along with a placeholder for the translated versions of thesestrings.
Copies of these .po
files are then handed over to theindividual human translators who write translations for everysupported natural language. They send back the completedlanguage-specific versions as a <language-name>.po
file that'scompiled into a machine-readable .mo
binary catalog file usingthe msgfmt program. The .mo
files are used by thegettext
module for the actual translation processing atrun-time.
How you use the gettext
module in your code depends on whether you areinternationalizing a single module or your entire application. The next twosections will discuss each case.
Localizing your module
If you are localizing your module, you must take care not to make globalchanges, e.g. to the built-in namespace. You should not use the GNU gettextAPI but instead the class-based API.
Let's say your module is called "spam" and the module's various natural languagetranslation .mo
files reside in /usr/share/locale
in GNUgettext format. Here's what you would put at the top of yourmodule:
- import gettext
- t = gettext.translation('spam', '/usr/share/locale')
- _ = t.gettext
Localizing your application
If you are localizing your application, you can install the ()
functionglobally into the built-in namespace, usually in the main driver file of yourapplication. This will let all your application-specific files just use('…')
without having to explicitly install it in each file.
In the simple case then, you need only add the following bit of code to the maindriver file of your application:
- import gettext
- gettext.install('myapplication')
If you need to set the locale directory, you can pass it into theinstall()
function:
- import gettext
- gettext.install('myapplication', '/usr/share/locale')
Changing languages on the fly
If your program needs to support many languages at the same time, you may wantto create multiple translation instances and then switch between themexplicitly, like so:
- import gettext
- lang1 = gettext.translation('myapplication', languages=['en'])
- lang2 = gettext.translation('myapplication', languages=['fr'])
- lang3 = gettext.translation('myapplication', languages=['de'])
- # start by using language1
- lang1.install()
- # ... time goes by, user selects language 2
- lang2.install()
- # ... more time goes by, user selects language 3
- lang3.install()
Deferred translations
In most coding situations, strings are translated where they are coded.Occasionally however, you need to mark strings for translation, but defer actualtranslation until later. A classic example is:
- animals = ['mollusk',
- 'albatross',
- 'rat',
- 'penguin',
- 'python', ]
- # ...
- for a in animals:
- print(a)
Here, you want to mark the strings in the animals
list as beingtranslatable, but you don't actually want to translate them until they areprinted.
Here is one way you can handle this situation:
- def _(message): return message
- animals = [_('mollusk'),
- _('albatross'),
- _('rat'),
- _('penguin'),
- _('python'), ]
- del _
- # ...
- for a in animals:
- print(_(a))
This works because the dummy definition of ()
simply returns the stringunchanged. And this dummy definition will temporarily override any definitionof ()
in the built-in namespace (until the del
command). Takecare, though if you have a previous definition of _()
in the localnamespace.
Note that the second use of _()
will not identify "a" as beingtranslatable to the gettext program, because the parameteris not a string literal.
Another way to handle this is with the following example:
- def N_(message): return message
- animals = [N_('mollusk'),
- N_('albatross'),
- N_('rat'),
- N_('penguin'),
- N_('python'), ]
- # ...
- for a in animals:
- print(_(a))
In this case, you are marking translatable strings with the functionN()
, which won't conflict with any definition of ()
.However, you will need to teach your message extraction program tolook for translatable strings marked with N()
. xgettext,pygettext, pybabel extract
, and xpot allsupport this through the use of the -k
command-line switch.The choice of N
()
here is totally arbitrary; it could have justas easily been MarkThisStringForTranslation()
.
致谢
以下人员为创建此模块贡献了代码、反馈、设计建议、早期实现和宝贵的经验:
Peter Funk
James Henstridge
Juan David Ibáñez Palomar
Marc-André Lemburg
Martin von Löwis
François Pinard
Barry Warsaw
Gustavo Niemeyer
备注
- 1
The default locale directory is system dependent; for example, on RedHat Linuxit is
/usr/share/locale
, but on Solaris it is/usr/lib/locale
.Thegettext
module does not try to support these system dependentdefaults; instead its default issys.prefix/share/locale
(seesys.prefix
). For this reason, it is always best to callbindtextdomain()
with an explicit absolute path at the start of yourapplication.- See the footnote for
bindtextdomain()
above.