gettext —- 多语种国际化服务

源代码:Lib/gettext.py


The gettext module provides internationalization (I18N) and localization(L10N) services for your Python modules and applications. It supports both theGNU gettext message catalog API and a higher level, class-based API that maybe more appropriate for Python files. The interface described below allows youto write your module and application messages in one natural language, andprovide a catalog of translated messages for running under different naturallanguages.

同时还给出一些本地化 Python 模块及应用程序的小技巧。

GNU gettext API

模块 gettext 定义了下列 API,这与 gettext API 类似。如果你使用该 API,将会对整个应用程序产生全局的影响。如果你的应用程序支持多语种,而语言选择取决于用户的区域设置,这通常正是你所想要的。而如果你正在本地化某个 Python 模块,或者你的应用程序需要在运行时切换语言,相反你或许想用基于类的API。

  • gettext.bindtextdomain(domain, localedir=None)
  • Bind the domain to the locale directory localedir. More concretely,gettext will look for binary .mo files for the given domain usingthe path (on Unix): localedir/language/LCMESSAGES/domain.mo, where_language is searched for in the environment variables LANGUAGE,LC_ALL, LC_MESSAGES, and LANG respectively.

如果遗漏了 localedir 或者设置为 None,那么将返回当前 domain 所绑定的值 1

  • gettext.bindtextdomain_codeset(_domain, codeset=None)
  • Bind the domain to codeset, changing the encoding of byte stringsreturned by the lgettext(), ldgettext(), lngettext()and ldngettext() functions.If codeset is omitted, then the current binding is returned.

Deprecated since version 3.8, will be removed in version 3.10.

  • gettext.textdomain(domain=None)
  • Change or query the current global domain. If domain is None, then thecurrent global domain is returned, otherwise the global domain is set todomain, which is returned.

  • gettext.gettext(message)

  • Return the localized translation of message, based on the current globaldomain, language, and locale directory. This function is usually aliased as_() in the local namespace (see examples below).

  • gettext.dgettext(domain, message)

  • Like gettext(), but look the message up in the specified domain.

  • gettext.ngettext(singular, plural, n)

  • Like gettext(), but consider plural forms. If a translation is found,apply the plural formula to n, and return the resulting message (somelanguages have more than two plural forms). If no translation is found, returnsingular if n is 1; return plural otherwise.

The Plural formula is taken from the catalog header. It is a C or Pythonexpression that has a free variable n; the expression evaluates to the indexof the plural in the catalog. Seethe GNU gettext documentationfor the precise syntax to be used in .po files and theformulas for a variety of languages.

  • gettext.dngettext(domain, singular, plural, n)
  • Like ngettext(), but look the message up in the specified domain.

  • gettext.pgettext(context, message)

  • gettext.dpgettext(domain, context, message)
  • gettext.npgettext(context, singular, plural, n)
  • gettext.dnpgettext(domain, context, singular, plural, n)
  • Similar to the corresponding functions without the p in the prefix (thatis, gettext(), dgettext(), ngettext(), dngettext()),but the translation is restricted to the given message context.

3.8 新版功能.

  • gettext.lgettext(message)
  • gettext.ldgettext(domain, message)
  • gettext.lngettext(singular, plural, n)
  • gettext.ldngettext(domain, singular, plural, n)
  • Equivalent to the corresponding functions without the l prefix(gettext(), dgettext(), ngettext() and dngettext()),but the translation is returned as a byte string encoded in the preferredsystem encoding if no other encoding was explicitly set withbind_textdomain_codeset().

警告

These functions should be avoided in Python 3, because they returnencoded bytes. It's much better to use alternatives which returnUnicode strings instead, since most Python applications will want tomanipulate human readable text as strings instead of bytes. Further,it's possible that you may get unexpected Unicode-related exceptionsif there are encoding problems with the translated strings.

Deprecated since version 3.8, will be removed in version 3.10.

Note that GNU gettext also defines a dcgettext() method, butthis was deemed not useful and so it is currently unimplemented.

Here's an example of typical usage for this API:

  1. import gettext
  2. gettext.bindtextdomain('myapplication', '/path/to/my/language/directory')
  3. gettext.textdomain('myapplication')
  4. _ = gettext.gettext
  5. # ...
  6. print(_('This is a translatable string.'))

Class-based API

The class-based API of the gettext module gives you more flexibility andgreater convenience than the GNU gettext API. It is the recommendedway of localizing your Python applications and modules. gettext definesa GNUTranslations class which implements the parsing of GNU .mo formatfiles, and has methods for returning strings. Instances of this class can alsoinstall themselves in the built-in namespace as the function _().

  • gettext.find(domain, localedir=None, languages=None, all=False)
  • This function implements the standard .mo file search algorithm. Ittakes a domain, identical to what textdomain() takes. Optionallocaledir is as in bindtextdomain(). Optional languages is a list ofstrings, where each string is a language code.

If localedir is not given, then the default system locale directory is used.2 If languages is not given, then the following environment variables aresearched: LANGUAGE, LCALL, LC_MESSAGES, andLANG. The first one returning a non-empty value is used for the_languages variable. The environment variables should contain a colon separatedlist of languages, which will be split on the colon to produce the expected listof language code strings.

find() then expands and normalizes the languages, and then iteratesthrough them, searching for an existing file built of these components:

localedir/language/LC_MESSAGES/domain.mo

The first such file name that exists is returned by find(). If no suchfile is found, then None is returned. If all is given, it returns a listof all file names, in the order in which they appear in the languages list orthe environment variables.

  • gettext.translation(domain, localedir=None, languages=None, class=None, _fallback=False, codeset=None)
  • Return a *Translations instance based on the domain, localedir,and languages, which are first passed to find() to get a list of theassociated .mo file paths. Instances with identical .mo filenames are cached. The actual class instantiated is class__ ifprovided, otherwise GNUTranslations. The class's constructor musttake a single file object argument. If provided, _codeset will changethe charset used to encode translated strings in thelgettext() and lngettext()methods.

If multiple files are found, later files are used as fallbacks for earlier ones.To allow setting the fallback, copy.copy() is used to clone eachtranslation object from the cache; the actual instance data is still shared withthe cache.

If no .mo file is found, this function raises OSError iffallback is false (which is the default), and returns aNullTranslations instance if fallback is true.

在 3.3 版更改: IOError used to be raised instead of OSError.

Deprecated since version 3.8, will be removed in version 3.10: The codeset parameter.

  • gettext.install(domain, localedir=None, codeset=None, names=None)
  • This installs the function () in Python's builtins namespace, based on_domain, localedir, and codeset which are passed to the functiontranslation().

For the names parameter, please see the description of the translationobject's install() method.

As seen below, you usually mark the strings in your application that arecandidates for translation, by wrapping them in a call to the _()function, like this:

  1. print(_('This string will be translated.'))

For convenience, you want the _() function to be installed in Python'sbuiltins namespace, so it is easily accessible in all modules of yourapplication.

Deprecated since version 3.8, will be removed in version 3.10: The codeset parameter.

The NullTranslations class

Translation classes are what actually implement the translation of originalsource file message strings to translated message strings. The base class usedby all translation classes is NullTranslations; this provides the basicinterface you can use to write your own specialized translation classes. Hereare the methods of NullTranslations:

  • class gettext.NullTranslations(fp=None)
  • Takes an optional file objectfp, which is ignored by the base class.Initializes "protected" instance variables info_ and charset which are setby derived classes, as well as __fallback, which is set throughadd_fallback(). It then calls self.parse(fp) if _fp is notNone.

    • parse(_fp)
    • No-op in the base class, this method takes file object fp, and readsthe data from the file, initializing its message catalog. If you have anunsupported message catalog file format, you should override this methodto parse your format.

    • addfallback(_fallback)

    • Add fallback as the fallback object for the current translation object.A translation object should consult the fallback if it cannot provide atranslation for a given message.

    • gettext(message)

    • If a fallback has been set, forward gettext() to the fallback.Otherwise, return message. Overridden in derived classes.

    • ngettext(singular, plural, n)

    • If a fallback has been set, forward ngettext() to the fallback.Otherwise, return singular if n is 1; return plural otherwise.Overridden in derived classes.

    • pgettext(context, message)

    • If a fallback has been set, forward pgettext() to the fallback.Otherwise, return the translated message. Overridden in derived classes.

3.8 新版功能.

  • npgettext(context, singular, plural, n)
  • If a fallback has been set, forward npgettext() to the fallback.Otherwise, return the translated message. Overridden in derived classes.

3.8 新版功能.

  • lgettext(message)
  • lngettext(singular, plural, n)
  • Equivalent to gettext() and ngettext(), but the translationis returned as a byte string encoded in the preferred system encodingif no encoding was explicitly set with set_output_charset().Overridden in derived classes.

警告

These methods should be avoided in Python 3. See the warning for thelgettext() function.

Deprecated since version 3.8, will be removed in version 3.10.

  • info()
  • Return the "protected" _info variable, a dictionary containingthe metadata found in the message catalog file.

  • charset()

  • Return the encoding of the message catalog file.

  • output_charset()

  • Return the encoding used to return translated messages in lgettext()and lngettext().

Deprecated since version 3.8, will be removed in version 3.10.

  • setoutput_charset(_charset)
  • Change the encoding used to return translated messages.

Deprecated since version 3.8, will be removed in version 3.10.

  • install(names=None)
  • This method installs gettext() into the built-in namespace,binding it to _.

If the names parameter is given, it must be a sequence containing thenames of functions you want to install in the builtins namespace inaddition to _(). Supported names are 'gettext', 'ngettext','pgettext', 'npgettext', 'lgettext', and 'lngettext'.

Note that this is only one way, albeit the most convenient way, to makethe () function available to your application. Because it affectsthe entire application globally, and specifically the built-in namespace,localized modules should never install (). Instead, they should usethis code to make _() available to their module:

  1. import gettext
  2. t = gettext.translation('mymodule', ...)
  3. _ = t.gettext

This puts _() only in the module's global namespace and so onlyaffects calls within this module.

在 3.8 版更改: Added 'pgettext' and 'npgettext'.

The GNUTranslations class

The gettext module provides one additional class derived fromNullTranslations: GNUTranslations. This class overrides_parse() to enable reading GNU gettext format .mo filesin both big-endian and little-endian format.

GNUTranslations parses optional metadata out of the translationcatalog. It is convention with GNU gettext to include metadata asthe translation for the empty string. This metadata is in RFC 822-stylekey: value pairs, and should contain the Project-Id-Version key. If thekey Content-Type is found, then the charset property is used toinitialize the "protected" _charset instance variable, defaulting toNone if not found. If the charset encoding is specified, then all messageids and message strings read from the catalog are converted to Unicode usingthis encoding, else ASCII is assumed.

Since message ids are read as Unicode strings too, all *gettext() methodswill assume message ids as Unicode strings, not byte strings.

The entire set of key/value pairs are placed into a dictionary and set as the"protected" _info instance variable.

If the .mo file's magic number is invalid, the major version number isunexpected, or if other problems occur while reading the file, instantiating aGNUTranslations class can raise OSError.

  • class gettext.GNUTranslations
  • The following methods are overridden from the base class implementation:

    • gettext(message)
    • Look up the message id in the catalog and return the corresponding messagestring, as a Unicode string. If there is no entry in the catalog for themessage id, and a fallback has been set, the look up is forwarded to thefallback's gettext() method. Otherwise, themessage id is returned.

    • ngettext(singular, plural, n)

    • Do a plural-forms lookup of a message id. singular is used as the message idfor purposes of lookup in the catalog, while n is used to determine whichplural form to use. The returned message string is a Unicode string.

If the message id is not found in the catalog, and a fallback is specified,the request is forwarded to the fallback's ngettext()method. Otherwise, when n is 1 singular is returned, and plural isreturned in all other cases.

Here is an example:

  1. n = len(os.listdir('.'))
  2. cat = GNUTranslations(somefile)
  3. message = cat.ngettext(
  4. 'There is %(num)d file in this directory',
  5. 'There are %(num)d files in this directory',
  6. n) % {'num': n}
  • pgettext(context, message)
  • Look up the context and message id in the catalog and return thecorresponding message string, as a Unicode string. If there is noentry in the catalog for the message id and context, and a fallbackhas been set, the look up is forwarded to the fallback'spgettext() method. Otherwise, the message id is returned.

3.8 新版功能.

  • npgettext(context, singular, plural, n)
  • Do a plural-forms lookup of a message id. singular is used as themessage id for purposes of lookup in the catalog, while n is used todetermine which plural form to use.

If the message id for context is not found in the catalog, and afallback is specified, the request is forwarded to the fallback'snpgettext() method. Otherwise, when n is 1 singular isreturned, and plural is returned in all other cases.

3.8 新版功能.

  • lgettext(message)
  • lngettext(singular, plural, n)
  • Equivalent to gettext() and ngettext(), but the translationis returned as a byte string encoded in the preferred system encodingif no encoding was explicitly set withset_output_charset().

警告

These methods should be avoided in Python 3. See the warning for thelgettext() function.

Deprecated since version 3.8, will be removed in version 3.10.

Solaris message catalog support

The Solaris operating system defines its own binary .mo file format, butsince no documentation can be found on this format, it is not supported at thistime.

The Catalog constructor

GNOME uses a version of the gettext module by James Henstridge, but thisversion has a slightly different API. Its documented usage was:

  1. import gettext
  2. cat = gettext.Catalog(domain, localedir)
  3. _ = cat.gettext
  4. print(_('hello world'))

For compatibility with this older module, the function Catalog() is analias for the translation() function described above.

One difference between this module and Henstridge's: his catalog objectssupported access through a mapping API, but this appears to be unused and so isnot currently supported.

Internationalizing your programs and modules

Internationalization (I18N) refers to the operation by which a program is madeaware of multiple languages. Localization (L10N) refers to the adaptation ofyour program, once internationalized, to the local language and cultural habits.In order to provide multilingual messages for your Python programs, you need totake the following steps:

  • prepare your program or module by specially marking translatable strings

  • run a suite of tools over your marked files to generate raw messages catalogs

  • create language-specific translations of the message catalogs

  • use the gettext module so that message strings are properly translated

In order to prepare your code for I18N, you need to look at all the strings inyour files. Any string that needs to be translated should be marked by wrappingit in ('…') —- that is, a call to the function (). For example:

  1. filename = 'mylog.txt'
  2. message = _('writing a log message')
  3. with open(filename, 'w') as fp:
  4. fp.write(message)

In this example, the string 'writing a log message' is marked as a candidatefor translation, while the strings 'mylog.txt' and 'w' are not.

There are a few tools to extract the strings meant for translation.The original GNU gettext only supported C or C++ sourcecode but its extended version xgettext scans code writtenin a number of languages, including Python, to find strings marked astranslatable. Babel is a Pythoninternationalization library that includes a pybabel script toextract and compile message catalogs. François Pinard's programcalled xpot does a similar job and is available as part ofhis po-utils package.

(Python also includes pure-Python versions of these programs, calledpygettext.py and msgfmt.py; some Python distributionswill install them for you. pygettext.py is similar toxgettext, but only understands Python source code andcannot handle other programming languages such as C or C++.pygettext.py supports a command-line interface similar toxgettext; for details on its use, run pygettext.py—help. msgfmt.py is binary compatible with GNUmsgfmt. With these two programs, you may not need the GNUgettext package to internationalize your Pythonapplications.)

xgettext, pygettext, and similar tools generate.po files that are message catalogs. They are structuredhuman-readable files that contain every marked string in the sourcecode, along with a placeholder for the translated versions of thesestrings.

Copies of these .po files are then handed over to theindividual human translators who write translations for everysupported natural language. They send back the completedlanguage-specific versions as a <language-name>.po file that'scompiled into a machine-readable .mo binary catalog file usingthe msgfmt program. The .mo files are used by thegettext module for the actual translation processing atrun-time.

How you use the gettext module in your code depends on whether you areinternationalizing a single module or your entire application. The next twosections will discuss each case.

Localizing your module

If you are localizing your module, you must take care not to make globalchanges, e.g. to the built-in namespace. You should not use the GNU gettextAPI but instead the class-based API.

Let's say your module is called "spam" and the module's various natural languagetranslation .mo files reside in /usr/share/locale in GNUgettext format. Here's what you would put at the top of yourmodule:

  1. import gettext
  2. t = gettext.translation('spam', '/usr/share/locale')
  3. _ = t.gettext

Localizing your application

If you are localizing your application, you can install the () functionglobally into the built-in namespace, usually in the main driver file of yourapplication. This will let all your application-specific files just use('…') without having to explicitly install it in each file.

In the simple case then, you need only add the following bit of code to the maindriver file of your application:

  1. import gettext
  2. gettext.install('myapplication')

If you need to set the locale directory, you can pass it into theinstall() function:

  1. import gettext
  2. gettext.install('myapplication', '/usr/share/locale')

Changing languages on the fly

If your program needs to support many languages at the same time, you may wantto create multiple translation instances and then switch between themexplicitly, like so:

  1. import gettext
  2.  
  3. lang1 = gettext.translation('myapplication', languages=['en'])
  4. lang2 = gettext.translation('myapplication', languages=['fr'])
  5. lang3 = gettext.translation('myapplication', languages=['de'])
  6.  
  7. # start by using language1
  8. lang1.install()
  9.  
  10. # ... time goes by, user selects language 2
  11. lang2.install()
  12.  
  13. # ... more time goes by, user selects language 3
  14. lang3.install()

Deferred translations

In most coding situations, strings are translated where they are coded.Occasionally however, you need to mark strings for translation, but defer actualtranslation until later. A classic example is:

  1. animals = ['mollusk',
  2. 'albatross',
  3. 'rat',
  4. 'penguin',
  5. 'python', ]
  6. # ...
  7. for a in animals:
  8. print(a)

Here, you want to mark the strings in the animals list as beingtranslatable, but you don't actually want to translate them until they areprinted.

Here is one way you can handle this situation:

  1. def _(message): return message
  2.  
  3. animals = [_('mollusk'),
  4. _('albatross'),
  5. _('rat'),
  6. _('penguin'),
  7. _('python'), ]
  8.  
  9. del _
  10.  
  11. # ...
  12. for a in animals:
  13. print(_(a))

This works because the dummy definition of () simply returns the stringunchanged. And this dummy definition will temporarily override any definitionof () in the built-in namespace (until the del command). Takecare, though if you have a previous definition of _() in the localnamespace.

Note that the second use of _() will not identify "a" as beingtranslatable to the gettext program, because the parameteris not a string literal.

Another way to handle this is with the following example:

  1. def N_(message): return message
  2.  
  3. animals = [N_('mollusk'),
  4. N_('albatross'),
  5. N_('rat'),
  6. N_('penguin'),
  7. N_('python'), ]
  8.  
  9. # ...
  10. for a in animals:
  11. print(_(a))

In this case, you are marking translatable strings with the functionN(), which won't conflict with any definition of ().However, you will need to teach your message extraction program tolook for translatable strings marked with N(). xgettext,pygettext, pybabel extract, and xpot allsupport this through the use of the -k command-line switch.The choice of N() here is totally arbitrary; it could have justas easily been MarkThisStringForTranslation().

致谢

以下人员为创建此模块贡献了代码、反馈、设计建议、早期实现和宝贵的经验:

  • Peter Funk

  • James Henstridge

  • Juan David Ibáñez Palomar

  • Marc-André Lemburg

  • Martin von Löwis

  • François Pinard

  • Barry Warsaw

  • Gustavo Niemeyer

脚注

  • 1
  • The default locale directory is system dependent; for example, on RedHat Linuxit is /usr/share/locale, but on Solaris it is /usr/lib/locale.The gettext module does not try to support these system dependentdefaults; instead its default is sys.prefix/share/locale (seesys.prefix). For this reason, it is always best to callbindtextdomain() with an explicit absolute path at the start of yourapplication.

  • 2

  • See the footnote for bindtextdomain() above.