Unicode
Jinja2 is using Unicode internally which means that you have to pass Unicodeobjects to the render function or bytestrings that only consist of ASCIIcharacters. Additionally newlines are normalized to one end of linesequence which is per default UNIX style (\n
).
Python 2.x supports two ways of representing string objects. One is thestr type and the other is the unicode type, both of which extend a typecalled basestring. Unfortunately the default is str which should notbe used to store text based information unless only ASCII characters areused. With Python 2.6 it is possible to make unicode the default on a permodule level and with Python 3 it will be the default.
To explicitly use a Unicode string you have to prefix the string literalwith a u: u'Hänsel und Gretel sagen Hallo'
. That way Python willstore the string as Unicode by decoding the string with the characterencoding from the current Python module. If no encoding is specified thisdefaults to ‘ASCII’ which means that you can’t use any non ASCII identifier.
To set a better module encoding add the following comment to the first orsecond line of the Python module using the Unicode literal:
# -*- coding: utf-8 -*-
We recommend utf-8 as Encoding for Python modules and templates as it’spossible to represent every Unicode character in utf-8 and because it’sbackwards compatible to ASCII. For Jinja2 the default encoding of templatesis assumed to be utf-8.
It is not possible to use Jinja2 to process non-Unicode data. The reasonfor this is that Jinja2 uses Unicode already on the language level. Forexample Jinja2 treats the non-breaking space as valid whitespace insideexpressions which requires knowledge of the encoding or operating on anUnicode string.
For more details about Unicode in Python have a look at the excellentUnicode documentation.
Another important thing is how Jinja2 is handling string literals intemplates. A naive implementation would be using Unicode strings forall string literals but it turned out in the past that this is problematicas some libraries are typechecking against str explicitly. For exampledatetime.strftime does not accept Unicode arguments. To not break itcompletely Jinja2 is returning str for strings that fit into ASCII andfor everything else unicode:
>>> m = Template(u"{% set a, b = 'foo', 'föö' %}").module
>>> m.a
'foo'
>>> m.b
u'f\xf6\xf6'