3.5 Character Data Types
For working with character data, Firebird has the fixed-length CHAR
and the variable-length VARCHAR
data types. The maximum size of text data stored in these data types is 32,767 bytes for CHAR
and 32,765 bytes for VARCHAR
. The maximum number of characters that will fit within these limits depends on the CHARACTER SET
being used for the data under consideration. The collation sequence does not affect this maximum, although it may affect the maximum size of any index that involves the column.
If no character set is explicitly specified when defining a character object, the default character set specified when the database was created will be used. If the database does not have a default character set defined, the field gets the character set NONE
.
3.5.1 Unicode
Most current development tools support Unicode, implemented in Firebird with the character sets UTF8
and UNICODE_FSS
. UTF8
comes with collations for many languages. UNICODE_FSS
is more limited and is used mainly by Firebird internally for storing metadata. Keep in mind that one UTF8
character occupies up to 4 bytes, thus limiting the size of CHAR
fields to 8,191 characters (32,767/4).
Note
The actual bytes per character value depends on the range the character belongs to. Non-accented Latin letters occupy 1 byte, Cyrillic letters from the WIN1251
encoding occupy 2 bytes in UTF8
, characters from other encodings may occupy up to 4 bytes.
The UTF8
character set implemented in Firebird supports the latest version of the Unicode standard, thus recommending its use for international databases.
3.5.2 Client Character Set
While working with strings, it is essential to keep the character set of the client connection in mind. If there is a mismatch between the character sets of the stored data and that of the client connection, the output results for string columns are automatically re-encoded, both when data are sent from the client to the server and when they are sent back from the server to the client. For example, if the database was created in the WIN1251
encoding but KOI8R
or UTF8
is specified in the client’s connection parameters, the mismatch will be transparent.
3.5.3 Special Character Sets
Character set NONEThe character set NONE
is a special character set in Firebird. It can be characterized such that each byte is a part of a string, but the string is stored in the system without any clues about what constitutes any character: character encoding, collation, case, etc. are simply unknown. It is the responsibility of the client application to deal with the data and provide the means to interpret the string of bytes in some way that is meaningful to the application and the human user.
Character set OCTETSData in OCTETS
encoding are treated as bytes that may not actually be interpreted as characters. OCTETS
provides a way to store binary data, which could be the results of some Firebird functions. The database engine has no concept of what it is meant to do with a string of bits in OCTETS
, other than just store it and retrieve it. Again, the client side is responsible for validating the data, presenting them in formats that are meaningful to the application and its users and handling any exceptions arising from decoding and encoding them. Since Firebird 4.0 CHAR
and VARCHAR
with character set OCTETS
have synonyms BINARY
and VARBINARY
.
3.5.4 Collation Sequence
Each character set has a default collation sequence (COLLATE
) that specifies the collation order. Usually, it provides nothing more than ordering based on the numeric code of the characters and a basic mapping of upper- and lower-case characters. If some behaviour is needed for strings that is not provided by the default collation sequence and a suitable alternative collation is supported for that character set, a COLLATE `collation`
clause can be specified in the column definition.
A COLLATE *collation*
clause can be applied in other contexts besides the column definition. For greater-than/less-than comparison operations, it can be added in the WHERE
clause of a SELECT
statement. If output needs to be sorted in a special alphabetic sequence, or case-insensitively, and the appropriate collation exists, then a COLLATE
clause can be included with the ORDER BY
clause when rows are being sorted on a character field and with the GROUP BY
clause in case of grouping operations.
3.5.4.1 Case-Insensitive Searching
For a case-insensitive search, the UPPER
function could be used to convert both the search argument and the searched strings to upper-case before attempting a match:
…
where upper(name) = upper(:flt_name)
For strings in a character set that has a case-insensitive collation available, you can simply apply the collation, to compare the search argument and the searched strings directly. For example, using the WIN1251
character set, the collation PXW_CYRL
is case-insensitive for this purpose:
…
WHERE FIRST_NAME COLLATE PXW_CYRL >= :FLT_NAME
…
ORDER BY NAME COLLATE PXW_CYRL
See alsoCONTAINING
3.5.4.2 UTF8
Collation Sequences
The following table shows the possible collation sequences for the UTF8
character set.
Table 3.5.4.2.1 Collation Sequences for Character Set UTF8
Collation | Characteristics |
---|---|
| Collation works according to the position of the character in the table (binary). Added in Firebird 2.0 |
| Collation works according to the UCA algorithm (Unicode Collation Algorithm) (alphabetical). Added in Firebird 2.0 |
| The default, binary collation, identical to |
| Case-insensitive collation, works without taking character case into account. Added in Firebird 2.1 |
| Case-insensitive, accent-insensitive collation, works alphabetically without taking character case or accents into account. Added in Firebird 2.5 |
ExampleAn example of collation for the UTF8 character set without taking into account the case or accentuation of characters (similar to COLLATE PXW_CYRL
).
...
ORDER BY NAME COLLATE UNICODE_CI_AI
3.5.5 Character Indexes
In Firebird earlier than version 2.0, a problem can occur with building an index for character columns that use a non-standard collation sequence: the length of an indexed field is limited to 252 bytes with no COLLATE
specified or 84 bytes if COLLATE
is specified. Multi-byte character sets and compound indexes limit the size even further.
Starting from Firebird 2.0, the maximum length for an index equals one quarter of the page size, i.e. from 1,024 — for page size 4,096 — to 8,192 bytes — for page size 32,768. The maximum length of an indexed string is 9 bytes less than that quarter-page limit.
Calculating Maximum Length of an Indexed String Field
The following formula calculates the maximum length of an indexed string (in characters):
max_char_length = FLOOR((page_size / 4 - 9) / N)
where N is the number of bytes per character in the character set.
The table below shows the maximum length of an indexed string (in characters), according to page size and character set, calculated using this formula.
Table 3.5.5.1 Maximum Index Lengths by Page Size and Character Size
Page Size | Bytes per character | ||||
1 | 2 | 3 | 4 | 6 | |
4,096 | 1,015 | 507 | 338 | 253 | 169 |
8,192 | 2,039 | 1,019 | 679 | 509 | 339 |
16,384 | 4,087 | 2,043 | 1,362 | 1,021 | 681 |
32,768 | 8,183 | 4,091 | 2,727 | 2,045 | 1,363 |
Note
With case-insensitive collations (_CI), one character in the index will occupy not 4, but 6 (six) bytes, so the maximum key length for a page of — for example — 4,096 bytes, will be 169 characters.
See alsoCREATE DATABASE, Collation sequence, SELECT, WHERE, GROUP BY, ORDER BY
3.5.6 Character Types in Detail
3.5.6.1 BINARY
Data Type Declaration Format
BINARY [(length)]
Table 3.5.6.1.1 BINARY
Type Parameters
Parameter | Description |
---|---|
length | Length in bytes between 1 and 32,767; defaults to |
BINARY
is a fixed-length data type, and is an SQL standard-compliant alias for CHAR(*length*) CHARACTER SET OCTETS
. Values shorter than the declared length are padded with NUL up to the declared length.
Note
Some tools may report the type as CHAR CHARACTER SET OCTETS
instead of BINARY
.
See alsoSection 3.5.6.2, CHAR, Section 3.5.6.3, VARBINARY
3.5.6.2 CHAR
Data Type Declaration Format
{CHAR | CHARACTER} [(length)]
[CHARACTER SET <set>] [COLLATE <name>]
Table 3.5.6.2.1 CHAR
Type Parameters
Parameter | Description |
---|---|
length | Length in characters, defaults to |
set | Character set name |
name | Collation name |
CHAR
is a fixed-length data type. Values shorter than the declared length are padded with spaces up to the declared length. Generally, the pad character does not have to be a space: it depends on the character set. For example, the pad character for the OCTETS
character set is NUL.
The full name of this data type is CHARACTER
, but there is no requirement to use full names and people rarely do so.
Fixed-length character data can be used to store codes whose length is standard and has a definite width in directories. An example of such a code is an EAN13 barcode — 13 characters, all filled.
Note
Since Firebird 4.0,
CHAR CHARACTER SET OCTETS
has the aliasBINARY
Formally, the
COLLATE
clause is not part of the data type declaration, and its position depends on the syntax of the statement.
See alsoSection 3.5.6.1, BINARY, Section 3.5.6.4, VARCHAR
3.5.6.3 VARBINARY
Data Type Declaration Format
{VARBINARY | BINARY VARYING} (length)
Table 3.5.6.3.1 VARBINARY
Type Parameters
Parameter | Description |
---|---|
length | Length in bytes between 1 and 32,765 |
VARBINARY
is variable length binary type, and is an SQL standard-compliant alias for VARCHAR(*length*) CHARACTER SET OCTETS
.
Note
Some tools may report the type as VARCHAR CHARACTER SET OCTETS
instead of VARBINARY
.
See alsoSection 3.5.6.4, VARCHAR, Section 3.5.6.1, BINARY
3.5.6.4 VARCHAR
Data Type Declaration Format
{VARCHAR | {CHAR | CHARACTER} VARYING} (length)
[CHARACTER SET <set>] [COLLATE <name>]
Table 3.5.6.4.1 VARCHAR
Type Parameters
Parameter | Description |
---|---|
length | Length in characters. A valid length is from 1 to the maximum number of characters that can be accommodated within 32,765 bytes. |
set | Character set name |
name | Collation name |
VARCHAR
is the basic string type for storing texts of variable length, up to a maximum of 32,765 bytes. The stored structure is equal to the actual size of the data plus 2 bytes where the length of the data is recorded.
All characters that are sent from the client application to the database are considered meaningful, including the leading and trailing spaces.
The full name of this type is CHARACTER VARYING
. Another variant of the name is written as CHAR VARYING
.
Note
Since Firebird 4.0,
VARCHAR CHARACTER SET OCTETS
has the aliasVARBINARY
Formally, the
COLLATE
clause is not part of the data type declaration, and its position depends on the syntax of the statement.
See alsoSection 3.5.6.3, VARBINARY, Section 3.5.6.2, CHAR
3.5.6.5 NCHAR
Data Type Declaration Format
{NCHAR | NATIONAL {CHAR | CHARACTER}} [(length)]
NCHAR
is a fixed-length character data type with the ISO8859_1
character set predefined. In all other respects it is the same as CHAR
.
Note
If no length is specified, it is taken to be 1.
A similar data type is available for the variable-length string type: NATIONAL {CHAR | CHARACTER} VARYING
.