citext Data Type
The citext module provides a case-insensitive character string type, citext
. Essentially, it internally calls the lower
function when comparing values. Otherwise, it behaves almost exactly like the text
data type.
The standard method to perform case-insensitive matches on text values is to use the lower
function when comparing values, for example
SELECT * FROM tab WHERE lower(col) = LOWER(?);
This method works well, but has drawbacks:
- It makes your SQL statements verbose, and you must remember to use
lower
on both the column and the query value. - It does not work with an index, unless you create a functional index using
lower
.
The citext
data type allows you to eliminate calls to lower
in SQL queries and you can create case-insensitive indexes on columns of type citext
. citext
is locale-aware, like the text
type, which means comparing uppercase and lowercase characters depends on the rules of the LC_CTYPE locale setting. This behavior is the same as using lower
in queries, but it is done transparently by the data type, so you do not have to do anything special in your queries.
Parent topic: Additional Supplied Modules
Installing citext
Before you can use the citext
data type, run the installation script $GPHOME/share/postgresql/contrib/citext.sql
in each database where you want to use the type:
$ psql -d testdb -f $GPHOME/share/postgresql/contrib/citext.sql
Using the citext Type
Here is a simple example defining a citext
table column:
CREATE TABLE users (
id bigint PRIMARY KEY,
nick CITEXT NOT NULL,
pass TEXT NOT NULL
) DISTRIBUTED BY (id);
INSERT INTO users VALUES (1, 'larry', md5(random()::text) );
INSERT INTO users VALUES (2, 'Tom', md5(random()::text) );
INSERT INTO users VALUES (3, 'Damian', md5(random()::text) );
INSERT INTO users VALUES (4, 'NEAL', md5(random()::text) );
INSERT INTO users VALUES (5, 'Bjørn', md5(random()::text) );
SELECT * FROM users WHERE nick = 'Larry';
The SELECT
statement returns one tuple, even though the nick
column is set to larry
and the query specified Larry
.
String Comparison Behavior
citext
performs comparisons by converting each string to lower case (as though the lower
function were called) and then comparing the results normally. Two strings are considered equal if lower
would produce identical results for them.
In order to emulate a case-insensitive collation as closely as possible, there are citext
-specific versions of a number of string-processing operators and functions. So, for example, the regular expression operators ~
and ~*
exhibit the same behavior when applied to citext
: they both match case-insensitively. The same is true for !~
and !~*
, as well as for the LIKE
operators ~~
and ~~*
, and !~~
and !~~*
. If you want to match case-sensitively, you can cast the operator’s arguments to text
.
The following functions perform matching case-insensitively if their arguments are citext
:
regexp_match()
regexp_matches()
regexp_replace()
regexp_split_to_array()
regexp_split_to_table()
replace()
split_part()
strpos()
translate()
For the regexp functions, if you want to match case-sensitively, you can specify the “c” flag to force a case-sensitive match. If you want case-sensitive behavior, you must cast to text
before using one of these functions.
Limitations
A column of type
citext
cannot be part of a primary key or distribution key in aCREATE TABLE
statement.The
citext
type’s case-folding behavior depends on theLC_CTYPE
setting of your database. How it compares values is therefore determined when the database is created. It is not truly case-insensitive in the terms defined by the Unicode standard. Effectively, what this means is that, as long as you’re happy with your collation, you should be happy withcitext
‘s comparisons. But if you have data in different languages stored in your database, users of one language may find their query results are not as expected if the collation is for another language.citext
is not as efficient astext
because the operator functions and the B-tree comparison functions must make copies of the data and convert them to lower case for comparisons. It is, however, slightly more efficient than usinglower
to perform case-insensitive matching.citext
may not be the best option if you need data to compare case-sensitively in some contexts and case-insensitively in other contexts. The standard recommendation is to use thetext
type and manually apply thelower
function when you need to compare case-insensitively. This works if case-insensitive comparison is needed only infrequently. If you need case-insensitive behavior most of the time and case-sensitive infrequently, consider storing the data ascitext
and explicitly casting the column totext
when you want case-sensitive comparison. In either situation, you will need two indexes if you want both types of searches to be fast.The schema containing the
citext
operators must be in the currentsearch_path
(typicallypublic
); if it is not, the normal case-sensitivetext
operators will be invoked instead.