文本检索函数和操作符
文本检索操作符
@@
描述:tsvector类型的词汇与tsquery类型的词汇是否匹配
示例:
postgres=# SELECT to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') AS RESULT;
result
--------
t
(1 row)
@@@
描述:@@的同义词
示例:
postgres=# SELECT to_tsvector('fat cats ate rats') @@@ to_tsquery('cat & rat') AS RESULT;
result
--------
t
(1 row)
||
描述:连接两个tsvector类型的词汇
示例:
postgres=# SELECT 'a:1 b:2'::tsvector || 'c:1 d:2 b:3'::tsvector AS RESULT;
result
---------------------------
'a':1 'b':2,5 'c':3 'd':4
(1 row)
&&
描述:将两个tsquery类型的词汇进行“与”操作
示例:
postgres=# SELECT 'fat | rat'::tsquery && 'cat'::tsquery AS RESULT;
result
---------------------------
( 'fat' | 'rat' ) & 'cat'
(1 row)
||
描述:将两个tsquery类型的词汇进行“或”操作
示例:
postgres=# SELECT 'fat | rat'::tsquery || 'cat'::tsquery AS RESULT;
result
---------------------------
( 'fat' | 'rat' ) | 'cat'
(1 row)
!!
描述:tsquery类型词汇的非关系
示例:
postgres=# SELECT !! 'cat'::tsquery AS RESULT;
result
--------
!'cat'
(1 row)
@>
描述:一个tsquery类型的词汇是否包含另一个tsquery类型的词汇
示例:
postgres=# SELECT 'cat'::tsquery @> 'cat & rat'::tsquery AS RESULT;
result
--------
f
(1 row)
<@
描述:一个tsquery类型的词汇是否被包含另一个tsquery类型的词汇
示例:
postgres=# SELECT 'cat'::tsquery <@ 'cat & rat'::tsquery AS RESULT;
result
--------
t
(1 row)
除了上述的操作符,还为tsvector类型和tsquery类型的数据定义了普通的B-tree比较操作符(=,<等)。
文本检索函数
get_current_ts_config()
描述:获取文本检索的默认配置。
返回类型:regconfig
示例:
postgres=# SELECT get_current_ts_config();
get_current_ts_config
-----------------------
english
(1 row)
length(tsvector)
描述:tsvector类型词汇的单词数。
返回类型:integer
示例:
postgres=# SELECT length('fat:2,4 cat:3 rat:5A'::tsvector);
length
--------
3
(1 row)
numnode(tsquery)
描述:tsquery类型的单词加上操作符的数量。
返回类型:integer
示例:
postgres=# SELECT numnode('(fat & rat) | cat'::tsquery);
numnode
---------
5
(1 row)
plainto_tsquery([ config regconfig , ] query text)
描述:产生tsquery类型的词汇,并忽略标点
返回类型:tsquery
示例:
postgres=# SELECT plainto_tsquery('english', 'The Fat Rats');
plainto_tsquery
-----------------
'fat' & 'rat'
(1 row)
querytree(query tsquery)
描述:获取tsquery类型的词汇可加索引的部分。
返回类型:text
示例:
postgres=# SELECT querytree('foo & ! bar'::tsquery);
querytree
-----------
'foo'
(1 row)
setweight(tsvector, “char”)
描述:给tsvector类型的每个元素分配权值。
返回类型:tsvector
示例:
postgres=# SELECT setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A');
setweight
-------------------------------
'cat':3A 'fat':2A,4A 'rat':5A
(1 row)
strip(tsvector)
描述:删除tsvector类型单词中的position和权值。
返回类型:tsvector
示例:
postgres=# SELECT strip('fat:2,4 cat:3 rat:5A'::tsvector);
strip
-------------------
'cat' 'fat' 'rat'
(1 row)
to_tsquery([ config regconfig , ] query text)
描述:标准化单词,并转换为tsquery类型。
返回类型:tsquery
示例:
postgres=# SELECT to_tsquery('english', 'The & Fat & Rats');
to_tsquery
---------------
'fat' & 'rat'
(1 row)
to_tsvector([ config regconfig , ] document text)
描述:去除文件信息,并转换为tsvector类型。
返回类型:tsvector
示例:
postgres=# SELECT to_tsvector('english', 'The Fat Rats');
to_tsvector
-----------------
'fat':2 'rat':3
(1 row)
ts_headline([ config regconfig, ] document text, query tsquery [, options text ])
描述:高亮显示查询的匹配项。
返回类型:text
示例:
postgres=# SELECT ts_headline('x y z', 'z'::tsquery);
ts_headline
--------------
x y <b>z</b>
(1 row)
ts_rank([ weights float4[], ] vector tsvector, query tsquery [, normalization integer ])
描述:文档查询排名。
返回类型:float4
示例:
postgres=# SELECT ts_rank('hello world'::tsvector, 'world'::tsquery);
ts_rank
----------
.0607927
(1 row)
ts_rank_cd([ weights float4[], ] vector tsvector, query tsquery [, normalization integer ])
描述:排序文件查询使用覆盖密度。
返回类型:float4
示例:
postgres=# SELECT ts_rank_cd('hello world'::tsvector, 'world'::tsquery);
ts_rank_cd
------------
.1
(1 row)
ts_rewrite(query tsquery, target tsquery, substitute tsquery)
描述:替换目标tsquery类型的单词。
返回类型:tsquery
示例:
postgres=# SELECT ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery);
ts_rewrite
-------------------------
'b' & ( 'foo' | 'bar' )
(1 row)
ts_rewrite(query tsquery, select text)
描述:使用SELECT命令的结果替代目标中tsquery类型的单词。
返回类型:tsquery
示例:
postgres=# SELECT ts_rewrite('world'::tsquery, 'select ''world''::tsquery, ''hello''::tsquery');
ts_rewrite
------------
'hello'
(1 row)
文本检索调试函数
ts_debug([ config regconfig, ] document text, OUT alias text, OUT description text, OUT token text, OUT dictionaries regdictionary[], OUT dictionary regdictionary, OUT lexemes text[])
描述:测试一个配置。
返回类型:setof record
示例:
postgres=# SELECT ts_debug('english', 'The Brightest supernovaes');
ts_debug
-----------------------------------------------------------------------------------
(asciiword,"Word, all ASCII",The,{english_stem},english_stem,{})
(blank,"Space symbols"," ",{},,)
(asciiword,"Word, all ASCII",Brightest,{english_stem},english_stem,{brightest})
(blank,"Space symbols"," ",{},,)
(asciiword,"Word, all ASCII",supernovaes,{english_stem},english_stem,{supernova})
(5 rows)
ts_lexize(dict regdictionary, token text)
描述:测试一个数据字典。
返回类型:text[]
示例:
postgres=# SELECT ts_lexize('english_stem', 'stars');
ts_lexize
-----------
{star}
(1 row)
ts_parse(parser_name text, document text, OUT tokid integer, OUT token text)
描述:测试一个解析。
返回类型:setof record
示例:
postgres=# SELECT ts_parse('default', 'foo - bar');
ts_parse
-----------
(1,foo)
(12," ")
(12,"- ")
(1,bar)
(4 rows)
ts_parse(parser_oid oid, document text, OUT tokid integer, OUT token text)
描述:测试一个解析。
返回类型:setof record
示例:
postgres=# SELECT ts_parse(3722, 'foo - bar');
ts_parse
-----------
(1,foo)
(12," ")
(12,"- ")
(1,bar)
(4 rows)
ts_token_type(parser_name text, OUT tokid integer, OUT alias text, OUT description text)
描述:获取分析器定义的记号类型。
返回类型:setof record
示例:
postgres=# SELECT ts_token_type('default');
ts_token_type
--------------------------------------------------------------
(1,asciiword,"Word, all ASCII")
(2,word,"Word, all letters")
(3,numword,"Word, letters and digits")
(4,email,"Email address")
(5,url,URL)
(6,host,Host)
(7,sfloat,"Scientific notation")
(8,version,"Version number")
(9,hword_numpart,"Hyphenated word part, letters and digits")
(10,hword_part,"Hyphenated word part, all letters")
(11,hword_asciipart,"Hyphenated word part, all ASCII")
(12,blank,"Space symbols")
(13,tag,"XML tag")
(14,protocol,"Protocol head")
(15,numhword,"Hyphenated word, letters and digits")
(16,asciihword,"Hyphenated word, all ASCII")
(17,hword,"Hyphenated word, all letters")
(18,url_path,"URL path")
(19,file,"File or path name")
(20,float,"Decimal notation")
(21,int,"Signed integer")
(22,uint,"Unsigned integer")
(23,entity,"XML entity")
(23 rows)
ts_token_type(parser_oid oid, OUT tokid integer, OUT alias text, OUT description text)
描述:获取分析器定义的记号类型。
返回类型:setof record
示例:
postgres=# SELECT ts_token_type(3722);
ts_token_type
--------------------------------------------------------------
(1,asciiword,"Word, all ASCII")
(2,word,"Word, all letters")
(3,numword,"Word, letters and digits")
(4,email,"Email address")
(5,url,URL)
(6,host,Host)
(7,sfloat,"Scientific notation")
(8,version,"Version number")
(9,hword_numpart,"Hyphenated word part, letters and digits")
(10,hword_part,"Hyphenated word part, all letters")
(11,hword_asciipart,"Hyphenated word part, all ASCII")
(12,blank,"Space symbols")
(13,tag,"XML tag")
(14,protocol,"Protocol head")
(15,numhword,"Hyphenated word, letters and digits")
(16,asciihword,"Hyphenated word, all ASCII")
(17,hword,"Hyphenated word, all letters")
(18,url_path,"URL path")
(19,file,"File or path name")
(20,float,"Decimal notation")
(21,int,"Signed integer")
(22,uint,"Unsigned integer")
(23,entity,"XML entity")
(23 rows)
ts_stat(sqlquery text, [ weights text, ] OUT word text, OUT ndoc integer, OUT nentry integer)
描述:获取tsvector列的统计数据。
返回类型:setof record
示例:
postgres=# SELECT ts_stat('select ''hello world''::tsvector');
ts_stat
-------------
(world,1,1)
(hello,1,1)
(2 rows)