Text Indexes
Atlas Full-Text Search
MongoDB Atlas Full-Text Search Indexes leverage Apache Lucene topower rich text search with features like language analysis andscoring.
Visit Atlas Full-Text Searchto learn more. You can use the Atlas promotional codeMONGODB4DOT2
for $200 of Atlas credit. For information onredeeming Atlas credit, see Atlas Billing.
Overview
MongoDB provides text indexes to supporttext search queries on string content. text
indexes can include anyfield whose value is a string or an array of string elements.
Versions
text Index Version | Description |
---|---|
Version 3 | MongoDB introduces a version 3 of the text index. Version 3is the default version of text indexes created inMongoDB 3.2 and later. |
Version 2 | MongoDB 2.6 introduces a version 2 of the text index.Version 2 is the default version of text indexes createdin MongoDB 2.6 and 3.0 series. |
Version 1 | MongoDB 2.4 introduces a version 1 of the text index.MongoDB 2.4 can only support version 1 . |
To override the default version and specify a different version,include the option { "textIndexVersion": <version> }
whencreating the index.
Create Text Index
Important
A collection can have at most one text
index.
To create a text
index, use thedb.collection.createIndex()
method. To index a field thatcontains a string or an array of string elements, include the field andspecify the string literal "text"
in the index document, as in thefollowing example:
- db.reviews.createIndex( { comments: "text" } )
You can index multiple fields for the text
index. The followingexample creates a text
index on the fields subject
andcomments
:
- db.reviews.createIndex(
- {
- subject: "text",
- comments: "text"
- }
- )
A compound index can include text
index keys in combination with ascending/descending index keys. Formore information, see Compound Index.
In order to drop a text
index, use the index name. SeeUse the Index Name to Drop a text Index for more information.
Specify Weights
For a text
index, the weight of an indexed field denotes thesignificance of the field relative to the other indexed fields in termsof the text search score.
For each indexed field in the document, MongoDB multiplies the numberof matches by the weight and sums the results. Using this sum, MongoDBthen calculates the score for the document. See $meta
operator for details on returning and sorting by text scores.
The default weight is 1 for the indexed fields. To adjust the weightsfor the indexed fields, include the weights
option in thedb.collection.createIndex()
method.
For more information using weights to control the results of a textsearch, see Control Search Results with Weights.
Wildcard Text Indexes
Note
Wildcard Text Indexes are distinct from Wildcard Indexes.Wildcard indexes cannot support queries using the $text
operator.
While Wildcard Text Indexes and Wildcard Indexes share thewildcard $**
field pattern, they are distinct index types. OnlyWildcard Text Indexes support the $text
operator.
When creating a text
index on multiple fields, you can also use thewildcard specifier ($**
). With a wildcard text index, MongoDBindexes every field that contains string data for each document in thecollection. The following example creates a text index using thewildcard specifier:
- db.collection.createIndex( { "$**": "text" } )
This index allows for text search on all fields with string content.Such an index can be useful with highly unstructured data if it isunclear which fields to include in the text index or for ad-hocquerying.
Wildcard text indexes are text
indexes on multiple fields. As such,you can assign weights to specific fields during index creation tocontrol the ranking of the results. For more information using weightsto control the results of a text search, seeControl Search Results with Weights.
Wildcard text indexes, as with all text indexes, can be part of acompound indexes. For example, the following creates a compound indexon the field a
as well as the wildcard specifier:
- db.collection.createIndex( { a: 1, "$**": "text" } )
As with all compound text indexes, sincethe a
precedes the text index key, in order to perform a$text
search with this index, the query predicate must includean equality match conditions a
. For information on compound textindexes, see Compound Text Indexes.
Case Insensitivity
Changed in version 3.2.
The version 3 text
index supports the common C
, simple S
,and for Turkish languages, the special T
case foldings as specifiedin Unicode 8.0 Character Database Case Folding.
The case foldings expands the case insensitivity of the text
index to include characters with diacritics, such as é
andÉ
, and characters from non-Latin alphabets, such as “И” and “и”in the Cyrillic alphabet.
Version 3 of the text
index is also diacritic insensitive. As such, the index also does notdistinguish between é
, É
, e
, and E
.
Previous versions of the text
index are case insensitive for[A-z]
only; i.e. case insensitive for non-diacritics Latincharacters only . For all other characters, earlier versions of thetext index treat them as distinct.
Diacritic Insensitivity
Changed in version 3.2.
With version 3, text
index is diacritic insensitive. That is, theindex does not distinguish between characters that contain diacriticalmarks and their non-marked counterpart, such as é
, ê
, ande
. More specifically, the text
index strips the characterscategorized as diacritics in Unicode 8.0 Character Database Prop List.
Version 3 of the text
index is also case insensitive to characters with diacritics. Assuch, the index also does not distinguish between é
, É
, e
,and E
.
Previous versions of the text
index treat characters withdiacritics as distinct.
Tokenization Delimiters
Changed in version 3.2.
For tokenization, version 3 text
index uses the delimiterscategorized under Dash
, Hyphen
, Pattern_Syntax
,Quotation_Mark
, Terminal_Punctuation
, and White_Space
inUnicode 8.0 Character Database Prop List.
For example, if given a string "Il a dit qu'il «était le meilleurjoueur du monde»"
, the text
index treats «
, »
, and spacesas delimiters.
Previous versions of the index treat «
as part of the term"«était"
and »
as part of the term "monde»"
.
Index Entries
text
index tokenizes and stems the terms in the indexed fields forthe index entries. text
index stores one index entry for eachunique stemmed term in each indexed field for each document in thecollection. The index uses simple language-specific suffix stemming.
Supported Languages and Stop Words
MongoDB supports text search for various languages. text
indexesdrop language-specific stop words (e.g. in English, the
, an
,a
, and
, etc.) and use simple language-specific suffix stemming.For a list of the supported languages, see Text Search Languages.
If you specify a language value of "none"
, then the text
indexuses simple tokenization with no list of stop words and no stemming.
To specify a language for the text
index, seeSpecify a Language for Text Index.
sparse Property
text
indexes are always sparse and ignore thesparse option. If a document lacks a text
index field (orthe field is null
or an empty array), MongoDB does not add an entryfor the document to the text
index. For inserts, MongoDB insertsthe document but does not add to the text
index.
For a compound index that includes a text
index key along with keysof other types, only the text
index field determines whether theindex references a document. The other keys do not determine whetherthe index references the documents or not.
Restrictions
One Text Index Per Collection
A collection can have at most one text
index.
Text Search and Hints
You cannot use hint()
if the query includesa $text
query expression.
Text Index and Sort
Sort operations cannot obtain sort order from a text
index, evenfrom a compound text index; i.e. sortoperations cannot use the ordering in the text index.
Compound Index
A compound index can include a text
index key in combination with ascending/descending index keys. However,these compound indexes have the following restrictions:
- A compound
text
index cannot include any other special indextypes, such as multi-key orgeospatial index fields. - If the compound
text
index includes keys preceding thetext
index key, to perform a$text
search, the querypredicate must include equality match conditions on the precedingkeys. - When creating a compound
text
index, alltext
index keys mustbe listed adjacently in the index specification document.
See also Text Index and Sort for additional limitations.
For an example of a compound text index, seeLimit the Number of Entries Scanned.
Drop a Text Index
To drop a text
index, pass the name of the index to thedb.collection.dropIndex()
method. To get the name of theindex, run the db.collection.getIndexes()
method.
For information on the default naming scheme for text
indexes aswell as overriding the default name, seeSpecify Name for text Index.
Collation Option
text
indexes only support simple binary comparison and do notsupport collation.
To create a text
index on a a collection that has a non-simplecollation, you must explicitly specify {collation: {locale: "simple"}}
when creating the index.
Storage Requirements and Performance Costs
text
indexes have the following storage requirements andperformance costs:
text
indexes can be large. They contain one index entry for eachunique post-stemmed word in each indexed field for each documentinserted.- Building a
text
index is very similar to building a largemulti-key index and will take longer than building a simple ordered(scalar) index on the same data. - When building a large
text
index on an existing collection,ensure that you have a sufficiently high limit on open filedescriptors. See the recommended settings. text
indexes will impact insertion throughput because MongoDBmust add an index entry for each unique post-stemmed word in eachindexed field of each new source document.- Additionally,
text
indexes do not store phrases or informationabout the proximity of words in the documents. As a result, phrasequeries will run much more effectively when the entire collectionfits in RAM.
Text Search Support
The text
index supports $text
query operations. Forexamples of text search, see the $text reference page
.For examples of $text
operations in aggregation pipelines, seeText Search in the Aggregation Pipeline.