Text Indexes

Text Indexes

Atlas Full-Text Search

MongoDB Atlas Full-Text Search Indexes leverage Apache Lucene topower rich text search with features like language analysis andscoring.

Visit Atlas Full-Text Searchto learn more. You can use the Atlas promotional codeMONGODB4DOT2 for $200 of Atlas credit. For information onredeeming Atlas credit, see Atlas Billing.

Overview

MongoDB provides text indexes to supporttext search queries on string content. text indexes can include anyfield whose value is a string or an array of string elements.

Versions

`text` Index Version	Description
Version 3	MongoDB introduces a version 3 of the `text` index. Version 3is the default version of `text` indexes created inMongoDB 3.2 and later.
Version 2	MongoDB 2.6 introduces a version 2 of the `text` index.Version 2 is the default version of `text` indexes createdin MongoDB 2.6 and 3.0 series.
Version 1	MongoDB 2.4 introduces a version 1 of the `text` index.MongoDB 2.4 can only support version `1`.

To override the default version and specify a different version,include the option { "textIndexVersion": <version> } whencreating the index.

Create Text Index

Important

A collection can have at most one text index.

To create a text index, use thedb.collection.createIndex() method. To index a field thatcontains a string or an array of string elements, include the field andspecify the string literal "text" in the index document, as in thefollowing example:

db.reviews.createIndex( { comments: "text" } )

You can index multiple fields for the text index. The followingexample creates a text index on the fields subject andcomments:

db.reviews.createIndex(
   {
     subject: "text",
     comments: "text"
   }
 )

A compound index can include textindex keys in combination with ascending/descending index keys. Formore information, see Compound Index.

In order to drop a text index, use the index name. SeeUse the Index Name to Drop a text Index for more information.

Specify Weights

For a text index, the weight of an indexed field denotes thesignificance of the field relative to the other indexed fields in termsof the text search score.

For each indexed field in the document, MongoDB multiplies the numberof matches by the weight and sums the results. Using this sum, MongoDBthen calculates the score for the document. See $metaoperator for details on returning and sorting by text scores.

The default weight is 1 for the indexed fields. To adjust the weightsfor the indexed fields, include the weights option in thedb.collection.createIndex() method.

For more information using weights to control the results of a textsearch, see Control Search Results with Weights.

Wildcard Text Indexes

Note

Wildcard Text Indexes are distinct from Wildcard Indexes.Wildcard indexes cannot support queries using the $textoperator.

While Wildcard Text Indexes and Wildcard Indexes share thewildcard $** field pattern, they are distinct index types. OnlyWildcard Text Indexes support the $text operator.

When creating a text index on multiple fields, you can also use thewildcard specifier ($**). With a wildcard text index, MongoDBindexes every field that contains string data for each document in thecollection. The following example creates a text index using thewildcard specifier:

db.collection.createIndex( { "$**": "text" } )

This index allows for text search on all fields with string content.Such an index can be useful with highly unstructured data if it isunclear which fields to include in the text index or for ad-hocquerying.

Wildcard text indexes are text indexes on multiple fields. As such,you can assign weights to specific fields during index creation tocontrol the ranking of the results. For more information using weightsto control the results of a text search, seeControl Search Results with Weights.

Wildcard text indexes, as with all text indexes, can be part of acompound indexes. For example, the following creates a compound indexon the field a as well as the wildcard specifier:

db.collection.createIndex( { a: 1, "$**": "text" } )

As with all compound text indexes, sincethe a precedes the text index key, in order to perform a$text search with this index, the query predicate must includean equality match conditions a. For information on compound textindexes, see Compound Text Indexes.

Case Insensitivity

Changed in version 3.2.

The version 3 text index supports the common C, simple S,and for Turkish languages, the special T case foldings as specifiedin Unicode 8.0 Character Database Case Folding.

The case foldings expands the case insensitivity of the textindex to include characters with diacritics, such as é andÉ, and characters from non-Latin alphabets, such as “И” and “и”in the Cyrillic alphabet.

Version 3 of the text index is also diacritic insensitive. As such, the index also does notdistinguish between é, É, e, and E.

Previous versions of the text index are case insensitive for[A-z] only; i.e. case insensitive for non-diacritics Latincharacters only . For all other characters, earlier versions of thetext index treat them as distinct.

Diacritic Insensitivity

Changed in version 3.2.

With version 3, text index is diacritic insensitive. That is, theindex does not distinguish between characters that contain diacriticalmarks and their non-marked counterpart, such as é, ê, ande. More specifically, the text index strips the characterscategorized as diacritics in Unicode 8.0 Character Database Prop List.

Version 3 of the text index is also case insensitive to characters with diacritics. Assuch, the index also does not distinguish between é, É, e,and E.

Previous versions of the text index treat characters withdiacritics as distinct.

Tokenization Delimiters

Changed in version 3.2.

For tokenization, version 3 text index uses the delimiterscategorized under Dash, Hyphen, Pattern_Syntax,Quotation_Mark, Terminal_Punctuation, and White_Space inUnicode 8.0 Character Database Prop List.

For example, if given a string "Il a dit qu'il «était le meilleurjoueur du monde»", the text index treats «, », and spacesas delimiters.

Previous versions of the index treat « as part of the term"«était" and » as part of the term "monde»".

Index Entries

text index tokenizes and stems the terms in the indexed fields forthe index entries. text index stores one index entry for eachunique stemmed term in each indexed field for each document in thecollection. The index uses simple language-specific suffix stemming.

Supported Languages and Stop Words

MongoDB supports text search for various languages. text indexesdrop language-specific stop words (e.g. in English, the, an,a, and, etc.) and use simple language-specific suffix stemming.For a list of the supported languages, see Text Search Languages.

If you specify a language value of "none", then the text indexuses simple tokenization with no list of stop words and no stemming.

To specify a language for the text index, seeSpecify a Language for Text Index.

sparse Property

text indexes are always sparse and ignore thesparse option. If a document lacks a text index field (orthe field is null or an empty array), MongoDB does not add an entryfor the document to the text index. For inserts, MongoDB insertsthe document but does not add to the text index.

For a compound index that includes a text index key along with keysof other types, only the text index field determines whether theindex references a document. The other keys do not determine whetherthe index references the documents or not.

Restrictions

One Text Index Per Collection

A collection can have at most one text index.

Text Search and Hints

You cannot use hint() if the query includesa $text query expression.

Text Index and Sort

Sort operations cannot obtain sort order from a text index, evenfrom a compound text index; i.e. sortoperations cannot use the ordering in the text index.

Compound Index

A compound index can include a textindex key in combination with ascending/descending index keys. However,these compound indexes have the following restrictions:

A compound text index cannot include any other special indextypes, such as multi-key orgeospatial index fields.
If the compound text index includes keys preceding thetext index key, to perform a $text search, the querypredicate must include equality match conditions on the precedingkeys.
When creating a compound text index, all text index keys mustbe listed adjacently in the index specification document.

See also Text Index and Sort for additional limitations.

For an example of a compound text index, seeLimit the Number of Entries Scanned.

Drop a Text Index

To drop a text index, pass the name of the index to thedb.collection.dropIndex() method. To get the name of theindex, run the db.collection.getIndexes() method.

For information on the default naming scheme for text indexes aswell as overriding the default name, seeSpecify Name for text Index.

Collation Option

text indexes only support simple binary comparison and do notsupport collation.

To create a text index on a a collection that has a non-simplecollation, you must explicitly specify {collation: {locale: "simple"}} when creating the index.

Storage Requirements and Performance Costs

text indexes have the following storage requirements andperformance costs:

text indexes can be large. They contain one index entry for eachunique post-stemmed word in each indexed field for each documentinserted.
Building a text index is very similar to building a largemulti-key index and will take longer than building a simple ordered(scalar) index on the same data.
When building a large text index on an existing collection,ensure that you have a sufficiently high limit on open filedescriptors. See the recommended settings.
text indexes will impact insertion throughput because MongoDBmust add an index entry for each unique post-stemmed word in eachindexed field of each new source document.
Additionally, text indexes do not store phrases or informationabout the proximity of words in the documents. As a result, phrasequeries will run much more effectively when the entire collectionfits in RAM.

Text Search Support

The text index supports $text query operations. Forexamples of text search, see the $text reference page.For examples of $text operations in aggregation pipelines, seeText Search in the Aggregation Pipeline.