Specify a Language for Text Index
This tutorial describes how to specify the default languageassociated with the text indexand also how to create text indexes for collections that containdocuments in different languages.
Specify the Default Language for a text Index
The default language associated with the indexed data determines therules to parse word roots (i.e. stemming) and ignore stop words. Thedefault language for the indexed data is english
.
To specify a different language, use the default_language
optionwhen creating the text
index. See Text Search Languages forthe languages available for default_language
.
The following example creates for the quotes
collection a text
index on the content
field and sets the default_language
tospanish
:
- db.quotes.createIndex(
- { content : "text" },
- { default_language: "spanish" }
- )
Create a text Index for a Collection in Multiple Languages
Specify the Index Language within the Document
If a collection contains documents or embedded documents that are indifferent languages, include a field named language
in thedocuments or embedded documents and specify as its value the language forthat document or embedded document.
MongoDB will use the specified language for that document orembedded document when building the text
index:
- The specified language in the document overrides the default languagefor the
text
index. - The specified language in an embedded document override the languagespecified in an enclosing document or the default language for theindex.
See Text Search Languages for a list of supported languages.
For example, a collection quotes
contains multi-language documentsthat include the language
field in the document and/or theembedded document as needed:
- {
- _id: 1,
- language: "portuguese",
- original: "A sorte protege os audazes.",
- translation:
- [
- {
- language: "english",
- quote: "Fortune favors the bold."
- },
- {
- language: "spanish",
- quote: "La suerte protege a los audaces."
- }
- ]
- }
- {
- _id: 2,
- language: "spanish",
- original: "Nada hay más surrealista que la realidad.",
- translation:
- [
- {
- language: "english",
- quote: "There is nothing more surreal than reality."
- },
- {
- language: "french",
- quote: "Il n'y a rien de plus surréaliste que la réalité."
- }
- ]
- }
- {
- _id: 3,
- original: "is this a dagger which I see before me.",
- translation:
- {
- language: "spanish",
- quote: "Es este un puñal que veo delante de mí."
- }
- }
If you create a text
index on the quote
field with the defaultlanguage of English.
- db.quotes.createIndex( { original: "text", "translation.quote": "text" } )
Then, for the documents and embedded documents that contain the language
field, the text
index uses that language to parse word stems andother linguistic characteristics.
For embedded documents that do not contain the language
field,
- If the enclosing document contains the
language
field, then theindex uses the document’s language for the embedded document. - Otherwise, the index uses the default language for the embedded documents.
For documents that do not contain the language
field, the indexuses the default language, which is English.
Use any Field to Specify the Language for a Document
To use a field with a name other than language
, includethe language_override
option when creating the index.
For example, give the following command to use idioma
as the fieldname instead of language
:
- db.quotes.createIndex( { quote : "text" },
- { language_override: "idioma" } )
The documents of the quotes
collection may specify a language withthe idioma
field:
- { _id: 1, idioma: "portuguese", quote: "A sorte protege os audazes" }
- { _id: 2, idioma: "spanish", quote: "Nada hay más surrealista que la realidad." }
- { _id: 3, idioma: "english", quote: "is this a dagger which I see before me" }