- Neural sparse search
- Using neural sparse search
- Step 1: Create an ingest pipeline
- Step 2: Create an index for ingestion
- Step 3: Ingest documents into the index
- Step 4: Search the index using neural sparse search
- Step 5: Create and enable the two-phase processor (Optional)
- Setting a default model on an index or field
- Next steps
- FAQ
- Next steps
Neural sparse search
Introduced 2.11
Semantic search relies on dense retrieval that is based on text embedding models. However, dense methods use k-NN search, which consumes a large amount of memory and CPU resources. An alternative to semantic search, neural sparse search is implemented using an inverted index and is thus as efficient as BM25. Neural sparse search is facilitated by sparse embedding models. When you perform a neural sparse search, it creates a sparse vector (a list of token: weight
key-value pairs representing an entry and its weight) and ingests data into a rank features index.
When selecting a model, choose one of the following options:
- Use a sparse encoding model at both ingestion time and search time for better search relevance at the expense of relatively high latency.
- Use a sparse encoding model at ingestion time and a tokenizer at search time for lower search latency at the expense of relatively lower search relevance. Tokenization doesn’t involve model inference, so you can deploy and invoke a tokenizer using the ML Commons Model API for a more streamlined experience.
PREREQUISITE
Before using neural sparse search, make sure to set up a pretrained sparse embedding model or your own sparse embedding model. For more information, see Choosing a model.
Using neural sparse search
To use neural sparse search, follow these steps:
- Create an ingest pipeline.
- Create an index for ingestion.
- Ingest documents into the index.
- Search the index using neural search.
- Optional Create and enable the two-phase processor.
Step 1: Create an ingest pipeline
To generate vector embeddings, you need to create an ingest pipeline that contains a sparse_encoding processor, which will convert the text in a document field to vector embeddings. The processor’s field_map
determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings.
The following example request creates an ingest pipeline where the text from passage_text
will be converted into text embeddings and the embeddings will be stored in passage_embedding
:
PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse
{
"description": "An sparse encoding ingest pipeline",
"processors": [
{
"sparse_encoding": {
"model_id": "aP2Q8ooBpBj3wT4HVS8a",
"field_map": {
"passage_text": "passage_embedding"
}
}
}
]
}
copy
To split long text into passages, use the text_chunking
ingest processor before the sparse_encoding
processor. For more information, see Text chunking.
Step 2: Create an index for ingestion
In order to use the text embedding processor defined in your pipeline, create a rank features index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the field_map
are mapped as correct types. Continuing with the example, the passage_embedding
field must be mapped as rank_features. Similarly, the passage_text
field should be mapped as text
.
The following example request creates a rank features index that is set up with a default ingest pipeline:
PUT /my-nlp-index
{
"settings": {
"default_pipeline": "nlp-ingest-pipeline-sparse"
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "rank_features"
},
"passage_text": {
"type": "text"
}
}
}
}
copy
To save disk space, you can exclude the embedding vector from the source as follows:
PUT /my-nlp-index
{
"settings": {
"default_pipeline": "nlp-ingest-pipeline-sparse"
},
"mappings": {
"_source": {
"excludes": [
"passage_embedding"
]
},
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "rank_features"
},
"passage_text": {
"type": "text"
}
}
}
}
copy
Once the <token, weight>
pairs are excluded from the source, they cannot be recovered. Before applying this optimization, make sure you don’t need the <token, weight>
pairs for your application.
Step 3: Ingest documents into the index
To ingest documents into the index created in the previous step, send the following requests:
PUT /my-nlp-index/_doc/1
{
"passage_text": "Hello world",
"id": "s1"
}
copy
PUT /my-nlp-index/_doc/2
{
"passage_text": "Hi planet",
"id": "s2"
}
copy
Before the document is ingested into the index, the ingest pipeline runs the sparse_encoding
processor on the document, generating vector embeddings for the passage_text
field. The indexed document includes the passage_text
field, which contains the original text, and the passage_embedding
field, which contains the vector embeddings.
Step 4: Search the index using neural sparse search
To perform a neural sparse search on your index, use the neural_sparse
query clause in Query DSL queries.
The following example request uses a neural_sparse
query to search for relevant documents using a raw text query:
GET my-nlp-index/_search
{
"query": {
"neural_sparse": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": "aP2Q8ooBpBj3wT4HVS8a"
}
}
}
}
copy
The response contains the matching documents:
{
"took" : 688,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 30.0029,
"hits" : [
{
"_index" : "my-nlp-index",
"_id" : "1",
"_score" : 30.0029,
"_source" : {
"passage_text" : "Hello world",
"passage_embedding" : {
"!" : 0.8708904,
"door" : 0.8587369,
"hi" : 2.3929274,
"worlds" : 2.7839446,
"yes" : 0.75845814,
"##world" : 2.5432441,
"born" : 0.2682308,
"nothing" : 0.8625516,
"goodbye" : 0.17146169,
"greeting" : 0.96817183,
"birth" : 1.2788506,
"come" : 0.1623208,
"global" : 0.4371151,
"it" : 0.42951578,
"life" : 1.5750692,
"thanks" : 0.26481047,
"world" : 4.7300377,
"tiny" : 0.5462298,
"earth" : 2.6555297,
"universe" : 2.0308156,
"worldwide" : 1.3903781,
"hello" : 6.696973,
"so" : 0.20279501,
"?" : 0.67785245
},
"id" : "s1"
}
},
{
"_index" : "my-nlp-index",
"_id" : "2",
"_score" : 16.480486,
"_source" : {
"passage_text" : "Hi planet",
"passage_embedding" : {
"hi" : 4.338913,
"planets" : 2.7755864,
"planet" : 5.0969057,
"mars" : 1.7405145,
"earth" : 2.6087382,
"hello" : 3.3210192
},
"id" : "s2"
}
}
]
}
}
You can also use the neural_sparse
query with sparse vector embeddings:
GET my-nlp-index/_search
{
"query": {
"neural_sparse": {
"passage_embedding": {
"query_tokens": {
"hi" : 4.338913,
"planets" : 2.7755864,
"planet" : 5.0969057,
"mars" : 1.7405145,
"earth" : 2.6087382,
"hello" : 3.3210192
}
}
}
}
}
Step 5: Create and enable the two-phase processor (Optional)
The neural_sparse_two_phase_processor
is a new feature introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries.
To quickly launch a search pipeline with neural sparse search, use the following example pipeline:
PUT /_search/pipeline/two_phase_search_pipeline
{
"request_processors": [
{
"neural_sparse_two_phase_processor": {
"tag": "neural-sparse",
"description": "This processor is making two-phase processor."
}
}
]
}
copy
Then choose the index you want to configure with the search pipeline and set the index.search.default_pipeline
to the pipeline name, as shown in the following example:
PUT /index-name/_settings
{
"index.search.default_pipeline" : "two_phase_search_pipeline"
}
copy
Setting a default model on an index or field
A neural_sparse query requires a model ID for generating sparse embeddings. To eliminate passing the model ID with each neural_sparse query request, you can set a default model on index-level or field-level.
First, create a search pipeline with a neural_query_enricher request processor. To set a default model for an index, provide the model ID in the default_model_id
parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the neural_field_default_id
map. If you provide both default_model_id
and neural_field_default_id
, neural_field_default_id
takes precedence:
PUT /_search/pipeline/default_model_pipeline
{
"request_processors": [
{
"neural_query_enricher" : {
"default_model_id": "bQ1J8ooBpBj3wT4HVUsb",
"neural_field_default_id": {
"my_field_1": "uZj0qYoBMtvQlfhaYeud",
"my_field_2": "upj0qYoBMtvQlfhaZOuM"
}
}
}
]
}
copy
Then set the default model for your index:
PUT /my-nlp-index/_settings
{
"index.search.default_pipeline" : "default_model_pipeline"
}
copy
You can now omit the model ID when searching:
GET /my-nlp-index/_search
{
"query": {
"neural_sparse": {
"passage_embedding": {
"query_text": "Hi world"
}
}
}
}
copy
The response contains both documents:
{
"took" : 688,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 30.0029,
"hits" : [
{
"_index" : "my-nlp-index",
"_id" : "1",
"_score" : 30.0029,
"_source" : {
"passage_text" : "Hello world",
"passage_embedding" : {
"!" : 0.8708904,
"door" : 0.8587369,
"hi" : 2.3929274,
"worlds" : 2.7839446,
"yes" : 0.75845814,
"##world" : 2.5432441,
"born" : 0.2682308,
"nothing" : 0.8625516,
"goodbye" : 0.17146169,
"greeting" : 0.96817183,
"birth" : 1.2788506,
"come" : 0.1623208,
"global" : 0.4371151,
"it" : 0.42951578,
"life" : 1.5750692,
"thanks" : 0.26481047,
"world" : 4.7300377,
"tiny" : 0.5462298,
"earth" : 2.6555297,
"universe" : 2.0308156,
"worldwide" : 1.3903781,
"hello" : 6.696973,
"so" : 0.20279501,
"?" : 0.67785245
},
"id" : "s1"
}
},
{
"_index" : "my-nlp-index",
"_id" : "2",
"_score" : 16.480486,
"_source" : {
"passage_text" : "Hi planet",
"passage_embedding" : {
"hi" : 4.338913,
"planets" : 2.7755864,
"planet" : 5.0969057,
"mars" : 1.7405145,
"earth" : 2.6087382,
"hello" : 3.3210192
},
"id" : "s2"
}
}
]
}
}
Next steps
- To learn more about splitting long text into passages for neural search, see Text chunking.
FAQ
Refer to the following frequently asked questions for more information about neural sparse search.
How do I mitigate remote connector throttling exceptions?
When using connectors to call a remote service like SageMaker, ingestion and search calls sometimes fail due to remote connector throttling exceptions.
To mitigate throttling exceptions, modify the connector’s client_config parameter to decrease the number of maximum connections, using the max_connection
setting to prevent the maximum number of concurrent connections from exceeding the threshold of the remote service. You can also modify the retry settings to flatten the request spike during ingestion.
For versions earlier than OpenSearch 2.15, the SageMaker throttling exception will be thrown as the following “error”:
{
"type": "status_exception",
"reason": "Error from remote service: {\"message\":null}"
}
Next steps
- To learn more about splitting long text into passages for neural search, see Text chunking.