Semantic search using byte-quantized vectors

This tutorial shows you how to build a semantic search using the Cohere Embed model and byte-quantized vectors. For more information about using byte-quantized vectors, see Byte vectors.

The Cohere Embed v3 model supports several embedding_types. For this tutorial, you’ll use the INT8 type to encode byte-quantized vectors.

The Cohere Embed v3 model supports several input types. This tutorial uses the following input types:

  • search_document: Use this input type when you have text (in the form of documents) that you want to store in a vector database.
  • search_query: Use this input type when structuring search queries to find the most relevant documents in your vector database.

For more information about input types, see the Cohere documentation.

In this tutorial, you will create two models:

  • A model used for ingestion, whose input_type is search_document
  • A model used for search, whose input_type is search_query

Replace the placeholders beginning with the prefix your_ with your own values.

Step 1: Create an embedding model for ingestion

Create a connector for the Cohere model, specifying the search_document input type:

  1. POST /_plugins/_ml/connectors/_create
  2. {
  3. "name": "Cohere embedding connector with int8 embedding type for ingestion",
  4. "description": "Test connector for Cohere embedding model",
  5. "version": 1,
  6. "protocol": "http",
  7. "credential": {
  8. "cohere_key": "your_cohere_api_key"
  9. },
  10. "parameters": {
  11. "model": "embed-english-v3.0",
  12. "embedding_types": ["int8"],
  13. "input_type": "search_document"
  14. },
  15. "actions": [
  16. {
  17. "action_type": "predict",
  18. "method": "POST",
  19. "headers": {
  20. "Authorization": "Bearer ${credential.cohere_key}",
  21. "Request-Source": "unspecified:opensearch"
  22. },
  23. "url": "https://api.cohere.ai/v1/embed",
  24. "request_body": "{ \"model\": \"${parameters.model}\", \"texts\": ${parameters.texts}, \"input_type\":\"${parameters.input_type}\", \"embedding_types\": ${parameters.embedding_types} }",
  25. "pre_process_function": "connector.pre_process.cohere.embedding",
  26. "post_process_function": "\n def name = \"sentence_embedding\";\n def data_type = \"FLOAT32\";\n def result;\n if (params.embeddings.int8 != null) {\n data_type = \"INT8\";\n result = params.embeddings.int8;\n } else if (params.embeddings.uint8 != null) {\n data_type = \"UINT8\";\n result = params.embeddings.uint8;\n } else if (params.embeddings.float != null) {\n data_type = \"FLOAT32\";\n result = params.embeddings.float;\n }\n \n if (result == null) {\n return \"Invalid embedding result\";\n }\n \n def embedding_list = new StringBuilder(\"[\");\n \n for (int m=0; m<result.length; m++) {\n def embedding_size = result[m].length;\n def embedding = new StringBuilder(\"[\");\n def shape = [embedding_size];\n for (int i=0; i<embedding_size; i++) {\n def val;\n if (\"FLOAT32\".equals(data_type)) {\n val = result[m][i].floatValue();\n } else if (\"INT8\".equals(data_type) || \"UINT8\".equals(data_type)) {\n val = result[m][i].intValue();\n }\n embedding.append(val);\n if (i < embedding_size - 1) {\n embedding.append(\",\"); \n }\n }\n embedding.append(\"]\"); \n \n // workaround for compatible with neural-search\n def dummy_data_type = 'FLOAT32';\n \n def json = '{' +\n '\"name\":\"' + name + '\",' +\n '\"data_type\":\"' + dummy_data_type + '\",' +\n '\"shape\":' + shape + ',' +\n '\"data\":' + embedding +\n '}';\n embedding_list.append(json);\n if (m < result.length - 1) {\n embedding_list.append(\",\"); \n }\n }\n embedding_list.append(\"]\"); \n return embedding_list.toString();\n "
  27. }
  28. ]
  29. }

copy

To ensure compatibility with the Neural Search plugin, the data_type (output in the inference_results.output.data_type field of the response) must be set to FLOAT32 in the post-processing function, even though the actual embedding type will be INT8.

Note the connector ID in the response; you’ll use it to register the model.

Register the model, providing its connector ID:

  1. POST /_plugins/_ml/models/_register?deploy=true
  2. {
  3. "name": "Cohere embedding model for INT8 with search_document input type",
  4. "function_name": "remote",
  5. "description": "test model",
  6. "connector_id": "your_connector_id"
  7. }

copy

Note the model ID in the response; you’ll use it in the following steps.

Test the model, providing the model ID:

  1. POST /_plugins/_ml/models/your_embedding_model_id/_predict
  2. {
  3. "parameters": {
  4. "texts": ["hello", "goodbye"]
  5. }
  6. }

copy

The response contains inference results:

  1. {
  2. "inference_results": [
  3. {
  4. "output": [
  5. {
  6. "name": "sentence_embedding",
  7. "data_type": "FLOAT32",
  8. "shape": [
  9. 1024
  10. ],
  11. "data": [
  12. 20,
  13. -11,
  14. -60,
  15. -91,
  16. ...
  17. ]
  18. },
  19. {
  20. "name": "sentence_embedding",
  21. "data_type": "FLOAT32",
  22. "shape": [
  23. 1024
  24. ],
  25. "data": [
  26. 58,
  27. -30,
  28. 9,
  29. -51,
  30. ...
  31. ]
  32. }
  33. ],
  34. "status_code": 200
  35. }
  36. ]
  37. }

Step 2: Ingest data

First, create an ingest pipeline:

  1. PUT /_ingest/pipeline/pipeline-cohere
  2. {
  3. "description": "Cohere embedding ingest pipeline",
  4. "processors": [
  5. {
  6. "text_embedding": {
  7. "model_id": "your_embedding_model_id_created_in_step1",
  8. "field_map": {
  9. "passage_text": "passage_embedding"
  10. }
  11. }
  12. }
  13. ]
  14. }

copy

Next, create a k-NN index and set the data_type for the passage_embedding field to byte so that it can hold byte-quantized vectors:

  1. PUT my_test_data
  2. {
  3. "settings": {
  4. "index": {
  5. "knn": true,
  6. "knn.algo_param.ef_search": 100,
  7. "default_pipeline": "pipeline-cohere"
  8. }
  9. },
  10. "mappings": {
  11. "properties": {
  12. "passage_text": {
  13. "type": "text"
  14. },
  15. "passage_embedding": {
  16. "type": "knn_vector",
  17. "dimension": 1024,
  18. "data_type": "byte",
  19. "method": {
  20. "name": "hnsw",
  21. "space_type": "l2",
  22. "engine": "lucene",
  23. "parameters": {
  24. "ef_construction": 128,
  25. "m": 24
  26. }
  27. }
  28. }
  29. }
  30. }
  31. }

copy

Last, ingest test data:

  1. POST _bulk
  2. { "index" : { "_index" : "my_test_data" } }
  3. { "passage_text" : "OpenSearch is the flexible, scalable, open-source way to build solutions for data-intensive applications. Explore, enrich, and visualize your data with built-in performance, developer-friendly tools, and powerful integrations for machine learning, data processing, and more." }
  4. { "index" : { "_index" : "my_test_data"} }
  5. { "passage_text" : "BM25 is a keyword-based algorithm that performs well on queries containing keywords but fails to capture the semantic meaning of the query terms. Semantic search, unlike keyword-based search, takes into account the meaning of the query in the search context. Thus, semantic search performs well when a query requires natural language understanding." }

copy

Create a connector to an embedding model with the search_query input type:

  1. POST /_plugins/_ml/connectors/_create
  2. {
  3. "name": "Cohere embedding connector with int8 embedding type for search",
  4. "description": "Test connector for Cohere embedding model. Use this connector for search.",
  5. "version": 1,
  6. "protocol": "http",
  7. "credential": {
  8. "cohere_key": "your_cohere_api_key"
  9. },
  10. "parameters": {
  11. "model": "embed-english-v3.0",
  12. "embedding_types": ["int8"],
  13. "input_type": "search_query"
  14. },
  15. "actions": [
  16. {
  17. "action_type": "predict",
  18. "method": "POST",
  19. "headers": {
  20. "Authorization": "Bearer ${credential.cohere_key}",
  21. "Request-Source": "unspecified:opensearch"
  22. },
  23. "url": "https://api.cohere.ai/v1/embed",
  24. "request_body": "{ \"model\": \"${parameters.model}\", \"texts\": ${parameters.texts}, \"input_type\":\"${parameters.input_type}\", \"embedding_types\": ${parameters.embedding_types} }",
  25. "pre_process_function": "connector.pre_process.cohere.embedding",
  26. "post_process_function": "\n def name = \"sentence_embedding\";\n def data_type = \"FLOAT32\";\n def result;\n if (params.embeddings.int8 != null) {\n data_type = \"INT8\";\n result = params.embeddings.int8;\n } else if (params.embeddings.uint8 != null) {\n data_type = \"UINT8\";\n result = params.embeddings.uint8;\n } else if (params.embeddings.float != null) {\n data_type = \"FLOAT32\";\n result = params.embeddings.float;\n }\n \n if (result == null) {\n return \"Invalid embedding result\";\n }\n \n def embedding_list = new StringBuilder(\"[\");\n \n for (int m=0; m<result.length; m++) {\n def embedding_size = result[m].length;\n def embedding = new StringBuilder(\"[\");\n def shape = [embedding_size];\n for (int i=0; i<embedding_size; i++) {\n def val;\n if (\"FLOAT32\".equals(data_type)) {\n val = result[m][i].floatValue();\n } else if (\"INT8\".equals(data_type) || \"UINT8\".equals(data_type)) {\n val = result[m][i].intValue();\n }\n embedding.append(val);\n if (i < embedding_size - 1) {\n embedding.append(\",\"); \n }\n }\n embedding.append(\"]\"); \n \n // workaround for compatible with neural-search\n def dummy_data_type = 'FLOAT32';\n \n def json = '{' +\n '\"name\":\"' + name + '\",' +\n '\"data_type\":\"' + dummy_data_type + '\",' +\n '\"shape\":' + shape + ',' +\n '\"data\":' + embedding +\n '}';\n embedding_list.append(json);\n if (m < result.length - 1) {\n embedding_list.append(\",\"); \n }\n }\n embedding_list.append(\"]\"); \n return embedding_list.toString();\n "
  27. }
  28. ]
  29. }

copy

Note the connector ID in the response; you’ll use it to register the model.

Register the model, providing its connector ID:

  1. POST /_plugins/_ml/models/_register?deploy=true
  2. {
  3. "name": "Cohere embedding model for INT8 with search_document input type",
  4. "function_name": "remote",
  5. "description": "test model",
  6. "connector_id": "your_connector_id"
  7. }

copy

Note the model ID in the response; you’ll use it to run queries.

Run a neural search query, providing the model ID:

  1. POST /my_test_data/_search
  2. {
  3. "query": {
  4. "neural": {
  5. "passage_embedding": {
  6. "query_text": "semantic search",
  7. "model_id": "your_embedding_model_id",
  8. "k": 100
  9. }
  10. }
  11. },
  12. "size": "1",
  13. "_source": ["passage_text"]
  14. }

copy

The response contains the query results:

  1. {
  2. "took": 143,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 2,
  13. "relation": "eq"
  14. },
  15. "max_score": 9.345969e-7,
  16. "hits": [
  17. {
  18. "_index": "my_test_data",
  19. "_id": "_IXCuY0BJr_OiKWden7i",
  20. "_score": 9.345969e-7,
  21. "_source": {
  22. "passage_text": "BM25 is a keyword-based algorithm that performs well on queries containing keywords but fails to capture the semantic meaning of the query terms. Semantic search, unlike keyword-based search, takes into account the meaning of the query in the search context. Thus, semantic search performs well when a query requires natural language understanding."
  23. }
  24. }
  25. ]
  26. }
  27. }