Disk-based vector search

Disk-based vector search

Introduced 2.17

For low-memory environments, OpenSearch provides disk-based vector search, which significantly reduces the operational costs for vector workloads. Disk-based vector search uses binary quantization, compressing vectors and thereby reducing the memory requirements. This memory optimization provides large memory savings at the cost of slightly increased search latency while still maintaining strong recall.

To use disk-based vector search, set the mode parameter to on_disk for your vector field type. This parameter will configure your index to use secondary storage.

Creating an index for disk-based vector search

To create an index for disk-based vector search, send the following request:

PUT my-vector-index
{
  "settings" : {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector_field": {
        "type": "knn_vector",
        "dimension": 8,
        "space_type": "innerproduct",
        "data_type": "float",
        "mode": "on_disk"
      }
    }
  }
}

copy

By default, the on_disk mode configures the index to use the faiss engine and hnsw method. The default compression_level of 32x reduces the amount of memory the vectors require by a factor of 32. To preserve the search recall, rescoring is enabled by default. A search on a disk-optimized index runs in two phases: The compressed index is searched first, and then the results are rescored using full-precision vectors loaded from disk.

To reduce the compression level, provide the compression_level parameter when creating the index mapping:

PUT my-vector-index
{
  "settings" : {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector_field": {
        "type": "knn_vector",
        "dimension": 8,
        "space_type": "innerproduct",
        "data_type": "float",
        "mode": "on_disk",
        "compression_level": "16x"
      }
    }
  }
}

copy

For more information about the compression_level parameter, see Compression levels. Note that for 4x compression, the lucene engine will be used.

If you need more granular fine-tuning, you can override additional k-NN parameters in the method definition. For example, to improve recall, increase the ef_construction parameter value:

PUT my-vector-index
{
  "settings" : {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector_field": {
        "type": "knn_vector",
        "dimension": 8,
        "space_type": "innerproduct",
        "data_type": "float",
        "mode": "on_disk",
        "method": {
          "params": {
            "ef_construction": 512
          }
        }
      }
    }
  }
}

copy

The on_disk mode only works with the float data type.

Ingestion

You can perform document ingestion for a disk-optimized vector index in the same way as for a regular vector index. To index several documents in bulk, send the following request:

POST _bulk
{ "index": { "_index": "my-vector-index", "_id": "1" } }
{ "my_vector_field": [1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5], "price": 12.2 }
{ "index": { "_index": "my-vector-index", "_id": "2" } }
{ "my_vector_field": [2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5], "price": 7.1 }
{ "index": { "_index": "my-vector-index", "_id": "3" } }
{ "my_vector_field": [3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5], "price": 12.9 }
{ "index": { "_index": "my-vector-index", "_id": "4" } }
{ "my_vector_field": [4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5], "price": 1.2 }
{ "index": { "_index": "my-vector-index", "_id": "5" } }
{ "my_vector_field": [5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-vector-index", "_id": "6" } }
{ "my_vector_field": [6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5], "price": 10.3 }
{ "index": { "_index": "my-vector-index", "_id": "7" } }
{ "my_vector_field": [7.5, 7.5, 7.5, 7.5, 7.5, 7.5, 7.5, 7.5], "price": 5.5 }
{ "index": { "_index": "my-vector-index", "_id": "8" } }
{ "my_vector_field": [8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5], "price": 4.4 }
{ "index": { "_index": "my-vector-index", "_id": "9" } }
{ "my_vector_field": [9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5], "price": 8.9 }

copy

Search

Search is also performed in the same way as in other index configurations. The key difference is that, by default, the oversample_factor of the rescore parameter is set to 3.0 (unless you override the compression_level). For more information, see Rescoring quantized results using full precision. To perform vector search on a disk-optimized index, provide the search vector:

GET my-vector-index/_search
{
  "query": {
    "knn": {
      "my_vector_field": {
        "vector": [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5],
        "k": 5
      }
    }
  }
}

copy

Similarly to other index configurations, you can override k-NN parameters in the search request:

GET my-vector-index/_search
{
  "query": {
    "knn": {
      "my_vector_field": {
        "vector": [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5],
        "k": 5,
        "method_parameters": {
            "ef_search": 512
        },
        "rescore": {
            "oversample_factor": 10.0
        }
      }
    }
  }
}

copy

Radial search does not support disk-based vector search.

Model-based indexes

For model-based indexes, you can specify the on_disk parameter in the training request in the same way that you would specify it during index creation. By default, on_disk mode will use the Faiss IVF method and a compression level of 32x. To run the training API, send the following request:

POST /_plugins/_knn/models/test-model/_train
{
    "training_index": "train-index-name",
    "training_field": "train-field-name",
    "dimension": 8,
    "max_training_vector_count": 1200,
    "search_size": 100,
    "description": "My model",
    "space_type": "innerproduct",
    "mode": "on_disk"
}

copy

This command assumes that training data has been ingested into the train-index-name index. For more information, see Building a k-NN index from a model.

You can override the compression_level for disk-optimized indexes in the same way as for regular k-NN indexes.

Next steps

For more information about binary quantization, see Binary quantization.
For more information about k-NN vector workload modes, see Vector workload modes.