k-NN search with nested fields

Using nested fields in a k-nearest neighbors (k-NN) index, you can store multiple vectors in a single document. For example, if your document consists of various components, you can generate a vector value for each component and store each vector in a nested field.

A k-NN document search operates at the field level. For a document with nested fields, OpenSearch examines only the vector nearest to the query vector to decide whether to include the document in the results. For example, consider an index containing documents A and B. Document A is represented by vectors A1 and A2, and document B is represented by vector B1. Further, the similarity order for a query Q is A1, A2, B1. If you search using query Q with a k value of 2, the search will return both documents A and B instead of only document A.

Note that in the case of an approximate search, the results are approximations and not exact matches.

k-NN search with nested fields is supported by the HNSW algorithm for the Lucene and Faiss engines.

Indexing and searching nested fields

To use k-NN search with nested fields, you must create a k-NN index by setting index.knn to true. Create a nested field by setting its type to nested and specify one or more fields of the knn_vector data type within the nested field. In this example, the knn_vector field my_vector is nested inside the nested_field field:

  1. PUT my-knn-index-1
  2. {
  3. "settings": {
  4. "index": {
  5. "knn": true
  6. }
  7. },
  8. "mappings": {
  9. "properties": {
  10. "nested_field": {
  11. "type": "nested",
  12. "properties": {
  13. "my_vector": {
  14. "type": "knn_vector",
  15. "dimension": 3,
  16. "space_type": "l2",
  17. "method": {
  18. "name": "hnsw",
  19. "engine": "lucene",
  20. "parameters": {
  21. "ef_construction": 100,
  22. "m": 16
  23. }
  24. }
  25. },
  26. "color": {
  27. "type": "text",
  28. "index": false
  29. }
  30. }
  31. }
  32. }
  33. }
  34. }

copy

After you create the index, add some data to it:

  1. PUT _bulk?refresh=true
  2. { "index": { "_index": "my-knn-index-1", "_id": "1" } }
  3. {"nested_field":[{"my_vector":[1,1,1], "color": "blue"},{"my_vector":[2,2,2], "color": "yellow"},{"my_vector":[3,3,3], "color": "white"}]}
  4. { "index": { "_index": "my-knn-index-1", "_id": "2" } }
  5. {"nested_field":[{"my_vector":[10,10,10], "color": "red"},{"my_vector":[20,20,20], "color": "green"},{"my_vector":[30,30,30], "color": "black"}]}

copy

Then run a k-NN search on the data by using the knn query type:

  1. GET my-knn-index-1/_search
  2. {
  3. "query": {
  4. "nested": {
  5. "path": "nested_field",
  6. "query": {
  7. "knn": {
  8. "nested_field.my_vector": {
  9. "vector": [1,1,1],
  10. "k": 2
  11. }
  12. }
  13. }
  14. }
  15. }
  16. }

copy

Even though all three vectors nearest to the query vector are in document 1, the query returns both documents 1 and 2 because k is set to 2:

  1. {
  2. "took": 5,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 2,
  13. "relation": "eq"
  14. },
  15. "max_score": 1.0,
  16. "hits": [
  17. {
  18. "_index": "my-knn-index-1",
  19. "_id": "1",
  20. "_score": 1.0,
  21. "_source": {
  22. "nested_field": [
  23. {
  24. "my_vector": [
  25. 1,
  26. 1,
  27. 1
  28. ],
  29. "color": "blue"
  30. },
  31. {
  32. "my_vector": [
  33. 2,
  34. 2,
  35. 2
  36. ],
  37. "color": "yellow"
  38. },
  39. {
  40. "my_vector": [
  41. 3,
  42. 3,
  43. 3
  44. ],
  45. "color": "white"
  46. }
  47. ]
  48. }
  49. },
  50. {
  51. "_index": "my-knn-index-1",
  52. "_id": "2",
  53. "_score": 0.0040983604,
  54. "_source": {
  55. "nested_field": [
  56. {
  57. "my_vector": [
  58. 10,
  59. 10,
  60. 10
  61. ],
  62. "color": "red"
  63. },
  64. {
  65. "my_vector": [
  66. 20,
  67. 20,
  68. 20
  69. ],
  70. "color": "green"
  71. },
  72. {
  73. "my_vector": [
  74. 30,
  75. 30,
  76. 30
  77. ],
  78. "color": "black"
  79. }
  80. ]
  81. }
  82. }
  83. ]
  84. }
  85. }

Inner hits

When you retrieve documents based on matches in nested fields, by default, the response does not contain information about which inner objects matched the query. Thus, it is not apparent why the document is a match. To include information about the matching nested fields in the response, you can provide the inner_hits object in your query. To return only certain fields of the matching documents within inner_hits, specify the document fields in the fields array. Generally, you should also exclude _source from the results to avoid returning the whole document. The following example returns only the color inner field of the nested_field:

  1. GET my-knn-index-1/_search
  2. {
  3. "_source": false,
  4. "query": {
  5. "nested": {
  6. "path": "nested_field",
  7. "query": {
  8. "knn": {
  9. "nested_field.my_vector": {
  10. "vector": [1,1,1],
  11. "k": 2
  12. }
  13. }
  14. },
  15. "inner_hits": {
  16. "_source": false,
  17. "fields":["nested_field.color"]
  18. }
  19. }
  20. }
  21. }

copy

The response contains matching documents. For each matching document, the inner_hits object contains only the nested_field.color fields of the matched documents in the fields array:

  1. {
  2. "took": 4,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 2,
  13. "relation": "eq"
  14. },
  15. "max_score": 1.0,
  16. "hits": [
  17. {
  18. "_index": "my-knn-index-1",
  19. "_id": "1",
  20. "_score": 1.0,
  21. "inner_hits": {
  22. "nested_field": {
  23. "hits": {
  24. "total": {
  25. "value": 1,
  26. "relation": "eq"
  27. },
  28. "max_score": 1.0,
  29. "hits": [
  30. {
  31. "_index": "my-knn-index-1",
  32. "_id": "1",
  33. "_nested": {
  34. "field": "nested_field",
  35. "offset": 0
  36. },
  37. "_score": 1.0,
  38. "fields": {
  39. "nested_field.color": [
  40. "blue"
  41. ]
  42. }
  43. }
  44. ]
  45. }
  46. }
  47. }
  48. },
  49. {
  50. "_index": "my-knn-index-1",
  51. "_id": "2",
  52. "_score": 0.0040983604,
  53. "inner_hits": {
  54. "nested_field": {
  55. "hits": {
  56. "total": {
  57. "value": 1,
  58. "relation": "eq"
  59. },
  60. "max_score": 0.0040983604,
  61. "hits": [
  62. {
  63. "_index": "my-knn-index-1",
  64. "_id": "2",
  65. "_nested": {
  66. "field": "nested_field",
  67. "offset": 0
  68. },
  69. "_score": 0.0040983604,
  70. "fields": {
  71. "nested_field.color": [
  72. "red"
  73. ]
  74. }
  75. }
  76. ]
  77. }
  78. }
  79. }
  80. }
  81. ]
  82. }
  83. }

k-NN search with filtering on nested fields

You can apply a filter to a k-NN search with nested fields. A filter can be applied to either a top-level field or a field inside a nested field.

The following example applies a filter to a top-level field.

First, create a k-NN index with a nested field:

  1. PUT my-knn-index-1
  2. {
  3. "settings": {
  4. "index": {
  5. "knn": true
  6. }
  7. },
  8. "mappings": {
  9. "properties": {
  10. "nested_field": {
  11. "type": "nested",
  12. "properties": {
  13. "my_vector": {
  14. "type": "knn_vector",
  15. "dimension": 3,
  16. "space_type": "l2",
  17. "method": {
  18. "name": "hnsw",
  19. "engine": "lucene",
  20. "parameters": {
  21. "ef_construction": 100,
  22. "m": 16
  23. }
  24. }
  25. }
  26. }
  27. }
  28. }
  29. }
  30. }

copy

After you create the index, add some data to it:

  1. PUT _bulk?refresh=true
  2. { "index": { "_index": "my-knn-index-1", "_id": "1" } }
  3. {"parking": false, "nested_field":[{"my_vector":[1,1,1]},{"my_vector":[2,2,2]},{"my_vector":[3,3,3]}]}
  4. { "index": { "_index": "my-knn-index-1", "_id": "2" } }
  5. {"parking": true, "nested_field":[{"my_vector":[10,10,10]},{"my_vector":[20,20,20]},{"my_vector":[30,30,30]}]}
  6. { "index": { "_index": "my-knn-index-1", "_id": "3" } }
  7. {"parking": true, "nested_field":[{"my_vector":[100,100,100]},{"my_vector":[200,200,200]},{"my_vector":[300,300,300]}]}

copy

Then run a k-NN search on the data using the knn query type with a filter. The following query returns documents whose parking field is set to true:

  1. GET my-knn-index-1/_search
  2. {
  3. "query": {
  4. "nested": {
  5. "path": "nested_field",
  6. "query": {
  7. "knn": {
  8. "nested_field.my_vector": {
  9. "vector": [
  10. 1,
  11. 1,
  12. 1
  13. ],
  14. "k": 3,
  15. "filter": {
  16. "term": {
  17. "parking": true
  18. }
  19. }
  20. }
  21. }
  22. }
  23. }
  24. }
  25. }

copy

Even though all three vectors nearest to the query vector are in document 1, the query returns documents 2 and 3 because document 1 is filtered out:

  1. {
  2. "took": 10,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 2,
  13. "relation": "eq"
  14. },
  15. "max_score": 0.0040983604,
  16. "hits": [
  17. {
  18. "_index": "my-knn-index-1",
  19. "_id": "2",
  20. "_score": 0.0040983604,
  21. "_source": {
  22. "parking": true,
  23. "nested_field": [
  24. {
  25. "my_vector": [
  26. 10,
  27. 10,
  28. 10
  29. ]
  30. },
  31. {
  32. "my_vector": [
  33. 20,
  34. 20,
  35. 20
  36. ]
  37. },
  38. {
  39. "my_vector": [
  40. 30,
  41. 30,
  42. 30
  43. ]
  44. }
  45. ]
  46. }
  47. },
  48. {
  49. "_index": "my-knn-index-1",
  50. "_id": "3",
  51. "_score": 3.400898E-5,
  52. "_source": {
  53. "parking": true,
  54. "nested_field": [
  55. {
  56. "my_vector": [
  57. 100,
  58. 100,
  59. 100
  60. ]
  61. },
  62. {
  63. "my_vector": [
  64. 200,
  65. 200,
  66. 200
  67. ]
  68. },
  69. {
  70. "my_vector": [
  71. 300,
  72. 300,
  73. 300
  74. ]
  75. }
  76. ]
  77. }
  78. }
  79. ]
  80. }
  81. }