Doc Opeartion

http://router_server is the router service, $db_name is the name of the created database, $space_name is the name of the created space, $ID is the unique ID of the data record.

Single Insertion

Insert without a unique ID

  1. curl -XPOST -H "content-type: application/json" -d'
  2. {
  3. "field1": "value1",
  4. "field2": "value2",
  5. "field3": {
  6. "feature": [0.1, 0.2]
  7. }
  8. }
  9. ' http://router_server/$db_name/$space_name

field1 and field2 are scalar field and field3 is feature field. All field names, value types, and table structures are consistent

The return value format is as follows:

  1. {
  2. "_index": "db1",
  3. "_type": "space1",
  4. "_id": "AW5J1lNmJG6WbbCkHrFW",
  5. "status": 201,
  6. "_version": 1,
  7. "_shards": {
  8. "total": 0,
  9. "successful": 1,
  10. "failed": 0
  11. },
  12. "result": "created",
  13. "_seq_no": 1,
  14. "_primary_term": 1
  15. }

Among them, _index is the name of the database, _type is the name of the tablespace. ID is the unique identification of the record generated by the server, which can be specified by the user. The unique identification needs to be used for data modification and deletion.

Specify a unique ID when inserting

  1. curl -XPOST -H "content-type: application/json" -d'
  2. {
  3. "field1": "value1",
  4. "field2": "value2",
  5. "field3": {
  6. "feature": [0.1, 0.2]
  7. }
  8. }
  9. ' http://router_server/$db_name/$space_name/$id

$id is the unique ID generated by the server with the specified value when inserting data. The $id value cannot use special characters such as URL path. Overwrite if the unique record already exists in the library.

Batch insertion

  1. curl -H "content-type: application/json" -XPOST -d'
  2. {"index": {"_id": "v1"}}\n
  3. {"field1": "value", "field2": {"feature": []}}\n
  4. {"index": {"_id": "v2"}}\n
  5. {"field1": "value", "field2": {"feature": []}}\n
  6. ' http://router_server/$db_name/$space_name/_bulk

like json format, {“index”: {“_id”: “v1”}} specify the record id, {“field1”: “value”, “field2”: {“feature”: []}} specify inserted data,every line is json string.

Update

Unique ID must be specified when updating

  1. curl -H "content-type: application/json" -XPOST -d'
  2. {
  3. "doc": {
  4. "field1": 32
  5. }
  6. }
  7. ' http://router_server/$db_name/$space_name/$id/_update

The unique $id is specified in the request path. The field1 is the field to be modified. The modification of the vector field uses the method of inserting the specified $id to update the data coverage.

Delete

Delete data according to unique ID

  1. curl -XDELETE http://router_server/$db_name/$space_name/$id

Delete data according to query filtering results

  1. curl -H "content-type: application/json" -XPOST -d'
  2. {
  3. "query": {
  4. "filter": [{}]
  5. }
  6. }
  7. ' http://router_server/$db_name/$space_name/_delete_by_query

see detail infomation in search

Batch delete according to ID

  1. curl -H "content-type: application/json" -XPOST -d'
  2. {"delete": {"_id": "v1"}}
  3. {"delete": {"_id": "v2"}}
  4. {"delete": {"_id": "v3"}}
  5. ' http://router_server/$db_name/$space_name/_bulk

See the following for query syntax

Query example

  1. curl -H "content-type: application/json" -XPOST -d'
  2. {
  3. "query": {
  4. "sum": [{
  5. "field": "field_name",
  6. "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
  7. "min_score": 0.9,
  8. "boost": 0.5
  9. }],
  10. "filter": [{
  11. "range": {
  12. "field_name": {
  13. "gte": 160,
  14. "lte": 180
  15. }
  16. }
  17. },
  18. {
  19. "term": {
  20. "field_name": ["100", "200", "300"],
  21. "operator": "or"
  22. }
  23. }]
  24. },
  25. "direct_search_type": 0,
  26. "quick": false,
  27. "vector_value": false,
  28. "online_log_level": "debug",
  29. "size": 10
  30. }
  31. ' http://router_server/$db_name/$space_name/_search

The overall JSON structure of query parameters is as follows:

  1. {
  2. "query": {
  3. "sum": [],
  4. "filter": []
  5. },
  6. "direct_search_type": 0,
  7. "quick": false,
  8. "vector_value": false,
  9. "online_log_level": "debug",
  10. "size": 10
  11. }

Parameter Description:

field name

field type

must

remarks

sum

json array

false

query feature, vector or document_ids must have one

filter

json array

false

query criteria filtering: numeric filtering + label filtering

fields

json array

false

Specify which fields to return. By default, only the unique id and score are returned.

is_brute_search

int

false

default 0

online_log_level

string

false

The value is debug, which turns on printing debugging logs.

quick

bool

false

default false

vector_value

bool

false

default false

load_balance

string

false

Load balancing algorithm, random by default

l2_sqrt

bool

false

The default is false, and the root sign is used for the l2 distance calculation result.

sort

json array

false

Specify field sorting (only for matching results, not the whole)

size

int

false

Specify the number of returned results, the default is 50

The retrieval_param parameter specifies the parameters for model calculation. Different models support different parameters, as shown in the following example:

  • metric_type: calculation type, supports InnerProduct and L2, the default is L2.

  • nprobe: Search bucket number.

  • recall_num: The number of recalls, the default is equal to the value of size in the query parameter, set the number to search from the index, and then calculate the size closest values.

  • parallel_on_queries: Default 1, parallelism between searches; 0 represents parallelism between buckets.

  • efSearch: distance of graph traversal.

IVFPQ:

  1. "retrieval_param": {
  2. "parallel_on_queries": 1,
  3. "recall_num" : 100,
  4. "nprobe": 80,
  5. "metric_type": "L2"
  6. }

GPU:

  1. "retrieval_param": {
  2. "recall_num" : 100,
  3. "nprobe": 80,
  4. "metric_type": "L2"
  5. }

HNSW:

  1. "retrieval_param": {
  2. "efSearch": 64,
  3. "metric_type": "L2"
  4. }

IVFFLAT:

  1. "retrieval_param": {
  2. "parallel_on_queries": 1,
  3. "nprobe": 80,
  4. "metric_type": "L2"
  5. }

FLAT:

  1. "retrieval_param": {
  2. "metric_type": "L2"
  3. }
  • sum json structure elucidation:
  1. "sum": [{
  2. "field": "field_name",
  3. "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
  4. "min_score": 0.9,
  5. "boost": 0.5
  6. }]
  1. sum: Support multiple (including multiple feature fields when defining table structure correspondingly).

  2. field: Specifies the name of the feature field when the table is created.

  3. feature: Transfer feature, dimension must be the same when defining table structure

  4. min_score: Specify the minimum score of the returned result, min_score can specify the minimum score of the returned result, and max_score can specify the maximum score. For example, set “min_score”: 0.8, “max_score”: 0.95 to filter the result of 0.8 <= score <= 0.95. At the same time, another way of score filtering is to use the combination of “symbol”: “>=”, “value”: 0.9. The value types supported by symbol include: >, >=, < and <= four kinds, and the values of value.

  5. boost: Specify the weight of similarity. For example, if the similarity score of two vectors is 0.7 and boost is set to 0.5, the returned result will multiply the score 0.7 * 0.5, which is 0.35.Does not take effect when using a single vector.

  • filter json structure elucidation:
  1. "filter": [
  2. {
  3. "range": {
  4. "field_name": {
  5. "gte": 160,
  6. "lte": 180
  7. }
  8. }
  9. },
  10. {
  11. "term": {
  12. "field_name": ["100", "200", "300"],
  13. "operator": "or"
  14. }
  15. }
  16. ]
  1. filter: Multiple conditions are supported. Multiple conditions are intersecting.

  2. range: Specify to use the numeric field integer / float filtering, the file name is the numeric field name, gte and lte specify the range, lte is less than or equal to, gte is greater than or equal to, if equivalent filtering is used, lte and gte settings are the same value. The above example shows that the query field_name field is greater than or equal to 160 but less than or equal to 180.

  3. term: With label filtering, field_name is a defined label field, which allows multiple value filtering. You can intersect “operator”: “or”, merge: “operator”: “and”. The above example indicates that the query field name segment value is “100”, “200” or “300”.

  • is_brute_search: Specify the query type. 0 means to use index if the feature has been created, and violent search if it has not been created; - 1 means to use index only for search, and 1 means not to use index only for violent search. The default value is 0.

  • quick: By default, the PQ recall vector is calculated and refined in the search results. In order to speed up the processing speed of the server to true, only recall can be specified, and no calculation and refined.

  • vector_value: In order to reduce the network overhead, the search results contain only scalar information fields without feature data by default, and set to true to specify that the returned results contain the original feature data.

  • online_log_level: Set “debug” to specify to print more detailed logs on the server, which is convenient for troubleshooting in the development and test phase.

  • size: Specifies the maximum number of results to return. use the size value specified in the URL first.

ID query

  1. curl -XGET http://router_server/$db_name/$space_name/$id
  1. curl -H "content-type: application/json" -XPOST -d'
  2. {
  3. "query": {
  4. "sum": [{
  5. "field": "vector_field_name",
  6. "feature": [0.1, 0.2]
  7. }]
  8. }
  9. }
  10. ' http://router_server/$db_name/$space_name/_msearch

The difference between batch query and single query is that the batch features are spliced into a feature array in order, and the background service will split according to the feature dimension when defining the table space structure. For example, define 10-dimensional feature fields, query 50 features in batches, and splice features into a 500 dimensional array in order to assign them to feature parameters. The request suffix uses “_msearch”.

Multi vector query

The definition of tablespace supports multiple feature fields, so the query can support the features of corresponding data. Take two vectors per record as an example: define table structure fields

  1. {
  2. "field1": {
  3. "type": "vector",
  4. "dimension": 128
  5. },
  6. "field2": {
  7. "type": "vector",
  8. "dimension": 256
  9. }
  10. }

Field1 and field2 are vector fields, and two vectors can be specified for search criteria during query:

  1. {
  2. "query": {
  3. "sum": [{
  4. "field": "filed1",
  5. "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
  6. "min_score": 0.9
  7. },
  8. {
  9. "field": "filed2",
  10. "feature": [0.8, 0.9],
  11. "min_score": 0.8
  12. }]
  13. }
  14. }

The results of field1 and field2 are intersected, and other parameters and request addresses are consistent with those of ordinary queries.