Doc Opeartion

Doc Opeartion

http://router_server is the router service, $db_name is the name of the created database, $space_name is the name of the created space, $ID is the unique ID of the data record.

Single Insertion

Insert without a unique ID

curl -XPOST -H "content-type: application/json"  -d'
{
    "field1": "value1",
    "field2": "value2",
    "field3": {
        "feature": [0.1, 0.2]
    }
}
' http://router_server/$db_name/$space_name

field1 and field2 are scalar field and field3 is feature field. All field names, value types, and table structures are consistent

The return value format is as follows:

{
    "_index": "db1",
    "_type": "space1",
    "_id": "AW5J1lNmJG6WbbCkHrFW",
    "status": 201,
    "_version": 1,
    "_shards": {
        "total": 0,
        "successful": 1,
        "failed": 0
    },
    "result": "created",
    "_seq_no": 1,
    "_primary_term": 1
}

Among them, _index is the name of the database, _type is the name of the tablespace. ID is the unique identification of the record generated by the server, which can be specified by the user. The unique identification needs to be used for data modification and deletion.

Specify a unique ID when inserting

curl -XPOST -H "content-type: application/json"  -d'
{
    "field1": "value1",
    "field2": "value2",
    "field3": {
        "feature": [0.1, 0.2]
    }
}
' http://router_server/$db_name/$space_name/$id

$id is the unique ID generated by the server with the specified value when inserting data. The $id value cannot use special characters such as URL path. Overwrite if the unique record already exists in the library.

Batch insertion

curl -H "content-type: application/json" -XPOST -d'
{"index": {"_id": "v1"}}\n
{"field1": "value", "field2": {"feature": []}}\n
{"index": {"_id": "v2"}}\n
{"field1": "value", "field2": {"feature": []}}\n
' http://router_server/$db_name/$space_name/_bulk

like json format, {“index”: {“_id”: “v1”}} specify the record id, {“field1”: “value”, “field2”: {“feature”: []}} specify inserted data，every line is json string.

Update

Unique ID must be specified when updating

curl -H "content-type: application/json" -XPOST -d'
{
    "doc": {
        "field1": 32
    }
}
' http://router_server/$db_name/$space_name/$id/_update

The unique $id is specified in the request path. The field1 is the field to be modified. The modification of the vector field uses the method of inserting the specified $id to update the data coverage.

Delete

Delete data according to unique ID

curl -XDELETE http://router_server/$db_name/$space_name/$id

Delete data according to query filtering results

curl -H "content-type: application/json" -XPOST -d'
{
    "query": {
        "filter": [{}]
    }
}
' http://router_server/$db_name/$space_name/_delete_by_query

see detail infomation in search

Batch delete according to ID

curl -H "content-type: application/json" -XPOST -d'
{"delete": {"_id": "v1"}}
{"delete": {"_id": "v2"}}
{"delete": {"_id": "v3"}}
' http://router_server/$db_name/$space_name/_bulk

See the following for query syntax

Search

Query example

curl -H "content-type: application/json" -XPOST -d'
{
    "query": {
        "sum": [{
            "field": "field_name",
            "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
            "min_score": 0.9,
            "boost": 0.5
        }],
        "filter": [{
            "range": {
                "field_name": {
                    "gte": 160,
                    "lte": 180
                }
            }
        },
        {
             "term": {
                 "field_name": ["100", "200", "300"],
                 "operator": "or"
             }
        }]
    },
    "direct_search_type": 0,
    "quick": false,
    "vector_value": false,
    "online_log_level": "debug",
    "size": 10
}
' http://router_server/$db_name/$space_name/_search

The overall JSON structure of query parameters is as follows:

{
    "query": {
        "sum": [],
        "filter": []
    },
    "direct_search_type": 0,
    "quick": false,
    "vector_value": false,
    "online_log_level": "debug",
    "size": 10
}

Parameter Description:

field name	field type	must	remarks
sum	json array	false	query feature, vector or document_ids must have one
filter	json array	false	query criteria filtering: numeric filtering + label filtering
fields	json array	false	Specify which fields to return. By default, only the unique id and score are returned.
is_brute_search	int	false	default 0
online_log_level	string	false	The value is debug, which turns on printing debugging logs.
quick	bool	false	default false
vector_value	bool	false	default false
load_balance	string	false	Load balancing algorithm, random by default
l2_sqrt	bool	false	The default is false, and the root sign is used for the l2 distance calculation result.
sort	json array	false	Specify field sorting (only for matching results, not the whole)
size	int	false	Specify the number of returned results, the default is 50

The retrieval_param parameter specifies the parameters for model calculation. Different models support different parameters, as shown in the following example:

metric_type: calculation type, supports InnerProduct and L2, the default is L2.
nprobe: Search bucket number.
recall_num: The number of recalls, the default is equal to the value of size in the query parameter, set the number to search from the index, and then calculate the size closest values.
parallel_on_queries: Default 1, parallelism between searches; 0 represents parallelism between buckets.
efSearch: distance of graph traversal.

IVFPQ:

"retrieval_param": {
    "parallel_on_queries": 1,
    "recall_num" : 100,
    "nprobe": 80,
    "metric_type": "L2"
}

GPU:

"retrieval_param": {
    "recall_num" : 100,
    "nprobe": 80,
    "metric_type": "L2"
}

HNSW:

"retrieval_param": {
    "efSearch": 64,
    "metric_type": "L2"
}

IVFFLAT:

"retrieval_param": {
    "parallel_on_queries": 1,
    "nprobe": 80,
    "metric_type": "L2"
}

FLAT:

"retrieval_param": {
    "metric_type": "L2"
}

sum json structure elucidation:

"sum": [{
          "field": "field_name",
          "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
          "min_score": 0.9,
          "boost": 0.5
       }]

sum: Support multiple (including multiple feature fields when defining table structure correspondingly).
field: Specifies the name of the feature field when the table is created.
feature: Transfer feature, dimension must be the same when defining table structure
min_score: Specify the minimum score of the returned result, min_score can specify the minimum score of the returned result, and max_score can specify the maximum score. For example, set “min_score”: 0.8, “max_score”: 0.95 to filter the result of 0.8 <= score <= 0.95. At the same time, another way of score filtering is to use the combination of “symbol”: “>=”, “value”: 0.9. The value types supported by symbol include: >, >=, < and <= four kinds, and the values of value.
boost: Specify the weight of similarity. For example, if the similarity score of two vectors is 0.7 and boost is set to 0.5, the returned result will multiply the score 0.7 * 0.5, which is 0.35.Does not take effect when using a single vector.

filter json structure elucidation:

"filter": [
             {
                 "range": {
                     "field_name": {
                          "gte": 160,
                          "lte": 180
                     }
                 }
             },
             {
                 "term": {
                     "field_name": ["100", "200", "300"],
                     "operator": "or"
                 }
             }
          ]

filter: Multiple conditions are supported. Multiple conditions are intersecting.
range: Specify to use the numeric field integer / float filtering, the file name is the numeric field name, gte and lte specify the range, lte is less than or equal to, gte is greater than or equal to, if equivalent filtering is used, lte and gte settings are the same value. The above example shows that the query field_name field is greater than or equal to 160 but less than or equal to 180.
term: With label filtering, field_name is a defined label field, which allows multiple value filtering. You can intersect “operator”: “or”, merge: “operator”: “and”. The above example indicates that the query field name segment value is “100”, “200” or “300”.

is_brute_search: Specify the query type. 0 means to use index if the feature has been created, and violent search if it has not been created; - 1 means to use index only for search, and 1 means not to use index only for violent search. The default value is 0.
quick: By default, the PQ recall vector is calculated and refined in the search results. In order to speed up the processing speed of the server to true, only recall can be specified, and no calculation and refined.
vector_value: In order to reduce the network overhead, the search results contain only scalar information fields without feature data by default, and set to true to specify that the returned results contain the original feature data.
online_log_level: Set “debug” to specify to print more detailed logs on the server, which is convenient for troubleshooting in the development and test phase.
size: Specifies the maximum number of results to return. use the size value specified in the URL first.

ID query

curl -XGET http://router_server/$db_name/$space_name/$id

Batch search

curl -H "content-type: application/json" -XPOST -d'
{
    "query": {
        "sum": [{
            "field": "vector_field_name",
            "feature": [0.1, 0.2]
        }]
    }
}
' http://router_server/$db_name/$space_name/_msearch

The difference between batch query and single query is that the batch features are spliced into a feature array in order, and the background service will split according to the feature dimension when defining the table space structure. For example, define 10-dimensional feature fields, query 50 features in batches, and splice features into a 500 dimensional array in order to assign them to feature parameters. The request suffix uses “_msearch”.

Multi vector query

The definition of tablespace supports multiple feature fields, so the query can support the features of corresponding data. Take two vectors per record as an example: define table structure fields

{
    "field1": {
        "type": "vector",
        "dimension": 128
    },
    "field2": {
        "type": "vector",
        "dimension": 256
    }
}

Field1 and field2 are vector fields, and two vectors can be specified for search criteria during query:

{
    "query": {
        "sum": [{
            "field": "filed1",
            "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
            "min_score": 0.9
        },
        {
            "field": "filed2",
            "feature": [0.8, 0.9],
            "min_score": 0.8
        }]
    }
}

The results of field1 and field2 are intersected, and other parameters and request addresses are consistent with those of ordinary queries.