Doc Opeartion
http://router_server is the router service, $db_name is the name of the created database, $space_name is the name of the created space, $ID is the unique ID of the data record.
Single Insertion
Insert without a unique ID
curl -XPOST -H "content-type: application/json" -d'
{
"field1": "value1",
"field2": "value2",
"field3": {
"feature": [0.1, 0.2]
}
}
' http://router_server/$db_name/$space_name
field1 and field2 are scalar field and field3 is feature field. All field names, value types, and table structures are consistent
The return value format is as follows:
{
"_index": "db1",
"_type": "space1",
"_id": "AW5J1lNmJG6WbbCkHrFW",
"status": 201,
"_version": 1,
"_shards": {
"total": 0,
"successful": 1,
"failed": 0
},
"result": "created",
"_seq_no": 1,
"_primary_term": 1
}
Among them, _index is the name of the database, _type is the name of the tablespace. ID is the unique identification of the record generated by the server, which can be specified by the user. The unique identification needs to be used for data modification and deletion.
Specify a unique ID when inserting
curl -XPOST -H "content-type: application/json" -d'
{
"field1": "value1",
"field2": "value2",
"field3": {
"feature": [0.1, 0.2]
}
}
' http://router_server/$db_name/$space_name/$id
$id is the unique ID generated by the server with the specified value when inserting data. The $id value cannot use special characters such as URL path. Overwrite if the unique record already exists in the library.
Batch insertion
curl -H "content-type: application/json" -XPOST -d'
{"index": {"_id": "v1"}}\n
{"field1": "value", "field2": {"feature": []}}\n
{"index": {"_id": "v2"}}\n
{"field1": "value", "field2": {"feature": []}}\n
' http://router_server/$db_name/$space_name/_bulk
like json format, {“index”: {“_id”: “v1”}} specify the record id, {“field1”: “value”, “field2”: {“feature”: []}} specify inserted data,every line is json string.
Update
Unique ID must be specified when updating
curl -H "content-type: application/json" -XPOST -d'
{
"doc": {
"field1": 32
}
}
' http://router_server/$db_name/$space_name/$id/_update
The unique $id is specified in the request path. The field1 is the field to be modified. The modification of the vector field uses the method of inserting the specified $id to update the data coverage.
Delete
Delete data according to unique ID
curl -XDELETE http://router_server/$db_name/$space_name/$id
Delete data according to query filtering results
curl -H "content-type: application/json" -XPOST -d'
{
"query": {
"filter": [{}]
}
}
' http://router_server/$db_name/$space_name/_delete_by_query
see detail infomation in search
Batch delete according to ID
curl -H "content-type: application/json" -XPOST -d'
{"delete": {"_id": "v1"}}
{"delete": {"_id": "v2"}}
{"delete": {"_id": "v3"}}
' http://router_server/$db_name/$space_name/_bulk
See the following for query syntax
Search
Query example
curl -H "content-type: application/json" -XPOST -d'
{
"query": {
"sum": [{
"field": "field_name",
"feature": [0.1, 0.2, 0.3, 0.4, 0.5],
"min_score": 0.9,
"boost": 0.5
}],
"filter": [{
"range": {
"field_name": {
"gte": 160,
"lte": 180
}
}
},
{
"term": {
"field_name": ["100", "200", "300"],
"operator": "or"
}
}]
},
"direct_search_type": 0,
"quick": false,
"vector_value": false,
"online_log_level": "debug",
"size": 10
}
' http://router_server/$db_name/$space_name/_search
The overall JSON structure of query parameters is as follows:
{
"query": {
"sum": [],
"filter": []
},
"direct_search_type": 0,
"quick": false,
"vector_value": false,
"online_log_level": "debug",
"size": 10
}
Parameter Description:
field name | field type | must | remarks |
---|---|---|---|
sum | json array | false | query feature, vector or document_ids must have one |
filter | json array | false | query criteria filtering: numeric filtering + label filtering |
fields | json array | false | Specify which fields to return. By default, only the unique id and score are returned. |
is_brute_search | int | false | default 0 |
online_log_level | string | false | The value is debug, which turns on printing debugging logs. |
quick | bool | false | default false |
vector_value | bool | false | default false |
load_balance | string | false | Load balancing algorithm, random by default |
l2_sqrt | bool | false | The default is false, and the root sign is used for the l2 distance calculation result. |
sort | json array | false | Specify field sorting (only for matching results, not the whole) |
size | int | false | Specify the number of returned results, the default is 50 |
The retrieval_param parameter specifies the parameters for model calculation. Different models support different parameters, as shown in the following example:
metric_type: calculation type, supports InnerProduct and L2, the default is L2.
nprobe: Search bucket number.
recall_num: The number of recalls, the default is equal to the value of size in the query parameter, set the number to search from the index, and then calculate the size closest values.
parallel_on_queries: Default 1, parallelism between searches; 0 represents parallelism between buckets.
efSearch: distance of graph traversal.
IVFPQ:
"retrieval_param": {
"parallel_on_queries": 1,
"recall_num" : 100,
"nprobe": 80,
"metric_type": "L2"
}
GPU:
"retrieval_param": {
"recall_num" : 100,
"nprobe": 80,
"metric_type": "L2"
}
HNSW:
"retrieval_param": {
"efSearch": 64,
"metric_type": "L2"
}
IVFFLAT:
"retrieval_param": {
"parallel_on_queries": 1,
"nprobe": 80,
"metric_type": "L2"
}
FLAT:
"retrieval_param": {
"metric_type": "L2"
}
- sum json structure elucidation:
"sum": [{
"field": "field_name",
"feature": [0.1, 0.2, 0.3, 0.4, 0.5],
"min_score": 0.9,
"boost": 0.5
}]
sum: Support multiple (including multiple feature fields when defining table structure correspondingly).
field: Specifies the name of the feature field when the table is created.
feature: Transfer feature, dimension must be the same when defining table structure
min_score: Specify the minimum score of the returned result, min_score can specify the minimum score of the returned result, and max_score can specify the maximum score. For example, set “min_score”: 0.8, “max_score”: 0.95 to filter the result of 0.8 <= score <= 0.95. At the same time, another way of score filtering is to use the combination of “symbol”: “>=”, “value”: 0.9. The value types supported by symbol include: >, >=, < and <= four kinds, and the values of value.
boost: Specify the weight of similarity. For example, if the similarity score of two vectors is 0.7 and boost is set to 0.5, the returned result will multiply the score 0.7 * 0.5, which is 0.35.Does not take effect when using a single vector.
- filter json structure elucidation:
"filter": [
{
"range": {
"field_name": {
"gte": 160,
"lte": 180
}
}
},
{
"term": {
"field_name": ["100", "200", "300"],
"operator": "or"
}
}
]
filter: Multiple conditions are supported. Multiple conditions are intersecting.
range: Specify to use the numeric field integer / float filtering, the file name is the numeric field name, gte and lte specify the range, lte is less than or equal to, gte is greater than or equal to, if equivalent filtering is used, lte and gte settings are the same value. The above example shows that the query field_name field is greater than or equal to 160 but less than or equal to 180.
term: With label filtering, field_name is a defined label field, which allows multiple value filtering. You can intersect “operator”: “or”, merge: “operator”: “and”. The above example indicates that the query field name segment value is “100”, “200” or “300”.
is_brute_search: Specify the query type. 0 means to use index if the feature has been created, and violent search if it has not been created; - 1 means to use index only for search, and 1 means not to use index only for violent search. The default value is 0.
quick: By default, the PQ recall vector is calculated and refined in the search results. In order to speed up the processing speed of the server to true, only recall can be specified, and no calculation and refined.
vector_value: In order to reduce the network overhead, the search results contain only scalar information fields without feature data by default, and set to true to specify that the returned results contain the original feature data.
online_log_level: Set “debug” to specify to print more detailed logs on the server, which is convenient for troubleshooting in the development and test phase.
size: Specifies the maximum number of results to return. use the size value specified in the URL first.
ID query
curl -XGET http://router_server/$db_name/$space_name/$id
Batch search
curl -H "content-type: application/json" -XPOST -d'
{
"query": {
"sum": [{
"field": "vector_field_name",
"feature": [0.1, 0.2]
}]
}
}
' http://router_server/$db_name/$space_name/_msearch
The difference between batch query and single query is that the batch features are spliced into a feature array in order, and the background service will split according to the feature dimension when defining the table space structure. For example, define 10-dimensional feature fields, query 50 features in batches, and splice features into a 500 dimensional array in order to assign them to feature parameters. The request suffix uses “_msearch”.
Multi vector query
The definition of tablespace supports multiple feature fields, so the query can support the features of corresponding data. Take two vectors per record as an example: define table structure fields
{
"field1": {
"type": "vector",
"dimension": 128
},
"field2": {
"type": "vector",
"dimension": 256
}
}
Field1 and field2 are vector fields, and two vectors can be specified for search criteria during query:
{
"query": {
"sum": [{
"field": "filed1",
"feature": [0.1, 0.2, 0.3, 0.4, 0.5],
"min_score": 0.9
},
{
"field": "filed2",
"feature": [0.8, 0.9],
"min_score": 0.8
}]
}
}
The results of field1 and field2 are intersected, and other parameters and request addresses are consistent with those of ordinary queries.