Doc Opeartion

Doc Opeartion

http://router_server represents the router service, $db_name is the created library name, $space_name is the created space name, and $id is the unique id of the data record.

_id is the unique identifier of the record generated by the server, which can be specified by the user. This unique identifier needs to be used to modify and delete data.

$id is a unique identifier generated by the server using the specified value when inserting data. The $id value cannot use special characters such as URL paths. If the record with the unique identifier already exists in the library, it will be updated and overwritten. Single Insertion ——–

document/upsert

If primary_id is set, the specified primary key will be used. If not set, generated by Vearch.

If the _id specified when inserting already exists, the existing data is updated; otherwise, it is inserted.

When the documents in the inserted data contain multiple pieces of data, it is a batch insertion. It is generally recommended that the number of batch insertions does not exceed 100 pieces.

Insertion and update now support passing in the values of only some fields. When inserting and only passing in some fields, the vector field must be included. There is no such restriction when updating.

Do not specify unique identification id when inserting

curl -H "content-type: application/json" -XPOST -d'
{
    "db_name": "ts_db",
    "space_name": "ts_space",
    "documents": [{
        "field_int": 90399,
        "field_float": 90399,
        "field_double": 90399,
        "field_string": "111399",
        "field_vector": {
            "feature": [...]
        }
    }, {
        "field_int": 45085,
        "field_float": 45085,
        "field_double": 45085,
        "field_string": "106085",
        "field_vector": {
            "feature": [...]
        }
    }, {
        "field_int": 52968,
        "field_float": 52968,
        "field_double": 52968,
        "field_string": "113968",
        "field_vector": {
            "feature": [...]
        }
    }]
}
' http://router_server/document/upsert

field_vector is feature field. All field names, value types, and table structures are consistent

Specify unique identifier when inserting

curl -H "content-type: application/json" -XPOST -d'
{
    "db_name": "ts_db",
    "space_name": "ts_space",
    "documents": [{
        "_id": "1000000",
        "field_int": 90399,
        "field_float": 90399,
        "field_double": 90399,
        "field_string": "111399",
        "field_vector": {
            "feature": [...]
        }
    }, {
        "_id": "1000001",
        "field_int": 45085,
        "field_float": 45085,
        "field_double": 45085,
        "field_string": "106085",
        "field_vector": {
            "feature": [...]
        }
    }, {
        "_id": "1000002",
        "field_int": 52968,
        "field_float": 52968,
        "field_double": 52968,
        "field_string": "113968",
        "field_vector": {
            "feature": [...]
        }
    }]
}
' http://router_server/document/upsert

The format of the return value of the upsert interface is as follows

{
    "code": 0,
    "msg": "success",
    "total": 3,
    "document_ids": [{
        "_id": "-526059949411103803",
        "status": 200,
        "error": "success"
    }, {
        "_id": "1287805132970120733",
        "status": 200,
        "error": "success"
    }, {
        "_id": "-1948185285365684656",
        "status": 200,
        "error": "success"
    }]
}

total identifies the number of successful insertions, and document_ids returns the generated _id and insertion result information.

document/query

The /document/query interface is used to accurately search for data that exactly matches the query conditions. The search does not include vector data.

Two methods are supported: one is to obtain documents directly through primary keys, and the other is to obtain corresponding documents based on filter conditions.

If partition_id is set, get the corresponding document on the specified data partition. At this time, the meaning of document_id is the document number on the partition. document_id can be [0, max_docid] of the specified partition, and max_docid and partition information can be obtained through the cluster/health interface. Complete data for the cluster can be obtained this way.

Find data based on unique id identifier

curl -H "content-type: application/json" -XPOST -d'
{
    "db_name": "ts_db",
    "space_name": "ts_space",
    "query": {
        "document_ids": ["6560995651113580768", "-5621139761924822824", "-104688682735192253"]
    },
    "vector_value": true
}
' http://router_server/document/query

Get the corresponding document on the specified data partition. At this time, document_id can be [0, max_docid] of the specified partition.

curl -H "content-type: application/json" -XPOST -d'
{
    "db_name": "ts_db",
    "space_name": "ts_space",
    "query": {
        "document_ids": [
        "10000",
        "10001",
        "10002"
        ],
        "partition_id": "1"
    },
    "vector_value": true
}
' http://router_server/document/query

Find based on Filter expression of custom scalar field

curl -H "content-type: application/json" -XPOST -d'
{
    "db_name": "ts_db",
    "space_name": "ts_space",
    "query": {
        "filter": [
        {
            "range": {
            "field_int": {
                "gte": 1000,
                "lte": 100000
            }
            }
        },
        {
            "term": {
            "field_string": [
                "322"
            ]
            }
        }
        ]
    },
    "vector_value": false
}
' http://router_server/document/query

Query interface return format

{
    "code": 0,
    "msg": "success",
    "total": 3,
    "documents": [{
        "_id": "6560995651113580768",
        "_source": {
            "field_double": 202558,
            "field_float": 102558,
            "field_int": 1558,
            "field_string": "1558"
        }
    }, {
        "_id": "-5621139761924822824",
        "_source": {
            "field_double": 210887,
            "field_float": 110887,
            "field_int": 89887,
            "field_string": "89887"
        }
    }, {
        "_id": "-104688682735192253",
        "_source": {
            "field_double": 207588,
            "field_float": 107588,
            "field_int": 46588,
            "field_string": "46588"
        }
    }]
}

Parameter Description:

field name	field type	must	remarks
document_ids	json array	false	filter or document_ids must have one
partition_id	json array	false	specify get document on which partition
filter	json array	false	query criteria filtering: numeric filtering + label filtering
fields	json array	false	Specify which fields to return. By default, only the unique id and score are returned.
vector_value	bool	false	default false
size	int	false	Specify the number of returned results, the default is 50

document/search

Supports similarity retrieval based on specified ID or vector value, and returns the specified Top K most similar Documents.

Supports similarity retrieval based on the primary key id (Document ID) or vector value, together with the Filter expression of a custom scalar field.

document_ids passes in the unique record id. The background processing first queries the characteristics of the record based on the unique id, and then uses the characteristics to perform similar queries and returns matching results.

Search based on document_ids

curl -H "content-type: application/json" -XPOST -d'
{
    "query": {
        "document_ids": [
            "3646866681750952826"
        ],
        "filter": [
        {
            "range": {
                "field_int": {
                    "gte": 1000,
                    "lte": 100000
                }
            }
        }
        ]
    },
    "retrieval_param": {
        "metric_type": "L2"
    },
    "size": 3,
    "db_name": "ts_db",
    "space_name": "ts_space"
}
' http://router_server/document/search

Search based on vector Supports single or multiple queries. Multiple queries can splice the features of multiple queries into a feature array (such as defining 128-dimensional features and querying 10 in batches. Then 10 128-dimensional features are spliced into a 1280-dimensional feature array in order and assigned to the feature field), After receiving the request, the background splits it according to the characteristic field dimensions defined by the table structure, and returns the matching results in order.

curl -H "content-type: application/json" -XPOST -d'
{
    "query": {
        "vector": [
        {
            "field": "field_vector",
            "feature": [
                "..."
            ]
        }
        ],
        "filter": [
        {
            "range": {
                "field_int": {
                    "gte": 1000,
                    "lte": 100000
                }
            }
        }
        ]
    },
    "retrieval_param": {
        "metric_type": "L2"
    },
    "size": 3,
    "db_name": "ts_db",
    "space_name": "ts_space"
}
' http://router_server/document/search

multi-vector search The table space supports multiple feature fields when defined, so the query can support the features of the corresponding data.

Take two vectors for each record as an example: define table structure fields

{
    "field1": {
        "type": "vector",
        "dimension": 128
    },
    "field2": {
        "type": "vector",
        "dimension": 256
    }
}

field1 and field2 are both vector fields. When querying, the search conditions can specify two vectors:

{
    "query": {
        "vector": [{
            "field": "filed1",
            "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
            "min_score": 0.9
        },
        {
            "field": "filed2",
            "feature": [0.8, 0.9],
            "min_score": 0.8
        }]
    }
}

The intersection of field1 and field2 filtering results is obtained. Other parameters and request addresses are the same as ordinary queries.

search interface return format

{
    "code": 0,
    "msg": "success",
    "documents": [
        [{
            "_id": "6979025510302030694",
            "_score": 16.55717658996582,
            "_source": {
                "field_double": 207598,
                "field_float": 107598,
                "field_int": 6598,
                "field_string": "6598"
            }
        }, {
            "_id": "-104688682735192253",
            "_score": 17.663991928100586,
            "_source": {
                "field_double": 207588,
                "field_float": 107588,
                "field_int": 46588,
                "field_string": "46588"
            }
        }, {
            "_id": "8549822044854277588",
            "_score": 17.88829803466797,
            "_source": {
                "field_double": 220413,
                "field_float": 120413,
                "field_int": 99413,
                "field_string": "99413"
            }
        }]
    ]
}

The overall json structure of the query parameters is as follows:

{
    "query": {
        "vector": [],
        "filter": []
    },
    "retrieval_param": {"nprobe": 20},
    "fields": ["field1", "field2"],
    "is_brute_search": 0,
    "online_log_level": "debug",
    "quick": false,
    "vector_value": false,
    "load_balance": "leader",
    "l2_sqrt": false,
    "size": 10
}

Parameter Description:

field name	field type	must	remarks
vector	json array	false	query feature, vector or document_ids must have one
document_ids	json array	false	query feature, vector or document_ids must have one
filter	json array	false	query criteria filtering: numeric filtering + label filtering
fields	json array	false	Specify which fields to return. By default, only the unique id and score are returned.
is_brute_search	int	false	default 0
online_log_level	string	false	The value is debug, which turns on printing debugging logs.
quick	bool	false	default false
vector_value	bool	false	default false
load_balance	string	false	Load balancing algorithm, random by default
l2_sqrt	bool	false	The default is false, and the root sign is used for the l2 distance calculation result.
sort	json array	false	Specify field sorting (only for matching results, not the whole)
size	int	false	Specify the number of returned results, the default is 50

The retrieval_param parameter specifies the parameters for model calculation. Different models support different parameters, as shown in the following example:

metric_type: calculation type, supports InnerProduct and L2, the default is L2.
nprobe: Search bucket number.
recall_num: The number of recalls, the default is equal to the value of size in the query parameter, set the number to search from the index, and then calculate the size closest values.
parallel_on_queries: Default 1, parallelism between searches; 0 represents parallelism between buckets.
efSearch: distance of graph traversal.

IVFPQ:

"retrieval_param": {
    "parallel_on_queries": 1,
    "recall_num" : 100,
    "nprobe": 80,
    "metric_type": "L2"
}

GPU:

"retrieval_param": {
    "recall_num" : 100,
    "nprobe": 80,
    "metric_type": "L2"
}

HNSW:

"retrieval_param": {
    "efSearch": 64,
    "metric_type": "L2"
}

IVFFLAT:

"retrieval_param": {
    "parallel_on_queries": 1,
    "nprobe": 80,
    "metric_type": "L2"
}

FLAT:

"retrieval_param": {
    "metric_type": "L2"
}

vector json structure elucidation:

"vector": [{
          "field": "field_name",
          "feature": [0.1, 0.2, 0.3, 0.4, 0.5],
          "min_score": 0.9,
          "boost": 0.5
       }]

vector: Support multiple (including multiple feature fields when defining table structure correspondingly).
field: Specifies the name of the feature field when the table is created.
feature: Transfer feature, dimension must be the same when defining table structure
min_score: Specify the minimum score of the returned result, min_score can specify the minimum score of the returned result, and max_score can specify the maximum score. For example, set “min_score”: 0.8, “max_score”: 0.95 to filter the result of 0.8 <= score <= 0.95. At the same time, another way of score filtering is to use the combination of “symbol”: “>=”, “value”: 0.9. The value types supported by symbol include: >, >=, < and <= four kinds, and the values of value.
boost: Specify the weight of similarity. For example, if the similarity score of two vectors is 0.7 and boost is set to 0.5, the returned result will multiply the score 0.7 * 0.5, which is 0.35.Does not take effect when using a single vector.

filter json structure elucidation:

"filter": [
    {
        "range": {
            "field_name": {
                "gte": 160,
                "lte": 180
            }
        }
    },
    {
        "term": {
            "field1": ["100", "200", "300"],
            "operator": "or"
        }
    },
    {
        "term": {
            "field2": ["a", "b", "c"],
            "operator": "and"
        }
    },
    {
        "term": {
            "field3": ["A1", "B2"],
            "operator": "not"
        }
    }
]

filter: Multiple conditions are supported. Multiple conditions are intersecting.
range: Specify to use the numeric field integer / float filtering, the file name is the numeric field name, gte and lte specify the range, lte is less than or equal to, gte is greater than or equal to, if equivalent filtering is used, lte and gte settings are the same value. The above example shows that the query field_name field is greater than or equal to 160 but less than or equal to 180.
term: With label filtering, field_name is a defined label field, which allows multiple value filtering. You can intersect “operator”: “or”, merge: “operator”: “and”. The above example indicates that the query field name segment value is “100”, “200” or “300”.

is_brute_search: Specify the query type. 0 means to use index if the feature has been created, and violent search if it has not been created; - 1 means to use index only for search, and 1 means not to use index only for violent search. The default value is 0.
quick: By default, the PQ recall vector is calculated and refined in the search results. In order to speed up the processing speed of the server to true, only recall can be specified, and no calculation and refined.
vector_value: In order to reduce the network overhead, the search results contain only scalar information fields without feature data by default, and set to true to specify that the returned results contain the original feature data.
online_log_level: Set “debug” to specify to print more detailed logs on the server, which is convenient for troubleshooting in the development and test phase.
size: Specifies the maximum number of results to return. use the size value specified in the URL first.
load_balance: leader，random，no_leader，least_connection，default random。

document/delete

Deletion supports two methods: specifying document_ids and filtering conditions.

Delete specified document_ids

curl -H "content-type: application/json" -XPOST -d'
{
    "db_name": "ts_db",
    "space_name": "ts_space",
    "query": {
        "document_ids": ["4501743250723073467", "616335952940335471", "-2422965400649882823"]
    }
}
' http://router_server/document/delete

Delete documents that meet the filter conditions. size specifies the number of items to delete for each data fragment.

curl -H "content-type: application/json" -XPOST -d'
{
    "db_name": "ts_db",
    "space_name": "ts_space",
    "query": {
        "filter": [
        {
            "range": {
            "field_int": {
                "gte": 1000,
                "lte": 100000
            }
            }
        },
        {
            "term": {
            "field_string": [
                "322"
            ]
            }
        }
        ]
    },
    "size": 3
}
' http://router_server/document/delete

Delete interface return format

{
    "code": 0,
    "msg": "success",
    "total": 3,
    "document_ids": ["4501743250723073467", "616335952940335471", "-2422965400649882823"]
}