Space Operation

http://${VEARCH_URL} is the vearch service, $db_name is the name of the created database, $space_name is the name of the created space.

Create Space

  1. curl -XPOST -H "content-type: application/json" -d'
  2. {
  3. "name": "space1",
  4. "partition_num": 1,
  5. "replica_num": 3,
  6. "fields": [
  7. {
  8. "name": "field_string",
  9. "type": "string"
  10. },
  11. {
  12. "name": "field_int",
  13. "type": "integer"
  14. },
  15. {
  16. "name": "field_float",
  17. "type": "float",
  18. "index": {
  19. "name": "field_float",
  20. "type": "SCALAR",
  21. },
  22. },
  23. {
  24. "name": "field_string_array",
  25. "type": "stringArray",
  26. "index": {
  27. "name": "field_string_array",
  28. "type": "SCALAR",
  29. },
  30. },
  31. {
  32. "name": "field_int_index",
  33. "type": "integer",
  34. "index": {
  35. "name": "field_int_index",
  36. "type": "SCALAR",
  37. },
  38. },
  39. {
  40. "name": "field_vector",
  41. "type": "vector",
  42. "dimension": 128,
  43. "index": {
  44. "name": "gamma",
  45. "type": "IVFPQ",
  46. "params": {
  47. "metric_type": "InnerProduct",
  48. "ncentroids": 2048,
  49. "nlinks": 32,
  50. "efConstruction": 40,
  51. },
  52. },
  53. }
  54. ]
  55. }
  56. ' http://${VEARCH_URL}/dbs/$db_name/spaces

Parameter description:

field name

field description

field type

must

remarks

name

space name

string

true

partition_num

partition number

int

true

replica_num

replica number

int

true

fields

schema config

json

true

define space field

1、Space name not be empty, do not start with numbers or underscores, try not to use special characters, etc.

2、partition_num: Specify the number of tablespace data fragments. Different fragments can be distributed on different machines to avoid the resource limitation of a single machine.

3、replica_num: The number of copies is recommended to be set to 3, which means that each piece of data has two backups to ensure high availability of data.

index config:

field name

field description

field type

must

remarks

name

index name

string

true

type

index type

string

true

params

index parameters

json

true

1. Index type Index type currently supports seven types in two categories, scalar index: SCALAR; vector index: IVFPQ, HNSW, GPU, IVFFLAT, BINARYIVF, FLAT, please see the link for details https://github.com/vearch/vearch/wiki/Vearch%E7%B4%A2%E5%BC%95%E4%BB%8B%E7%BB%8D%E5%92%8C%E5%8F% 82%E6%95%B0%E9%80%89%E6%8B%A9.

Scalar indexes only need to set name and type.

The parameter configurations and default values required for different vector index types are as follows:

IVFPQ:

field name

field description

field type

must

remarks

metric_type

computer type

string

true

L2 orInnerProduct

ncentroids

number of buckets for indexing

int

true

default 2048

nsubvector

PQ disassembler vector size

int

false

default 64

bucket_init_size

bucket init size

int

false

default 1000

bucket_max_size

max size for each bucket

int

false

default 1280000

training_threshold

training data size

int

false

The default training_threshold * 39 is the amount of data required for each shard training, not the amount of data in the space table.

nprobe

the number of cluster centers found during retrieval

int

false

default 80

  1. "index_type": "IVFPQ",
  2. "index_params": {
  3. "metric_type": "InnerProduct",
  4. "ncentroids": 2048,
  5. "nsubvector": 64
  6. }

set ivfpq with hnsw:

  1. "index_size": 2600000,
  2. "id_type": "string",
  3. "index_type": "IVFPQ",
  4. "index_params": {
  5. "metric_type": "InnerProduct",
  6. "ncentroids": 65536,
  7. "nsubvector": 64,
  8. "hnsw" : {
  9. "nlinks": 32,
  10. "efConstruction": 200,
  11. "efSearch": 64
  12. }
  13. }

HNSW:

field name

field description

field type

must

remarks

metric_type

computer type

string

true

L2 orInnerProduct

nlinks

Number of node neighbors

int

false

default 32

efConstruction

Composition traversal depth

int

false

default 40

  1. "index_type": "HNSW",
  2. "index_params": {
  3. "metric_type": "L2",
  4. "nlinks": 32,
  5. "efConstruction": 40
  6. }
  7. Note: 1. Vector storage only supports MemoryOnly

GPU (Compiled version for GPU):

field name

field description

field type

must

remarks

metric_type

computer type

string

true

L2 orInnerProduct

ncentroids

number of buckets for indexing

int

true

default 2048

nsubvector

PQ disassembler vector size

int

false

default 64, must be a multiple of 4

training_threshold

training data size

int

false

The default training_threshold * 39 is the amount of data required for each shard training, not the amount of data in the space table.

nprobe

the number of cluster centers found during retrieval

int

false

default 80

  1. "index_type": "GPU",
  2. "index_params": {
  3. "metric_type": "InnerProduct",
  4. "ncentroids": 2048,
  5. "nsubvector": 64
  6. }

IVFFLAT:

field name

field description

field type

must

remarks

metric_type

computer type

string

true

L2 orInnerProduct

ncentroids

number of buckets for indexing

int

true

default 256

training_threshold

training data size

int

false

The default training_threshold * 39 is the amount of data required for each shard training, not the amount of data in the space table.

nprobe

the number of cluster centers found during retrieval

int

false

default 80

  1. "index_type": "IVFFLAT",
  2. "index_params": {
  3. "metric_type": "InnerProduct",
  4. "ncentroids": 256
  5. }
  6. Note: 1. The vector storage method only supports RocksDB

BINARYIVF:

  1. "index_type": "BINARYIVF",
  2. "index_params": {
  3. "ncentroids": 256
  4. }
  5. Note: 1. The vector length is a multiple of 8

FLAT:

field name

field description

field type

must

remarks

metric_type

computer type

string

true

L2 orInnerProduct

  1. "index_type": "FLAT",
  2. "index_params": {
  3. "metric_type": "InnerProduct"
  4. }
  5. Note: 1. The vector storage method only supports MemoryOnly

fields config:

  1. There are seven types (that is, the value of type) supported by the field defined by the table space structure: string(keyword), stringArray, integer, long, float, double, vector (keyword is equivalent to string).

  2. The string type fields(include stringArray) support index. Index defines whether to create an index.

  3. Integer, float, long, double type fields support the index attribute, and the fields with index set to true support the use of numeric range filtering queries.

  4. Vector type fields are feature fields. Multiple feature fields are supported in a table space. The attributes supported by vector type fields are as follows:

field name

field description

field type

must

remarks

dimension

feature dimension

int

true

Value is an integral multiple of the above nsubvector value

store_type

feature storage type

string

false

support MemoryOnly and RocksDB

store_param

storage parameter settings

json

false

set the memory size of data

model_id

feature plug-in model

string

false

Specify when using the feature plug-in service

  1. dimension: define that type is the field of vector, and specify the dimension size of the feature.

  2. store_type: raw vector storage type, there are the following options

“MemoryOnly”: Vectors are stored in the memory, and the amount of stored vectors is limited by the memory. It is suitable for scenarios where the amount of vectors on a single machine is not large (10 millions) and high performance requirements

“RocksDB”: Vectors are stored in RockDB (disk), and the amount of stored vectors is limited by the size of the disk. It is suitable for scenarios where the amount of vectors on a single machine is huge (above 100 millions) and performance requirements are not high.

  1. store_param: storage parameters of different store_type, it contains the following two sub-parameters

cache_size: interge type, the unit is M bytes, the default is 1024. When store_type=”RocksDB”, it indicates the read buffer size of RocksDB. The larger the value, the better the performance of reading vector. Generally set 1024, 2048, 4096 and 6144; store_type =”MemoryOnly”, cache_size is not in effect.

Scalar Index Gamma engine supports scalar index, provides the filtering function for scalar data, the opening method refers to the 2nd and 3rd in the “fields config”, and the retrieval method refers to the “filter json structure elucidation” in the “Search”

View Space

  1. curl -XGET http://${VEARCH_URL}/dbs/$db_name/spaces/$space_name

返回数据详细格式:

字段标识

字段含义

类型

是否一定返回

备注

code

return code

int

msg

return message

string

data

return data

json

return data:

field name

field description

field type

must

remarks

space_name

space name

string

yes

db_name

database name

string

yes

doc_num

space document num

uint64

yes

partition_num

partition num

int

yes

replica_num

replica num

int

yes

schema

space struct schema

json

yes

status

space status

string

yes

red means: There is a problem with space

partitions

space partitions detail information

json

yes

errors

space error information

string list

no

return format:

  1. {
  2. "code": 0,
  3. "data": {
  4. "space_name": "ts_space",
  5. "db_name": "ts_db",
  6. "doc_num": 0,
  7. "partition_num": 1,
  8. "replica_num": 3,
  9. "schema": {
  10. "fields": [
  11. {
  12. "name": "field_string",
  13. "type": "string"
  14. },
  15. {
  16. "name": "field_int",
  17. "type": "integer"
  18. },
  19. {
  20. "name": "field_float",
  21. "type": "float",
  22. "index": {
  23. "name": "field_float",
  24. "type": "SCALAR"
  25. }
  26. },
  27. {
  28. "name": "field_string_array",
  29. "type": "stringArray",
  30. "index": {
  31. "name": "field_string_array",
  32. "type": "SCALAR"
  33. }
  34. },
  35. {
  36. "name": "field_int_index",
  37. "type": "integer",
  38. "index": {
  39. "name": "field_int_index",
  40. "type": "SCALAR"
  41. }
  42. },
  43. {
  44. "name": "field_vector",
  45. "type": "vector",
  46. "dimension": 128,
  47. "index": {
  48. "name": "gamma",
  49. "type": "IVFPQ",
  50. "params": {
  51. "metric_type": "InnerProduct",
  52. "ncentroids": 2048,
  53. "nlinks": 32,
  54. "efConstruction": 40
  55. }
  56. }
  57. }
  58. ]
  59. },
  60. "status": "green",
  61. "partitions": [
  62. {
  63. "pid": 1,
  64. "replica_num": 1,
  65. "status": 4,
  66. "color": "green",
  67. "ip": "x.x.x.x",
  68. "node_id": 1,
  69. "index_status": 0,
  70. "index_num": 0,
  71. "max_docid": -1
  72. },
  73. {
  74. "pid": 2,
  75. "replica_num": 1,
  76. "status": 4,
  77. "color": "green",
  78. "ip": "x.x.x.x",
  79. "node_id": 2,
  80. "index_status": 0,
  81. "index_num": 0,
  82. "max_docid": -1
  83. },
  84. {
  85. "pid": 3,
  86. "replica_num": 1,
  87. "status": 4,
  88. "color": "green",
  89. "ip": "x.x.x.x",
  90. "node_id": 3,
  91. "index_status": 0,
  92. "index_num": 0,
  93. "max_docid": -1
  94. }
  95. ],
  96. }
  97. }

more information

  1. curl -XGET http://${VEARCH_URL}/dbs/$db_name/spaces/$space_name?detail=true

return format

  1. {
  2. "code": 0,
  3. "data": {
  4. "space_name": "ts_space",
  5. "db_name": "ts_db",
  6. "doc_num": 0,
  7. "partition_num": 1,
  8. "replica_num": 3,
  9. "schema": {
  10. "fields": [
  11. {
  12. "name": "field_string",
  13. "type": "string"
  14. },
  15. {
  16. "name": "field_int",
  17. "type": "integer"
  18. },
  19. {
  20. "name": "field_float",
  21. "type": "float",
  22. "index": {
  23. "name": "field_float",
  24. "type": "SCALAR"
  25. }
  26. },
  27. {
  28. "name": "field_string_array",
  29. "type": "stringArray",
  30. "index": {
  31. "name": "field_string_array",
  32. "type": "SCALAR"
  33. }
  34. },
  35. {
  36. "name": "field_int_index",
  37. "type": "integer",
  38. "index": {
  39. "name": "field_int_index",
  40. "type": "SCALAR"
  41. }
  42. },
  43. {
  44. "name": "field_vector",
  45. "type": "vector",
  46. "dimension": 128,
  47. "index": {
  48. "name": "gamma",
  49. "type": "IVFPQ",
  50. "params": {
  51. "metric_type": "InnerProduct",
  52. "ncentroids": 2048,
  53. "nlinks": 32,
  54. "efConstruction": 40
  55. }
  56. }
  57. }
  58. ]
  59. },
  60. "status": "green",
  61. "partitions": [
  62. {
  63. "pid": 1,
  64. "replica_num": 1,
  65. "path": "/export/Data/datas/",
  66. "status": 4,
  67. "color": "green",
  68. "ip": "x.x.x.x",
  69. "node_id": 1,
  70. "raft_status": {
  71. "ID": 1,
  72. "NodeID": 1,
  73. "Leader": 1,
  74. "Term": 1,
  75. "Index": 1,
  76. "Commit": 1,
  77. "Applied": 1,
  78. "Vote": 1,
  79. "PendQueue": 0,
  80. "RecvQueue": 0,
  81. "AppQueue": 0,
  82. "Stopped": false,
  83. "RestoringSnapshot": false,
  84. "State": "StateLeader",
  85. "Replicas": {
  86. "1": {
  87. "Match": 1,
  88. "Commit": 1,
  89. "Next": 2,
  90. "State": "ReplicaStateProbe",
  91. "Snapshoting": false,
  92. "Paused": false,
  93. "Active": true,
  94. "LastActive": "2024-03-18T09: 59: 17.095112556+08: 00",
  95. "Inflight": 0
  96. }
  97. }
  98. },
  99. "index_status": 0,
  100. "index_num": 0,
  101. "max_docid": -1
  102. },
  103. {
  104. "pid": 2,
  105. "replica_num": 1,
  106. "path": "/export/Data/datas/",
  107. "status": 4,
  108. "color": "green",
  109. "ip": "x.x.x.x",
  110. "node_id": 2,
  111. "raft_status": {
  112. "ID": 2,
  113. "NodeID": 1,
  114. "Leader": 1,
  115. "Term": 1,
  116. "Index": 1,
  117. "Commit": 1,
  118. "Applied": 1,
  119. "Vote": 1,
  120. "PendQueue": 0,
  121. "RecvQueue": 0,
  122. "AppQueue": 0,
  123. "Stopped": false,
  124. "RestoringSnapshot": false,
  125. "State": "StateLeader",
  126. "Replicas": {
  127. "1": {
  128. "Match": 1,
  129. "Commit": 1,
  130. "Next": 2,
  131. "State": "ReplicaStateProbe",
  132. "Snapshoting": false,
  133. "Paused": false,
  134. "Active": true,
  135. "LastActive": "2024-03-18T09: 59: 17.095112556+08: 00",
  136. "Inflight": 0
  137. }
  138. }
  139. },
  140. "index_status": 0,
  141. "index_num": 0,
  142. "max_docid": -1
  143. },
  144. {
  145. "pid": 3,
  146. "replica_num": 1,
  147. "path": "/export/Data/datas/",
  148. "status": 4,
  149. "color": "green",
  150. "ip": "x.x.x.x",
  151. "node_id": 3,
  152. "raft_status": {
  153. "ID": 3,
  154. "NodeID": 1,
  155. "Leader": 1,
  156. "Term": 1,
  157. "Index": 1,
  158. "Commit": 1,
  159. "Applied": 1,
  160. "Vote": 1,
  161. "PendQueue": 0,
  162. "RecvQueue": 0,
  163. "AppQueue": 0,
  164. "Stopped": false,
  165. "RestoringSnapshot": false,
  166. "State": "StateLeader",
  167. "Replicas": {
  168. "1": {
  169. "Match": 1,
  170. "Commit": 1,
  171. "Next": 2,
  172. "State": "ReplicaStateProbe",
  173. "Snapshoting": false,
  174. "Paused": false,
  175. "Active": true,
  176. "LastActive": "2024-03-18T09: 59: 17.095112556+08: 00",
  177. "Inflight": 0
  178. }
  179. }
  180. },
  181. "index_status": 0,
  182. "index_num": 0,
  183. "max_docid": -1
  184. }
  185. ]
  186. }
  187. }

Delete Space

  1. curl -XDELETE http://${VEARCH_URL}/dbs/$db_name/spaces/$space_name