Effect Evaluation

Benchmarks

This document shows the experiments we do and the results we get. Here we do two series of experiments. First, we experiment on a single node to show the recalls of the modified IVFPQ model which is based on faiss. Second, we do experiments with Vearch cluster.

We evaluate methods with the recall at k performance measure, which is the proportion of results that contain the ground truth nearest neighbor when returning the top k candidates (for k ∈{1,10,100}). And we use Euclidean neighbors as ground truth.

Note that the numbers (especially QPS) change slightly due to changes in the implementation, different machines, etc.

Getting data

We do experiments on two kind of features. One is 128-dimensional SIFT feature, the other is 512-dimensional VGG feature.

Getting SIFT1M

To run it, please download the ANN_SIFT1M dataset from

http://corpus-texmex.irisa.fr/

and unzip it to the subdirectory sift1M.

Getting VGG1M and VGG10M

We get 1 million and other 10 million data and then use deep-learning model vgg to get their features.

Getting VGG100M , VGG500M and VGG1B

We collect billions of data and use deep-learning model vgg to get their features for cluster experiments.

Nprobe experiments

We do experiments on SIFT1M, VGG1M and VGG10M. In this experiment, nprobe ∈{1,5,10,20,30,40,50,80,100,200}. At the same time, we set the ncentroids as 256 and the nbytes as 32.

We use recall at 1 to show the result.

Result

As we can see, when nprobe exceeds 25, there is no obvious change of recalls. Also, when nprobe get larger,only QPS of vgg10M get smaller, QPS of vgg1M and QPS of sift1M basically have no changes.

Ncentroids experiments

We do experiment on VGG10M. The number of centroid ∈{64,128,256,512,1024,2048,4096,8192} and we set nprobe as 50 considering the number of centroid becomes very large. Here we also set nbytes as 32. We use recall at 1 to show the result.

Result

As we can see, there is no obvious change of recalls when the number of centroid get larger. But the QPS become higher and higher as the number of centroid grows.

Nbytes experiments

We do experiment on VGG10M. The number of byte ∈{4,8,16,32,64}. We set ncentroids as 256 and nprobe as 50. We use recall at 1 to show the result.

Result

As we can see, when the number of byte grows, the recall get higher and higher, but the QPS drops obviously.

Experiments with faiss

We do experiments on SIFT1M, VGG1M and VGG10M to compare the recalls with faiss. We use some algorithm implemented with faiss and we use Vearch to represent our algorithm.

Models

Here we show the parameters we set for used models. When the parameters in the table are empty, there are no corresponding parameters in the models. And the parameters of links, efSearch and efConstruction are defined in faiss of hnsw.

model	ncentroids	nprobe	bytes of SIFT	bytes of VGG	links	efSearch	efConstruction
pq \|			32	64
ivfpq \|256		20	32	64
imipq	2^(2*10)	2048	32	64
opq+pq			32	64
hnsw					32	64	40
ivfhnsw	256	20			32	64	40
Vearch	256	20	32	64

Result

recalls of SIFT1M:

model	recall@1	recall@10	recall@100
pq	0.6274	0.9829	0.9999
ivfpq	0.6167	0.9797	0.9960
imipq	0.6595	0.9775	0.9841
opq+pq	0.6250	0.9821	1.0000
hnsw	0.9792	0.9867	0.9867
ivfhnsw	0.9888	0.9961	0.9961
Vearch	0.8649	0.9721	0.9722

recalls of VGG1M :

model	recall@1	recall@10	recall@100
pq	0.5079	0.8922	0.9930
ivfpq	0.4985	0.8792	0.9704
imipq	0.5077	0.8618	0.9248
opq+pq	0.5213	0.9105	0.9975
hnsw	0.9496	0.9550	0.9551
ivfhnsw	0.9690	0.9744	0.9745
Vearch	0.9536	0.9582	0.9585

recalls of VGG10M :

model	recall@1	recall@10	recall@100
pq	0.5842	0.8980	0.9888
ivfpq	0.5913	0.8896	0.9748
imipq	0.5925	0.8878	0.9570
opq+pq	0.6126	0.9160	0.9944
hnsw	0.8877	0.9069	0.9074
ivfhnsw	0.9638	0.9839	0.9843
Vearch	0.9272	0.9464	0.9468

Cluster experiments

First, we do experiments by searching on cluster only with vgg features. Then, we experiment with the vgg features and filter the search using an integer field to compare the time consumed and QPS with the vgg features only. In the following section, we use searching with filter or without filter to specify the experiment method mentioned earlier. For different size of experiment data, we use different Vearch cluster. We use 3 masters, 3 routers and 5 partition services for VGG100M. For VGG500M, we use the same size of master and router with VGG100M but 24 partition services. We use 3 masters, 6 routers and 48 partition services to deal with the VGG1B.

Result

The growth shape of QPS is more like inverted J-shaped curve which means the growth of QPS basically have no obvious change when average latency exceed one certain number.