Skip to content

Commit e8312dc

Browse files
VectorDBBench 1.0 (#543)
This commit marks the milestone release of VectorDBBench 1.0, introducing a wide range of new features, major enhancements, and updated benchmarks. Key changes include: - UI: Introduce a brand new homepage and navigation bar. The new design integrates powerful front-end pages for intuitive test result analysis and visualization. - Cases: Add new label-filter test cases. This allows testing search performance with metadata filters using expressions like color == "red". Initial support includes Milvus, Zilliz Cloud, Elasticsearch Cloud, Qdrant Cloud, Pinecone, and OpenSearch (AWS). - Cases: Implement new streaming test cases. These cases are designed to measure search performance while data is actively being inserted, simulating real-world "read-while-writing" scenarios. - Dataset: Add the new BioASQ dataset. This dataset is 1024-dimensional and comes in 1M and 10M sizes, enriching the diversity of our test data. - Custom Dataset: Enhance the custom dataset functionality. Users now have more flexible configuration options to simulate their own data distributions and schemas better. - New Results: Re-run and update all benchmark results for `Milvus`, `ZillizCloud`, `ElasticCloud`, `QdrantCloud`, `Pinecone`, and `OpenSearch(AWS)` to reflect their latest performance on the new test cases.
1 parent be65129 commit e8312dc

File tree

79 files changed

+34583
-3546
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+34583
-3546
lines changed

.github/workflows/pull_request.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ on:
44
pull_request:
55
branches:
66
- main
7+
- vdbbench_*
78

89
jobs:
910
build:

README.md

Lines changed: 13 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -426,52 +426,35 @@ The standard benchmark results displayed here include all 15 cases that we curre
426426

427427
All standard benchmark results are generated by a client running on an 8 core, 32 GB host, which is located in the same region as the server being tested. The client host is equipped with an `Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz` processor. Also all the servers for the open-source systems tested in our benchmarks run on hosts with the same type of processor.
428428
### Run Test Page
429-
![image](https://github.com/zilliztech/VectorDBBench/assets/105927039/f3135a29-8f12-4aac-bbb3-f2f55e2a2ff0)
430-
This is the page to run a test:
431429
1. Initially, you select the systems to be tested - multiple selections are allowed. Once selected, corresponding forms will pop up to gather necessary information for using the chosen databases. The db_label is used to differentiate different instances of the same system. We recommend filling in the host size or instance type here (as we do in our standard results).
432430
2. The next step is to select the test cases you want to perform. You can select multiple cases at once, and a form to collect corresponding parameters will appear.
433431
3. Finally, you'll need to provide a task label to distinguish different test results. Using the same label for different tests will result in the previous results being overwritten.
434432
Now we can only run one task at the same time.
433+
![image](fig/run_test_select_db.png)
434+
![image](fig/run_test_select_case.png)
435+
![image](fig/run_test_submit.png)
436+
435437

436438
## Module
437439
### Code Structure
438440
![image](https://github.com/zilliztech/VectorDBBench/assets/105927039/8c06512e-5419-4381-b084-9c93aed59639)
439441
### Client
440-
Our client module is designed with flexibility and extensibility in mind, aiming to integrate APIs from different systems seamlessly. As of now, it supports Milvus, Zilliz Cloud, Elastic Search, Pinecone, Qdrant Cloud, Weaviate Cloud, PgVector, Redis, and Chroma. Stay tuned for more options, as we are consistently working on extending our reach to other systems.
442+
Our client module is designed with flexibility and extensibility in mind, aiming to integrate APIs from different systems seamlessly. As of now, it supports Milvus, Zilliz Cloud, Elastic Search, Pinecone, Qdrant Cloud, Weaviate Cloud, PgVector, Redis, Chroma, etc. Stay tuned for more options, as we are consistently working on extending our reach to other systems.
441443
### Benchmark Cases
442-
We've developed an array of 15 comprehensive benchmark cases to test vector databases' various capabilities, each designed to give you a different piece of the puzzle. These cases are categorized into three main types:
444+
We've developed lots of comprehensive benchmark cases to test vector databases' various capabilities, each designed to give you a different piece of the puzzle. These cases are categorized into four main types:
443445
#### Capacity Case
444446
- **Large Dim:** Tests the database's loading capacity by inserting large-dimension vectors (GIST 100K vectors, 960 dimensions) until fully loaded. The final number of inserted vectors is reported.
445447
- **Small Dim:** Similar to the Large Dim case but uses small-dimension vectors (SIFT 500K vectors, 128 dimensions).
446448
#### Search Performance Case
447449
- **XLarge Dataset:** Measures search performance with a massive dataset (LAION 100M vectors, 768 dimensions) at varying parallel levels. The results include index building time, recall, latency, and maximum QPS.
448-
- **Large Dataset:** Similar to the XLarge Dataset case, but uses a slightly smaller dataset (10M-768dim, 5M-1536dim).
449-
- **Medium Dataset:** A case using a medium dataset (1M-768dim, 500K-1536dim).
450+
- **Large Dataset:** Similar to the XLarge Dataset case, but uses a slightly smaller dataset (10M-1024dim, 10M-768dim, 5M-1536dim).
451+
- **Medium Dataset:** A case using a medium dataset (1M-1024dim, 1M-768dim, 500K-1536dim).
452+
- **Small Dataset:** For development (100K-768dim, 50K-1536dim).
450453
#### Filtering Search Performance Case
451-
- **Large Dataset, Low Filtering Rate:** Evaluates search performance with a large dataset (10M-768dim, 5M-1536dim) under a low filtering rate (1% vectors) at different parallel levels.
452-
- **Medium Dataset, Low Filtering Rate:** This case uses a medium dataset (1M-768dim, 500K-1536dim) with a similar low filtering rate.
453-
- **Large Dataset, High Filtering Rate:** It tests with a large dataset (10M-768dim, 5M-1536dim) but under a high filtering rate (99% vectors).
454-
- **Medium Dataset, High Filtering Rate:** This case uses a medium dataset (1M-768dim, 500K-1536dim) with a high filtering rate.
455-
For a quick reference, here is a table summarizing the key aspects of each case:
456-
457-
Case No. | Case Type | Dataset Size | Filtering Rate | Results |
458-
|----------|-----------|--------------|----------------|---------|
459-
1 | Capacity Case | SIFT 500K vectors, 128 dimensions | N/A | Number of inserted vectors |
460-
2 | Capacity Case | GIST 100K vectors, 960 dimensions | N/A | Number of inserted vectors |
461-
3 | Search Performance Case | LAION 100M vectors, 768 dimensions | N/A | Index building time, recall, latency, maximum QPS |
462-
4 | Search Performance Case | Cohere 10M vectors, 768 dimensions | N/A | Index building time, recall, latency, maximum QPS |
463-
5 | Search Performance Case | Cohere 1M vectors, 768 dimensions | N/A | Index building time, recall, latency, maximum QPS |
464-
6 | Filtering Search Performance Case | Cohere 10M vectors, 768 dimensions | 1% vectors | Index building time, recall, latency, maximum QPS |
465-
7 | Filtering Search Performance Case | Cohere 1M vectors, 768 dimensions | 1% vectors | Index building time, recall, latency, maximum QPS |
466-
8 | Filtering Search Performance Case | Cohere 10M vectors, 768 dimensions | 99% vectors | Index building time, recall, latency, maximum QPS |
467-
9 | Filtering Search Performance Case | Cohere 1M vectors, 768 dimensions | 99% vectors | Index building time, recall, latency, maximum QPS |
468-
10 | Search Performance Case | OpenAI generated 500K vectors, 1536 dimensions | N/A | Index building time, recall, latency, maximum QPS |
469-
11 | Search Performance Case | OpenAI generated 5M vectors, 1536 dimensions | N/A | Index building time, recall, latency, maximum QPS |
470-
12 | Filtering Search Performance Case | OpenAI generated 500K vectors, 1536 dimensions | 1% vectors | Index building time, recall, latency, maximum QPS |
471-
13 | Filtering Search Performance Case | OpenAI generated 5M vectors, 1536 dimensions | 1% vectors | Index building time, recall, latency, maximum QPS |
472-
14 | Filtering Search Performance Case | OpenAI generated 500K vectors, 1536 dimensions | 99% vectors | Index building time, recall, latency, maximum QPS |
473-
15 | Filtering Search Performance Case | OpenAI generated 5M vectors, 1536 dimensions | 99% vectors | Index building time, recall, latency, maximum QPS |
474-
454+
- **Int-Filter Cases:** Evaluates search performance with int-based filter expression (e.g. "id >= 2,000").
455+
- **Label-Filter Cases:** Evaluates search performance with label-based filter expressions (e.g., "color == 'red'"). The test includes randomly generated labels to simulate real-world filtering scenarios.
456+
#### Streaming Cases
457+
- **Insertion-Under-Load Case:** Evaluates search performance while maintaining a constant insertion workload. VectorDBBench applies a steady stream of insert requests at a fixed rate to simulate real-world scenarios where search operations must perform reliably under continuous data ingestion.
475458

476459
Each case provides an in-depth examination of a vector database's abilities, providing you a comprehensive view of the database's performance.
477460

fig/homepage/bar-chart.png

79.3 KB
Loading

fig/homepage/concurrent.png

202 KB
Loading

fig/homepage/custom.png

73.8 KB
Loading

fig/homepage/label_filter.png

120 KB
Loading

fig/homepage/qp$.png

72 KB
Loading

fig/homepage/run_test.png

545 KB
Loading

fig/homepage/streaming.png

42.7 KB
Loading

fig/homepage/table.png

168 KB
Loading

fig/run_test_select_case.png

250 KB
Loading

fig/run_test_select_db.png

249 KB
Loading

fig/run_test_submit.png

48.9 KB
Loading

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ dependencies = [
3535
"psutil",
3636
"polars",
3737
"plotly",
38-
"environs<14.1.0",
38+
"environs",
3939
"pydantic<v2",
4040
"scikit-learn",
4141
"pymilvus", # with pandas, numpy, ujson

vectordb_bench/__init__.py

Lines changed: 14 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -18,37 +18,16 @@ class config:
1818
DEFAULT_DATASET_URL = env.str("DEFAULT_DATASET_URL", AWS_S3_URL)
1919
DATASET_LOCAL_DIR = env.path("DATASET_LOCAL_DIR", "/tmp/vectordb_bench/dataset")
2020
NUM_PER_BATCH = env.int("NUM_PER_BATCH", 100)
21+
TIME_PER_BATCH = 1 # 1s. for streaming insertion.
22+
MAX_INSERT_RETRY = 5
23+
MAX_SEARCH_RETRY = 5
24+
25+
LOAD_MAX_TRY_COUNT = 10
2126

2227
DROP_OLD = env.bool("DROP_OLD", True)
2328
USE_SHUFFLED_DATA = env.bool("USE_SHUFFLED_DATA", True)
2429

25-
NUM_CONCURRENCY = env.list(
26-
"NUM_CONCURRENCY",
27-
[
28-
1,
29-
5,
30-
10,
31-
15,
32-
20,
33-
25,
34-
30,
35-
35,
36-
40,
37-
45,
38-
50,
39-
55,
40-
60,
41-
65,
42-
70,
43-
75,
44-
80,
45-
85,
46-
90,
47-
95,
48-
100,
49-
],
50-
subcast=int,
51-
)
30+
NUM_CONCURRENCY = env.list("NUM_CONCURRENCY", [1, 5, 10, 20, 30, 40, 60, 80], subcast=int)
5231

5332
CONCURRENCY_DURATION = 30
5433

@@ -68,21 +47,29 @@ class config:
6847

6948
CAPACITY_TIMEOUT_IN_SECONDS = 24 * 3600 # 24h
7049
LOAD_TIMEOUT_DEFAULT = 24 * 3600 # 24h
50+
LOAD_TIMEOUT_768D_100K = 24 * 3600 # 24h
7151
LOAD_TIMEOUT_768D_1M = 24 * 3600 # 24h
7252
LOAD_TIMEOUT_768D_10M = 240 * 3600 # 10d
7353
LOAD_TIMEOUT_768D_100M = 2400 * 3600 # 100d
7454

7555
LOAD_TIMEOUT_1536D_500K = 24 * 3600 # 24h
7656
LOAD_TIMEOUT_1536D_5M = 240 * 3600 # 10d
7757

58+
LOAD_TIMEOUT_1024D_1M = 24 * 3600 # 24h
59+
LOAD_TIMEOUT_1024D_10M = 240 * 3600 # 10d
60+
7861
OPTIMIZE_TIMEOUT_DEFAULT = 24 * 3600 # 24h
62+
OPTIMIZE_TIMEOUT_768D_100K = 24 * 3600 # 24h
7963
OPTIMIZE_TIMEOUT_768D_1M = 24 * 3600 # 24h
8064
OPTIMIZE_TIMEOUT_768D_10M = 240 * 3600 # 10d
8165
OPTIMIZE_TIMEOUT_768D_100M = 2400 * 3600 # 100d
8266

8367
OPTIMIZE_TIMEOUT_1536D_500K = 24 * 3600 # 24h
8468
OPTIMIZE_TIMEOUT_1536D_5M = 240 * 3600 # 10d
8569

70+
OPTIMIZE_TIMEOUT_1024D_1M = 24 * 3600 # 24h
71+
OPTIMIZE_TIMEOUT_1024D_10M = 240 * 3600 # 10d
72+
8673
def display(self) -> str:
8774
return [
8875
i

vectordb_bench/backend/assembler.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
import logging
22

3-
from vectordb_bench.backend.clients import EmptyDBCaseConfig
3+
from vectordb_bench.backend.clients import DB, EmptyDBCaseConfig
44
from vectordb_bench.backend.data_source import DatasetSource
5+
from vectordb_bench.backend.filter import FilterOp
56
from vectordb_bench.models import TaskConfig
67

78
from .cases import CaseLabel
@@ -10,6 +11,13 @@
1011
log = logging.getLogger(__name__)
1112

1213

14+
class FilterNotSupportedError(ValueError):
15+
"""Raised when a filter type is not supported by a vector database."""
16+
17+
def __init__(self, db_name: str, filter_type: FilterOp):
18+
super().__init__(f"{filter_type} Filter test is not supported by {db_name}.")
19+
20+
1321
class Assembler:
1422
@classmethod
1523
def assemble(cls, run_id: str, task: TaskConfig, source: DatasetSource) -> CaseRunner:
@@ -39,25 +47,30 @@ def assemble_all(
3947
runners = [cls.assemble(run_id, task, source) for task in tasks]
4048
load_runners = [r for r in runners if r.ca.label == CaseLabel.Load]
4149
perf_runners = [r for r in runners if r.ca.label == CaseLabel.Performance]
50+
streaming_runners = [r for r in runners if r.ca.label == CaseLabel.Streaming]
4251

4352
# group by db
44-
db2runner = {}
53+
db2runner: dict[DB, list[CaseRunner]] = {}
4554
for r in perf_runners:
4655
db = r.config.db
4756
if db not in db2runner:
4857
db2runner[db] = []
4958
db2runner[db].append(r)
5059

51-
# check dbclient installed
52-
for k in db2runner:
53-
_ = k.init_cls
60+
# check
61+
for db, runners in db2runner.items():
62+
db_instance = db.init_cls
63+
for runner in runners:
64+
if not db_instance.filter_supported(runner.ca.filters):
65+
raise FilterNotSupportedError(db.value, runner.ca.filters.type)
5466

5567
# sort by dataset size
5668
for _, runner in db2runner.items():
57-
runner.sort(key=lambda x: x.ca.dataset.data.size)
69+
runner.sort(key=lambda x: (x.ca.dataset.data.size, 0 if x.ca.filters.type == FilterOp.StrEqual else 1))
5870

5971
all_runners = []
6072
all_runners.extend(load_runners)
73+
all_runners.extend(streaming_runners)
6174
for v in db2runner.values():
6275
all_runners.extend(v)
6376

0 commit comments

Comments
 (0)