Skip to content

Test/test recall commit0303#196

Open
liututu12 wants to merge 16 commits intomainfrom
test/test_recall_commit0303
Open

Test/test recall commit0303#196
liututu12 wants to merge 16 commits intomainfrom
test/test_recall_commit0303

Conversation

@liututu12
Copy link
Collaborator

Add recall-related test cases and optimize historical DML/DQL test cases.

@greptile-apps
Copy link

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR adds comprehensive recall testing capabilities and optimizes existing DML/DQL tests. The changes include enhanced distance calculations for low-precision types (FP16/INT8), new helper functions for generating recall-specific test data, and a new test file with extensive recall validation logic.

Critical Issues:

  • support_helper.py: VECTOR_DIMENSION_1024 = 4 appears incorrect (should be 1024)
  • fixture_helper.py: Undefined variable DVECTOR_DIMENSION_1024 will cause runtime error
  • fixture_helper.py: Logic bug with tuples in list comparison that will never match
  • distance_helper.py: Debug print statements left in code

Positive Changes:

  • Fixed two bugs in test_collection_dml.py where doc.id == doc.id comparisons were always true
  • Added comprehensive recall testing framework with ground truth computation
  • Improved numerical stability for FP16/INT8 distance calculations

Confidence Score: 2/5

  • This PR has several critical issues that need resolution before merging
  • Score reflects two syntax errors (undefined variable, incorrect constant value), two logic bugs (tuple matching, commented assertion), duplicate declarations, and debug code
  • fixture_helper.py and support_helper.py require immediate attention for critical bugs

Important Files Changed

Filename Overview
python/tests/detail/distance_helper.py enhanced distance calculations for FP16/INT8 with better numerical stability, added distance_recall function; contains debug print statements that should be removed
python/tests/detail/doc_helper.py added recall-specific vector generation functions and refactored update logic; has duplicate variable declarations in two functions
python/tests/detail/fixture_helper.py added complex conditional logic for index types and new 1024-dimension fixture; contains undefined variable and logic bug with tuple matching
python/tests/detail/support_helper.py added VECTOR_DIMENSION_1024 constant; value is 4 instead of 1024 which appears incorrect
python/tests/detail/test_collection_dml.py fixed two bugs where doc.id was compared to itself instead of insert_doc.id and update_doc_partial.id
python/tests/detail/test_collection_dql.py commented out assertion checking for non-negative scores; needs verification that this doesn't hide issues
python/tests/detail/test_collection_recall.py new comprehensive recall testing with ground truth computation for various index types and metrics; well-structured with helper functions

Last reviewed commit: ec8cde3

Comment on lines +169 to +170
print("dim,vec1,vec2:\n")
print(dim,vec1,vec2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove debug print statements before committing

Suggested change
print("dim,vec1,vec2:\n")
print(dim,vec1,vec2)
# Process dimension intersection for sparse vectors


DEFAULT_VECTOR_DIMENSION = 128

VECTOR_DIMENSION_1024 = 4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VECTOR_DIMENSION_1024 = 4 appears incorrect - should this be 1024?

Suggested change
VECTOR_DIMENSION_1024 = 4
VECTOR_DIMENSION_1024 = 1024

VectorSchema(
v,
k,
dimension=DVECTOR_DIMENSION_1024,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVECTOR_DIMENSION_1024 is undefined - should be VECTOR_DIMENSION_1024

Suggested change
dimension=DVECTOR_DIMENSION_1024,
dimension=VECTOR_DIMENSION_1024,

Comment on lines +146 to +151
(True, True, IVFIndexParam(metric_type=MetricType.COSINE, n_list=150, n_iters=15, use_soar=False, )),

(True, True, HnswIndexParam(metric_type=MetricType.COSINE, m=24, ef_construction=150, )),
(True, True, HnswIndexParam(metric_type=MetricType.L2, m=32, ef_construction=200, )),
(True, True, FlatIndexParam(metric_type=MetricType.COSINE, )),
(True, True, FlatIndexParam(metric_type=MetricType.L2, )),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuples in this list will never match vector_index_param (which is just an IndexParam object, not a tuple). remove the tuple wrappers or fix the comparison logic

Suggested change
(True, True, IVFIndexParam(metric_type=MetricType.COSINE, n_list=150, n_iters=15, use_soar=False, )),
(True, True, HnswIndexParam(metric_type=MetricType.COSINE, m=24, ef_construction=150, )),
(True, True, HnswIndexParam(metric_type=MetricType.L2, m=32, ef_construction=200, )),
(True, True, FlatIndexParam(metric_type=MetricType.COSINE, )),
(True, True, FlatIndexParam(metric_type=MetricType.L2, )),
IVFIndexParam(metric_type=MetricType.COSINE, n_list=150, n_iters=15, use_soar=False, ),
HnswIndexParam(metric_type=MetricType.COSINE, m=24, ef_construction=150, ),
HnswIndexParam(metric_type=MetricType.L2, m=32, ef_construction=200, ),
FlatIndexParam(metric_type=MetricType.COSINE, ),
FlatIndexParam(metric_type=MetricType.L2, ),

Comment on lines 108 to 111
doc_fields = {}
doc_vectors = {}
doc_fields, doc_vectors = generate_vectordict(i, schema)
doc = Doc(id=str(i), fields=doc_fields, vectors=doc_vectors)
return doc


def generate_update_doc(i: int, schema: CollectionSchema) -> Doc:
doc_fields = {}
doc_vectors = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate variable declarations - remove lines 110-111

Suggested change
doc_fields = {}
doc_vectors = {}
doc_fields, doc_vectors = generate_vectordict(i, schema)
doc = Doc(id=str(i), fields=doc_fields, vectors=doc_vectors)
return doc
def generate_update_doc(i: int, schema: CollectionSchema) -> Doc:
doc_fields = {}
doc_vectors = {}
def generate_vectordict_recall(i: int, schema: CollectionSchema) -> Doc:
doc_fields = {}
doc_vectors = {}

Comment on lines +171 to +174
doc_fields = {}
doc_vectors = {}
doc_fields = {}
doc_vectors = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate variable declarations - remove lines 173-174

Suggested change
doc_fields = {}
doc_vectors = {}
doc_fields = {}
doc_vectors = {}
def generate_vectordict_update(i: int, schema: CollectionSchema) -> Doc:
doc_fields = {}
doc_vectors = {}

)
assert hasattr(found_doc, "score")
assert found_doc.score >= 0.0
#assert found_doc.score >= 0.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify this assertion should be removed or if it's hiding a real issue that needs fixing


from zvec.model.schema import FieldSchema, VectorSchema
from zvec.extension import RrfReRanker, WeightedReRanker, QwenReRanker
from distance_helper import *
from distance_helper import *

from zvec import StatusCode
from distance_helper import *

from zvec import StatusCode
from distance_helper import *
from fixture_helper import *
from zvec import StatusCode
from distance_helper import *
from fixture_helper import *
from doc_helper import *
from distance_helper import *
from fixture_helper import *
from doc_helper import *
from params_helper import *

import pytest

from zvec.typing import DataType, StatusCode, MetricType, QuantizeType
import pytest

from zvec.typing import DataType, StatusCode, MetricType, QuantizeType
from zvec.model import Collection, Doc, VectorQuery
Comment on lines +19 to +27
from zvec.model.param import (
CollectionOption,
InvertIndexParam,
HnswIndexParam,
FlatIndexParam,
IVFIndexParam,
HnswQueryParam,
IVFQueryParam,
)
IVFQueryParam,
)

from zvec.model.schema import FieldSchema, VectorSchema

from typing import Any, Generator

from zvec.typing import DataType, StatusCode, MetricType, QuantizeType
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants