Skip to content

Conversation

cpegeric
Copy link
Contributor

@cpegeric cpegeric commented Aug 22, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #21835

What this PR does / why we need it:

cdc index update feature added.


PR Type

Enhancement, Tests


Description

• Implements CDC (Change Data Capture) integration for fulltext, IVFFLAT, and HNSW indexes with asynchronous update capabilities
• Adds comprehensive SQL writers for different index types with CDC operations (Insert, Upsert, Delete)
• Introduces IndexConsumer for processing ISCP data and managing index synchronization operations
• Implements HNSW model unification with CDC synchronization functionality and parallel processing support
• Adds async index support in DDL operations with automatic CDC task lifecycle management
• Enhances data type support in ISCP utilities (JSON, arrays, date/time, UUID, etc.)
• Provides extensive test coverage for all new CDC and async index functionality
• Integrates CDC task management into table operations (CREATE, DROP, ALTER, TRUNCATE)
• Adds new HNSW_CDC_UPDATE built-in function for processing CDC operations
• Improves error handling and transaction support across index operations


Diagram Walkthrough

flowchart LR
  A["DDL Operations"] --> B["CDC Task Management"]
  B --> C["Index Consumer"]
  C --> D["Index SQL Writers"]
  D --> E["Vector Index Models"]
  E --> F["Async Index Updates"]
  G["ISCP Data"] --> C
  H["Built-in Functions"] --> E
  I["Test Coverage"] --> A
  I --> C
  I --> E
Loading

File Walkthrough

Relevant files
Feature
15 files
index_sqlwriter.go
Add index SQL writers for CDC operations                                 

pkg/iscp/index_sqlwriter.go

• Implements SQL writers for different index types (Fulltext, IVFFLAT,
HNSW) with CDC operations
• Provides IndexSqlWriter interface with
methods for Insert, Upsert, Delete, and SQL generation
• Includes base
implementation BaseIndexSqlWriter and specialized writers for each
index algorithm
• Handles row serialization and SQL generation for
index update operations

+649/-0 
sync.go
Add HNSW CDC synchronization implementation                           

pkg/vectorindex/hnsw/sync.go

• Implements HNSW index CDC synchronization functionality
• Provides
CdcSync function to update HNSW indexes via CDC data
• Includes
HnswSync struct for managing index updates with parallel processing

Handles model loading, updating, and SQL generation for index
persistence

+676/-0 
model.go
Add HNSW model implementation for index management             

pkg/vectorindex/hnsw/model.go

• Implements HnswModel struct for HNSW index management
• Provides
methods for index operations (Add, Remove, Contains, Search)

Includes file I/O operations for saving/loading index data
• Handles
SQL generation for database persistence and cleanup operations

+524/-0 
build_dml_util.go
Add async index support in DML operations                               

pkg/sql/plan/build_dml_util.go

• Adds async index support by checking IndexAlgoParams for async flag

• Skips synchronous index operations when async mode is enabled

Updates MultiTableIndex struct to include IndexAlgoParams field

Applies async checks to fulltext and IVF index operations

+53/-4   
create.go
Add async option to index creation syntax                               

pkg/sql/parsers/tree/create.go

• Adds Async boolean field to IndexOption struct
• Updates Format
method to include async option in SQL output
• Enables parsing and
formatting of async index creation syntax

+4/-0     
index_consumer.go
Add IndexConsumer for CDC index synchronization                   

pkg/iscp/index_consumer.go

• Implements a new IndexConsumer struct that processes ISCP data for
index updates
• Handles both snapshot and tail data types with
different processing strategies
• Manages SQL generation and execution
for index synchronization operations
• Provides methods for insert,
delete, and upsert operations on index data

+435/-0 
ddl.go
Integrate CDC tasks into DDL operations                                   

pkg/sql/compile/ddl.go

• Integrates CDC task management into DDL operations (CREATE, DROP,
ALTER, TRUNCATE)
• Adds calls to create and drop CDC tasks for vector
and fulltext indexes
• Implements automatic CDC task lifecycle
management during table operations

+80/-3   
cdc_util.go
Add CDC task management utilities                                               

pkg/sql/compile/cdc_util.go

• Implements utility functions for CDC task management (create,
delete, register)
• Provides PITR (Point-in-Time Recovery) creation
and management for indexes
• Handles validation and lifecycle
management of index CDC tasks

+273/-0 
secondary_index_utils.go
Add async parameter support for indexes                                   

pkg/catalog/secondary_index_utils.go

• Adds support for async parameter in index configurations

Implements IsIndexAsync function to check if an index is asynchronous

• Updates parameter parsing to handle async flag for different index
types

+38/-3   
types.go
Add CDC data structures and operations                                     

pkg/vectorindex/types.go

• Defines CDC-related data structures and constants
• Implements
VectorIndexCdc for managing CDC operations (insert, delete, upsert)

Adds JSON serialization support for CDC data structures

+87/-0   
func_hnsw.go
Add HNSW CDC update function implementation                           

pkg/sql/plan/function/func_hnsw.go

• Implements hnswCdcUpdate function for processing HNSW CDC operations

• Handles JSON deserialization of CDC data and calls synchronization
logic
• Provides parameter validation and error handling for CDC
updates

+77/-0   
ddl_index_algo.go
Integrate CDC tasks into index algorithm handling               

pkg/sql/compile/ddl_index_algo.go

• Integrates CDC task creation into fulltext and IVF-flat index
handling
• Adds async parameter checking and CDC task registration

Updates index creation workflow to support asynchronous updates

+29/-1   
sqlexec.go
Add transaction-based SQL execution support                           

pkg/vectorindex/sqlexec/sqlexec.go

• Adds RunTxn function for executing SQL operations within
transactions
• Provides transaction-based SQL execution with proper
context and options

+27/-0   
list_builtIn.go
Register HNSW CDC update function                                               

pkg/sql/plan/function/list_builtIn.go

• Registers the new HNSW_CDC_UPDATE function in the built-in function
list
• Defines function signature and parameter types for CDC update
operations

+21/-0   
function_id.go
Add HNSW CDC update function ID                                                   

pkg/sql/plan/function/function_id.go

• Adds HNSW_CDC_UPDATE function ID and registers it in the function
registry
• Updates function end number to accommodate new function

+7/-1     
Enhancement
3 files
util.go
Enable additional data type support in ISCP utilities       

pkg/iscp/util.go

• Uncomments and enables support for additional data types in row
extraction and SQL conversion
• Adds support for JSON, bit, array
types, date/time types, decimal types, UUID, and other specialized
types
• Includes appendHex function for binary data formatting

Enhances NULL value handling with proper type casting

+134/-127
fulltext.go
Enhance fulltext index tokenization support                           

pkg/sql/plan/fulltext.go

• Enhances fulltext index tokenization to support both table scan and
values scan
• Adds support for composite primary keys in fulltext
operations
• Improves parameter handling and type validation for
fulltext functions

+54/-12 
func_cast.go
Enhance array dimension validation in casting                       

pkg/sql/plan/function/func_cast.go

• Improves array dimension validation in string-to-array casting

Adds proper dimension checking and error reporting for array types

Handles maximum dimension bypass for flexible array operations

+11/-2   
Tests
15 files
index_consumer_test.go
Add test suite for index consumer functionality                   

pkg/iscp/index_consumer_test.go

• Adds comprehensive test suite for index consumer functionality

Includes mock implementations for retriever, SQL executor, and
transaction executor
• Tests HNSW snapshot and tail operations with
various data scenarios
• Validates SQL generation and execution for
index updates

+381/-0 
sync_test.go
Add comprehensive tests for HNSW sync operations                 

pkg/vectorindex/hnsw/sync_test.go

• Provides extensive test coverage for HNSW synchronization operations

• Tests various CDC operations including upsert, delete, insert
scenarios
• Includes tests for multi-file operations and shuffled data
handling
• Validates sync behavior with empty datasets and large data
volumes

+370/-0 
index_sqlwriter_test.go
Add comprehensive tests for index SQL writers                       

pkg/iscp/index_sqlwriter_test.go

• Adds comprehensive test cases for index SQL writers (fulltext, HNSW,
IVF-flat)
• Tests SQL generation for different index types and primary
key configurations
• Validates handling of composite primary keys and
multi-part indexes

+242/-0 
search_test.go
Update HNSW search tests for new model                                     

pkg/vectorindex/hnsw/search_test.go

• Updates test cases to work with new model structure
• Adds mock
functions for testing multi-file scenarios
• Enhances test coverage
for metadata and catalog operations

+106/-0 
model_test.go
Add comprehensive HnswModel tests                                               

pkg/vectorindex/hnsw/model_test.go

• Adds comprehensive tests for the new HnswModel functionality
• Tests
model operations like load, unload, add, remove, and search

Validates SQL generation and file handling capabilities

+206/-0 
func_hnsw_test.go
Add tests for HNSW CDC update function                                     

pkg/sql/plan/function/func_hnsw_test.go

• Adds test cases for the new hnswCdcUpdate function
• Tests various
error conditions and parameter validation scenarios
• Validates
function behavior with null and invalid inputs

+129/-0 
mysql_sql_test.go
Update parser tests for async index support                           

pkg/sql/parsers/dialect/mysql/mysql_sql_test.go

• Updates test cases to include async keyword in index creation
statements
• Validates parsing of async parameter for different index
types (HNSW, IVF-flat, fulltext)

+9/-1     
build_test.go
Update HNSW build tests for new model                                       

pkg/vectorindex/hnsw/build_test.go

• Updates test cases to use HnswModel instead of HnswSearchIndex

Adjusts function calls and type references for the new model structure

+5/-5     
types_test.go
Add tests for CDC data structures                                               

pkg/vectorindex/types_test.go

• Adds tests for CDC data structures and operations
• Validates JSON
serialization and CDC operation methods
• Tests insert, delete, upsert
operations and state management

+63/-0   
vector_ivf_async.result
IVF vector index async functionality test results               

test/distributed/cases/vector/vector_ivf_async.result

• Added comprehensive test results for IVF vector index with ASYNC
functionality
• Tests include creating tables with vector columns,
inserting vector data, and creating async IVF indexes
• Validates
vector similarity search using L2_DISTANCE function with various query
vectors
• Tests both small datasets and large datasets (10k-20k
records) with bulk data loading

+58/-0   
vector_ivf_async.sql
IVF vector index async functionality test cases                   

test/distributed/cases/vector/vector_ivf_async.sql

• Added test cases for IVF vector index with ASYNC support
• Tests
table creation, index creation with ASYNC keyword, and vector
similarity queries
• Includes tests for both small manual inserts and
large bulk data loads
• Validates that async index building works
correctly with concurrent data operations

+59/-0   
vector_hnsw_async.result
HNSW vector index async functionality test results             

test/distributed/cases/vector/vector_hnsw_async.result

• Added test results for HNSW vector index with ASYNC functionality

Tests include CRUD operations (insert, update, delete) with async
index updates
• Validates vector similarity search performance with
large datasets
• Tests concurrent data loading while async index
building is in progress

+66/-0   
vector_hnsw_async.sql
HNSW vector index async functionality test cases                 

test/distributed/cases/vector/vector_hnsw_async.sql

• Added comprehensive test cases for HNSW vector index with ASYNC
support
• Tests CRUD operations with async index updates and vector
similarity queries
• Includes scenarios with concurrent data loading
and index building
• Validates proper handling of insert, update, and
delete operations with async indexes

+96/-0   
fulltext_async.sql
Fulltext index async functionality test cases                       

test/distributed/cases/fulltext/fulltext_async.sql

• Added test cases for fulltext index with ASYNC functionality
• Tests
fulltext search with MATCH...AGAINST queries on async indexes

Includes multilingual content (English and Chinese) for comprehensive
testing
• Tests handling of NULL values in fulltext indexed columns

+21/-0   
fulltext_async.result
Fulltext index async functionality test results                   

test/distributed/cases/fulltext/fulltext_async.result

• Added expected test results for fulltext index with ASYNC
functionality
• Validates fulltext search results using TF-IDF
relevancy algorithm
• Tests search functionality across multiple
columns with async index building
• Confirms proper handling of
multilingual content and NULL values

+19/-0   
Code refactoring
2 files
build.go
Refactor HNSW build to use unified model                                 

pkg/vectorindex/hnsw/build.go

• Refactors HnswBuildIndex to use HnswModel instead of the original
struct
• Removes duplicate code by consolidating index functionality
into shared model
• Updates function signatures and method calls to
use the new model structure

+11/-182
search.go
Refactor HNSW search to use unified model                               

pkg/vectorindex/hnsw/search.go

• Refactors search functionality to use HnswModel instead of
HnswSearchIndex
• Moves metadata loading logic to shared functions

Simplifies search implementation by leveraging unified model structure

+18/-142
Error handling
1 files
util.go
Improve error handling in fulltext SQL generation               

pkg/sql/compile/util.go

• Updates genInsertIndexTableSqlForFullTextIndex to return error
alongside SQL
• Improves error handling in fulltext index SQL
generation

+2/-2     
Bug fix
2 files
watermark_updater.go
Add safety check for empty table ID list                                 

pkg/iscp/watermark_updater.go

• Adds safety check to prevent SQL execution with empty table ID list

• Improves error handling in database cleanup operations

+11/-7   
iteration.go
Improve error handling and context in iteration                   

pkg/iscp/iteration.go

• Adds proper error handling for CollectChanges function
• Sets system
account context for consumer operations

+5/-1     
Additional files
17 files
types.go +4/-0     
types.go +1/-0     
consumer.go +3/-0     
data_retriever.go +8/-0     
mock_consumer.go +1/-1     
types.go +2/-0     
keywords.go +1/-0     
mysql_sql.go +8607/-8632
mysql_sql.y +10/-1   
build_ddl.go +4/-4     
build_show_util.go +5/-0     
function_id_test.go +3/-1     
hnsw.go +6/-4     
types.go +3/-2     
array.result +2/-2     
vector_hnsw.result +1/-1     
vector_index.result +1/-1     

Copy link
Contributor

mergify bot commented Sep 8, 2025

⚠️ The sha of the head commit of this PR conflicts with #22484. Mergify cannot evaluate rules on this PR. ⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants