Skip to content

.Net: New Feature: VectorStore to provide supported key types #13141

@adamsitnik

Description

@adamsitnik

I am currently working on Data Ingestion project, which is more or less an ETL that uses some cloud service to parse the file, some chunkers to split it into chunks, LLMs to extend them with info (keywords/summary etc) and MEVD to store the chunks in the Vector Store (and generate embeddings with MEAI on the fly).

Since the users can specify any number of custom metadata enrichers (add summary, classify, extract keywords), I am using the "dynamic" collections:

https://github.com/adamsitnik/dataingestion/blob/23e97338e0e491598e5b8dc2f7b1690f16309799/src/Microsoft.Extensions.DataIngestion/VectorStoreWriter.cs#L87-L88

And building the definition on the fly:

https://github.com/adamsitnik/dataingestion/blob/23e97338e0e491598e5b8dc2f7b1690f16309799/src/Microsoft.Extensions.DataIngestion/VectorStoreWriter.cs#L118-L152

It works really nice and is easy to use, but I have to ask the users to specify the TKey in explicit way:

https://github.com/adamsitnik/dataingestion/blob/23e97338e0e491598e5b8dc2f7b1690f16309799/src/Samples/Program.cs#L48

I would like to avoid that (so it's super easy to use and get started), and then choose following key generation strategy:

  • for all the vector stores that support Guid as key, generate new Guid when inserting chunk records
  • for all the vector stores that support string as key, generate a guid and get its string representation.

It would be great if VectorStore was capable of exposing information about supported key types. Then I would not need to ask the users to specify that.

My API proposal is to extend VectorStoreMetadata with IReadOnlyList<Type> SupportedKeyTypes property.

public class VectorStoreMetadata
{
    public string? VectorStoreSystemName { get; init; }
    public string? VectorStoreName { get; init; }
+   pulict IReadOnlyList<Type> SupportedKeyTypes { get; init; }
}

IReadOnlyList<Type> backed by array should be just enough to quickly check if given type is supported. HashSet would be better if there was more types, for this case it could just slow down the startup time by compiling another type.

cc @roji @westey-m please let me know what do you think, I am more than happy to send a PR.

Metadata

Metadata

Assignees

Labels

.NETIssue or Pull requests regarding .NET codemsft.ext.vectordataRelated to Microsoft.Extensions.VectorDataneeds_port_to_pythonIndicate this item needs to also be done for Python

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions