-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
I am currently working on Data Ingestion project, which is more or less an ETL that uses some cloud service to parse the file, some chunkers to split it into chunks, LLMs to extend them with info (keywords/summary etc) and MEVD to store the chunks in the Vector Store (and generate embeddings with MEAI on the fly).
Since the users can specify any number of custom metadata enrichers (add summary, classify, extract keywords), I am using the "dynamic" collections:
And building the definition on the fly:
It works really nice and is easy to use, but I have to ask the users to specify the TKey
in explicit way:
I would like to avoid that (so it's super easy to use and get started), and then choose following key generation strategy:
- for all the vector stores that support Guid as key, generate new Guid when inserting chunk records
- for all the vector stores that support string as key, generate a guid and get its string representation.
It would be great if VectorStore
was capable of exposing information about supported key types. Then I would not need to ask the users to specify that.
My API proposal is to extend VectorStoreMetadata
with IReadOnlyList<Type> SupportedKeyTypes
property.
public class VectorStoreMetadata
{
public string? VectorStoreSystemName { get; init; }
public string? VectorStoreName { get; init; }
+ pulict IReadOnlyList<Type> SupportedKeyTypes { get; init; }
}
IReadOnlyList<Type>
backed by array should be just enough to quickly check if given type is supported. HashSet
would be better if there was more types, for this case it could just slow down the startup time by compiling another type.
cc @roji @westey-m please let me know what do you think, I am more than happy to send a PR.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status