fix: store multimodal_processed in separate KV namespace to prevent DocProcessingStatus errors#253
Open
peterCheng123321 wants to merge 1 commit intoHKUDS:mainfrom
Conversation
LightRAG's Server API deserializes doc_status records into DocProcessingStatus dataclass objects. Because RAGAnything was injecting a 'multimodal_processed' key directly into those records, any version of the dataclass that did not declare that field raised: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' causing 500 errors on /documents/paginated and similar endpoints. Fix: introduce a dedicated 'raganything_multimodal_status' KV namespace (same storage class as parse_cache) to hold per-document multimodal processing state. LightRAG's own doc_status records are no longer modified with extra fields, so DocProcessingStatus deserialization always succeeds. All read/write paths in processor.py are updated accordingly. Fixes HKUDS#91 Fixes HKUDS#119 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
|
Thanks for working on this. I reviewed the separate I don't think this should be merged as an isolated small fix yet. Moving Before this can land, I think we need at least:
So I would hold this for a design pass rather than merging it directly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #91, fixes #119.
RAGAnything was injecting a
multimodal_processedfield directly into LightRAG'sdoc_statusKV records. The LightRAG Server API deserializes those records intoDocProcessingStatusdataclass objects. Any LightRAG version whoseDocProcessingStatusdoes not declaremultimodal_processedraised:Fix: introduce a dedicated
raganything_multimodal_statusKV namespace (same storage class asparse_cache) to hold per-document multimodal processing state. LightRAG's owndoc_statusrecords are no longer modified with RAGAnything-specific fields, soDocProcessingStatusdeserialization always succeeds regardless of LightRAG version.Changes:
raganything/raganything.py— addmultimodal_statusfield, initialize in both pre-provided and newly-created LightRAG paths, finalize infinalize_storagesraganything/processor.py— allmultimodal_processedreads/writes now go throughself.multimodal_status;doc_statusupserts no longer contain this fieldTest plan
/documents/paginatedon LightRAG Server — confirm no 500 erroradelete_by_doc_idstill works after processingis_document_fully_processed()andget_document_processing_status()return correct values