[Fix](connector) Fix schema size mismatch caused by Doris internal columns#351
Open
dingyufei615 wants to merge 1 commit intoapache:masterfrom
Open
[Fix](connector) Fix schema size mismatch caused by Doris internal columns#351dingyufei615 wants to merge 1 commit intoapache:masterfrom
dingyufei615 wants to merge 1 commit intoapache:masterfrom
Conversation
3 tasks
805389a to
16aa419
Compare
Author
JNSimba
reviewed
Feb 3, 2026
Comment on lines
+148
to
+150
| if (fieldVectors.size() < schema.size()) { | ||
| logger.error("Arrow field size '{}' is less than data schema size '{}'.", | ||
| fieldVectors.size(), schema.size()); |
Member
There was a problem hiding this comment.
I remember there was a version of Doris where the schema would return delete_sign, but the data wasn't actually there. Could this change cause a problem?
Author
There was a problem hiding this comment.
Thank you for the review
The fix handles both scenarios safely:
Scenario 1 (Issue #349): Arrow has extra internal columns
fieldVectors.size() > schema.size()- Behavior: Log warning, continue processing
- Fixes the reported issue
Scenario 2 (Your concern): Schema has columns missing in Arrow data
fieldVectors.size() < schema.size()- Behavior: Throw exception immediately (line 148-152)
- Maintains fail-fast behavior
Code logic:
// Still throws exception when Arrow data is missing expected columns
if (fieldVectors.size() < schema.size()) {
throw new DorisException("Load Doris data failed, schema size of fetch data is wrong.");
}
// Only allows extra columns (Doris 2.0+ internal columns)
if (fieldVectors.size() > schema.size()) {
logger.warn("This may be due to internal columns in Doris 2.0+...");
}Could you share which Doris version had the issue you mentioned? I'd like to verify if it still exists and add test coverage if needed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Fix] Fix Arrow field count mismatch with schema causing read failures
Proposed changes
Issue Number: close #349
Problem Summary:
Fixed the
DorisException: Load Doris data failed, schema size of fetch data is wrongerror that occurs when reading data from Doris 2.0+ using Spark Doris Connector, caused by Arrow returning more fields than defined in the schema.Root Cause
Doris 2.0+ includes internal system columns (such as
__DORIS_DELETE_SIGN__) in the Arrow data stream for certain table types (e.g., Unique Key tables). These columns are used for Merge-on-Read implementation but should not be visible to users. The original strict validation logicfieldVectors.size() > schema.size()would throw an exception immediately, preventing normal data reading.Solution
>check to only throw exceptions whenfieldVectors.size() < schema.size()(actual error scenario)fieldVectors.size() > schema.size()readBatch()andconvertArrowToRowBatch(), only process columns defined in schema, ignoring extra internal columnsChanges Made
File:
spark-doris-connector-base/src/main/java/org/apache/doris/spark/client/read/RowBatch.javareadBatch()method:schema.size()instead offieldVectors.size()to initialize Row objectsconvertArrowToRowBatch()method:schema.size()fields, ignoring extra internal columnsChecklist(Required)
Further comments
Testing & Verification
This fix has been verified in the following environment:
__DORIS_DELETE_SIGN__internal column)Impact Scope
Related Issues
This issue has been reported in the community:
__DORIS_DELETE_SIGN__org.apache.doris.spark.exception.DorisException: Load Doris data failed, schema size of fetch data is wrong