Conversation
|
Thank you for your contribution.
|
Hello, thank you for your attention. In fact, I have already completed the conversion from array to string in the first commit of this PR and implemented relevant fallback strategies. However, after careful consideration, I believe that in scenarios with large amounts of data, type inference with arrow can reduce some performance overhead. What are your thoughts on this issue? As shown in the PR, I've added a lot of integration tests to ensure that type inference is successful. Do you have any suggestions for this? |
| case "TIME" => DataTypes.DoubleType | ||
| case "STRING" => DataTypes.StringType | ||
| case "ARRAY" => DataTypes.StringType | ||
| case "ARRAY" => ArrayType(DataTypes.StringType, containsNull = true) |
There was a problem hiding this comment.
Could this change the previous behavior?
There was a problem hiding this comment.
This will change the previous state, but the data content will remain compatible.
Core Changes
Previous:
Schema: tags: string
Value: "[\"Alice\",\"Bob\"]" (string)
Array operations were not supported.
Now:
Schema: tags: array
Value: WrappedArray(Alice, Bob) (array)
Explode, array_contains, etc., are supported.
Impact
Schema type change: StringType → ArrayType(StringType)
Value type change: String → ArrayData
Data content compatibility: Elements are still strings ["Alice", "Bob"]
I understand your concerns. User code that relies on StringType checks or uses row.getString() will need to be adapted.
Proposed changes
Issue Number: Related to issue-341
Problem Summary:
Describe the overview of changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...