Skip to content

feat: support array type#345

Open
kaori-seasons wants to merge 10 commits intoapache:masterfrom
kaori-seasons:issue-314
Open

feat: support array type#345
kaori-seasons wants to merge 10 commits intoapache:masterfrom
kaori-seasons:issue-314

Conversation

@kaori-seasons
Copy link

@kaori-seasons kaori-seasons commented Nov 4, 2025

Proposed changes

Issue Number: Related to issue-341

Problem Summary:

Describe the overview of changes.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@JNSimba
Copy link
Member

JNSimba commented Nov 4, 2025

Thank you for your contribution.

  1. Is the inferred data type inaccurate?
  2. Currently, we are converting the Array to a String for reading. Have we encountered any problems with this method?

@kaori-seasons
Copy link
Author

kaori-seasons commented Nov 4, 2025

Thank you for your contribution.

  1. Is the inferred data type inaccurate?
  2. Currently, we are converting the Array to a String for reading. Have we encountered any problems with this method?

Hello, thank you for your attention. In fact, I have already completed the conversion from array to string in the first commit of this PR and implemented relevant fallback strategies. However, after careful consideration, I believe that in scenarios with large amounts of data, type inference with arrow can reduce some performance overhead. What are your thoughts on this issue?

As shown in the PR, I've added a lot of integration tests to ensure that type inference is successful. Do you have any suggestions for this?

case "TIME" => DataTypes.DoubleType
case "STRING" => DataTypes.StringType
case "ARRAY" => DataTypes.StringType
case "ARRAY" => ArrayType(DataTypes.StringType, containsNull = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this change the previous behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the previous state, but the data content will remain compatible.

Core Changes

Previous:
Schema: tags: string
Value: "[\"Alice\",\"Bob\"]" (string)
Array operations were not supported.

Now:
Schema: tags: array
Value: WrappedArray(Alice, Bob) (array)
Explode, array_contains, etc., are supported.

Impact

Schema type change: StringType → ArrayType(StringType)
Value type change: String → ArrayData
Data content compatibility: Elements are still strings ["Alice", "Bob"]

I understand your concerns. User code that relies on StringType checks or uses row.getString() will need to be adapted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants