feat: support array type by kaori-seasons · Pull Request #345 · apache/doris-spark-connector

kaori-seasons · 2025-11-04T01:49:26Z

Proposed changes

Issue Number: Related to issue-341

Problem Summary:

Describe the overview of changes.

Checklist(Required)

Does it affect the original behavior: (Yes/No/I Don't know)
Has unit tests been added: (Yes/No/No Need)
Has document been added or modified: (Yes/No/No Need)
Does it need to update dependencies: (Yes/No)
Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

JNSimba · 2025-11-04T02:26:31Z

Thank you for your contribution.

Is the inferred data type inaccurate?
Currently, we are converting the Array to a String for reading. Have we encountered any problems with this method?

kaori-seasons · 2025-11-04T02:33:52Z

Thank you for your contribution.

Is the inferred data type inaccurate?

Currently, we are converting the Array to a String for reading. Have we encountered any problems with this method?

Hello, thank you for your attention. In fact, I have already completed the conversion from array to string in the first commit of this PR and implemented relevant fallback strategies. However, after careful consideration, I believe that in scenarios with large amounts of data, type inference with arrow can reduce some performance overhead. What are your thoughts on this issue?

As shown in the PR, I've added a lot of integration tests to ensure that type inference is successful. Do you have any suggestions for this?

JNSimba · 2025-11-13T11:12:13Z

...spark-doris-connector-base/src/main/scala/org/apache/doris/spark/util/SchemaConvertors.scala

      case "TIME" => DataTypes.DoubleType
      case "STRING" => DataTypes.StringType
-      case "ARRAY" => DataTypes.StringType
+      case "ARRAY" => ArrayType(DataTypes.StringType, containsNull = true)


Could this change the previous behavior?

This will change the previous state, but the data content will remain compatible.

Core Changes

Previous:
Schema: tags: string
Value: "[\"Alice\",\"Bob\"]" (string)
Array operations were not supported.

Now:
Schema: tags: array
Value: WrappedArray(Alice, Bob) (array)
Explode, array_contains, etc., are supported.

Impact

Schema type change: StringType → ArrayType(StringType)
Value type change: String → ArrayData
Data content compatibility: Elements are still strings ["Alice", "Bob"]

I understand your concerns. User code that relies on StringType checks or uses row.getString() will need to be adapted.

kaori-seasons added 3 commits November 4, 2025 09:41

chore: support array type

8fcd4a4

chore: enhance infertype

ed5e3af

chore: type compatiable

efd8e62

kaori-seasons added 6 commits November 4, 2025 10:53

fix: compile error

35ba4e4

chore: add optional conf

a0d96f9

fix: testArrayTypeInferenceNested tests

b9799a8

enhance: performce optimize

da79ed5

bugfix: fix tests

5d4b632

bugfix : time type error

43e263e

JNSimba reviewed Nov 13, 2025

View reviewed changes

bugfix: convertListToArrayData function && temproy view clean

a0fbc82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support array type#345

feat: support array type#345
kaori-seasons wants to merge 10 commits intoapache:masterfrom
kaori-seasons:issue-314

kaori-seasons commented Nov 4, 2025 •

edited

Loading

Uh oh!

JNSimba commented Nov 4, 2025

Uh oh!

kaori-seasons commented Nov 4, 2025 •

edited

Loading

Uh oh!

JNSimba Nov 13, 2025

Uh oh!

kaori-seasons Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kaori-seasons commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Problem Summary:

Checklist(Required)

Further comments

Uh oh!

JNSimba commented Nov 4, 2025

Uh oh!

kaori-seasons commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JNSimba Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

kaori-seasons Nov 17, 2025

Choose a reason for hiding this comment

Core Changes

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaori-seasons commented Nov 4, 2025 •

edited

Loading

kaori-seasons commented Nov 4, 2025 •

edited

Loading