[SPARK-56257][PYTHON][CONNECT] Support DataFrame input for spark.read.json/csv/xml by Yicong-Huang · Pull Request #55057 · apache/spark

Yicong-Huang · 2026-03-27T10:12:20Z

What changes were proposed in this pull request?

Allow spark.read.json(), spark.read.csv(), and spark.read.xml() to accept a DataFrame with a single string column as input. Connect supports JSON and CSV only — XML will be added in a follow-up PR after extending the Parse proto.

Why are the changes needed?

Parsing in-memory text data into a structured DataFrame currently requires sc.parallelize(), which is unavailable on Spark Connect. This is the inverse of DataFrame.toJSON().

Does this PR introduce any user-facing change?

Yes. spark.read.json(), csv(), and xml() now accept a single-string-column DataFrame as input.

How was this patch tested?

7 new tests: 4 classic (JSON, JSON+schema, CSV, XML) and 3 Connect (JSON, CSV, XML-unsupported).

Was this patch authored or co-authored using generative AI tooling?

No

Yicong-Huang · 2026-03-27T10:42:34Z

cc @HyukjinKwon

Yicong-Huang · 2026-03-30T16:52:12Z

Closing this POC PR. Split into individual PRs per JIRA sub-task under SPARK-55227

feat: support DataFrame input for spark.read.json/csv/xml

6a3c12a

Yicong-Huang marked this pull request as draft March 27, 2026 10:15

fix: ruff format

f9751ce

Yicong-Huang marked this pull request as ready for review March 27, 2026 10:42

Yicong-Huang marked this pull request as draft March 28, 2026 06:36

Yicong-Huang closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56257][PYTHON][CONNECT] Support DataFrame input for spark.read.json/csv/xml#55057

[SPARK-56257][PYTHON][CONNECT] Support DataFrame input for spark.read.json/csv/xml#55057
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-56257

Yicong-Huang commented Mar 27, 2026

Uh oh!

Yicong-Huang commented Mar 27, 2026

Uh oh!

Yicong-Huang commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yicong-Huang commented Mar 27, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Yicong-Huang commented Mar 27, 2026

Uh oh!

Yicong-Huang commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Yicong-Huang commented Mar 30, 2026 •

edited

Loading