Commit 0502a7c
authored
feat(sql): migrate to DataFusion-based streaming SQL (#219)
<!-- CURSOR_SUMMARY -->
> [!NOTE]
> **High Risk**
> Large dependency and planning/execution refactor: introduces
DataFusion/Arrow/DataFusion-based streaming SQL planning plus persistent
stream table catalog and job submission. Risk comes from new DDL paths
(`CREATE STREAMING TABLE`, connector-backed `CREATE TABLE`, `DROP
TABLE`) and major crate/version bumps (Arrow 55/DataFusion git forks,
`bincode` v2) affecting runtime behavior and serialization.
>
> **Overview**
> Switches the SQL stack to a **DataFusion-based streaming planner**:
adds compilation support for `CREATE STREAMING TABLE ... AS SELECT`
(including connector options like `connector`/`partition_by`) and
connector-backed `CREATE TABLE ... WITH ('connector'=...)`, plus `DROP
TABLE` planning.
>
> Wires these new plan nodes through coordinator execution by
introducing a `CoordinatorRuntimeContext` (task manager + stream catalog
+ job manager), persisting source/sink definitions to the stream
catalog, and submitting streaming jobs when creating a streaming sink.
>
> Expands the `protocol` crate with new protobuf APIs (`fs_api.proto`,
`storage.proto`) and build output (serde-derived types + descriptor
set), adds a new Arrow-backed `FsSchema` type, and performs a **major
dependency refresh** (Arrow 55, git-pinned
DataFusion/Arrow/parquet/sqlparser/typify, `bincode` 2 + new supporting
crates) reflected in `Cargo.toml` and `Cargo.lock`.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7842995. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->1 parent 6ab4638 commit 0502a7c
278 files changed
Lines changed: 36390 additions & 1577 deletions
File tree
- .github/workflows
- cli/cli
- src
- conf
- docs
- protocol
- proto
- src
- src
- common
- config
- coordinator
- analyze
- dataset
- execution
- plan
- statement
- tool
- runtime
- streaming
- api
- execution
- tracker
- factory
- connector
- global
- format
- job
- memory
- network
- operators
- grouping
- joins
- sink
- kafka
- source
- kafka
- watermark
- windows
- protocol
- util
- wasm
- input
- protocol
- kafka
- output
- protocol
- kafka
- processor
- python
- wasm
- server
- sql
- analysis
- api
- common
- functions
- logical_node
- logical
- logical_planner
- optimizers
- parser
- physical
- cdc
- schema
- types
- storage
- stream_catalog
- task
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
0 commit comments