Skip to content

perf: LogicalPlan enum is 320 bytes due to oversized DdlStatement variants #22732

@zhuqi-lucas

Description

@zhuqi-lucas

Describe the bug / opportunity

LogicalPlan is 320 bytes on the stack today, but the typical query-execution path never produces the variants that drive that size. The Ddl(DdlStatement) variant is the offender: it carries CreateExternalTable (312 bytes) and CreateFunction (288 bytes), and the enum-size rule (max(variant) + tag) forces the whole LogicalPlan enum to the same width on every code path — including SELECT queries that will never instantiate a DDL node.

This shows up directly on the planning hot path. Profiling sql_planner (samply, logical_plan_tpch_all) on macOS aarch64:

55%  in sql_planner binary (DataFusion + Rust stdlib)
31%  libsystem_malloc.dylib  (malloc / free / realloc)
13%  libsystem_platform.dylib (memcpy / memmove)
 1%  other (kernel, dyld, pthread)

A non-trivial share of the 13% memcpy/memmove time is LogicalPlan moves: every std::mem::take in the optimizer's in-place rewriters, every owned-API LogicalPlan::map_*, every Arc<LogicalPlan> write currently shuffles 320 bytes, even when the loaded variant is something small like Projection (40 bytes) or Filter (128 bytes).

Per-variant sizes

=== LogicalPlan enum total ===
   320 bytes  LogicalPlan
=== Per-variant inner struct ===
    40 bytes  Projection
   128 bytes  Filter
    40 bytes  Window
    64 bytes  Aggregate
    48 bytes  Sort
   176 bytes  Join
    40 bytes  Repartition
    32 bytes  Union
    56 bytes  Subquery
    72 bytes  SubqueryAlias
    24 bytes  Limit
    88 bytes  Distinct
    16 bytes  Extension
    56 bytes  RecursiveQuery
    48 bytes  Analyze
    48 bytes  Explain
   168 bytes  TableScan
    32 bytes  Values
   144 bytes  Unnest
    96 bytes  DmlStatement
   120 bytes  CreateMemoryTable
    96 bytes  CreateView
    88 bytes  DistinctOn
    56 bytes  Statement
   320 bytes  DdlStatement       <-- forces LogicalPlan to 320
    16 bytes  EmptyRelation
    16 bytes  DescribeTable

=== Inside DdlStatement ===
   312 bytes  CreateExternalTable  <-- dominates DdlStatement
   288 bytes  CreateFunction       <-- second-largest
   144 bytes  CreateIndex
    72 bytes  DropTable / DropView
    48 bytes  DropCatalogSchema
    40 bytes  CreateCatalog / CreateCatalogSchema / DropFunction

If CreateExternalTable and CreateFunction are Boxed inside DdlStatement, the max DDL variant drops to CreateIndex at 144 bytes, the max LogicalPlan variant becomes Join at 176, and LogicalPlan shrinks to 176 bytes (–45%) — the enum discriminant fits inside Join's alignment padding, so LogicalPlan ends up the same width as Join itself. Paid for by one heap allocation per DDL plan, which is negligible because DDL plans are not on the per-query hot path.

To Reproduce

// in datafusion/expr, with all relevant types in scope:
println!("{}", std::mem::size_of::<LogicalPlan>());         // 320
println!("{}", std::mem::size_of::<DdlStatement>());        // 320
println!("{}", std::mem::size_of::<CreateExternalTable>()); // 312
println!("{}", std::mem::size_of::<CreateFunction>());      // 288

Expected behavior

LogicalPlan should not be sized by variants that never appear on the query path. Moving the two outsized DDL variants behind a Box brings LogicalPlan to a size driven by Join (176 bytes), which is paid by every plan node on every query.

Additional context

Local cargo bench -p datafusion --bench sql_planner --quick on macOS aarch64, comparing main vs. boxed DDL variants:

bench main boxed delta
optimizer_tpch_all 8.61 ms 8.18 ms –5.0%
optimizer_tpcds_all 168.0 ms 163.5 ms –2.7%

Smaller benches (sub-200 µs) are within --quick noise.

CI bench on the GKE aarch64 runner should give a tighter signal; willing to open a draft PR so a maintainer can trigger it.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions