Skip to content

RFC: Open Type Attrs with Tiered Dispatch #459

@junrushao

Description

@junrushao

Context

TVM-FFI currently provides two mechanisms for attaching behavior to types:

Mechanism Storage Lookup Inheritance Designed For
Member methods Row-major (per-type dict) Name-based hash Yes (base -> derived) Per-type methods (sum, append, ...)
TypeAttrColumn Column-major (std::vector indexed by type_index) O(1) pointer offset — single array dereference No (per-type opt-in) Builtin dunder dispatch (__any_hash__, __ffi_repr__, ...)

TypeAttrColumn is currently a closed set — only a handful of builtin
attributes are registered as dense-array columns. The implicit rule is that type
attrs should be "rare and restrained by tvm-ffi package."

This document proposes making type attrs an open set backed by hash maps,
while preserving the dense-array fast paths for the few builtins that
genuinely need them — keeping the std::vector-indexed O(1) dispatch that
tvm-ffi's hot loops rely on.

Motivation: IR Trait Dispatch

Consider a compiler IR built on tvm-ffi. Every IR node can have traits
static properties that optimization passes query:

  • SideEffectFree: Can this node be eliminated if its result is unused?
  • Commutative: Can operands be reordered?
  • HasCanonicalizer: Does this node type provide a canonicalization rule?
  • Fusible: Can this node be fused into a neighboring kernel?

These traits are queried across all node types during compilation:

void DCE(const Object* node) {
  if (HasTrait(node, "side_effect_free")) {
    // safe to remove if unused
  }
}

Under the current closed-set policy, an IR framework cannot define new type
attrs — it must fall back to member method lookup or reinvent its own dispatch
table.

Problem with Dense Arrays at Scale

The current TypeAttrColumn uses a dense std::vector indexed by type_index.
This works well for a small fixed set of builtins where nearly every type
could have the attribute. But it does not scale to an open set:

  • With 500 registered types and 100 user-defined trait columns, each dense
    array allocates 500 entries (8 bytes each) = 400 KB of mostly-null
    pointers
    .
  • As the type registry grows (downstream packages registering types), every
    existing column must be resized.
  • The sparsity is wasteful: if Commutative only applies to 10 out of 500 op
    types, 98% of the column is null.

Dense arrays are the right choice for a small, fixed set of hot-path builtins.
They are the wrong choice for an open, extensible set of user-defined traits.

Proposed Design: Two-Tier Type Attr Dispatch

Tier 1: General type attrs via unordered_map

The default storage for type attrs is a hash map keyed by (attr_name, type_index). Any framework can register arbitrary type attrs without
pre-declaring columns:

// IR framework registers traits (no EnsureTypeAttrColumn needed)
TypeAttrSet(AddOp::type_index, "ir_trait.commutative", true);
TypeAttrSet(AddOp::type_index, "ir_trait.side_effect_free", true);

// Query: hash map lookup (~20-50 cycles)
AnyView val = TypeAttrGet("ir_trait.commutative", node->type_index());

Performance: Hash map lookup is ~20-50 cycles. For IR trait queries that
run millions of times during compilation, this adds ~50 ms per 1M queries —
negligible compared to the IR transformations themselves. This is fast enough
for the vast majority of cross-type dispatch use cases.

Memory: Only entries that are actually set are stored. No pre-allocation,
no sparsity waste.

Open set: Any downstream framework can define new attrs with any string
key. No allowlist, no pre-registration.

Tier 2: Dense-array fast paths for hot builtins

For the few builtins that sit on genuinely critical paths — where every
object must be dispatched and the overhead compounds — the system retains
the existing std::vector-indexed dense-array columns. These provide a
single pointer-arithmetic dereference (column.data[type_index]) with no
hashing, no branching, and no cache misses beyond the one data load — the
same cost model as a C function-pointer slot.

This matters in practice because these builtins run in tight inner loops:

Builtin Why it's hot Access pattern Specialization
__any_hash__ Called on every key in every Map insert/lookup Tight loop over map entries; hash dispatch is on the critical path Dense array column (std::vector O(1))
__structural_equal__ Called on every node pair in IR comparison Recursive DFS over entire IR trees; called millions of times per pass Dense array column (std::vector O(1))
__ffi_repr__ Called once per object in repr Single traversal, not perf-critical General tier is fine
__ffi_shallow_copy__ Called once per object in copy Single invocation per clone General tier is fine

The criterion for tier-2 specialization is empirical: does hash-map overhead
show up in profiles?
Most dunder methods (repr, copy, serialization) are
called infrequently enough that the general tier suffices. Only those on inner-loop
critical paths (hash, equality) need the dense-array std::vector fast path.

This is analogous to how CPython works: most dunder methods go through
tp_dict (hash map), but the most critical ones (tp_hash, tp_richcompare,
tp_iter) have dedicated function pointer slots in the type struct — precisely
because they sit on hot loops where even a hash-map probe would be too costly.

Implementation Sketch

class TypeAttrRegistry {
 public:
  // Tier 1: general hash-map storage
  void Set(int32_t type_index, const std::string& attr_name, Any value) {
    general_[attr_name][type_index] = std::move(value);
  }

  AnyView Get(const std::string& attr_name, int32_t type_index) const {
    auto col_it = general_.find(attr_name);
    if (col_it == general_.end()) return AnyView();
    auto val_it = col_it->second.find(type_index);
    if (val_it == col_it->second.end()) return AnyView();
    return AnyView(val_it->second);
  }

  // Tier 2: dense-array fast path (opt-in per attr)
  // Existing TypeAttrColumn mechanism, unchanged.
  // Used only for __any_hash__, __structural_equal__, etc.

 private:
  // attr_name -> (type_index -> value)
  std::unordered_map<std::string, std::unordered_map<int32_t, Any>> general_;
};

For tier-2, the existing TVMFFITypeAttrColumn dense-array mechanism is
retained as-is. The two tiers coexist:

  • Hot-path builtins are registered in both the dense column (for fast C++
    dispatch) and the general map (for uniform Python-side access via
    getattr).
  • User-defined attrs go only in the general map.

Optimization: Per-Column Hash Maps

If even the two-level hash lookup (attr_name then type_index) is too
slow for certain user-defined attrs, a framework can cache the inner map:

// Cache the inner map once at startup
static auto* commutative_map = &TypeAttrRegistry::Get("ir_trait.commutative");

// Fast query: single hash lookup by type_index
bool is_commutative = commutative_map->count(node->type_index());

This gives ~20-cycle lookup (single unordered_map probe on a small integer
key) without requiring a dense array allocation for every registered type.

What This Achieves

For tvm-ffi core

  • Hot-path builtins stay fast: __any_hash__ and __structural_equal__
    keep their std::vector-indexed O(1) dense-array lookup — a single pointer
    dereference with no hashing or branching. Zero performance regression on
    the inner loops that tvm-ffi cares about most.
  • No sparsity waste: User-defined attrs use hash maps, not dense arrays.
  • No ABI bloat: The general tier is a pure C++ implementation detail.
    Dense-array columns remain the ABI for builtins.
  • Clear boundary: Tier-2 specialization is driven by profiling data, not
    an arbitrary allowlist.

For downstream frameworks

  • Open set: Any framework can define new type attrs with any string key.
    No gatekeeping.
  • Uniform API: TypeAttrGet(name, type_index) works for both builtins and
    user-defined attrs. Python-side getattr works for everything.
  • No confusion: Users don't need to know about the two tiers. They define
    attrs, and the system routes them to the right storage.
  • IR traits work: SideEffectFree, Commutative, etc. can be defined as
    type attrs and queried efficiently across all node types.

Python API

@ffi.py_class
class MyIRNode(ffi.Object):
    # Trait: registered as general type attr
    ir_trait = SideEffectFree

    # Repr: registered as type attr (tier-1 general map is fine for repr)
    def __ffi_repr__(self, fn_print):
        return f"MyIRNode(...)"

    # Normal method: registered as member method (row store)
    def simplify(self):
        ...

The @ffi.py_class decorator routes:

  • __ffi_*__ methods -> type attrs (general tier; tier-2 for known hot
    builtins)
  • Class-level trait assignments -> type attrs (general tier)
  • Everything else -> member methods

Comparison with CPython

CPython uses the same two-tier approach:

Aspect CPython Proposed TVM-FFI
General methods tp_dict (hash map) General type attr map
Hot slots tp_hash, tp_richcompare, tp_iter, ... Dense-array columns for __any_hash__, ...
Criterion Fixed set of ~40 type slots Empirically profiled hot builtins
Open/closed Open (any method in tp_dict) Open (any attr in general map)

Summary

The current TypeAttrColumn system conflates two concerns:

  1. Open extensibility: Downstream frameworks need to define new type attrs.
  2. Hot-path performance: A few builtins need sub-10-cycle dispatch via
    direct std::vector indexing.

Dense arrays solve (2) but block (1). Hash maps solve (1) but are too slow for
(2). The solution is both:

  • General tier (unordered_map): Open set, no sparsity, ~20-50 cycle
    lookup. Good enough for IR traits, repr, copy, serialization, and the vast
    majority of cross-type dispatch.
  • Specialized tier (dense std::vector array): Closed set of profiled hot
    builtins. Single pointer-offset lookup (~4 cycles). Reserved for
    __any_hash__, __structural_equal__, and any future builtin that shows up
    in profiles.

This gives tvm-ffi's performance guarantee where it matters — the std::vector
O(1) dispatch that hot loops like Map operations and structural comparison
depend on — and open extensibility everywhere else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions