RFC: Open Type Attrs with Tiered Dispatch

## Context

TVM-FFI currently provides two mechanisms for attaching behavior to types:

| Mechanism | Storage | Lookup | Inheritance | Designed For |
| --------- | ------- | ------ | ----------- | ------------ |
| **Member methods** | Row-major (per-type dict) | Name-based hash | Yes (base -> derived) | Per-type methods (`sum`, `append`, ...) |
| **TypeAttrColumn** | Column-major (`std::vector` indexed by `type_index`) | O(1) pointer offset — single array dereference | No (per-type opt-in) | Builtin dunder dispatch (`__any_hash__`, `__ffi_repr__`, ...) |

TypeAttrColumn is currently a **closed set** — only a handful of builtin
attributes are registered as dense-array columns. The implicit rule is that type
attrs should be "rare and restrained by tvm-ffi package."

This document proposes making type attrs an **open set** backed by hash maps,
while **preserving the dense-array fast paths** for the few builtins that
genuinely need them — keeping the `std::vector`-indexed O(1) dispatch that
tvm-ffi's hot loops rely on.

## Motivation: IR Trait Dispatch

Consider a compiler IR built on tvm-ffi. Every IR node can have **traits** —
static properties that optimization passes query:

- `SideEffectFree`: Can this node be eliminated if its result is unused?
- `Commutative`: Can operands be reordered?
- `HasCanonicalizer`: Does this node type provide a canonicalization rule?
- `Fusible`: Can this node be fused into a neighboring kernel?

These traits are queried across all node types during compilation:

```cpp
void DCE(const Object* node) {
  if (HasTrait(node, "side_effect_free")) {
    // safe to remove if unused
  }
}
```

Under the current closed-set policy, an IR framework cannot define new type
attrs — it must fall back to member method lookup or reinvent its own dispatch
table.

## Problem with Dense Arrays at Scale

The current TypeAttrColumn uses a dense `std::vector` indexed by `type_index`.
This works well for a small fixed set of builtins where nearly every type
*could* have the attribute. But it does **not** scale to an open set:

- With 500 registered types and 100 user-defined trait columns, each dense
  array allocates 500 entries (8 bytes each) = **400 KB of mostly-null
  pointers**.
- As the type registry grows (downstream packages registering types), every
  existing column must be resized.
- The sparsity is wasteful: if `Commutative` only applies to 10 out of 500 op
  types, 98% of the column is null.

Dense arrays are the right choice for a small, fixed set of hot-path builtins.
They are the wrong choice for an open, extensible set of user-defined traits.

## Proposed Design: Two-Tier Type Attr Dispatch

### Tier 1: General type attrs via `unordered_map`

The default storage for type attrs is a hash map keyed by `(attr_name,
type_index)`. Any framework can register arbitrary type attrs without
pre-declaring columns:

```cpp
// IR framework registers traits (no EnsureTypeAttrColumn needed)
TypeAttrSet(AddOp::type_index, "ir_trait.commutative", true);
TypeAttrSet(AddOp::type_index, "ir_trait.side_effect_free", true);

// Query: hash map lookup (~20-50 cycles)
AnyView val = TypeAttrGet("ir_trait.commutative", node->type_index());
```

**Performance**: Hash map lookup is ~20-50 cycles. For IR trait queries that
run millions of times during compilation, this adds ~50 ms per 1M queries —
negligible compared to the IR transformations themselves. This is fast enough
for the vast majority of cross-type dispatch use cases.

**Memory**: Only entries that are actually set are stored. No pre-allocation,
no sparsity waste.

**Open set**: Any downstream framework can define new attrs with any string
key. No allowlist, no pre-registration.

### Tier 2: Dense-array fast paths for hot builtins

For the few builtins that sit on genuinely critical paths — where every
object must be dispatched and the overhead compounds — the system retains
the existing **`std::vector`-indexed dense-array columns**. These provide a
single pointer-arithmetic dereference (`column.data[type_index]`) with no
hashing, no branching, and no cache misses beyond the one data load — the
same cost model as a C function-pointer slot.

This matters in practice because these builtins run in tight inner loops:

| Builtin | Why it's hot | Access pattern | Specialization |
| ------- | ------------ | -------------- | -------------- |
| `__any_hash__` | Called on **every** key in every `Map` insert/lookup | Tight loop over map entries; hash dispatch is on the critical path | Dense array column (`std::vector` O(1)) |
| `__structural_equal__` | Called on **every** node pair in IR comparison | Recursive DFS over entire IR trees; called millions of times per pass | Dense array column (`std::vector` O(1)) |
| `__ffi_repr__` | Called once per object in repr | Single traversal, not perf-critical | General tier is fine |
| `__ffi_shallow_copy__` | Called once per object in copy | Single invocation per clone | General tier is fine |

The criterion for tier-2 specialization is empirical: **does hash-map overhead
show up in profiles?** Most dunder methods (repr, copy, serialization) are
called infrequently enough that the general tier suffices. Only those on inner-loop
critical paths (hash, equality) need the dense-array `std::vector` fast path.

This is analogous to how CPython works: most dunder methods go through
`tp_dict` (hash map), but the most critical ones (`tp_hash`, `tp_richcompare`,
`tp_iter`) have dedicated function pointer slots in the type struct — precisely
because they sit on hot loops where even a hash-map probe would be too costly.

### Implementation Sketch

```cpp
class TypeAttrRegistry {
 public:
  // Tier 1: general hash-map storage
  void Set(int32_t type_index, const std::string& attr_name, Any value) {
    general_[attr_name][type_index] = std::move(value);
  }

  AnyView Get(const std::string& attr_name, int32_t type_index) const {
    auto col_it = general_.find(attr_name);
    if (col_it == general_.end()) return AnyView();
    auto val_it = col_it->second.find(type_index);
    if (val_it == col_it->second.end()) return AnyView();
    return AnyView(val_it->second);
  }

  // Tier 2: dense-array fast path (opt-in per attr)
  // Existing TypeAttrColumn mechanism, unchanged.
  // Used only for __any_hash__, __structural_equal__, etc.

 private:
  // attr_name -> (type_index -> value)
  std::unordered_map<std::string, std::unordered_map<int32_t, Any>> general_;
};
```

For tier-2, the existing `TVMFFITypeAttrColumn` dense-array mechanism is
retained as-is. The two tiers coexist:

- Hot-path builtins are registered in *both* the dense column (for fast C++
  dispatch) and the general map (for uniform Python-side access via
  `getattr`).
- User-defined attrs go only in the general map.

### Optimization: Per-Column Hash Maps

If even the two-level hash lookup (`attr_name` then `type_index`) is too
slow for certain user-defined attrs, a framework can cache the inner map:

```cpp
// Cache the inner map once at startup
static auto* commutative_map = &TypeAttrRegistry::Get("ir_trait.commutative");

// Fast query: single hash lookup by type_index
bool is_commutative = commutative_map->count(node->type_index());
```

This gives ~20-cycle lookup (single `unordered_map` probe on a small integer
key) without requiring a dense array allocation for every registered type.

## What This Achieves

### For tvm-ffi core

- **Hot-path builtins stay fast**: `__any_hash__` and `__structural_equal__`
  keep their `std::vector`-indexed O(1) dense-array lookup — a single pointer
  dereference with no hashing or branching. **Zero performance regression** on
  the inner loops that tvm-ffi cares about most.
- **No sparsity waste**: User-defined attrs use hash maps, not dense arrays.
- **No ABI bloat**: The general tier is a pure C++ implementation detail.
  Dense-array columns remain the ABI for builtins.
- **Clear boundary**: Tier-2 specialization is driven by profiling data, not
  an arbitrary allowlist.

### For downstream frameworks

- **Open set**: Any framework can define new type attrs with any string key.
  No gatekeeping.
- **Uniform API**: `TypeAttrGet(name, type_index)` works for both builtins and
  user-defined attrs. Python-side `getattr` works for everything.
- **No confusion**: Users don't need to know about the two tiers. They define
  attrs, and the system routes them to the right storage.
- **IR traits work**: `SideEffectFree`, `Commutative`, etc. can be defined as
  type attrs and queried efficiently across all node types.

## Python API

```python
@ffi.py_class
class MyIRNode(ffi.Object):
    # Trait: registered as general type attr
    ir_trait = SideEffectFree

    # Repr: registered as type attr (tier-1 general map is fine for repr)
    def __ffi_repr__(self, fn_print):
        return f"MyIRNode(...)"

    # Normal method: registered as member method (row store)
    def simplify(self):
        ...
```

The `@ffi.py_class` decorator routes:

- `__ffi_*__` methods -> type attrs (general tier; tier-2 for known hot
  builtins)
- Class-level trait assignments -> type attrs (general tier)
- Everything else -> member methods

## Comparison with CPython

CPython uses the same two-tier approach:

| Aspect | CPython | Proposed TVM-FFI |
| ------ | ------- | ---------------- |
| General methods | `tp_dict` (hash map) | General type attr map |
| Hot slots | `tp_hash`, `tp_richcompare`, `tp_iter`, ... | Dense-array columns for `__any_hash__`, ... |
| Criterion | Fixed set of ~40 type slots | Empirically profiled hot builtins |
| Open/closed | Open (any method in `tp_dict`) | Open (any attr in general map) |

## Summary

The current TypeAttrColumn system conflates two concerns:

1. **Open extensibility**: Downstream frameworks need to define new type attrs.
2. **Hot-path performance**: A few builtins need sub-10-cycle dispatch via
   direct `std::vector` indexing.

Dense arrays solve (2) but block (1). Hash maps solve (1) but are too slow for
(2). The solution is both:

- **General tier** (`unordered_map`): Open set, no sparsity, ~20-50 cycle
  lookup. Good enough for IR traits, repr, copy, serialization, and the vast
  majority of cross-type dispatch.
- **Specialized tier** (dense `std::vector` array): Closed set of profiled hot
  builtins. Single pointer-offset lookup (~4 cycles). Reserved for
  `__any_hash__`, `__structural_equal__`, and any future builtin that shows up
  in profiles.

This gives tvm-ffi's performance guarantee where it matters — the `std::vector`
O(1) dispatch that hot loops like `Map` operations and structural comparison
depend on — and open extensibility everywhere else.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Open Type Attrs with Tiered Dispatch #459

Context

Motivation: IR Trait Dispatch

Problem with Dense Arrays at Scale

Proposed Design: Two-Tier Type Attr Dispatch

Tier 1: General type attrs via `unordered_map`

Tier 2: Dense-array fast paths for hot builtins

Implementation Sketch

Optimization: Per-Column Hash Maps

What This Achieves

For tvm-ffi core

For downstream frameworks

Python API

Comparison with CPython

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mechanism	Storage	Lookup	Inheritance	Designed For
Member methods	Row-major (per-type dict)	Name-based hash	Yes (base -> derived)	Per-type methods (`sum`, `append`, ...)
TypeAttrColumn	Column-major (`std::vector` indexed by `type_index`)	O(1) pointer offset — single array dereference	No (per-type opt-in)	Builtin dunder dispatch (`__any_hash__`, `__ffi_repr__`, ...)

Builtin	Why it's hot	Access pattern	Specialization
`__any_hash__`	Called on every key in every `Map` insert/lookup	Tight loop over map entries; hash dispatch is on the critical path	Dense array column (`std::vector` O(1))
`__structural_equal__`	Called on every node pair in IR comparison	Recursive DFS over entire IR trees; called millions of times per pass	Dense array column (`std::vector` O(1))
`__ffi_repr__`	Called once per object in repr	Single traversal, not perf-critical	General tier is fine
`__ffi_shallow_copy__`	Called once per object in copy	Single invocation per clone	General tier is fine

Aspect	CPython	Proposed TVM-FFI
General methods	`tp_dict` (hash map)	General type attr map
Hot slots	`tp_hash`, `tp_richcompare`, `tp_iter`, ...	Dense-array columns for `__any_hash__`, ...
Criterion	Fixed set of ~40 type slots	Empirically profiled hot builtins
Open/closed	Open (any method in `tp_dict`)	Open (any attr in general map)

RFC: Open Type Attrs with Tiered Dispatch #459

Description

Context

Motivation: IR Trait Dispatch

Problem with Dense Arrays at Scale

Proposed Design: Two-Tier Type Attr Dispatch

Tier 1: General type attrs via unordered_map

Tier 2: Dense-array fast paths for hot builtins

Implementation Sketch

Optimization: Per-Column Hash Maps

What This Achieves

For tvm-ffi core

For downstream frameworks

Python API

Comparison with CPython

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Tier 1: General type attrs via `unordered_map`