Commit 8e2a957
committed
btf: Add type deduplication
This commit introduces a deduplication mechanism for BTF types.
Most of the time BTF types originate from a single spec where a compiler
or other external tool has already ensured that types are unique. In
such cases we can simply rely on pointer equality to determine if two
types are the same.
When dealing with manually generated BTF or merging multiple BTF
specs, duplicate types are common. Meaning we have multiple different
go objects which represent the same underlying BTF type.
It is useful to be able to deduplicate these types, both to reduce
the size of the resulting BTF, as well as allowing name based lookups
after combining multiple specs with duplicate types.
The deduplication algorithm is loosely based on the one used in libbpf.
This version for example does not do FWD type resolution, as that is
only needed when combining BTF from multiple compilation units, which
is something typically not seen in eBPF use-cases (only pahole).
In the libbpf implementation the first step is string deduplication,
however, we do this step during marshaling, and thus we do not
deduplicate strings in the Go representation.
When a type is deduplicated, we try to deduplicate not just that root
type, but the full subtree of types reachable from that type. We start
by traversing all types in post-order, and any time there is an edge
we try to replace that child with an equal type we have already seen.
Comparing every type with those seen before would be very expensive.
So what we do is we compute a hash of each type. The hash is an
approximation of all properties of the type, including recursively
hashing child types. When using this hash as key in a map we end up
with a set of candidate types which might be equal to the type we
are currently deduplicating. We still need to do a full equality check
to be sure two types are equal, both to avoid hash collisions as well
as to compare properties which are not included in the hash (recursion
limit). In practically every case a hash narrows down to 0 or 1
candidate types.
Once we have narrowed down to candidate types, we do a full equality
check in which we walk the two types to be compared together in depth
first manner, and bail out as soon as we find a difference.
Since types can form cycles with pointers, we keep track of already
visited types in the current equality check, and assume types are equal.
This deduplication mechanism can be used via a standalone function,
but is also integrated in the BTF spec builder via a new method to
add and deduplicate a types.
Signed-off-by: Dylan Reimerink <[email protected]>1 parent 627c039 commit 8e2a957
4 files changed
+604
-1
lines changed
0 commit comments