Description
Expected Behaviour
Per the comment here, memory fragmentation should be avoided when creating the rr graph for performance reasons. The new ClockRRGraphBuilder
should avoid incremental allocation patterns, and instead batch edge insertion.
In addition ClockRRGraphBuilder::create_and_append_clock_rr_graph
should be invoked at a point in the rr graph construction flow where work is not duplicated.
Current Behaviour
ClockRRGraphBuilder
and its classes use only incremental allocation patterns (e.g. they only called add_edge
and never preallocate memory). In addition ClockRRGraphBuilder::create_and_append_clock_rr_graph
is called after partition_rr_graph_edges
, init_fan_in
, alloc_and_load_rr_indexed_data
are already called, so ClockRRGraphBuilder::create_and_append_clock_rr_graph
ends up calling them again! See https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/master/vpr/src/route/rr_graph_clock.cpp#L33-L43
ClockRRGraphBuilder
does 3 forms of incremental allocation:
t_rr_node::add_edge
rr_nodes.emplace_back();
segment_inf.emplace_back();
The rr node and edge incremental allocation are probably worth fixing. The segment emplace back is possible fine?