Skip to content

Move call location of ClockRRGraphBuilder and use alloc_and_load_edges. #1081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 3, 2020

Conversation

litghost
Copy link
Collaborator

@litghost litghost commented Jan 24, 2020

Description

By moving ClockRRGraphBuilder earlier in the rr graph flow, several
parts of ClockRRGraphBuilder::create_and_append_clock_rr_graph can be
avoided, as they were duplicating work that the original build_rr_graph
flow was already doing (init_fan, mapping arch switch to rr switch,
partition_edges).

This new code should also fully preallocate the rr_node array, though
this is not required by the code.

Fixes #1080

Related Issue

#1079
#1080

Motivation and Context

When researching #1079, it was discovered that there was basically 3 ways edges were used (alloc_and_load_edges, rr graph load, and ClockRRGraphBuilder). However it is clear that ClockRRGraphBuilder::create_and_append_clock_rr_graph should just be using alloc_and_load_edges rather than a bespoke allocation methodology.

In addition, ClockRRGraphBuilder was duplicating work (two calls to init_fan_out for example), and it was possible for ClockRRGraphBuilder to emit a switch that still had an arch_switch_idx vs an rr_switch_idx.

This PR fixes all of these issues.

How Has This Been Tested?

  • CI is green

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@litghost
Copy link
Collaborator Author

CI is now green, @mustafabbas and @kmurray please review.

@kmurray
Copy link
Contributor

kmurray commented Jan 31, 2020

If I understand correctly the main structural changes in this code are as follows:

  • Add support for estimating the number of RR nodes to be added by clock network construction
    • Use these to reserve appropriate additional space in rr_nodes (in addition to that used by the main RR graph)
  • Move clock network construction into build_rr_graph() such that we can avoid some redundant work (init_fanin(), edge paritioning etc.)
    • The clock network construction still uses push_back() and friends, but the estimated reservations above avoid unnecessary re-allocations.

Assuming my understanding of the above is correct this looks like a good change to me.

@litghost What's the best approach for handling this? Since this is a less used code path I'm pretty happy to merge this independently of #1096 (which would also simplifies #1096). Also this PR seems to have a merge conflict which would also need to be resolved.

@litghost
Copy link
Collaborator Author

@litghost What's the best approach for handling this? Since this is a less used code path I'm pretty happy to merge this independently of #1096 (which would also simplifies #1096). Also this PR seems to have a merge conflict which would also need to be resolved.

I'll rebase on master, and we should merge this.

@litghost
Copy link
Collaborator Author

If I understand correctly the main structural changes in this code are as follows:

Yes

By moving ClockRRGraphBuilder earlier in the rr graph flow, several
parts of ClockRRGraphBuilder::create_and_append_clock_rr_graph can be
avoided, as they were duplicating work that the original build_rr_graph
flow was already doing (init_fan, mapping arch switch to rr switch,
partition_edges).

This new code should also fully preallocate the rr_node array, though
this is not required by the code.

Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Copy link
Member

@mustafabbas mustafabbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, very minor comments to look at while you rebase

if (device_ctx.virtual_clock_network_root_idx == inode) {
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this outside the function. Perhaps before the call to check_rr_node in check_rr_graph with the other ignored nets here:

/* Ignore any uninitialized rr_graph nodes */

The reason is check_rr_node is called twice in check_rr_graph and in check_route. When we call this function in check_route we really don't want to see the virtual clock sink, as we decided its best to remove it from the route tree after we utilize it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -39,18 +39,24 @@ void RoutingToClockConnection::set_fc_val(float fc_val) {
* RoutingToClockConnection (member functions)
*/

void RoutingToClockConnection::create_switches(const ClockRRGraphBuilder& clock_graph) {
size_t RoutingToClockConnection::estimate_additional_nodes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to note that routing to clock connections create a virtual sink which is why we are returning one node to be allocated here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kmurray
Copy link
Contributor

kmurray commented Jan 31, 2020

Hmm, looks like a QoR change. That should probably be investigated to figure out the cause (in particular since its on the clock regression test). That test is really an implicit check of the clock modelling functionality, so a QoR change could indicate that it broke somehow.

@mustafabbas How would you recommend verifying that that test is working correctly (i.e. generated clock structure is correct, and used correctly)?

@litghost
Copy link
Collaborator Author

litghost commented Jan 31, 2020

Hmm, looks like a QoR change.

I've noticed it too. Weirdly enough prior to the rebase onto master, the QoR check passed. I'm investigating now. It is possible (but unclear how) that something in this PR has resulted in the QoR change I was describing in #1084 (comment) . I haven't been able to track it down yet.

Data I have so far:

  • Previous master and new master has no change in router behavior
  • This PR on old master has a different router behavior, but QoR is still acceptable
  • This same PR on new master has a different router behavior, but QoR is out of tolarance.

Investigation continues.

@litghost
Copy link
Collaborator Author

One concrete that is worth pointing out is that the previous code failed check_rr_graph because of missing non-configurable nodes in the reverse direction. The code here: https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/1081/files#diff-502e2913e417644b10bae07da14b9dabR191-R195 fixes this bug. So it is possible that the QoR change is simply a result of fixing a bug.

I have tried backing out just that change, and the QoR change persists, so I'm still investigating.

@mustafabbas
Copy link
Member

Hmm, looks like a QoR change. That should probably be investigated to figure out the cause (in particular since its on the clock regression test). That test is really an implicit check of the clock modelling functionality, so a QoR change could indicate that it broke somehow.

@mustafabbas How would you recommend verifying that that test is working correctly (i.e. generated clock structure is correct, and used correctly)?

I used to trace the clock structure in the dumped rr_graph.xml. This is a qor in final setup time though. I am not sure off the top of my head where to look but I can replicate and take a look at it tonight if the bug still lingers.

@litghost
Copy link
Collaborator Author

litghost commented Feb 1, 2020

I used to trace the clock structure in the dumped rr_graph.xml. This is a qor in final setup time though. I am not sure off the top of my head where to look but I can replicate and take a look at it tonight if the bug still lingers.

I've begun to isolate the changes in a separate branch. So far no bugs have been found :/

@litghost
Copy link
Collaborator Author

litghost commented Feb 1, 2020

I believe I've isolated the change in QoR to fan / switch behavior, and I believe that the previous QoR was simply incorrect based on incorrect arch -> rr switch behavior. @mustafabbas if you can check that the code is still working as expect, I believe this PR should be merged, along with an update to the QoR for the dedicated networks.

@kmurray How do you want updates to the strong QoR delivered?

@litghost
Copy link
Collaborator Author

litghost commented Feb 1, 2020

Specifically there were two bugs in the previous code that are now fixed:

Both of the bugs above are avoided by virtue of doing 1 arch -> rr pass using the old rr graph generator logic.

@mustafabbas
Copy link
Member

mustafabbas commented Feb 3, 2020

Specifically there were two bugs in the previous code that are now fixed:

* Any switches remapped in the clock rr generator assumed no fan in adjustment.  In fact there was a TODO to that affect: https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/master/vpr/src/route/rr_graph_clock.cpp#L115-L116

Agreed accounting for fan in information should make a difference in QoR. I also tested it locally and the written rr_graph looks correct. Thanks for the fixes. I believe this pull request is good to go after updating the golden results

QoR change was do to fixing bugs related to switch fan-in delay fixes
related to the clock network (see discussion in verilog-to-routing#1081).
@probot-autolabeler probot-autolabeler bot added tests VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Feb 3, 2020
@kmurray
Copy link
Contributor

kmurray commented Feb 3, 2020

@kmurray How do you want updates to the strong QoR delivered?

I've re-generated them and pushed the updates. Once CI passes this should be good to merge.

@litghost
Copy link
Collaborator Author

litghost commented Feb 3, 2020

@kmurray How do you want updates to the strong QoR delivered?

I've re-generated them and pushed the updates. Once CI passes this should be good to merge.

Awesome, thanks

@kmurray kmurray merged commit 4f72116 into verilog-to-routing:master Feb 3, 2020
@litghost litghost deleted the refactor_edges branch June 24, 2020 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code tests VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClockRRGraphBuilder uses bad allocation strategies and duplicates work
3 participants