-
Notifications
You must be signed in to change notification settings - Fork 1k
Parallelize opt_merge
#5530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Parallelize opt_merge
#5530
Conversation
widlarizer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly enough, I see a tsan failure at opt_merge_init.ys with a setup that yields Yosys 0.57+244 (git sha1 329ff2687, ccache clang++ 16.0.6 -Og -fPIC -O1 -fno-omit-frame-pointer -fno-optimize-sibling-calls -fsanitize=thread)
WARNING: ThreadSanitizer: data race (pid=...)
Write of size 4 at 0x... by thread T5:
#0 Yosys::RTLIL::Const::is_fully_def() const ...yosys/kernel/rtlil.cc:725:2 (yosys+0xa57c79)
#1 (anonymous namespace)::OptMergeThreadWorker::has_dont_care_initval(Yosys::TRTLIL::RCell) const ...yosys/passes/opt/opt_merge.cc:250:53 (yosys+0x...)
#2 (anonymous namespace)::OptMergeThreadWorker::compute_cell_hashes((anonymous namespace)::ComputeCellHashes const&) const ...yosys/passes/opt/opt_merge.cc:275:23 (yosys+0x...)
#3 (anonymous namespace)::OptMergeWorker::OptMergeWorker(Yosys::RTLIL::Module*, bool, bool, bool)::'lambda'(int)::operator()(int) const ...yosys/passes/opt/opt_merge.cc:386:56 (yosys+0x...)
...
Previous write of size 4 at 0x... by thread T1:
#0 Yosys::RTLIL::Const::is_fully_def() const ...yosys/kernel/rtlil.cc:725:2 (yosys+0xa57c79)
#1 (anonymous namespace)::OptMergeThreadWorker::has_dont_care_initval(Yosys::TRTLIL::RCell) const ...yosys/passes/opt/opt_merge.cc:250:53 (yosys+0x...)
#2 (anonymous namespace)::OptMergeThreadWorker::compute_cell_hashes((anonymous namespace)::ComputeCellHashes const&) const ...yosys/passes/opt/opt_merge.cc:275:23 (yosys+0x...)
#3 (anonymous namespace)::OptMergeWorker::OptMergeWorker(Yosys::RTLIL::Module*, bool, bool, bool)::'lambda'(int)::operator()(int) const ...yosys/passes/opt/opt_merge.cc:386:56 (yosys+0x...)
...
Location is global 'Yosys::RTLIL::Const::is_fully_def() const::__d' of size 32 at 0x... (yosys+0x...)
Haven't taken a close look at the code change yet
|
I guess this is missing some things from main? I can't easily rebase it since it doesn't build on single chunk sigspecs |
5eb4a30 to
d5f84d9
Compare
I'm not sure why but this is actually faster than existing `opt_merge` even with YOSYS_MAX_THREADS=1, for the jpeg synthesis test. 16.0s before, 15.5s after for end-to-end synthesis.
d5f84d9 to
755836c
Compare
Hmm, not sure what I did there. Should be fixed now.
When I run |
|
Yep, I no longer see the thread sanitizer issues Since the goal of this PR is to preserve the exact behavior against the existing implementation, I would like to see a testing setup that emits random designs with frequent duplicate cells and a pass that checks strict "RTLIL-level" structural equivalence (rather than the already deduplicating, functionally intended structural equivalence of Was it by hand or did you have a test case? The automated testing would likely come in handy for rewriting other opt passes as well. I'd be fairly confident merging this as-is, Added some variable renaming at rocallahan#1 |
If your work is part of a larger effort, please discuss your general plans on Discourse first to align your vision with maintainers.
https://yosyshq.discourse.group/t/parallel-optmergepass-implementation/
What are the reasons/motivation for this change?
Massive speedups on large modules. On a particular real-world flattened design with millions of cells, at 20 cores I get 14.5x speedup on an
opt_mergethat takes 15 iterations internally. On a 48-core machine speedup levels out at 25x.Another way to get massive speedups would be to make
opt_mergeincremental, i.e. at each step only consider merging cells whose state or connected cells have changed since the last iteration ofopt_merge. That would be more efficient, but also more invasive since information about what's changed would have to be kept up to date across passes. Anyway, see the discussion in the Discourse thread.Explain how this is achieved.
This is the same algorithm presented in the Discourse thread, but over the last few months we've upgraded RTLIL to be thread-safe for read-only access, so there is no extra TRTLIL layer anymore and a lot less code needs to change here.
I'm not sure why but this is actually faster than existing
opt_mergeeven with YOSYS_MAX_THREADS=1, for the jpeg synthesis test. 16.0s before, 15.5s after for end-to-end synthesis.