Skip to content

[APPack] Iterative Re-Packing #3171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AlexandreSinger
Copy link
Contributor

Being based off the packer, APPack also uses iterative re-packing when a dense enough clustering cannot be found. APPack has some special options that it can use to increase the density of clustering further without hurting quality as much as the default flow.

Updated the iterative re-packing algorithm to use these options if needed.

Having this safer fall-back has allowed me to tune some numbers that I knew would improve the quality of most circuits but was causing a few circuits to fail packing. These few that were failing should now hit this fall-back paths which resolves this issue.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Jun 29, 2025
@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Jun 29, 2025

Results on VTR:

Metric Normalized to AP Baseline
post_gp_hpwl 1.14
post_fl_hpwl 1.02
post_dp_hpwl 0.98
total_wirelength 0.99
post_gp_cpd 1.00
post_fl_cpd 0.98
post_dp_cpd 0.98
crit_path_delay 0.98

Since I was able to tighten the max displacement threshold (as well as other parameters which may increase density) we were able to see a 2% improvement in basically everything after global placement. Post-FL HPWL is an exception; however only one circuit (MCML) got much worst, everything else got similar improvements.

Note: the Post-GP HPWL got worse due to me setting the target density of CLBs/LABs to 0.8 in the partial legalizer. This is expected to make the post-GP HPWL worse, but I found that it made the overall solution quality better.

Being based off the packer, APPack also uses iterative re-packing when a
dense enough clustering cannot be found. APPack has some special options
that it can use to increase the density of clustering further without
hurting quality as much as the default flow.

Updated the iterative re-packing algorithm to use these options if
needed.

Having this safer fall-back has allowed me to tune some numbers that I
knew would improve the quality of most circuits but was causing a few
circuits to fail packing. These few that were failing should now hit
this fall-back paths which resolves this issue.
@AlexandreSinger AlexandreSinger force-pushed the feature-ap-iterative-repacking branch from c574bc5 to 695d205 Compare June 29, 2025 03:20
@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Jun 29, 2025

Results on Titan (timing driven, no fixed blocks):

circuit post_gp_hpwl post_fl_hpwl post_dp_hpwl total_wirelength post_gp_cpd post_fl_cpd post_dp_cpd crit_path_delay
LU230_stratixiv_arch_timing.blif 1.143 0.923 0.928 0.929 1.018 1.022 1.011 1.022
LU_Network_stratixiv_arch_timing.blif 0.807 0.706 1.052 1.031 0.792 0.951 1.734 1.723
SLAM_spheric_stratixiv_arch_timing.blif 1.758 1.913 1.217 1.205 1.224 1.196 1.014 1.017
bitcoin_miner_stratixiv_arch_timing.blif 0.701 0.900 0.918 0.925 0.685 0.992 0.962 1.029
bitonic_mesh_stratixiv_arch_timing.blif 1.055 0.927 0.788 0.839 1.177 1.029 1.004 0.997
cholesky_bdti_stratixiv_arch_timing.blif 1.046 1.071 1.031 1.022 1.095 1.119 1.072 1.068
cholesky_mc_stratixiv_arch_timing.blif 1.021 0.929 0.983 0.975 0.731 0.894 0.980 0.999
dart_stratixiv_arch_timing.blif 1.325 1.236 1.003 0.994 1.278 1.197 1.080 1.104
denoise_stratixiv_arch_timing.blif 1.042 1.028 0.946 0.946 1.016 0.998 0.982 0.981
des90_stratixiv_arch_timing.blif 1.020 0.879 0.833 0.870 0.982 0.933 0.932 0.929
directrf_stratixiv_arch_timing.blif 1.048 0.704 0.891 0.898 0.947 0.940 1.012 0.964
gsm_switch_stratixiv_arch_timing.blif 1.109 0.809 1.007 0.992 0.956 0.991 0.905 0.878
mes_noc_stratixiv_arch_timing.blif 1.005 0.879 0.890 0.915 1.023 0.926 0.866 0.856
minres_stratixiv_arch_timing.blif 1.093 0.948 0.880 0.898 1.179 0.933 0.897 1.230
neuron_stratixiv_arch_timing.blif 0.971 1.044 1.084 1.055 0.988 1.029 1.136 1.142
openCV_stratixiv_arch_timing.blif 0.899 0.774 0.878 0.889 0.983 0.974 1.122 1.096
segmentation_stratixiv_arch_timing.blif 1.045 0.904 0.967 0.987 1.023 0.979 0.987 0.985
sparcT1_chip2_stratixiv_arch_timing.blif 1.249 1.029 0.914 0.926 1.152 1.099 1.037 1.047
sparcT1_core_stratixiv_arch_timing.blif 0.798 0.721 0.929 0.941 1.072 0.845 1.082 1.002
sparcT2_core_stratixiv_arch_timing.blif 1.117 0.931 0.894 0.923 1.095 1.104 1.012 0.956
stap_qrd_stratixiv_arch_timing.blif 0.931 0.671 0.900 0.913 0.777 0.698 1.022 1.040
stereo_vision_stratixiv_arch_timing.blif 1.090 1.080 0.987 0.987 1.092 1.104 1.004 1.001
                 
Geomean 1.040 0.930 0.947 0.954 1.001 0.991 1.029 1.038

All numbers are normalized to AP baseline (the AP flow just prior to this PR).

We have a 5% improvement in post-routed wirelength. It looks as if we have a degredation of CPD by 4% however, this is caused by 2 outliers (LU_Netword and minres) which had better post-FL CPD but worse post-DP CPD. This implies to me that more tuning is needed for the annealer; but I think we can recover everything with a bit more tuning on the overall flow.

The runtime results are also interesting:

circuit ap_runtime ap_gp_runtime ap_fl_runtime ap_dp_runtime route_runtime total_runtime
LU230_stratixiv_arch_timing.blif 0.996 1.090 0.974 0.983 0.835 0.990
LU_Network_stratixiv_arch_timing.blif 0.983 0.988 0.968 0.983 1.019 0.986
SLAM_spheric_stratixiv_arch_timing.blif 1.633 0.792 2.735 1.541 1.271 1.564
bitcoin_miner_stratixiv_arch_timing.blif 0.980 1.462 1.018 0.916 0.797 0.974
bitonic_mesh_stratixiv_arch_timing.blif 1.026 0.989 0.992 1.078 0.783 0.992
cholesky_bdti_stratixiv_arch_timing.blif 0.948 1.040 1.001 0.877 1.046 0.967
cholesky_mc_stratixiv_arch_timing.blif 1.025 1.152 1.017 0.892 0.983 1.020
dart_stratixiv_arch_timing.blif 0.967 0.958 0.923 1.027 1.008 0.972
denoise_stratixiv_arch_timing.blif 0.959 1.006 0.962 0.943 0.900 0.956
des90_stratixiv_arch_timing.blif 0.999 0.960 0.974 1.077 0.799 0.974
directrf_stratixiv_arch_timing.blif 0.994 1.020 0.950 0.992 0.948 0.993
gsm_switch_stratixiv_arch_timing.blif 0.971 0.905 0.936 1.009 0.923 0.973
mes_noc_stratixiv_arch_timing.blif 0.945 0.999 1.015 0.889 1.326 0.965
minres_stratixiv_arch_timing.blif 1.007 0.988 1.015 1.021 0.933 1.008
neuron_stratixiv_arch_timing.blif 1.525 0.929 2.694 1.116 1.054 1.401
openCV_stratixiv_arch_timing.blif 1.233 1.009 1.807 1.024 0.882 1.175
segmentation_stratixiv_arch_timing.blif 0.971 0.933 1.040 0.959 1.003 0.975
sparcT1_chip2_stratixiv_arch_timing.blif 1.204 0.973 1.815 1.024 0.991 1.186
sparcT1_core_stratixiv_arch_timing.blif 1.043 1.224 1.047 0.918 0.975 1.033
sparcT2_core_stratixiv_arch_timing.blif 0.983 0.965 0.918 1.021 0.884 0.982
stap_qrd_stratixiv_arch_timing.blif 1.011 1.097 1.040 0.978 0.834 1.004
stereo_vision_stratixiv_arch_timing.blif 1.017 0.994 0.954 1.006 0.969 1.029
           
Geomean  1.053 1.014 1.142 1.005 0.954 1.042

4 circuits actually hit this fallback path which caused their full legalization (APPack) runtime to increase (shown in bold in the table above). This brought the overall geomean runtime up; however for the rest of the circuits the runtime actually decreased by around 10%. I think with more tuning with the fall-back options I can also reduce these as well.

@vaughnbetz We did hit a very big milestone. With this change we are now 9.4% better WL and 2% worst CPD (however we are practically tied if we ignore LU_Network) than without using AP on Titan, at the cost of around 9% run time. This puts us very close to the prior state of the art on Titan. I think with a bit more tuning I can get these numbers even better!

@amin1377 FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant