Skip to content

Move forward update links out of Node struct#214

Merged
StackDoubleFlow merged 4 commits intoMCHPR:masterfrom
KonaeAkira:refactor-forward-links
Dec 6, 2025
Merged

Move forward update links out of Node struct#214
StackDoubleFlow merged 4 commits intoMCHPR:masterfrom
KonaeAkira:refactor-forward-links

Conversation

@KonaeAkira
Copy link
Contributor

This PR moves all forward update links into a single Vec. Each Node will instead store the begin and end indices for its own links.

This has the following benefits:

@BramOtte
Copy link
Contributor

BramOtte commented Nov 29, 2025

I can confirm this does improve performance.

Iris Mandelbrot benchmark went from about 2'800'000rtps to about 2'950'000rtps or about a 5% performance increase

This was was run at 500'000 ticks at a time on a ryzen r1200 on fedora 43 with no other programs running.

(I think the few slow runs on master were because I did not have the terminal in focus for the entire run, I made sure to keep it in focus for the rest.)

raw runs, master:

DONE in 51.5569016s processing 135000000 ticks, effective rtps: 2618466.118219951
DONE in 52.117169247s processing 135000000 ticks, effective rtps: 2590317.2016152996
DONE in 48.945879831s processing 135000000 ticks, effective rtps: 2758148.397089338
DONE in 47.847170448s processing 135000000 ticks, effective rtps: 2821483.459439198
DONE in 47.789161943s processing 135000000 ticks, effective rtps: 2824908.295337335
DONE in 47.862234891s processing 135000000 ticks, effective rtps: 2820595.409041072

This pull request:

DONE in 45.64615921s processing 135000000 ticks, effective rtps: 2957532.5139387557
DONE in 45.321526445s processing 135000000 ticks, effective rtps: 2978716.971588091
DONE in 46.623534334s processing 135000000 ticks, effective rtps: 2895533.3809078448
DONE in 45.2200781s processing 135000000 ticks, effective rtps: 2985399.5320720156
DONE in 45.242627898s processing 135000000 ticks, effective rtps: 2983911.551388195

@KonaeAkira
Copy link
Contributor Author

Aligning the Node struct to L1 cache lines further improved the repeater_grid benchmark over the previous commit by ~1.6%.

Current performance gain over master:

repeater_grid           time:   [240.26 µs 241.45 µs 242.92 µs]
                        change: [-6.3477% -5.6568% -4.8247%] (p = 0.00 < 0.05)
                        Performance has improved.

@KonaeAkira
Copy link
Contributor Author

Running the Mandelbrot program on the BatPU-2 shows a ~17% increase in TPS (no flushing, -io compile flags).

master branch:

Simulation completed in 168.642932388s (1539883 tps)
Simulation completed in 167.267331651s (1549104 tps)
Simulation completed in 167.046934586s (1549104 tps)

This PR:

Simulation completed in 142.849635035s (1821834 tps)
Simulation completed in 143.566302508s (1809094 tps)
Simulation completed in 145.828858343s (1784141 tps)

@StackDoubleFlow
Copy link
Member

Thanks! Nice and simple change, and good results

@StackDoubleFlow StackDoubleFlow merged commit 1e59bae into MCHPR:master Dec 6, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants