-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
GS/SW: Mask color gradients to prevent incorrect clamping. #13519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Haven't done a dump run yet so converting to draft. |
|
Didn't look super hard but I think you're missing save & restore of xmm15 on windows. Yay for calling conventions being fun. |
Co-authored-by: TellowKrinkle
1d3ce85 to
941e6a4
Compare
Good point, I had forgotten about this. Just fixed it. |
TellowKrinkle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arm has non-saturating pack instructions, so no separate masking is needed
GH doesn't let me suggest edits on existing code, but the entire second section can be replaced with
// VectorI r = VectorI(dr * shift[1 + i]);
armAsm->Fmul(v2.V4S(), v0.V4S(), VRegister(4 + i, kFormat4S));
armAsm->Fcvtzs(v2.V4S(), v2.V4S());
// VectorI b = VectorI(db * shift[1 + i]);
armAsm->Fmul(v3.V4S(), v1.V4S(), VRegister(4 + i, kFormat4S));
armAsm->Fcvtzs(v3.V4S(), v3.V4S());
// m_local.d[i].rb = r.trn1_16(b); // Yeah I know this isn't in GSVector since that's mainly targeting x86 for now
armAsm->Trn1(v2.V8H(), v2.V8H(), v3.V8H());
armAsm->Str(v2, _local(d[i].rb));
Sounds good, I made the suggested changes in a new commit so it's clear where it diverged from x64. When you have a chance, let me know if it looks kosher (along with the amended comments). |
82fe3dd to
6025236
Compare
This is more efficient on ARM, though the equivalent instructions are not currently used in the x64 JIT and C++ versions of GSVector. Co-authored-by: TellowKrinkle
6025236 to
0cf9ea8
Compare
TellowKrinkle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good assuming nothing breaks in the dump run
|
Dump run with SSE4 build came clean so removing draft status. Just to be safe I'll do an AVX2 run also. Edit: If anyone is able to do an ARM dump run that would be highly appreciated. |
JordanTheToaster
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested a variety of games and all seems fine on Windows with AVX2.
|
Also did a AVX2 dump run - all looks good. |
lightningterror
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an sse4 dump and didn't notice any issues (700+ dumps showed diffs with no visual difference so maybe something could still slip).
Great, thanks for the testing. Yeah, many dumps will have small differences, but hopefully these are all slight improvements. |
Description of Changes
Mask color gradients before converting 32->16 bit in prim setup to prevent unwanted clamping.
Rationale behind Changes
When gradients are too large to fit in the fix point format, rolling them over will still preserve the correct colors in the scanline renderer. This makes sure the roll over happens correctly to prevent graphical bugs.
Fixes #6459
Fixes #10210.
Suggested Testing Steps
Testing any games with the SW renderer on both SSE and AVX2 builds.
Did you use AI to help find, test, or implement this issue or feature?
Looking up aarch64 instructions.
Credits
Co-authored-by: TellowKrinkle