Skip to content

Conversation

@TJnotJT
Copy link
Contributor

@TJnotJT TJnotJT commented Nov 11, 2025

Description of Changes

Mask color gradients before converting 32->16 bit in prim setup to prevent unwanted clamping.

Rationale behind Changes

When gradients are too large to fit in the fix point format, rolling them over will still preserve the correct colors in the scanline renderer. This makes sure the roll over happens correctly to prevent graphical bugs.

Fixes #6459
Fixes #10210.

Suggested Testing Steps

Testing any games with the SW renderer on both SSE and AVX2 builds.

Did you use AI to help find, test, or implement this issue or feature?

Looking up aarch64 instructions.

Credits

Co-authored-by: TellowKrinkle

@TJnotJT
Copy link
Contributor Author

TJnotJT commented Nov 11, 2025

Haven't done a dump run yet so converting to draft.

@TJnotJT TJnotJT marked this pull request as draft November 11, 2025 16:13
@TellowKrinkle
Copy link
Member

Didn't look super hard but I think you're missing save & restore of xmm15 on windows. Yay for calling conventions being fun.

@TJnotJT TJnotJT force-pushed the gs-sw-gradient-mask branch from 1d3ce85 to 941e6a4 Compare November 14, 2025 22:01
@TJnotJT
Copy link
Contributor Author

TJnotJT commented Nov 14, 2025

Didn't look super hard but I think you're missing save & restore of xmm15 on windows. Yay for calling conventions being fun.

Good point, I had forgotten about this. Just fixed it.

Copy link
Member

@TellowKrinkle TellowKrinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arm has non-saturating pack instructions, so no separate masking is needed

GH doesn't let me suggest edits on existing code, but the entire second section can be replaced with

			// VectorI r = VectorI(dr * shift[1 + i]);

			armAsm->Fmul(v2.V4S(), v0.V4S(), VRegister(4 + i, kFormat4S));
			armAsm->Fcvtzs(v2.V4S(), v2.V4S());

			// VectorI b = VectorI(db * shift[1 + i]);

			armAsm->Fmul(v3.V4S(), v1.V4S(), VRegister(4 + i, kFormat4S));
			armAsm->Fcvtzs(v3.V4S(), v3.V4S());

			// m_local.d[i].rb = r.trn1_16(b); // Yeah I know this isn't in GSVector since that's mainly targeting x86 for now
			armAsm->Trn1(v2.V8H(), v2.V8H(), v3.V8H());
			armAsm->Str(v2, _local(d[i].rb));

@TJnotJT
Copy link
Contributor Author

TJnotJT commented Nov 15, 2025

arm has non-saturating pack instructions, so no separate masking is needed

GH doesn't let me suggest edits on existing code, but the entire second section can be replaced with

Sounds good, I made the suggested changes in a new commit so it's clear where it diverged from x64. When you have a chance, let me know if it looks kosher (along with the amended comments).

@TJnotJT TJnotJT force-pushed the gs-sw-gradient-mask branch from 82fe3dd to 6025236 Compare November 15, 2025 12:49
This is more efficient on ARM, though the equivalent instructions are not currently used in the x64 JIT and C++ versions of GSVector.

Co-authored-by: TellowKrinkle
@TJnotJT TJnotJT force-pushed the gs-sw-gradient-mask branch from 6025236 to 0cf9ea8 Compare November 16, 2025 17:48
Copy link
Member

@TellowKrinkle TellowKrinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good assuming nothing breaks in the dump run

@TJnotJT
Copy link
Contributor Author

TJnotJT commented Nov 18, 2025

Dump run with SSE4 build came clean so removing draft status. Just to be safe I'll do an AVX2 run also.

Edit: If anyone is able to do an ARM dump run that would be highly appreciated.

@TJnotJT TJnotJT marked this pull request as ready for review November 18, 2025 20:15
Copy link
Member

@JordanTheToaster JordanTheToaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested a variety of games and all seems fine on Windows with AVX2.

@TJnotJT
Copy link
Contributor Author

TJnotJT commented Nov 20, 2025

Also did a AVX2 dump run - all looks good.

@JordanTheToaster JordanTheToaster added this to the Release 2.6 milestone Nov 25, 2025
Copy link
Contributor

@lightningterror lightningterror left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an sse4 dump and didn't notice any issues (700+ dumps showed diffs with no visual difference so maybe something could still slip).

@lightningterror lightningterror merged commit f322dfb into PCSX2:master Nov 26, 2025
12 checks passed
@TJnotJT
Copy link
Contributor Author

TJnotJT commented Nov 26, 2025

Did an sse4 dump and didn't notice any issues (700+ dumps showed diffs with no visual difference so maybe something could still slip).

Great, thanks for the testing. Yeah, many dumps will have small differences, but hopefully these are all slight improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: FFX-2 graphical artifacting in SW mode [BUG]: Crash Twinsanity: Odd sprite rough edges in Software

4 participants