Skip to content

Use random_state in svd for fbpca#227

Merged
MaxHalford merged 2 commits into
MaxHalford:masterfrom
rtomek:rtomek-patch-fbpca_random_state
Jun 10, 2026
Merged

Use random_state in svd for fbpca#227
MaxHalford merged 2 commits into
MaxHalford:masterfrom
rtomek:rtomek-patch-fbpca_random_state

Conversation

@rtomek

@rtomek rtomek commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

If the user passes a random_state to the pca calculation, we should actually use it for fbpca. There was a recent upstream change that caused the PCA output to always be the exact same even if we set a random_state.

If the user passes a random_state to the pca calculation, we should actually use it for fbpca. There was a recent upstream change that caused the PCA output to always be the exact same even if we set a random_state.
@MaxHalford

MaxHalford commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Hello @rtomek! When you say "upstream", what are you referring to?

My concern with your change is that it mutates the random state globally, which is not a good pattern. But we could use a context manager. But first, I'm not sure I understand what the issue is in the first place.

People shouldn't be using the old global rng state methods anymore, but let's keep the affected random state local to the svd method from fbpca.
@rtomek

rtomek commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

I realized that it affected the global state after I pushed it. So yeah I had the same thought as you so I added a way to save the state.

There is an assumption that setting the random_state does something rather than nothing unless otherwise documented. I used it with the intent to make something reproducable (or deliberately change it) and I wasn't able to control the output of fbpca. Something else in my code was unknowingly controlling the the output of this function. This allows one to narrowly control the reproducability of this specific function if they choose to - which seems to be the intent of even having random_state as an argument.

For scipy, it uses LAPACK as a backend for calculating the svd so as far as I can tell we don't have control over that method.

@rtomek

rtomek commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Also, I said 'upstream' in that as far as I could tell, this issue came up after updating other 3rd party packages. I had two different virtual environments with the same version of prince exhibiting different behavior on the exact same code. However, after looking into this some more, I'm even more confused than before why setting random_state with fbpca ever worked, it must be something on my end that I didn't notice.

@MaxHalford

Copy link
Copy Markdown
Owner

Hey @rtomek, thank you for this additional info! For sure your change will bring more reproducibility.

I made a suggestion here to add some unit tests. I'll let you merge it before we merge your PR.

@MaxHalford MaxHalford merged commit 0fd0fe6 into MaxHalford:master Jun 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants