Use random_state in svd for fbpca#227
Conversation
If the user passes a random_state to the pca calculation, we should actually use it for fbpca. There was a recent upstream change that caused the PCA output to always be the exact same even if we set a random_state.
|
Hello @rtomek! When you say "upstream", what are you referring to? My concern with your change is that it mutates the random state globally, which is not a good pattern. But we could use a context manager. But first, I'm not sure I understand what the issue is in the first place. |
People shouldn't be using the old global rng state methods anymore, but let's keep the affected random state local to the svd method from fbpca.
|
I realized that it affected the global state after I pushed it. So yeah I had the same thought as you so I added a way to save the state. There is an assumption that setting the random_state does something rather than nothing unless otherwise documented. I used it with the intent to make something reproducable (or deliberately change it) and I wasn't able to control the output of fbpca. Something else in my code was unknowingly controlling the the output of this function. This allows one to narrowly control the reproducability of this specific function if they choose to - which seems to be the intent of even having random_state as an argument. For scipy, it uses LAPACK as a backend for calculating the svd so as far as I can tell we don't have control over that method. |
|
Also, I said 'upstream' in that as far as I could tell, this issue came up after updating other 3rd party packages. I had two different virtual environments with the same version of prince exhibiting different behavior on the exact same code. However, after looking into this some more, I'm even more confused than before why setting random_state with fbpca ever worked, it must be something on my end that I didn't notice. |
If the user passes a random_state to the pca calculation, we should actually use it for fbpca. There was a recent upstream change that caused the PCA output to always be the exact same even if we set a random_state.