Skip to content

Conversation

@luanamarinho
Copy link

Changes
  • Refactored MultiscaleMixture:
  • Vectorize perplexity validation using NumPy
  • Enhanced validation with edge case handling
Tests
  • Added test for MultiscaleMixture.check_perplexities

@pavlin-policar
Copy link
Owner

Thanks for the PR. However, I'm not entirely sure what problem this is solving. Could you please provide a paragraph or so explaining the need for these changes and what problems they solve.

From the code, it seems like there are a whole lot of minor changes. The biggest change seems to be that duplicates are now removed from perplexity lists and they are sorted, is that right? Some of these changes are certainly welcome, but I'm not entirely sure all are needed.

Could you also please take a look at the failing tests on Windows? The error Windows fatal exception: access violation is cryptic, but it appears to consistently fail at the added test, so there must be something there that Winsdows doesn't like.

@luanamarinho
Copy link
Author

Hi Pavlin. Thanks for reviewing the PR so promptly. You' right to note these changes focus on broader quality improvements, rather than targeting a specific bug. The commits mean to make the class more robust, while maintaining all existing functionality. Here's a summary of the key modifications:

  1. Validation centralization
    Before: Perplexity validation was repetitive and scattered across __init__, set_perplexities, and to_new, with redundant min(n_samples - 1, int(3 * max_perplexity), violating SRP and DRY principles.
    Now: check_perplexities is the single authoritative source for:
    Input type checking (np.ndarray conversion);
    Dimensionality handling (.ravel() for scalars);
    Value validation (positivity, max bounds) - including that of knn.
    Impact: Changes to validation logic now require edits in only one place.
  2. Robust perplexity management in set_perplexities
    Fixed flawed deduplication (silent bug);
    Moved bounds validation entirely into check_perplexities (previously done ad-hoc): SRP and DRY principles;
    Added early validation to fail fast - potential RuntimeErrors are now caught during initial checks instead of mid-execution.
  3. Optimized check_perplexities implementation
    Replaced manual loops with vectorized NumPy operations - improved efficiency;
    Simplified logic while adding proper edge case handling - fail fast.
  4. Defensive Programming
    Wrapped all critical operations in structured error handling for better debuggability
    Ensured validation occurs before expensive computation begins - fail fast

These changes eliminate subtle maintenance hazards while improving efficiency. I’m currently verifying the Windows-specific issue separately and will update the PR accordingly. Happy to discuss or adjust any aspect!

Kind regards,
Luana

@pavlin-policar pavlin-policar force-pushed the master branch 2 times, most recently from 107c846 to 80d4f3d Compare May 27, 2025 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants