-
Notifications
You must be signed in to change notification settings - Fork 5
Create xi_multipoles.py to compress original auto-cf to its multipoles #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #140 +/- ##
==========================================
- Coverage 39.89% 39.15% -0.74%
==========================================
Files 30 31 +1
Lines 3645 3713 +68
Branches 678 687 +9
==========================================
Hits 1454 1454
- Misses 2047 2115 +68
Partials 144 144 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new module xi_multipoles.py that provides functionality to compress correlation function data into multipole representations using Legendre polynomial decomposition.
Key Changes
- Adds
XiMultipolesclass for fitting correlation function data to Legendre multipole basis - Implements automatic overfitting detection with chi-squared criterion
- Provides Monte Carlo error estimation capabilities
Comments suppressed due to low confidence (1)
vega/xi_multipoles.py:57
- Variable idx is not used.
idx = ((rr - self.rmin) / self.dr).astype(int)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.xi_knots_fit = np.zeros((nl, nr)) | ||
| self.std_xi_mc = np.zeros((nl, nr)) | ||
|
|
||
| def __call__(self, r, mu, xi_knots=None): |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for the __call__ method. The method should include documentation explaining parameters (r, mu, xi_knots) and return value.
| def __call__(self, r, mu, xi_knots=None): | |
| def __call__(self, r, mu, xi_knots=None): | |
| """ | |
| Evaluate the multipole expansion of the correlation function at given (r, mu) values. | |
| Parameters | |
| ---------- | |
| r : array_like | |
| Radial distances at which to evaluate the correlation function. | |
| mu : array_like | |
| Cosine of the angle to the line of sight for each r. | |
| xi_knots : array_like, optional | |
| Precomputed multipole coefficients to use for evaluation. If None, uses self.xi_knots_fit. | |
| Returns | |
| ------- | |
| results : ndarray | |
| The evaluated correlation function at the specified (r, mu) values. | |
| """ |
|
|
||
| return results | ||
|
|
||
| def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False): |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for the fit method. The method should include documentation for all parameters (rdata, mudata, xi_data, cov_data, mumax, nmc, seed, return_mcs) and return value, following the NumPy/SciPy docstring convention used elsewhere in the codebase.
| def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False): | |
| def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False): | |
| """ | |
| Fit the multipole model to the input data using least squares and optionally Monte Carlo sampling. | |
| Parameters | |
| ---------- | |
| rdata : array_like | |
| Array of radial distances of the data points. | |
| mudata : array_like | |
| Array of cosine of angles (mu) for the data points. | |
| xi_data : array_like | |
| Array of measured correlation function values at (rdata, mudata). | |
| cov_data : array_like | |
| Covariance matrix of the measured correlation function values. | |
| mumax : float, optional | |
| Maximum value of mu to include in the fit (default is 0.999). | |
| nmc : int, optional | |
| Number of Monte Carlo samples to draw for error estimation (default is 0). | |
| seed : int, optional | |
| Random seed for Monte Carlo sampling (default is 0). | |
| return_mcs : bool, optional | |
| If True, return the Monte Carlo samples of the fitted multipoles (default is False). | |
| Returns | |
| ------- | |
| chi2 : float | |
| The chi-squared value of the fit. | |
| xi_mcs : ndarray, optional | |
| If `return_mcs` is True, returns the Monte Carlo samples of the fitted multipoles | |
| as an array of shape (nmc, nl, nr). | |
| """ |
| if d > 0: | ||
| n2 = min(self.nr - 1, n1 + 1) | ||
| else: | ||
| n2 = max(0, n1 - 1) | ||
|
|
||
| for ell in range(self.nl): | ||
| matrix_A[i, n1 + ell * self.nr] += (1.0 - d) * self.leg_polys[ell](mumu[i]) | ||
| matrix_A[i, n2 + ell * self.nr] += d * self.leg_polys[ell](mumu[i]) |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic error in interpolation weights: When d <= 0, the code sets n2 = max(0, n1 - 1) and then applies weight d to n2. However, when d = 0, the weight (1.0 - d) correctly gives full weight to n1, but the weight d (which is 0) is still applied to n2. When d < 0 (which shouldn't happen based on the calculation but isn't explicitly prevented), the logic becomes problematic. Consider handling the d == 0 case separately or ensuring d is always non-negative, or using abs(d) for the second weight.
| for ell in range(self.nl): | ||
| matrix_A[i, n1 + ell * self.nr] += (1.0 - d) * self.leg_polys[ell](mumu[i]) | ||
| matrix_A[i, n2 + ell * self.nr] += d * self.leg_polys[ell](mumu[i]) |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Performance concern: Legendre polynomials are evaluated inside a loop for each data point. Since self.leg_polys[ell] is a callable that computes the polynomial, calling it repeatedly for the same mu values (lines 69-70) can be inefficient. Consider pre-computing the Legendre polynomial values for all unique mumu values before the loop, especially if there are repeated values or if the dataset is large.
| class XiMultipoles(): | ||
| @classmethod | ||
| def getOverfit( | ||
| cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8 | ||
| ): | ||
| """ Find the nl that overfits the data such that chi2 < chi_c. | ||
| """ |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Magic number without explanation: The value 0.8 is used as a default for chi_c (chi-squared criterion) without documentation. Consider documenting what this threshold represents (e.g., reduced chi-squared target) or defining it as a named constant.
| class XiMultipoles(): | |
| @classmethod | |
| def getOverfit( | |
| cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8 | |
| ): | |
| """ Find the nl that overfits the data such that chi2 < chi_c. | |
| """ | |
| # Default reduced chi-squared target for overfitting criterion | |
| DEFAULT_CHI_SQUARED_CRITERION = 0.8 | |
| class XiMultipoles(): | |
| @classmethod | |
| def getOverfit( | |
| cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=DEFAULT_CHI_SQUARED_CRITERION | |
| ): | |
| """ | |
| Find the nl that overfits the data such that chi2 < chi_c. | |
| Parameters | |
| ---------- | |
| chi_c : float | |
| Reduced chi-squared target for overfitting criterion (default: 0.8). | |
| """ |
|
|
||
| return xi_multipoles | ||
|
|
||
| def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6): |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for the __init__ method. Following the pattern in other classes in this codebase, the __init__ method should include a docstring documenting all parameters (r1, r2, nr, nl).
| def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6): | |
| def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6): | |
| """ | |
| Initialize the XiMultipoles object. | |
| Parameters | |
| ---------- | |
| r1 : float, optional | |
| Minimum value of the separation grid (default: 42.0). | |
| r2 : float, optional | |
| Maximum value of the separation grid (default: 178.0). | |
| nr : int, optional | |
| Number of separation grid points (default: 35). | |
| nl : int, optional | |
| Number of Legendre multipoles (default: 6). | |
| """ |
| Nell = self.nr * self.nl | ||
| Nr = rr.size | ||
| matrix_A = np.zeros((Nr, Nell)) | ||
| idx = ((rr - self.rmin) / self.dr).astype(int) |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: variable idx is computed but never used. Line 57 calculates idx = ((rr - self.rmin) / self.dr).astype(int) but this value is not referenced anywhere in the subsequent code. The actual indexing uses a different calculation starting at line 60 with variable d. Either this variable should be removed, or there's a logic error in how indices are calculated.
| idx = ((rr - self.rmin) / self.dr).astype(int) |
| class XiMultipoles(): | ||
| @classmethod | ||
| def getOverfit( | ||
| cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8 |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Magic number without explanation: The value 0.999 is used as a default for mumax without documentation. Consider either documenting why this specific value is chosen in the method docstring, or defining it as a named constant with an explanatory comment (e.g., DEFAULT_MU_MAX = 0.999 # Exclude extreme mu values close to 1).
| """ | ||
| for nl in range(nlmin, nlmax): | ||
| xi_multipoles = cls(nl=nl) | ||
| chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax) | ||
| if chi2 < chi_c: | ||
| print("Converged for nl=", nl) | ||
| break | ||
|
|
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling: If the loop completes without finding an nl that satisfies chi2 < chi_c, the method returns xi_multipoles which would be undefined (from the last iteration where the condition wasn't met). This could lead to returning a poorly-fitted model without warning. Consider either raising an exception or logging a warning if convergence is not achieved within the range [nlmin, nlmax].
| """ | |
| for nl in range(nlmin, nlmax): | |
| xi_multipoles = cls(nl=nl) | |
| chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax) | |
| if chi2 < chi_c: | |
| print("Converged for nl=", nl) | |
| break | |
| """ | |
| converged = False | |
| for nl in range(nlmin, nlmax): | |
| xi_multipoles = cls(nl=nl) | |
| chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax) | |
| if chi2 < chi_c: | |
| print("Converged for nl=", nl) | |
| converged = True | |
| break | |
| if not converged: | |
| raise RuntimeError( | |
| f"getOverfit: No nl in range [{nlmin}, {nlmax}) found with chi2 < {chi_c}. " | |
| f"Last chi2={chi2} for nl={nl}." | |
| ) |
| np.zeros(rdata.size), cov=cov_data, size=nmc | ||
| )[:, w] |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential memory issue: The Monte Carlo simulation generates nmc samples of size rdata.size, which could be memory-intensive for large datasets. The random samples are generated with full size rdata.size and then filtered with [:, w], which is inefficient. Consider generating the random samples only for the filtered data points to reduce memory usage: rng.multivariate_normal(np.zeros(w.sum()), cov=cov_in, size=nmc).
| np.zeros(rdata.size), cov=cov_data, size=nmc | |
| )[:, w] | |
| np.zeros(w.sum()), cov=cov_in, size=nmc | |
| ) |
No description provided.