Skip to content

Conversation

@p-slash
Copy link
Collaborator

@p-slash p-slash commented Feb 20, 2025

No description provided.

@codecov
Copy link

codecov bot commented Feb 20, 2025

Codecov Report

Attention: Patch coverage is 0% with 68 lines in your changes missing coverage. Please review.

Project coverage is 39.15%. Comparing base (28f612f) to head (f0c3207).

Files with missing lines Patch % Lines
vega/xi_multipoles.py 0.00% 68 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #140      +/-   ##
==========================================
- Coverage   39.89%   39.15%   -0.74%     
==========================================
  Files          30       31       +1     
  Lines        3645     3713      +68     
  Branches      678      687       +9     
==========================================
  Hits         1454     1454              
- Misses       2047     2115      +68     
  Partials      144      144              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new module xi_multipoles.py that provides functionality to compress correlation function data into multipole representations using Legendre polynomial decomposition.

Key Changes

  • Adds XiMultipoles class for fitting correlation function data to Legendre multipole basis
  • Implements automatic overfitting detection with chi-squared criterion
  • Provides Monte Carlo error estimation capabilities
Comments suppressed due to low confidence (1)

vega/xi_multipoles.py:57

  • Variable idx is not used.
        idx = ((rr - self.rmin) / self.dr).astype(int)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self.xi_knots_fit = np.zeros((nl, nr))
self.std_xi_mc = np.zeros((nl, nr))

def __call__(self, r, mu, xi_knots=None):
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for the __call__ method. The method should include documentation explaining parameters (r, mu, xi_knots) and return value.

Suggested change
def __call__(self, r, mu, xi_knots=None):
def __call__(self, r, mu, xi_knots=None):
"""
Evaluate the multipole expansion of the correlation function at given (r, mu) values.
Parameters
----------
r : array_like
Radial distances at which to evaluate the correlation function.
mu : array_like
Cosine of the angle to the line of sight for each r.
xi_knots : array_like, optional
Precomputed multipole coefficients to use for evaluation. If None, uses self.xi_knots_fit.
Returns
-------
results : ndarray
The evaluated correlation function at the specified (r, mu) values.
"""

Copilot uses AI. Check for mistakes.

return results

def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for the fit method. The method should include documentation for all parameters (rdata, mudata, xi_data, cov_data, mumax, nmc, seed, return_mcs) and return value, following the NumPy/SciPy docstring convention used elsewhere in the codebase.

Suggested change
def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):
def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):
"""
Fit the multipole model to the input data using least squares and optionally Monte Carlo sampling.
Parameters
----------
rdata : array_like
Array of radial distances of the data points.
mudata : array_like
Array of cosine of angles (mu) for the data points.
xi_data : array_like
Array of measured correlation function values at (rdata, mudata).
cov_data : array_like
Covariance matrix of the measured correlation function values.
mumax : float, optional
Maximum value of mu to include in the fit (default is 0.999).
nmc : int, optional
Number of Monte Carlo samples to draw for error estimation (default is 0).
seed : int, optional
Random seed for Monte Carlo sampling (default is 0).
return_mcs : bool, optional
If True, return the Monte Carlo samples of the fitted multipoles (default is False).
Returns
-------
chi2 : float
The chi-squared value of the fit.
xi_mcs : ndarray, optional
If `return_mcs` is True, returns the Monte Carlo samples of the fitted multipoles
as an array of shape (nmc, nl, nr).
"""

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +70
if d > 0:
n2 = min(self.nr - 1, n1 + 1)
else:
n2 = max(0, n1 - 1)

for ell in range(self.nl):
matrix_A[i, n1 + ell * self.nr] += (1.0 - d) * self.leg_polys[ell](mumu[i])
matrix_A[i, n2 + ell * self.nr] += d * self.leg_polys[ell](mumu[i])
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic error in interpolation weights: When d <= 0, the code sets n2 = max(0, n1 - 1) and then applies weight d to n2. However, when d = 0, the weight (1.0 - d) correctly gives full weight to n1, but the weight d (which is 0) is still applied to n2. When d < 0 (which shouldn't happen based on the calculation but isn't explicitly prevented), the logic becomes problematic. Consider handling the d == 0 case separately or ensuring d is always non-negative, or using abs(d) for the second weight.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +70
for ell in range(self.nl):
matrix_A[i, n1 + ell * self.nr] += (1.0 - d) * self.leg_polys[ell](mumu[i])
matrix_A[i, n2 + ell * self.nr] += d * self.leg_polys[ell](mumu[i])
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Performance concern: Legendre polynomials are evaluated inside a loop for each data point. Since self.leg_polys[ell] is a callable that computes the polynomial, calling it repeatedly for the same mu values (lines 69-70) can be inefficient. Consider pre-computing the Legendre polynomial values for all unique mumu values before the loop, especially if there are repeated values or if the dataset is large.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +11
class XiMultipoles():
@classmethod
def getOverfit(
cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8
):
""" Find the nl that overfits the data such that chi2 < chi_c.
"""
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Magic number without explanation: The value 0.8 is used as a default for chi_c (chi-squared criterion) without documentation. Consider documenting what this threshold represents (e.g., reduced chi-squared target) or defining it as a named constant.

Suggested change
class XiMultipoles():
@classmethod
def getOverfit(
cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8
):
""" Find the nl that overfits the data such that chi2 < chi_c.
"""
# Default reduced chi-squared target for overfitting criterion
DEFAULT_CHI_SQUARED_CRITERION = 0.8
class XiMultipoles():
@classmethod
def getOverfit(
cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=DEFAULT_CHI_SQUARED_CRITERION
):
"""
Find the nl that overfits the data such that chi2 < chi_c.
Parameters
----------
chi_c : float
Reduced chi-squared target for overfitting criterion (default: 0.8).
"""

Copilot uses AI. Check for mistakes.

return xi_multipoles

def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for the __init__ method. Following the pattern in other classes in this codebase, the __init__ method should include a docstring documenting all parameters (r1, r2, nr, nl).

Suggested change
def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):
def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):
"""
Initialize the XiMultipoles object.
Parameters
----------
r1 : float, optional
Minimum value of the separation grid (default: 42.0).
r2 : float, optional
Maximum value of the separation grid (default: 178.0).
nr : int, optional
Number of separation grid points (default: 35).
nl : int, optional
Number of Legendre multipoles (default: 6).
"""

Copilot uses AI. Check for mistakes.
Nell = self.nr * self.nl
Nr = rr.size
matrix_A = np.zeros((Nr, Nell))
idx = ((rr - self.rmin) / self.dr).astype(int)
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: variable idx is computed but never used. Line 57 calculates idx = ((rr - self.rmin) / self.dr).astype(int) but this value is not referenced anywhere in the subsequent code. The actual indexing uses a different calculation starting at line 60 with variable d. Either this variable should be removed, or there's a logic error in how indices are calculated.

Suggested change
idx = ((rr - self.rmin) / self.dr).astype(int)

Copilot uses AI. Check for mistakes.
class XiMultipoles():
@classmethod
def getOverfit(
cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Magic number without explanation: The value 0.999 is used as a default for mumax without documentation. Consider either documenting why this specific value is chosen in the method docstring, or defining it as a named constant with an explanatory comment (e.g., DEFAULT_MU_MAX = 0.999 # Exclude extreme mu values close to 1).

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +18
"""
for nl in range(nlmin, nlmax):
xi_multipoles = cls(nl=nl)
chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax)
if chi2 < chi_c:
print("Converged for nl=", nl)
break

Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling: If the loop completes without finding an nl that satisfies chi2 < chi_c, the method returns xi_multipoles which would be undefined (from the last iteration where the condition wasn't met). This could lead to returning a poorly-fitted model without warning. Consider either raising an exception or logging a warning if convergence is not achieved within the range [nlmin, nlmax].

Suggested change
"""
for nl in range(nlmin, nlmax):
xi_multipoles = cls(nl=nl)
chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax)
if chi2 < chi_c:
print("Converged for nl=", nl)
break
"""
converged = False
for nl in range(nlmin, nlmax):
xi_multipoles = cls(nl=nl)
chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax)
if chi2 < chi_c:
print("Converged for nl=", nl)
converged = True
break
if not converged:
raise RuntimeError(
f"getOverfit: No nl in range [{nlmin}, {nlmax}) found with chi2 < {chi_c}. "
f"Last chi2={chi2} for nl={nl}."
)

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +88
np.zeros(rdata.size), cov=cov_data, size=nmc
)[:, w]
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential memory issue: The Monte Carlo simulation generates nmc samples of size rdata.size, which could be memory-intensive for large datasets. The random samples are generated with full size rdata.size and then filtered with [:, w], which is inefficient. Consider generating the random samples only for the filtered data points to reduce memory usage: rng.multivariate_normal(np.zeros(w.sum()), cov=cov_in, size=nmc).

Suggested change
np.zeros(rdata.size), cov=cov_data, size=nmc
)[:, w]
np.zeros(w.sum()), cov=cov_in, size=nmc
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant