Conversation
Reviewer's GuideThis PR integrates a new experimental design planner into the project by adding utilities for dynamic experiment loading, instrument noise simulation, and a Bayesian optimization engine (ExperimentDesigner) for neutron reflectometry, along with example models and tests; it also updates project dependencies to support new components. Class diagram for ExperimentDesigner and related typesclassDiagram
class ExperimentDesigner {
- experiment: Experiment
- problem: FitProblem
- parameters_of_interest: Optional[List[str]]
- parameters: Dict[str, SampleParameter]
- all_model_parameters: Dict[str, Parameter]
+ __init__(experiment, parameters_of_interest)
+ __repr__()
+ set_parameter_to_optimize(param_name, value)
+ get_parameters()
+ prior_entropy()
+ perform_mcmc(q_values, noisy_reflectivity, errors, mcmc_steps, burn_steps)
+ _extract_marginal_samples(mcmc_samples)
+ _calculate_posterior_entropy_mvn(mcmc_samples)
+ _calculate_posterior_entropy_kdn(mcmc_samples)
+ _calculate_posterior_entropy(mcmc_samples, method)
+ optimize(param_to_optimize, param_values, parameters_of_interest, realizations, mcmc_steps, entropy_method, counting_time)
}
class SampleParameter {
+ name: str
+ value: float
+ bounds: Tuple[float, float]
+ is_of_interest: bool = True
+ h_prior: float = 0
+ h_posterior: float = 0
}
ExperimentDesigner --> SampleParameter
ExperimentDesigner --> FitProblem
FitProblem --> Experiment
SampleParameter <|-- BaseModel
Class diagram for add_instrumental_noise utilityclassDiagram
class add_instrumental_noise {
+ q_values: np.ndarray
+ reflectivity_curve: np.ndarray
+ counting_time: float = 1.0
+ returns: Tuple[np.ndarray, np.ndarray]
}
Class diagram for create_fit_experiment model loaderclassDiagram
class create_fit_experiment {
+ q: np.ndarray
+ dq: np.ndarray
+ data: np.ndarray
+ errors: np.ndarray
+ returns: Experiment
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- In expt_from_model_file, avoid injecting into sys.path—use importlib.util.spec_from_file_location to load the module directly by path, which is safer and doesn’t pollute the import namespace.
- The ExperimentDesigner.optimize (and related methods) use print statements for progress and results—consider switching to the logger or exposing a verbosity flag so output can be captured or silenced consistently.
- test_planner.py has no assertions and relies on random noise—add proper assert-based checks and seed the RNG (or allow injecting a seed) to make tests deterministic and verifiable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In expt_from_model_file, avoid injecting into sys.path—use importlib.util.spec_from_file_location to load the module directly by path, which is safer and doesn’t pollute the import namespace.
- The ExperimentDesigner.optimize (and related methods) use print statements for progress and results—consider switching to the logger or exposing a verbosity flag so output can be captured or silenced consistently.
- test_planner.py has no assertions and relies on random noise—add proper assert-based checks and seed the RNG (or allow injecting a seed) to make tests deterministic and verifiable.
## Individual Comments
### Comment 1
<location> `analyzer_tools/planner/experiment_design.py:204-205` </location>
<code_context>
+ """
+ # Prepare FitProblem
+
+ probe = QProbe(q_values, q_values * 0.025, R=noisy_reflectivity, dR=errors)
+ expt = Experiment(sample=self.experiment.sample, probe=probe)
+ problem = FitProblem(expt)
+ problem.model_update()
</code_context>
<issue_to_address>
**suggestion:** Hardcoded Q resolution scaling may not generalize to all instruments.
Consider making the Q resolution scaling factor configurable to accommodate different experimental setups.
Suggested implementation:
```python
"""
# Prepare FitProblem
probe = QProbe(q_values, q_values * q_resolution_scale, R=noisy_reflectivity, dR=errors)
expt = Experiment(sample=self.experiment.sample, probe=probe)
problem = FitProblem(expt)
problem.model_update()
```
```python
errors: Error bars
mcmc_steps: Number of MCMC steps
burn_steps: Number of burn-in steps
q_resolution_scale: Scaling factor for Q resolution (default: 0.025)
Returns:
Tuple of (2D numpy array of MCMC samples, list of parameter names)
"""
```
```python
def your_function_name(self, q_values, noisy_reflectivity, errors, mcmc_steps, burn_steps, q_resolution_scale=0.025):
```
- Replace `your_function_name` with the actual function name in your code.
- Update all calls to this function elsewhere in your codebase to pass the appropriate `q_resolution_scale` if a value other than the default is needed.
</issue_to_address>
### Comment 2
<location> `analyzer_tools/planner/experiment_design.py:297-306` </location>
<code_context>
+ def _calculate_posterior_entropy_kdn(self, mcmc_samples: np.ndarray) -> float:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** KDE fallback to MVN may mask underlying issues.
Consider logging exception details or improving error handling to ensure issues with KDE or sample data are visible.
Suggested implementation:
```python
import logging
def _calculate_posterior_entropy_kdn(self, mcmc_samples: np.ndarray) -> float:
"""
Calculate posterior entropy using kernel density estimation.
Args:
mcmc_samples: 2D array of MCMC samples
Returns:
Posterior entropy in bits
"""
if mcmc_samples.ndim != 2 or mcmc_samples.shape[0] < 2:
```
```python
try:
kde = gaussian_kde(mcmc_samples.T)
log_probs = kde.logpdf(mcmc_samples.T)
entropy_nats = -np.mean(log_probs)
return entropy_nats / np.log(2)
except Exception as e:
# Fallback to multivariate normal entropy if KDE fails
logging.error("KDE failed in _calculate_posterior_entropy_kdn: %s", str(e), exc_info=True)
print("Warning: KDE failed, falling back to MVN entropy. Exception: {}".format(e))
cov_matrix = np.cov(mcmc_samples, rowvar=False)
cov_matrix += 1e-10 * np.eye(cov_matrix.shape[0]) # Regularize
entropy_nats = multivariate_normal.entropy(cov=cov_matrix)
return entropy_nats / np.log(2)
```
</issue_to_address>
### Comment 3
<location> `analyzer_tools/planner/instrument.py:23-24` </location>
<code_context>
+ Tuple of (noisy_reflectivity, errors)
+ """
+ # Simple noise model: relative error inversely proportional to sqrt(R * counting_time)
+ min_relative_error = 0.01 # 1% minimum
+ base_relative_error = 0.05 # 5% base error
+
+ # Calculate relative errors - more realistic model
</code_context>
<issue_to_address>
**suggestion:** Hardcoded error parameters may limit flexibility.
Consider making min_relative_error and base_relative_error configurable via function arguments to support varying instrument requirements.
Suggested implementation:
```python
Args:
q_values: Q values
reflectivity_curve: Noise-free reflectivity
counting_time: Relative counting time (affects noise level)
min_relative_error: Minimum relative error (default 0.01)
base_relative_error: Base relative error (default 0.05)
Returns:
Tuple of (noisy_reflectivity, errors)
"""
# Simple noise model: relative error inversely proportional to sqrt(R * counting_time)
# Calculate relative errors - more realistic model
relative_errors = np.maximum(
min_relative_error,
base_relative_error
/ np.sqrt(np.maximum(reflectivity_curve * counting_time, 1e-10)),
)
```
```python
```
You will also need to update the function definition to include `min_relative_error=0.01` and `base_relative_error=0.05` as keyword arguments. For example:
```python
def your_function_name(q_values, reflectivity_curve, counting_time, min_relative_error=0.01, base_relative_error=0.05):
```
And update any calls to this function elsewhere in your codebase if you want to specify these parameters.
</issue_to_address>
### Comment 4
<location> `analyzer_tools/planner/instrument.py:34-35` </location>
<code_context>
+ )
+
+ # Add Q-dependent component (higher Q = higher relative error)
+ q_factor = 1 + 0.5 * (q_values / np.max(q_values)) ** 2
+ relative_errors *= q_factor
+
+ # Absolute errors
</code_context>
<issue_to_address>
**issue (bug_risk):** Division by zero risk if q_values contains only zeros.
Add a condition to handle cases where np.max(q_values) is zero to prevent division by zero errors.
</issue_to_address>
### Comment 5
<location> `analyzer_tools/utils/model_utils.py:296-302` </location>
<code_context>
+ -------
+ Experiment
+ """
+ model_path = Path(model_file).absolute()
+ model_file = model_path.stem
+
+ # Add project root to path to allow importing from 'models'
+ model_dir = str(model_path.parent.resolve())
+ if model_dir not in sys.path:
+ sys.path.insert(0, model_dir)
+
</code_context>
<issue_to_address>
**suggestion:** Modifying sys.path at runtime can have unintended side effects.
This change may impact module imports elsewhere, particularly in persistent or multi-threaded contexts. Prefer importlib or context management to limit scope and avoid global effects.
```suggestion
import importlib.util
model_path = Path(model_file).absolute()
model_file = model_path.stem
# Dynamically import the model module from the given file path
spec = importlib.util.spec_from_file_location(model_file, str(model_path))
model_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(model_module)
```
</issue_to_address>
### Comment 6
<location> `test_planner.py:23` </location>
<code_context>
+
+
+# Create base experiment
+# TODO: write a test for this.
+experiment = expt_from_model_file(model_name, q_values, dq_values)
+
</code_context>
<issue_to_address>
**issue (testing):** Missing actual test assertions and structure.
Please refactor this file to use a test framework and add assertions to validate expected outputs and error conditions.
</issue_to_address>
### Comment 7
<location> `test_planner.py:40-42` </location>
<code_context>
+# TODO: write a test for this.
+print(designer._model_parameters_to_dict())
+
+designer.set_parameter_to_optimize("THF rho", 5.0)
+
+results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
</code_context>
<issue_to_address>
**suggestion (testing):** No test for invalid parameter names in set_parameter_to_optimize.
Add a test that sets a non-existent parameter name and checks for the expected exception to verify error handling in ExperimentDesigner.
```suggestion
designer.set_parameter_to_optimize("THF rho", 5.0)
# Test: setting a non-existent parameter should raise an exception
import pytest
with pytest.raises((ValueError, KeyError)):
designer.set_parameter_to_optimize("nonexistent_param", 1.0)
results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
```
</issue_to_address>
### Comment 8
<location> `test_planner.py:42-44` </location>
<code_context>
+
+designer.set_parameter_to_optimize("THF rho", 5.0)
+
+results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
+
+# Print results
</code_context>
<issue_to_address>
**suggestion (testing):** No test for edge cases in optimize (e.g., empty param_values, invalid param_to_optimize).
Add tests for cases like empty param_values, invalid param_to_optimize, and realizations=0 to verify proper error handling and output.
```suggestion
results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
# Edge case tests for optimize
import pytest
# Test: empty param_values
try:
designer.optimize(param_to_optimize="THF rho", param_values=[], realizations=5)
print("FAILED: optimize did not raise error for empty param_values")
except Exception as e:
print(f"PASSED: optimize raised error for empty param_values: {e}")
# Test: invalid param_to_optimize
try:
designer.optimize(param_to_optimize="INVALID_PARAM", param_values=[1.0, 2.0], realizations=5)
print("FAILED: optimize did not raise error for invalid param_to_optimize")
except Exception as e:
print(f"PASSED: optimize raised error for invalid param_to_optimize: {e}")
# Test: realizations=0
try:
designer.optimize(param_to_optimize="THF rho", param_values=[1.0, 2.0], realizations=0)
print("FAILED: optimize did not raise error for realizations=0")
except Exception as e:
print(f"PASSED: optimize raised error for realizations=0: {e}")
# Print results
```
</issue_to_address>
### Comment 9
<location> `test_planner.py:24` </location>
<code_context>
+
+# Create base experiment
+# TODO: write a test for this.
+experiment = expt_from_model_file(model_name, q_values, dq_values)
+
+designer = ExperimentDesigner(experiment)
</code_context>
<issue_to_address>
**suggestion (testing):** No test for error handling in expt_from_model_file.
Add a test for error cases in expt_from_model_file, including non-existent model files and invalid q/dq arrays, to verify error handling.
Suggested implementation:
```python
# Create base experiment
# TODO: write a test for this.
experiment = expt_from_model_file(model_name, q_values, dq_values)
designer = ExperimentDesigner(experiment)
import pytest
def test_expt_from_model_file_errors():
# Test non-existent model file
with pytest.raises(FileNotFoundError):
expt_from_model_file("models/non_existent_model", q_values, dq_values)
# Test invalid q_values (empty array)
with pytest.raises(ValueError):
expt_from_model_file(model_name, np.array([]), dq_values)
# Test invalid dq_values (wrong shape)
with pytest.raises(ValueError):
expt_from_model_file(model_name, q_values, np.array([1, 2, 3]))
# Test q_values and dq_values of different lengths
with pytest.raises(ValueError):
expt_from_model_file(model_name, np.array([0.1, 0.2]), np.array([0.1]))
```
Make sure that `expt_from_model_file` raises appropriate exceptions (`FileNotFoundError`, `ValueError`) for these error cases. If it does not, you will need to update its implementation accordingly.
</issue_to_address>
### Comment 10
<location> `analyzer_tools/planner/experiment_design.py:145-152` </location>
<code_context>
def get_parameters(self):
"""
Get the variable parameters and their prior.
"""
# Extract parameters from the problem
parameters: Dict[str, SampleParameter] = {}
for param in self.problem.parameters:
is_of_interest = (
self.parameters_of_interest is None
or param.name in self.parameters_of_interest
)
p = SampleParameter(
name=param.name,
value=param.value,
bounds=param.bounds,
is_of_interest=is_of_interest,
)
parameters[param.name] = p
# Warn if parameters of interest not found
if self.parameters_of_interest:
missing_params = [
param
for param in self.parameters_of_interest
if param not in self.base_params
]
if len(missing_params) > 0:
logger.warning(f"Parameters {missing_params} not found in model")
logger.info(f"Available parameters: {list(self.base_params.keys())}")
return parameters
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Simplify sequence length comparison ([`simplify-len-comparison`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/simplify-len-comparison/))
</issue_to_address>
### Comment 11
<location> `analyzer_tools/utils/model_utils.py:308-309` </location>
<code_context>
def expt_from_model_file(
model_file: str,
q: np.ndarray,
dq: np.ndarray,
reflectivity: np.ndarray = None,
errors: np.ndarray = None,
) -> Experiment:
"""
Load an Experiment from a model file.
Parameters
----------
model_file : str
Python file containing a create_fit_experiment(q, dq, data, errors) function
q : np.ndarray
Q values
dq : np.ndarray
Q resolution values (1 sigma)
reflectivity : np.ndarray
Reflectivity values
errors : np.ndarray
Reflectivity error values
Returns
-------
Experiment
"""
model_path = Path(model_file).absolute()
model_file = model_path.stem
# Add project root to path to allow importing from 'models'
model_dir = str(model_path.parent.resolve())
if model_dir not in sys.path:
sys.path.insert(0, model_dir)
model_module = importlib.import_module(model_file)
create_experiment = model_module.create_fit_experiment
experiment = create_experiment(q, dq, reflectivity, errors)
return experiment
</code_context>
<issue_to_address>
**suggestion (code-quality):** Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
```suggestion
return create_experiment(q, dq, reflectivity, errors)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2 +/- ##
==========================================
+ Coverage 47.71% 55.96% +8.24%
==========================================
Files 12 17 +5
Lines 1203 1703 +500
==========================================
+ Hits 574 953 +379
- Misses 629 750 +121
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…error handling, importlib usage Co-authored-by: mdoucet <1108748+mdoucet@users.noreply.github.com>
…t structure with comprehensive assertions and edge case testing Co-authored-by: mdoucet <1108748+mdoucet@users.noreply.github.com>
…f-c821d45457a6 Address PR review comments: improve configurability, error handling, and testing
Summary by Sourcery
Add a new experiment planning feature set: dynamic model loading, a Bayesian optimizer with information‐gain calculations, noise simulation, an example model, and a basic test script
New Features:
Tests: