Skip to content

Add experiment planner#2

Merged
mdoucet merged 16 commits intomainfrom
planner
Oct 10, 2025
Merged

Add experiment planner#2
mdoucet merged 16 commits intomainfrom
planner

Conversation

@mdoucet
Copy link
Owner

@mdoucet mdoucet commented Sep 28, 2025

Summary by Sourcery

Add a new experiment planning feature set: dynamic model loading, a Bayesian optimizer with information‐gain calculations, noise simulation, an example model, and a basic test script

New Features:

  • Add expt_from_model_file utility for dynamically loading experiments from model files
  • Introduce ExperimentDesigner class and supporting modules for Bayesian experimental design optimization
  • Add add_instrumental_noise helper to simulate realistic reflectivity noise
  • Provide example model (cu_thf_planner.py) and initial test script (test_planner.py) for the planner

Tests:

  • Add test_planner.py to exercise the new planner functionality

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 28, 2025

Reviewer's Guide

This PR integrates a new experimental design planner into the project by adding utilities for dynamic experiment loading, instrument noise simulation, and a Bayesian optimization engine (ExperimentDesigner) for neutron reflectometry, along with example models and tests; it also updates project dependencies to support new components.

Class diagram for ExperimentDesigner and related types

classDiagram
    class ExperimentDesigner {
        - experiment: Experiment
        - problem: FitProblem
        - parameters_of_interest: Optional[List[str]]
        - parameters: Dict[str, SampleParameter]
        - all_model_parameters: Dict[str, Parameter]
        + __init__(experiment, parameters_of_interest)
        + __repr__()
        + set_parameter_to_optimize(param_name, value)
        + get_parameters()
        + prior_entropy()
        + perform_mcmc(q_values, noisy_reflectivity, errors, mcmc_steps, burn_steps)
        + _extract_marginal_samples(mcmc_samples)
        + _calculate_posterior_entropy_mvn(mcmc_samples)
        + _calculate_posterior_entropy_kdn(mcmc_samples)
        + _calculate_posterior_entropy(mcmc_samples, method)
        + optimize(param_to_optimize, param_values, parameters_of_interest, realizations, mcmc_steps, entropy_method, counting_time)
    }
    class SampleParameter {
        + name: str
        + value: float
        + bounds: Tuple[float, float]
        + is_of_interest: bool = True
        + h_prior: float = 0
        + h_posterior: float = 0
    }
    ExperimentDesigner --> SampleParameter
    ExperimentDesigner --> FitProblem
    FitProblem --> Experiment
    SampleParameter <|-- BaseModel
Loading

Class diagram for add_instrumental_noise utility

classDiagram
   class add_instrumental_noise {
        + q_values: np.ndarray
        + reflectivity_curve: np.ndarray
        + counting_time: float = 1.0
        + returns: Tuple[np.ndarray, np.ndarray]
    }
Loading

Class diagram for create_fit_experiment model loader

classDiagram
    class create_fit_experiment {
        + q: np.ndarray
        + dq: np.ndarray
        + data: np.ndarray
        + errors: np.ndarray
        + returns: Experiment
    }
Loading

File-Level Changes

Change Details Files
Added a helper to load Experiment objects from user-provided model files
  • Resolve and insert model path into sys.path
  • Dynamically import module and retrieve create_fit_experiment
  • Wrap parameters (q, dq, data, errors) and return Experiment
analyzer_tools/utils/model_utils.py
Updated project dependencies to support new planner components
  • Added pydantic for data validation
  • Added tqdm for progress bars
pyproject.toml
requirements.txt
Introduced the experiment planning module for Bayesian design optimization
  • Defined SampleParameter Pydantic model
  • Implemented ExperimentDesigner with parameter extraction, prior entropy, MCMC and posterior entropy methods
  • Added optimize method to iterate parameter values with noise realizations and compute information gain
  • Created add_instrumental_noise utility for realistic data simulation
analyzer_tools/planner/experiment_design.py
analyzer_tools/planner/instrument.py
Provided example model and end-to-end test harness
  • Added cu_thf_planner model defining create_fit_experiment
  • Created test_planner.py to exercise expt_from_model_file and ExperimentDesigner workflows
models/cu_thf_planner.py
test_planner.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In expt_from_model_file, avoid injecting into sys.path—use importlib.util.spec_from_file_location to load the module directly by path, which is safer and doesn’t pollute the import namespace.
  • The ExperimentDesigner.optimize (and related methods) use print statements for progress and results—consider switching to the logger or exposing a verbosity flag so output can be captured or silenced consistently.
  • test_planner.py has no assertions and relies on random noise—add proper assert-based checks and seed the RNG (or allow injecting a seed) to make tests deterministic and verifiable.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In expt_from_model_file, avoid injecting into sys.path—use importlib.util.spec_from_file_location to load the module directly by path, which is safer and doesn’t pollute the import namespace.
- The ExperimentDesigner.optimize (and related methods) use print statements for progress and results—consider switching to the logger or exposing a verbosity flag so output can be captured or silenced consistently.
- test_planner.py has no assertions and relies on random noise—add proper assert-based checks and seed the RNG (or allow injecting a seed) to make tests deterministic and verifiable.

## Individual Comments

### Comment 1
<location> `analyzer_tools/planner/experiment_design.py:204-205` </location>
<code_context>
+        """
+        # Prepare FitProblem
+
+        probe = QProbe(q_values, q_values * 0.025, R=noisy_reflectivity, dR=errors)
+        expt = Experiment(sample=self.experiment.sample, probe=probe)
+        problem = FitProblem(expt)
+        problem.model_update()
</code_context>

<issue_to_address>
**suggestion:** Hardcoded Q resolution scaling may not generalize to all instruments.

Consider making the Q resolution scaling factor configurable to accommodate different experimental setups.

Suggested implementation:

```python
        """
        # Prepare FitProblem

        probe = QProbe(q_values, q_values * q_resolution_scale, R=noisy_reflectivity, dR=errors)
        expt = Experiment(sample=self.experiment.sample, probe=probe)
        problem = FitProblem(expt)
        problem.model_update()

```

```python
            errors: Error bars
            mcmc_steps: Number of MCMC steps
            burn_steps: Number of burn-in steps
            q_resolution_scale: Scaling factor for Q resolution (default: 0.025)

        Returns:
            Tuple of (2D numpy array of MCMC samples, list of parameter names)
        """

```

```python
def your_function_name(self, q_values, noisy_reflectivity, errors, mcmc_steps, burn_steps, q_resolution_scale=0.025):

```

- Replace `your_function_name` with the actual function name in your code.
- Update all calls to this function elsewhere in your codebase to pass the appropriate `q_resolution_scale` if a value other than the default is needed.
</issue_to_address>

### Comment 2
<location> `analyzer_tools/planner/experiment_design.py:297-306` </location>
<code_context>
+    def _calculate_posterior_entropy_kdn(self, mcmc_samples: np.ndarray) -> float:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** KDE fallback to MVN may mask underlying issues.

Consider logging exception details or improving error handling to ensure issues with KDE or sample data are visible.

Suggested implementation:

```python
    import logging

    def _calculate_posterior_entropy_kdn(self, mcmc_samples: np.ndarray) -> float:
        """
        Calculate posterior entropy using kernel density estimation.

        Args:
            mcmc_samples: 2D array of MCMC samples

        Returns:
            Posterior entropy in bits
        """
        if mcmc_samples.ndim != 2 or mcmc_samples.shape[0] < 2:

```

```python
        try:
            kde = gaussian_kde(mcmc_samples.T)
            log_probs = kde.logpdf(mcmc_samples.T)
            entropy_nats = -np.mean(log_probs)
            return entropy_nats / np.log(2)
        except Exception as e:
            # Fallback to multivariate normal entropy if KDE fails
            logging.error("KDE failed in _calculate_posterior_entropy_kdn: %s", str(e), exc_info=True)
            print("Warning: KDE failed, falling back to MVN entropy. Exception: {}".format(e))
            cov_matrix = np.cov(mcmc_samples, rowvar=False)
            cov_matrix += 1e-10 * np.eye(cov_matrix.shape[0])  # Regularize
            entropy_nats = multivariate_normal.entropy(cov=cov_matrix)
            return entropy_nats / np.log(2)

```
</issue_to_address>

### Comment 3
<location> `analyzer_tools/planner/instrument.py:23-24` </location>
<code_context>
+        Tuple of (noisy_reflectivity, errors)
+    """
+    # Simple noise model: relative error inversely proportional to sqrt(R * counting_time)
+    min_relative_error = 0.01  # 1% minimum
+    base_relative_error = 0.05  # 5% base error
+
+    # Calculate relative errors - more realistic model
</code_context>

<issue_to_address>
**suggestion:** Hardcoded error parameters may limit flexibility.

Consider making min_relative_error and base_relative_error configurable via function arguments to support varying instrument requirements.

Suggested implementation:

```python
    Args:
        q_values: Q values
        reflectivity_curve: Noise-free reflectivity
        counting_time: Relative counting time (affects noise level)
        min_relative_error: Minimum relative error (default 0.01)
        base_relative_error: Base relative error (default 0.05)

    Returns:
        Tuple of (noisy_reflectivity, errors)
    """
    # Simple noise model: relative error inversely proportional to sqrt(R * counting_time)

    # Calculate relative errors - more realistic model
    relative_errors = np.maximum(
        min_relative_error,
        base_relative_error
        / np.sqrt(np.maximum(reflectivity_curve * counting_time, 1e-10)),
    )

```

```python

```

You will also need to update the function definition to include `min_relative_error=0.01` and `base_relative_error=0.05` as keyword arguments. For example:
```python
def your_function_name(q_values, reflectivity_curve, counting_time, min_relative_error=0.01, base_relative_error=0.05):
```
And update any calls to this function elsewhere in your codebase if you want to specify these parameters.
</issue_to_address>

### Comment 4
<location> `analyzer_tools/planner/instrument.py:34-35` </location>
<code_context>
+    )
+
+    # Add Q-dependent component (higher Q = higher relative error)
+    q_factor = 1 + 0.5 * (q_values / np.max(q_values)) ** 2
+    relative_errors *= q_factor
+
+    # Absolute errors
</code_context>

<issue_to_address>
**issue (bug_risk):** Division by zero risk if q_values contains only zeros.

Add a condition to handle cases where np.max(q_values) is zero to prevent division by zero errors.
</issue_to_address>

### Comment 5
<location> `analyzer_tools/utils/model_utils.py:296-302` </location>
<code_context>
+    -------
+        Experiment
+    """
+    model_path = Path(model_file).absolute()
+    model_file = model_path.stem
+
+    # Add project root to path to allow importing from 'models'
+    model_dir = str(model_path.parent.resolve())
+    if model_dir not in sys.path:
+        sys.path.insert(0, model_dir)
+
</code_context>

<issue_to_address>
**suggestion:** Modifying sys.path at runtime can have unintended side effects.

This change may impact module imports elsewhere, particularly in persistent or multi-threaded contexts. Prefer importlib or context management to limit scope and avoid global effects.

```suggestion
    import importlib.util

    model_path = Path(model_file).absolute()
    model_file = model_path.stem

    # Dynamically import the model module from the given file path
    spec = importlib.util.spec_from_file_location(model_file, str(model_path))
    model_module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(model_module)
```
</issue_to_address>

### Comment 6
<location> `test_planner.py:23` </location>
<code_context>
+
+
+# Create base experiment
+# TODO: write a test for this.
+experiment = expt_from_model_file(model_name, q_values, dq_values)
+
</code_context>

<issue_to_address>
**issue (testing):** Missing actual test assertions and structure.

Please refactor this file to use a test framework and add assertions to validate expected outputs and error conditions.
</issue_to_address>

### Comment 7
<location> `test_planner.py:40-42` </location>
<code_context>
+# TODO: write a test for this.
+print(designer._model_parameters_to_dict())
+
+designer.set_parameter_to_optimize("THF rho", 5.0)
+
+results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
</code_context>

<issue_to_address>
**suggestion (testing):** No test for invalid parameter names in set_parameter_to_optimize.

Add a test that sets a non-existent parameter name and checks for the expected exception to verify error handling in ExperimentDesigner.

```suggestion
designer.set_parameter_to_optimize("THF rho", 5.0)

# Test: setting a non-existent parameter should raise an exception
import pytest

with pytest.raises((ValueError, KeyError)):
    designer.set_parameter_to_optimize("nonexistent_param", 1.0)

results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
```
</issue_to_address>

### Comment 8
<location> `test_planner.py:42-44` </location>
<code_context>
+
+designer.set_parameter_to_optimize("THF rho", 5.0)
+
+results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
+
+# Print results
</code_context>

<issue_to_address>
**suggestion (testing):** No test for edge cases in optimize (e.g., empty param_values, invalid param_to_optimize).

Add tests for cases like empty param_values, invalid param_to_optimize, and realizations=0 to verify proper error handling and output.

```suggestion
results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)

# Edge case tests for optimize
import pytest

# Test: empty param_values
try:
    designer.optimize(param_to_optimize="THF rho", param_values=[], realizations=5)
    print("FAILED: optimize did not raise error for empty param_values")
except Exception as e:
    print(f"PASSED: optimize raised error for empty param_values: {e}")

# Test: invalid param_to_optimize
try:
    designer.optimize(param_to_optimize="INVALID_PARAM", param_values=[1.0, 2.0], realizations=5)
    print("FAILED: optimize did not raise error for invalid param_to_optimize")
except Exception as e:
    print(f"PASSED: optimize raised error for invalid param_to_optimize: {e}")

# Test: realizations=0
try:
    designer.optimize(param_to_optimize="THF rho", param_values=[1.0, 2.0], realizations=0)
    print("FAILED: optimize did not raise error for realizations=0")
except Exception as e:
    print(f"PASSED: optimize raised error for realizations=0: {e}")

# Print results
```
</issue_to_address>

### Comment 9
<location> `test_planner.py:24` </location>
<code_context>
+
+# Create base experiment
+# TODO: write a test for this.
+experiment = expt_from_model_file(model_name, q_values, dq_values)
+
+designer = ExperimentDesigner(experiment)
</code_context>

<issue_to_address>
**suggestion (testing):** No test for error handling in expt_from_model_file.

Add a test for error cases in expt_from_model_file, including non-existent model files and invalid q/dq arrays, to verify error handling.

Suggested implementation:

```python
# Create base experiment
# TODO: write a test for this.
experiment = expt_from_model_file(model_name, q_values, dq_values)

designer = ExperimentDesigner(experiment)

import pytest

def test_expt_from_model_file_errors():
    # Test non-existent model file
    with pytest.raises(FileNotFoundError):
        expt_from_model_file("models/non_existent_model", q_values, dq_values)

    # Test invalid q_values (empty array)
    with pytest.raises(ValueError):
        expt_from_model_file(model_name, np.array([]), dq_values)

    # Test invalid dq_values (wrong shape)
    with pytest.raises(ValueError):
        expt_from_model_file(model_name, q_values, np.array([1, 2, 3]))

    # Test q_values and dq_values of different lengths
    with pytest.raises(ValueError):
        expt_from_model_file(model_name, np.array([0.1, 0.2]), np.array([0.1]))

```

Make sure that `expt_from_model_file` raises appropriate exceptions (`FileNotFoundError`, `ValueError`) for these error cases. If it does not, you will need to update its implementation accordingly.
</issue_to_address>

### Comment 10
<location> `analyzer_tools/planner/experiment_design.py:145-152` </location>
<code_context>
    def get_parameters(self):
        """
        Get the variable parameters and their prior.
        """
        # Extract parameters from the problem
        parameters: Dict[str, SampleParameter] = {}

        for param in self.problem.parameters:
            is_of_interest = (
                self.parameters_of_interest is None
                or param.name in self.parameters_of_interest
            )
            p = SampleParameter(
                name=param.name,
                value=param.value,
                bounds=param.bounds,
                is_of_interest=is_of_interest,
            )
            parameters[param.name] = p

        # Warn if parameters of interest not found
        if self.parameters_of_interest:
            missing_params = [
                param
                for param in self.parameters_of_interest
                if param not in self.base_params
            ]
            if len(missing_params) > 0:
                logger.warning(f"Parameters {missing_params} not found in model")
                logger.info(f"Available parameters: {list(self.base_params.keys())}")
        return parameters

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Simplify sequence length comparison ([`simplify-len-comparison`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/simplify-len-comparison/))
</issue_to_address>

### Comment 11
<location> `analyzer_tools/utils/model_utils.py:308-309` </location>
<code_context>
def expt_from_model_file(
    model_file: str,
    q: np.ndarray,
    dq: np.ndarray,
    reflectivity: np.ndarray = None,
    errors: np.ndarray = None,
) -> Experiment:
    """
    Load an Experiment from a model file.

    Parameters
    ----------
    model_file : str
        Python file containing a create_fit_experiment(q, dq, data, errors) function
    q : np.ndarray
        Q values
    dq : np.ndarray
        Q resolution values (1 sigma)
    reflectivity : np.ndarray
        Reflectivity values
    errors : np.ndarray
        Reflectivity error values

    Returns
    -------
        Experiment
    """
    model_path = Path(model_file).absolute()
    model_file = model_path.stem

    # Add project root to path to allow importing from 'models'
    model_dir = str(model_path.parent.resolve())
    if model_dir not in sys.path:
        sys.path.insert(0, model_dir)


    model_module = importlib.import_module(model_file)
    create_experiment = model_module.create_fit_experiment

    experiment = create_experiment(q, dq, reflectivity, errors)
    return experiment

</code_context>

<issue_to_address>
**suggestion (code-quality):** Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))

```suggestion
    return create_experiment(q, dq, reflectivity, errors)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Sep 28, 2025

Codecov Report

❌ Patch coverage is 77.79923% with 115 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.96%. Comparing base (b79aec6) to head (5df15d1).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
analyzer_tools/planner/report.py 57.42% 43 Missing ⚠️
analyzer_tools/planner/experiment_design.py 70.67% 39 Missing ⚠️
analyzer_tools/planner/optimizer.py 62.96% 30 Missing ⚠️
analyzer_tools/planner/cli.py 99.17% 1 Missing ⚠️
analyzer_tools/utils/model_utils.py 96.77% 1 Missing ⚠️
analyzer_tools/utils/summary_plots.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #2      +/-   ##
==========================================
+ Coverage   47.71%   55.96%   +8.24%     
==========================================
  Files          12       17       +5     
  Lines        1203     1703     +500     
==========================================
+ Hits          574      953     +379     
- Misses        629      750     +121     
Flag Coverage Δ
unittests 55.96% <77.79%> (+8.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI and others added 15 commits September 28, 2025 23:43
…error handling, importlib usage

Co-authored-by: mdoucet <1108748+mdoucet@users.noreply.github.com>
…t structure with comprehensive assertions and edge case testing

Co-authored-by: mdoucet <1108748+mdoucet@users.noreply.github.com>
…f-c821d45457a6

Address PR review comments: improve configurability, error handling, and testing
@mdoucet mdoucet merged commit c59b416 into main Oct 10, 2025
3 checks passed
@mdoucet mdoucet deleted the planner branch October 10, 2025 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants