Skip to content

Separation of data preparation and plotting for criterion_plot() #600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

r3kste
Copy link
Contributor

@r3kste r3kste commented May 26, 2025

Summary of changes

Separated the logic for data preparation and plotting in criterion_plot(). This is done to easily enable the addition of plotting backends.

  1. Dataclass LineData stores required data for plotting a single line.
  2. Dataclass CriterionPlotData stores all backend agnostic data needed for criterion_plot() [lines and multistart_lines if applicable]
  3. Dataclass PlotConfig stores global settings of plots

Additionally, we could add a new parameter such as return_data - which would return the plot_data instead of plotting. This can be discussed, if it is worth looking into.

Copy link

codecov bot commented May 27, 2025

Codecov Report

Attention: Patch coverage is 91.66667% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/optimagic/visualization/history_plots.py 91.50% 9 Missing ⚠️
Files with missing lines Coverage Δ
src/optimagic/optimization/optimize_result.py 94.16% <100.00%> (+0.04%) ⬆️
src/optimagic/visualization/history_plots.py 92.82% <91.50%> (+1.97%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@r3kste r3kste force-pushed the separate_data_from_plot branch from 7fb293a to 839ca0c Compare May 27, 2025 15:45
Copy link
Member

@timmens timmens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice already. I have a few remarks.

Let me know if you have any questions; thank you!!

@r3kste
Copy link
Contributor Author

r3kste commented Jun 3, 2025

I have made the suggested changes and I believe that the failing test is unrelated.

@r3kste r3kste requested a review from timmens June 3, 2025 13:47
@r3kste
Copy link
Contributor Author

r3kste commented Jun 16, 2025

mypy is failing due to addition of type hints in criterion_plot(). I am not entirely sure how to tackle this.

Copy link
Member

@timmens timmens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Regarding the type-check errors, I had a look locally.

  1. You can ignore (on a per-line basis) the mypy errors that are introduced by passing "invalid" data to the History class
  2. There seem to be a few bugs that are caught by mypy for which we do not have tests. For example this line is flagged fun = res.multistart_info.exploration_results.tolist()[::-1] + stacked.fun by mypy, saying the [...].exploration_results has no method tolist(). This seems to be correct. For these cases, can you check whether it is possible to write a simple test case that would be triggered in the current version of the code, and then fix it?
  3. As mentioned elsewhere, for the problem where mypy thinks multistart_info is None, you can do if res.multistart_info directly, which should work. We will have to see if this is as readable as before.

I have a few more comments on _OptimizeData:

  1. I would like to see it defined before it is used in the return type hint of _retrieve_optimization_data. This also applies for the other dataclasses and functions.
  2. Is there no way to get the start_params in the case of _retrieve_optimization_data_from_results_object?
  3. Although I proposed it, I am not entirely happy with the class name _OptimizeData. It is too closely related to OptimizeResult. Can you propose some alternatives that make clear that it is simply a data container with intermediate results containing the histories?
  4. Given that it is a data container it should be a frozen. In the current setup this is problematic with the name field. You could just pass the name argument to the retrieval functions. Alternatively, instead of returning a list of optimize data, you can also just return a dict[str, _OptimizeData].


Returns:
plotly.graph_objs._figure.Figure: The figure.

"""
# ==================================================================================
# Process inputs
# ==================================================================================

results = _harmonize_inputs_to_dict(results, names)

if not isinstance(palette, list):
palette = [palette]
palette = itertools.cycle(palette)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
palette = itertools.cycle(palette)
palette_cycle = itertools.cycle(palette)

def _extract_criterion_plot_data(
data: list["_OptimizeData"],
max_evaluations: int | None,
palette: Iterator[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
palette: Iterator[str],
palette: itertools.cycle[str],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a TypeError: 'type' object is not subscriptable. The problem is that we can't subscript [str] for itertools.cycle. So, I think that we could:

  1. Use as palette_cycle: itertools.cycle. However, this doesn't convey that it is of type str.
  2. Leave as palette_cycle: Iterator[str]

Personally I believe using Iterator[str] would be better.

@r3kste r3kste force-pushed the separate_data_from_plot branch from cb9f65b to baeae7a Compare June 18, 2025 10:15
@r3kste
Copy link
Contributor Author

r3kste commented Jun 18, 2025

Thanks for the review. I have pushed the suggested changes and mypy seems to be passing.

3. Although I proposed it, I am not entirely happy with the class name _OptimizeData. It is too closely related to OptimizeResult. Can you propose some alternatives that make clear that it is simply a data container with intermediate results containing the histories?

How about _CriterionOptimizeData or _CriterionHistory ?

There seem to be a few bugs that are caught by mypy for which we do not have tests. For example this line is flagged fun = res.multistart_info.exploration_results.tolist()[::-1] + stacked.fun by mypy, saying the [...].exploration_results has no method tolist(). This seems to be correct. For these cases, can you check whether it is possible to write a simple test case that would be triggered in the current version of the code, and then fix it?

def run_explorations(
internal_problem: InternalOptimizationProblem,
sample: NDArray[np.float64],
n_cores: int,
step_id: int,
) -> dict[str, NDArray[np.float64]]:

According to the type hints of run_explorations(), it looks like res.multistart_info.expoloration_results is actually of type NDArray[np.float64] and not list[float]. The mypy error is caused because of this and can be fixed by correcting the type hint of exploration_results.

I believe this is not a bug and the usage of tolist() seems to be correct, as I couldn't find a case where exploration_results is a list.

diff --git a/src/optimagic/optimization/optimize_result.py b/src/optimagic/optimization/optimize_result.py
index f2895cf..65860b2 100644
@@ -218,7 +219,7 @@ class MultistartInfo:
     start_parameters: list[PyTree]
     local_optima: list[OptimizeResult]
     exploration_sample: list[PyTree]
-    exploration_results: list[float]
+    exploration_results: NDArray[np.float64]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants