Skip to content

[BREAKING CHANGE] API restructure, adds lazy loading#418

Open
billbrod wants to merge 52 commits intomainfrom
api_restructure
Open

[BREAKING CHANGE] API restructure, adds lazy loading#418
billbrod wants to merge 52 commits intomainfrom
api_restructure

Conversation

@billbrod
Copy link
Member

@billbrod billbrod commented Mar 2, 2026

Describe the change in this PR at a high-level

This PR makes two major changes requested by the pyOpenSci review: simplifies the API and adds lazy loading, following SPEC0001.

The API restructure attempts to follow the lessons outlined in these two blog posts, especially: flat is better than nested, avoid a tools/utils namespace, and file structure is an implementation detail.

The new API structure largely follows the proposal in #246 and the structure implied by the new API documentation page added in #386 and updated in #413. To summarize:

  • All synthesis objects and five useful helper functions now live under the top-level module.
  • All models live under plenoptic.models
  • All metrics live under plenoptic.metric
  • All model components (functions/objects that can be used to make models, but are not themselves compatible with our synthesis methods) live under plenoptic.model_components.
  • All metric components (functions/objects that can be used to make metrics, but are not themselves compatible with our synthesis methods) live under plenoptic.metric, as before.
  • All plotting functions, including those that operate on both synthesis objects and tensors directly, live under plenoptic.plot
  • All example images and the function that can grab additional examples live under plenoptic.data
  • Validation functions (largely used internally, but useful for users to check their own models before trying to use the synthesis objects) live under plenoptic.validate
  • Optimization-related functions (which can serve as loss or regularization functions) live under plenoptic.optim.
  • Two additional top-level modules (plenoptic.io and plenoptic.external) which each include single function, though it is possible additional ones will be added. Neither are particularly important for most users (the one under external is used to generate a figure in the docs, and the one under io is largely meant to help when debugging loading issues)

Additionally:

  • All redundancies have been removed: by setting __all__ and __dir__ correctly, there's now only one way to call every object.
  • OLD API HAS BEEN (ALMOST) COMPLETELY BROKEN. The only objects that are consistent are those found in plenoptic.metric
  • Inspired by numpy's 2.0 migration, we try to raise informative AttributeError at the top level, providing more info about where the object they tried to call (probably) lives and pointing them to the migration guide. This happens if a user calls plenoptic.synth, plenoptic.synthesize, plenoptic.simul, plenoptic.simulate, plenoptic.tools, plenoptic.imshow, plenoptic.animshow, or plenoptic.pyrshow.

The new migration guide in the docs has a table showing all the old and new ways of calling objects.

This PR will not merged until after #411 and a version 1.4.0 release.

Link any related issues, discussions, PRs

Closes: #246, #128, #101, #97. We've been wanting to do this for a long time :)

Outstanding questions / particular feedback

Because this PR changes how all plenoptic objects are called, it touches just about everything in the code base. For reviewing purposes, I think the best pages to look at are the API docs and migration guide. And maybe the quickstart to see a brief example of what it looks like in practice.

  • How should the docs site alert users to this? Right now, I have a warning admonition on the main docs page, which will get removed in later versions.
    • The other thing I was considering was an announcement banner, but that seems better for "time-limited" vs. "version-limited" info
    • The release notes for version 2.0 will include much of the info/description found at the top of this PR.
    • Anything else?
  • With @NickleDave on slack, I had discussed potentially doing a release with warnings to alert people to this coming. I think that would involve adding a FutureWarning to every object that moves, telling people its new location and then doing a new patch release (before merging this PR). I don't love that solution, since I'll pretty quickly be doing this release afterwards. I guess the responsible way to do this would be to do that patch release and then wait several months before making this change? I am inclined to just rip off the bandage and get this change over with (though I recognize this makes my life easier, rather than users'). Thoughts?
  • Is the migration guide clear enough? Is there any thing else I should add / change there?
  • Because the old ways of calling objects could be pretty lengthy, formatting the table was a bit difficult. I ended up inserting a newline on a dot that's as close to 45 characters as possible (but not further). Does this look okay? Is there a better way to do it?
    • Any other feedback on this table? The goal is to allow users to search up the objects they call and quickly see how they should now be called.
  • Right now, the helper functions that live at the top-level are scattered across the API page, near other thematically relevant functions (e.g., set_seed is near the optimization-related functions). Is this reasonable or should I have a separate "Commonly used helper functions" section in the API docs?
  • Does my selection of top-level helper functions seem reasonable?
  • Should I set the __module__ attribute for my objects? Right now, if one looks at the string representation of an object, it reflects the file structure rather than the API structure (e.g., plenoptic._synthesize.metamer.Metamer vs. plenoptic.Metamer). Setting __module__ would fix this, but I don't know common a practice this is. (I believe pynapple does this, but e.g., requests does not).
    • The additional wrinkle is that my synthesis object loading functionality checks the name of the object, including __module__, to ensure that users are not using the wrong object during load. This PR thus includes a work-around for this change, but plenoptic.Metamer will presumably be more stable than plenoptic._synthesize.metamer.Metamer.
    • So then, does it makes sense to only set __module__ for the synthesis objects?

Module names:

  • Metric is the only module where backwards-compatibility has been preserved. By moving the synthesize/ directory to _synthesize/, I broke a possible backwards compatibility: if users had first called e.g., plenoptic.Metamer, they would then be able to subsequently call plenoptic.synthesize.metamer.Metamer, though they would not be able to call it directly after importing plenoptic (which was a possible way to call Metamer before this PR, even if it was not the recommended or preferred way.) I did that because I thought this could be a really tricky thing for folks to debug, if plenoptic.synthesize.metamer.Metamer worked in some places but not others. Is that reasonable?
  • The name model_components is awfully long, and makes the sidebar a bit difficult to parse. Is there a shorter version that has the same meaning? I could go with components, but I want it to be clear that it's components for models, not synthesis objects.
    • Also that module contains a lot of objects, but I think that's fine.
  • Right now, my metric components live under metric, which is inconsistent. I should probably move these to model_components, right? (They are currently used to construct metrics only, but could in principle be used in a model.)
  • data is a bit of a vague module name. It's not images because eventually I will add example some video and audio there. Should it have example in the name? Would that make sense for the fetch_data function?
  • Should I rename optim to optimization? That would get rid of the last abbreviation in my module names, but some of the optimization function names are quite long, e.g., portilla_simoncelli_loss_factory.

The plotting functions have had the most serious changes:

  • I made imshow, animshow, and pyrshow slightly more complicated to call (they now all live under plenoptic.plot rather than the top level.) I think that's okay (the consistency of having everything under plot is worth it), but just wanted to check in.
  • All of the synthesis object plotting functions have been moved from the synthesis object files (e.g., from plenoptic/synthesize/metamer.py to plenoptic/plot/metamer.py), have had their names change, and some have had acceptable arguments change (these are the only ones, every other object has simply moved). The synthesis object plotting functions are all named object_plotType (e.g., metamer_image) whereas the tensor plotting functions are all just named plotType (e.g., imshow).
    • Each synthesis object has an associated _image plotting function. Should they all be named imshow to be consistent with the imshow function?
    • Similarly, some have an _animate function, should they be named _animshow instead?
  • Metamer has a plotting function called metamer_representation_error, which uses plot.plot_representation. plot_representation plots the model output, while metamer_representation_error plots the difference in model outputs. Is this naming convention reasonably clear? (The metamer_representation_error docstring points out the relationship between the two functions)
  • plot.plot_representation is obviously redundant, but I did it because if a model has a plot_representation method, it will use that instead of our custom code (see the PortillaSimoncelli object for an example). For the plotting function plot.representation is clear enough, but for the model method model.plot_representation is pretty clear, whereas model.representation sounds like an attribute that returns a tensor rather than a plotting function. So is the plot.plot_representation redundancy to preserve naming symmetry okay, or should I give up on the symmetry to reduce redundancy? Or is there another solution I'm not seeing?
    • plot.update_plot has the same issue.
  • Is plot.clean_stem_plot a reasonable name? or should I call it plot.clean_stem? (In matplotlib, the function is just called stem)
  • While doing this, I realized that there's some duplication across synthesis object functions that I should remove. That will be a separate PR, with thoughts summarized in Refactor plotting funtions #417. If you have feedback related to that process, put it in that issue.

Lazy loading:

  • I added lazy loading at every level of the hierarchy, not just the top, partly because I want to allow users to e.g., use only the synthesis objects and not the models, or vice versa. However, I believe the only import that takes a while is that of torch, which is imported in almost every file. There's no reason not to do this right?

Describe changes

  • Restructures API, see descriptions above.
    • This did involve moving almost every file around, largely because of removing the tools namespace, splitting the simulate one, and hiding the synthesize one.
  • Updates all references to new API, standardizes them. This was all done using the migration script described on the new migration page in the docs.
  • Removes all from plenoptic... import ... statements from docs and tests (except for autodiff), since that's not the way we want people to interact with the library.
  • Adds lazy loading at every level using pyi stub files to allow for type-completion, etc.
  • Adds a new test (on CPU) that runs EAGER IMPORT python -c "import plenoptic" to ensure that all the imports are properly structured (as recommended in SPEC-0001)
  • In migration guide, use a DataTable to summarize the changes, inspired by scikit-learn's api index page
  • API docs page:
    • Updated all generated pages to match the public API.
    • Small fixes to torch_module.rst.jinja template: reduce whitespace, bugfix to iterate through proper attributes variable.
    • Combine "Display" and "Synthesis_helper" sections into one "Display" section, with different subsections depending on whether the function accepts a synthesis object or tensor
    • Separate out remove_grad from other validation functions
  • Renames make_disk function to disk to match the other synthetic image functions (polar_radius, polar_angle)
  • Adds plenoptic/_api_change.py which contains dictionaries mapping between old and new API, used by the scripts in the migration guide. Users should not interact with this directly.
  • Small update to Synthesis.load to remap between old and new API for synthesis objects.
  • Moves example images out of data/__init__.py into data/images.py

Checklist

Affirm that you have done the following:

  • I have described the changes in this PR, following the template above.
  • I have added any necessary tests.
  • I have added any necessary documentation. This includes docstrings, updates to existing files found in docs/, or (for large changes) adding new files to the docs/ folder.
  • If a public new class or function was added: I have double-checked that it is present in the API docs, adding it to one of the rst files in docs/api/ or adding a new file as necessary.

because I think that's clearer
had an extra trailing underscore for __all__ which meant that wasn't
correct and then ruff was confused so it was deleting all the import
statements as unused imports
to match new name of the function
to cleanly break old code. otherwise, `plenoptic.synthesize.Metamer` can
be called after `plenoptic.Metamer` (but not before), and I cannot
intercept it, making it likely a difficult problem for people to debug
to make it clearer that remove_grad is different from others
forgot to add autosummary directive
- torch_module.rst.jinja: remove unnecessary whitespace in attributes
  list in (which results in docutils complaining)

- torch_module.rst.jinja: iterate through attributes, not
  doc_attributes (that was a holdover from previous version of file)

- move if docobj_module.startswith check outside of the if obj_type
  statement to reduce duplication

- change if docobj_module.startswith to use "plenoptic.model",
  reflecting the API restructure
@billbrod billbrod linked an issue Mar 2, 2026 that may be closed by this pull request
@billbrod billbrod requested a review from sjvenditto March 2, 2026 19:27
@billbrod
Copy link
Member Author

billbrod commented Mar 2, 2026

Documentation built by flatiron-jenkins at http://docs.plenoptic.org/docs//pulls/418

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 84.87124% with 141 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/plenoptic/data/synthetic_images.py 58.13% 36 Missing ⚠️
src/plenoptic/plot/mad_competition.py 88.41% 27 Missing ⚠️
src/plenoptic/plot/metamer.py 88.36% 27 Missing ⚠️
src/plenoptic/_synthesize/mad_competition.py 94.79% 9 Missing ⚠️
src/plenoptic/_api_change.py 0.00% 5 Missing ⚠️
src/plenoptic/__init__.py 69.23% 4 Missing ⚠️
src/plenoptic/external.py 0.00% 4 Missing ⚠️
src/plenoptic/metric/classes.py 0.00% 3 Missing ⚠️
src/plenoptic/models/frontend.py 60.00% 2 Missing ⚠️
src/plenoptic/_synthesize/autodiff.py 66.66% 1 Missing ⚠️
... and 23 more
Files with missing lines Coverage Δ
src/plenoptic/_synthesize/__init__.py 100.00% <100.00%> (ø)
src/plenoptic/data/__init__.py 100.00% <100.00%> (ø)
src/plenoptic/data/fetch.py 96.00% <ø> (ø)
src/plenoptic/metric/__init__.py 100.00% <100.00%> (ø)
src/plenoptic/model_components/__init__.py 100.00% <100.00%> (ø)
src/plenoptic/models/__init__.py 100.00% <100.00%> (ø)
src/plenoptic/plot/__init__.py 100.00% <100.00%> (ø)
src/plenoptic/_synthesize/autodiff.py 90.69% <66.66%> (ø)
src/plenoptic/_synthesize/eigendistortion.py 98.78% <75.00%> (ø)
src/plenoptic/_synthesize/metamer.py 97.24% <85.71%> (ø)
... and 30 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@BalzaniEdoardo
Copy link
Contributor

Outstanding questions / particular feedback

Because this PR changes how all plenoptic objects are called, it touches just about everything in the code base. For reviewing purposes, I think the best pages to look at are the API docs and migration guide. And maybe the quickstart to see a brief example of what it looks like in practice.

* How should the docs site alert users to this? Right now, I have a warning admonition on the main docs page, which will get removed in later versions.
  
  * The other thing I was considering was an [announcement banner](https://pydata-sphinx-theme.readthedocs.io/en/stable/user_guide/announcements.html), but that seems better for "time-limited" vs. "version-limited" info
  * The release notes for version 2.0 will include much of the info/description found at the top of this PR.
  * Anything else?

I think the warning is the best you can do at the docs level. You can add a discussion in GitHub and mention to the migration linking at your guide in the project level README.md.

* With @NickleDave on slack, I had discussed potentially doing a release with warnings to alert people to this coming. I think that would involve adding a `FutureWarning` to every object that moves, telling people its new location and then doing a new patch release (before merging this PR). I don't love that solution, since I'll pretty quickly be doing this release afterwards. I guess the responsible way to do this would be to do that patch release and then wait several months before making this change? I am inclined to just rip off the bandage and get this change over with (though I recognize this makes my life easier, rather than users'). Thoughts?

I honestly think it is not worth the effort, given that you'll need to delay the release.

* Is the migration guide clear enough? Is there any thing else I should add / change there?

Yes, but the migration script is a bit bare-bone. See my inline comment on the migration script for details on improving the user experience of the migration tooling.

* Because the old ways of calling objects could be pretty lengthy, formatting the table was a bit difficult. I ended up inserting a newline on a dot that's as close to 45 characters as possible (but not further). Does this look okay? Is there a better way to do it?

I think the newline looks fine.

  * Any other feedback on this table? The goal is to allow users to search up the objects they call and quickly see how they should now be called.

Swap the columns — the new path should come first. Even for users coming from 1.x, objects are recognizable by name, and the flatter new path is simply easier to parse at a quick glance.

* Right now, the helper functions that live at the top-level are scattered across the API page, near other thematically relevant functions (e.g., `set_seed` is near the optimization-related functions). Is this reasonable or should I have a separate "Commonly used helper functions" section in the API docs?

* Does my selection of top-level helper functions seem reasonable?

I would have a separate level for the top-level function. For example, I would have assumed set_seed would be under optim but it is not.

* Should I set the `__module__` attribute for my objects? Right now, if one looks at the string representation of an object, it reflects the file structure rather than the API structure (e.g., `plenoptic._synthesize.metamer.Metamer` vs. `plenoptic.Metamer`). Setting `__module__` would fix this, but I don't know common a practice this is. (I believe [pynapple](https://pynapple.org/) does this, but e.g., `requests` does not).
  
  * The additional wrinkle is that my synthesis object loading functionality checks the name of the object, including `__module__`, to ensure that users are not using the wrong object during load. This PR thus includes a work-around for this change, but `plenoptic.Metamer` will presumably be more stable than `plenoptic._synthesize.metamer.Metamer`.
  * So then, does it makes sense to only set `__module__` for the synthesis objects?

I think you should, I find it very confusing when one does not. I think many packages do not show the actual path,

In [4]: from numpy._core import float64

In [5]: float64
Out[5]: numpy.float64

Copy link
Contributor

@BalzaniEdoardo BalzaniEdoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the package organization, I think it makes sense and the guide is clear but you can do more to help the transition. I left comments on that.

The other major point I had is that scattering first level functions is confusing. I think they should be grouped.

To use, copy the following code block into a python script called `plenoptic_rename_api.py` and run from within a virtual environment with `plenoptic>=2.0.0`, passing it whatever files you would like to change. For example: `python plenoptic_rename_api.py my_plenoptic_code.py` or `python plenoptic_rename_api.py my_project/*.py`.

```
from plenoptic import _api_change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the script is too bare-bone and not super user friendly (especially the copy paste aspect of it and the instruction leaving in the docs but no comments/docstrings in the actual scripts). It needs at least a script level docstrings with all the relevant info and examples. See below.

Suggested change
from plenoptic import _api_change
"""Migrate Python source files from the plenoptic 1.x API to plenoptic 2.0.
Rewrites all occurrences of old API names to their new equivalents, in-place,
for each file passed as a command-line argument. After rewriting, reports any
deprecated usages that could not be automatically resolved and must be updated
manually.
Usage
-----
python migrate_api.py file1.py file2.py ...
Module aliases
--------------
The script handles the standard module aliases used in plenoptic's tutorials
and examples::
import plenoptic as po
import plenoptic.synthesize as synth
import plenoptic.simulate as simul
Non-standard aliases (e.g. ``import plenoptic as plen``) are not handled and
must be updated manually.
Exit behaviour
--------------
The script always rewrites files in-place. If deprecated usages are found
after rewriting, their names and the files containing them are printed to
stdout.
"""
from plenoptic import _api_change

A better solution that would be super nice for the users is to have a small repo with a cli tool that can live under the organization plenoptic.org. The readme will provide the info, and the cli could have an help which documents parameters and usage.

uvx plenoptic-migrate my_script.py

The backup could be part of the options, or a required path to be extra safe.

plenoptic-migrate src/ --backup-dir .migration_backup

Copy link
Contributor

@sjvenditto sjvenditto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on the Migration Guide:

  • Is it better to have the table at the top before "Plotting function changes"? This prioritizes visibility of the summary of changes.
  • I disagree with Edoardo and I think the column order should remain the same. I find it more natural for things to be ordered before --> after, and the goal is not to quickly search through available functions (which is what the API is for) but to quickly identify old vs. new ways of calling things.
  • Right now the long names that have been broken into multiple lines are not searchable by the entire path, as it puts a space at the line break

Comments on the API:

  • I agree with Edoardo in having a separate section for top-level functions. I find it a little counterintuitive to have things lumped together when they're not being called in the same module. This applies as well to the "model and metric components": if you're going to group these metric functions with the model_components functions, then they should be in the same module.
  • Onto the naming of model_components: if you do group the metric components functions into this module, then the name will make even less sense. I think just components would be fine, since you have a disclaimer that they won't work with synthesis objects. However, is components a sufficient name to encompass the objects/methods in this module? Maybe it's more common in vision, but a lot of these functions I wouldn't consider "components" per se. E.g. while the stats methods autocorrelation, variance, skew, and kurtosis could be considered "components" or "attributes" of the input, it's not the first module I'd think of when looking for these functions. Would using a utils module be too generic?

Plot comments:

  • I think having consistency in the plot naming is a good idea, e.g. rename _image to _imshow and _animate to _animshow
  • I don't think it's too redundant to have plot in the function name if it's referring to a type of plot. matplotlib does it with plot_date, and has it in other plot types like eventplot (although I think representationplot is too long for a single word)
  • for clean_stem_plot, I would at least rename it to stem_plot, including the word "clean" makes me think that it clears the plot

Other comments:

  • I agree with Edoardo, I think using FutureWarning may be more trouble than it's worth
  • I also agree that you should set __module__. E.g. pandas does it for all their objects using a decorator
  • I think the data module name is fine. scipy uses optimize over optimization, but of course their module includes optimization functions, so I think the use case is different. I think having optim is ok for an abbreviation, but I don't feel strongly about this. (also, if you had a utils module, it wouldn't be the only abbreviation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve API Revisit project file tree structure

3 participants