[BREAKING CHANGE] API restructure, adds lazy loading#418
[BREAKING CHANGE] API restructure, adds lazy loading#418
Conversation
because I think that's clearer
had an extra trailing underscore for __all__ which meant that wasn't correct and then ruff was confused so it was deleting all the import statements as unused imports
to match new name of the function
to cleanly break old code. otherwise, `plenoptic.synthesize.Metamer` can be called after `plenoptic.Metamer` (but not before), and I cannot intercept it, making it likely a difficult problem for people to debug
to make it clearer that remove_grad is different from others
forgot to add autosummary directive
- torch_module.rst.jinja: remove unnecessary whitespace in attributes list in (which results in docutils complaining) - torch_module.rst.jinja: iterate through attributes, not doc_attributes (that was a holdover from previous version of file) - move if docobj_module.startswith check outside of the if obj_type statement to reduce duplication - change if docobj_module.startswith to use "plenoptic.model", reflecting the API restructure
|
Documentation built by flatiron-jenkins at http://docs.plenoptic.org/docs//pulls/418 |
I think the warning is the best you can do at the docs level. You can add a discussion in GitHub and mention to the migration linking at your guide in the project level README.md.
I honestly think it is not worth the effort, given that you'll need to delay the release.
Yes, but the migration script is a bit bare-bone. See my inline comment on the migration script for details on improving the user experience of the migration tooling.
I think the newline looks fine.
Swap the columns — the new path should come first. Even for users coming from 1.x, objects are recognizable by name, and the flatter new path is simply easier to parse at a quick glance.
I would have a separate level for the top-level function. For example, I would have assumed
I think you should, I find it very confusing when one does not. I think many packages do not show the actual path, In [4]: from numpy._core import float64
In [5]: float64
Out[5]: numpy.float64 |
BalzaniEdoardo
left a comment
There was a problem hiding this comment.
I like the package organization, I think it makes sense and the guide is clear but you can do more to help the transition. I left comments on that.
The other major point I had is that scattering first level functions is confusing. I think they should be grouped.
| To use, copy the following code block into a python script called `plenoptic_rename_api.py` and run from within a virtual environment with `plenoptic>=2.0.0`, passing it whatever files you would like to change. For example: `python plenoptic_rename_api.py my_plenoptic_code.py` or `python plenoptic_rename_api.py my_project/*.py`. | ||
|
|
||
| ``` | ||
| from plenoptic import _api_change |
There was a problem hiding this comment.
I think the script is too bare-bone and not super user friendly (especially the copy paste aspect of it and the instruction leaving in the docs but no comments/docstrings in the actual scripts). It needs at least a script level docstrings with all the relevant info and examples. See below.
| from plenoptic import _api_change | |
| """Migrate Python source files from the plenoptic 1.x API to plenoptic 2.0. | |
| Rewrites all occurrences of old API names to their new equivalents, in-place, | |
| for each file passed as a command-line argument. After rewriting, reports any | |
| deprecated usages that could not be automatically resolved and must be updated | |
| manually. | |
| Usage | |
| ----- | |
| python migrate_api.py file1.py file2.py ... | |
| Module aliases | |
| -------------- | |
| The script handles the standard module aliases used in plenoptic's tutorials | |
| and examples:: | |
| import plenoptic as po | |
| import plenoptic.synthesize as synth | |
| import plenoptic.simulate as simul | |
| Non-standard aliases (e.g. ``import plenoptic as plen``) are not handled and | |
| must be updated manually. | |
| Exit behaviour | |
| -------------- | |
| The script always rewrites files in-place. If deprecated usages are found | |
| after rewriting, their names and the files containing them are printed to | |
| stdout. | |
| """ | |
| from plenoptic import _api_change |
A better solution that would be super nice for the users is to have a small repo with a cli tool that can live under the organization plenoptic.org. The readme will provide the info, and the cli could have an help which documents parameters and usage.
uvx plenoptic-migrate my_script.py
The backup could be part of the options, or a required path to be extra safe.
plenoptic-migrate src/ --backup-dir .migration_backup
sjvenditto
left a comment
There was a problem hiding this comment.
Comments on the Migration Guide:
- Is it better to have the table at the top before "Plotting function changes"? This prioritizes visibility of the summary of changes.
- I disagree with Edoardo and I think the column order should remain the same. I find it more natural for things to be ordered before --> after, and the goal is not to quickly search through available functions (which is what the API is for) but to quickly identify old vs. new ways of calling things.
- Right now the long names that have been broken into multiple lines are not searchable by the entire path, as it puts a space at the line break
Comments on the API:
- I agree with Edoardo in having a separate section for top-level functions. I find it a little counterintuitive to have things lumped together when they're not being called in the same module. This applies as well to the "model and metric components": if you're going to group these
metricfunctions with themodel_componentsfunctions, then they should be in the same module. - Onto the naming of
model_components: if you do group the metric components functions into this module, then the name will make even less sense. I think justcomponentswould be fine, since you have a disclaimer that they won't work with synthesis objects. However, iscomponentsa sufficient name to encompass the objects/methods in this module? Maybe it's more common in vision, but a lot of these functions I wouldn't consider "components" per se. E.g. while the stats methodsautocorrelation,variance,skew, andkurtosiscould be considered "components" or "attributes" of the input, it's not the first module I'd think of when looking for these functions. Would using autilsmodule be too generic?
Plot comments:
- I think having consistency in the plot naming is a good idea, e.g. rename
_imageto_imshowand_animateto_animshow - I don't think it's too redundant to have
plotin the function name if it's referring to a type of plot.matplotlibdoes it withplot_date, and has it in other plot types likeeventplot(although I thinkrepresentationplotis too long for a single word) - for
clean_stem_plot, I would at least rename it tostem_plot, including the word "clean" makes me think that it clears the plot
Other comments:
- I agree with Edoardo, I think using
FutureWarningmay be more trouble than it's worth - I also agree that you should set
__module__. E.g.pandasdoes it for all their objects using a decorator - I think the
datamodule name is fine.scipyusesoptimizeoveroptimization, but of course their module includes optimization functions, so I think the use case is different. I think havingoptimis ok for an abbreviation, but I don't feel strongly about this. (also, if you had autilsmodule, it wouldn't be the only abbreviation)
Describe the change in this PR at a high-level
This PR makes two major changes requested by the pyOpenSci review: simplifies the API and adds lazy loading, following SPEC0001.
The API restructure attempts to follow the lessons outlined in these two blog posts, especially: flat is better than nested, avoid a tools/utils namespace, and file structure is an implementation detail.
The new API structure largely follows the proposal in #246 and the structure implied by the new API documentation page added in #386 and updated in #413. To summarize:
plenoptic.modelsplenoptic.metricplenoptic.model_components.plenoptic.metric, as before.plenoptic.plotplenoptic.dataplenoptic.validateplenoptic.optim.penalty_function#411, there will be a newregularizationmodule at this level which includes the regularization functions (currently, justpenalize_range, but others will be added)plenoptic.ioandplenoptic.external) which each include single function, though it is possible additional ones will be added. Neither are particularly important for most users (the one underexternalis used to generate a figure in the docs, and the one underiois largely meant to help when debugging loading issues)Additionally:
__all__and__dir__correctly, there's now only one way to call every object.plenoptic.metricAttributeErrorat the top level, providing more info about where the object they tried to call (probably) lives and pointing them to the migration guide. This happens if a user callsplenoptic.synth,plenoptic.synthesize,plenoptic.simul,plenoptic.simulate,plenoptic.tools,plenoptic.imshow,plenoptic.animshow, orplenoptic.pyrshow.The new migration guide in the docs has a table showing all the old and new ways of calling objects.
This PR will not merged until after #411 and a version 1.4.0 release.
Link any related issues, discussions, PRs
Closes: #246, #128, #101, #97. We've been wanting to do this for a long time :)
Outstanding questions / particular feedback
Because this PR changes how all plenoptic objects are called, it touches just about everything in the code base. For reviewing purposes, I think the best pages to look at are the API docs and migration guide. And maybe the quickstart to see a brief example of what it looks like in practice.
FutureWarningto every object that moves, telling people its new location and then doing a new patch release (before merging this PR). I don't love that solution, since I'll pretty quickly be doing this release afterwards. I guess the responsible way to do this would be to do that patch release and then wait several months before making this change? I am inclined to just rip off the bandage and get this change over with (though I recognize this makes my life easier, rather than users'). Thoughts?set_seedis near the optimization-related functions). Is this reasonable or should I have a separate "Commonly used helper functions" section in the API docs?__module__attribute for my objects? Right now, if one looks at the string representation of an object, it reflects the file structure rather than the API structure (e.g.,plenoptic._synthesize.metamer.Metamervs.plenoptic.Metamer). Setting__module__would fix this, but I don't know common a practice this is. (I believe pynapple does this, but e.g.,requestsdoes not).__module__, to ensure that users are not using the wrong object during load. This PR thus includes a work-around for this change, butplenoptic.Metamerwill presumably be more stable thanplenoptic._synthesize.metamer.Metamer.__module__for the synthesis objects?Module names:
synthesize/directory to_synthesize/, I broke a possible backwards compatibility: if users had first called e.g.,plenoptic.Metamer, they would then be able to subsequently callplenoptic.synthesize.metamer.Metamer, though they would not be able to call it directly after importing plenoptic (which was a possible way to callMetamerbefore this PR, even if it was not the recommended or preferred way.) I did that because I thought this could be a really tricky thing for folks to debug, ifplenoptic.synthesize.metamer.Metamerworked in some places but not others. Is that reasonable?model_componentsis awfully long, and makes the sidebar a bit difficult to parse. Is there a shorter version that has the same meaning? I could go withcomponents, but I want it to be clear that it's components for models, not synthesis objects.metric, which is inconsistent. I should probably move these tomodel_components, right? (They are currently used to construct metrics only, but could in principle be used in a model.)datais a bit of a vague module name. It's notimagesbecause eventually I will add example some video and audio there. Should it haveexamplein the name? Would that make sense for thefetch_datafunction?optimtooptimization? That would get rid of the last abbreviation in my module names, but some of the optimization function names are quite long, e.g.,portilla_simoncelli_loss_factory.The plotting functions have had the most serious changes:
imshow,animshow, andpyrshowslightly more complicated to call (they now all live underplenoptic.plotrather than the top level.) I think that's okay (the consistency of having everything underplotis worth it), but just wanted to check in.plenoptic/synthesize/metamer.pytoplenoptic/plot/metamer.py), have had their names change, and some have had acceptable arguments change (these are the only ones, every other object has simply moved). The synthesis object plotting functions are all namedobject_plotType(e.g.,metamer_image) whereas the tensor plotting functions are all just namedplotType(e.g.,imshow)._imageplotting function. Should they all be namedimshowto be consistent with theimshowfunction?_animatefunction, should they be named_animshowinstead?metamer_representation_error, which usesplot.plot_representation.plot_representationplots the model output, whilemetamer_representation_errorplots the difference in model outputs. Is this naming convention reasonably clear? (Themetamer_representation_errordocstring points out the relationship between the two functions)plot.plot_representationis obviously redundant, but I did it because if a model has aplot_representationmethod, it will use that instead of our custom code (see thePortillaSimoncelliobject for an example). For the plotting functionplot.representationis clear enough, but for the model methodmodel.plot_representationis pretty clear, whereasmodel.representationsounds like an attribute that returns a tensor rather than a plotting function. So is theplot.plot_representationredundancy to preserve naming symmetry okay, or should I give up on the symmetry to reduce redundancy? Or is there another solution I'm not seeing?plot.update_plothas the same issue.plot.clean_stem_plota reasonable name? or should I call itplot.clean_stem? (In matplotlib, the function is just calledstem)Lazy loading:
torch, which is imported in almost every file. There's no reason not to do this right?Describe changes
from plenoptic... import ...statements from docs and tests (except forautodiff), since that's not the way we want people to interact with the library.pyistub files to allow for type-completion, etc.EAGER IMPORT python -c "import plenoptic"to ensure that all the imports are properly structured (as recommended in SPEC-0001)torch_module.rst.jinjatemplate: reduce whitespace, bugfix to iterate through proper attributes variable.remove_gradfrom other validation functionsmake_diskfunction todiskto match the other synthetic image functions (polar_radius,polar_angle)plenoptic/_api_change.pywhich contains dictionaries mapping between old and new API, used by the scripts in the migration guide. Users should not interact with this directly.Synthesis.loadto remap between old and new API for synthesis objects.data/__init__.pyintodata/images.pyChecklist
Affirm that you have done the following:
docs/, or (for large changes) adding new files to thedocs/folder.rstfiles indocs/api/or adding a new file as necessary.