-
Notifications
You must be signed in to change notification settings - Fork 166
feat: introduce (experimental) plugin system #2978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@ym-pett did you know about Marco's PR? Apologies if you've discussed this privately 😅 |
thanks for starting this! 😄 @dangotbanned yeah we'd discussed this My hope is that we can use Daft as the way to develop the plugin system, and it can serve as a reference implementation Here's what we're aiming for: If a user has narwhals-daft installed, then they should be able to run import narwhals as nw
import daft
df_native = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
df = nw.from_native(df_compliant)
result = df.select("a", nw.col("b") * nw.col("a"))
print(result.collect()) This needs to be done in a way that won't be specific to Daft, so that anyone can register their own plugin without Narwhals having any knowledge about it. In this PR it's currently all Daft-specific The packaging docs around entry-points might be useful here:
I'll also cc @camriddell into the conversation, as IIRC he'd also thought about pluggable backends For prior art on plugins and entry-points, I think https://github.com/PyCQA/flake8 might also be good to look at |
thanks both, I'll revert the current changes - I feel like I had to go down the wrong route first to see what this actually consists of! :) I can now go through the materials armed with more background! 🦾 |
3c8b34b
to
11fe33f
Compare
oops, didn't mean to close this! will reopen when I push new content! |
based on flake8 example - will try to get something more sensible in next |
1ac87eb
to
cfd156f
Compare
b004d4e
to
e3a2f3b
Compare
narwhals/translate.py
Outdated
for plugin in discovered_plugins: | ||
obj = plugin.load() | ||
frame = obj.dataframe.DaftLazyFrame | ||
|
||
# from obj.dataframe import DaftLazyFrame | ||
try: | ||
df_compliant = frame(native_object, version=Version.MAIN) | ||
return df_compliant.to_narwhals() | ||
except: | ||
# try the next plugin | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, nice! does this work?
to make it not daft-specific, perhaps we could aim to have something like
try:
df_compliant = obj.from_native(native_object, version=Version.MAIN)
?
This would mean making a top-level function in narwhals-daft too, and then we document that plugin authors are expected to implement this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling I'm doing something weird with how things are imported. Maybe it's that the top-level __init__
file needs altering in daft-narwhals.
with your suggestion, t.py no longer works and I get the error
TypeError: Expected pandas-like dataframe, Polars dataframe, or Polars lazyframe, got: <class 'daft.dataframe.dataframe.DataFrame'>
that dataframe.dataframe looks weird to me... would you expect that structure?
the type of plugin
is <class 'importlib.metadata.EntryPoint'>
, so I figured I had to load the module via that, for obj I then get the type <class 'module'>
the only way I could get access to the DaftLazyFrame (haven't tried simple LazyFrame yet) was by assigning it to a variable name, I couldn't do something like
from obj.dataframe import DaftLazyFrame
(the error then is ModuleNotFoundError: No module named 'obj'
)
I think at the moment this all leaves us too bound to daft, and I bet I'm breaking a million coding conventions, eek!
I suspect I need to do a better job at exposing the modules within daft-nw but I haven't found how yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps in narwhals_daft/__init__.py
you could make a from_native
function, and then here use obj.from_native
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, will try that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should plugin detection come before we check our own written backends? That way if someone really wanted to override our pandas
backend they would be empowered to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yeah maybe
d72da25
to
ca01346
Compare
319b3d6
to
90ad973
Compare
5e3be1f
to
76232df
Compare
495d727
to
ebc1a8f
Compare
hmm about that failing ruff test in pre-commit, I left this comment on the offending files but it seems to have disappeared (at least I can't find it anymore): mypy was complaining about missing library stubs for these ("Skipping analyzing X: module is installed, but missing library stubs or py.typed marker"), it helpfully guided me to the documentation & I've chosen the simplest fix from there (https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports) happy to change for a better one if you think this should be done! |
now mypy is quiet, ruff doesn't like it (didn't get complaints locally though?) what do you think is the best course of action? |
Based on pre-commit run - I'd suggest removing the If those imports require an ignore comment - then I think there's either a config issue or something wrong elsewhere Update 1I removed all the ignores locally and got: Show logs
tests/test_plugin/test_plugin/namespace.py:5: error: Cannot find implementation or library stub for module named "test_plugin.dataframe" [import-not-found]
from test_plugin.dataframe import DictFrame, DictLazyFrame
^
tests/test_plugin/test_plugin/__init__.py:8: error: Cannot find implementation or library stub for module named "test_plugin.dataframe" [import-not-found]
from test_plugin.dataframe import DictFrame
^
tests/test_plugin/test_plugin/__init__.py:8: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
tests/test_plugin/test_plugin/__init__.py:15: error: Cannot find implementation or library stub for module named "test_plugin.namespace" [import-not-found]
from test_plugin.namespace import DictNamespace
^
tests/test_plugin/test_plugin/__init__.py:17: error: Returning Any from function declared to return "DictNamespace" [no-any-return]
return DictNamespace(version=version)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Found 4 errors in 2 files (checked 434 source files) I think the issue is config-related, gonna take a closer look now Update 2Should be fixed in these two (6c65fd4), (b26d7d4) I guess the tricky part is that this test package is defined within a subdirectory of |
@pytest.mark.skipif(sys.version_info < (3, 10), reason="3.10+ required for entrypoints") | ||
def test_plugin() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This skip might be the reason behind the KeyError
here (#2978 (comment))
Either way, we should be able to test < (3, 10)
now - so removing it would be the first step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you still want this removing or is this obsolete?
from typing_extensions import TypeIs | ||
|
||
if TYPE_CHECKING: | ||
from narwhals.utils import Version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing_extensions import TypeIs | |
if TYPE_CHECKING: | |
from narwhals.utils import Version | |
if TYPE_CHECKING: | |
from typing_extensions import TypeIs | |
from narwhals.utils import Version |
@@ -0,0 +1,77 @@ | |||
from __future__ import annotations | |||
|
|||
from typing import TYPE_CHECKING, Any, TypeAlias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli I think ruff
isn't configured correctly 🤔
TypeAlias
wasn't added until 3.10
narwhals/_utils.py
Outdated
@cache | ||
def discover_plugins() -> EntryPoints: | ||
from importlib.metadata import entry_points | ||
import sys | ||
from importlib.metadata import entry_points as eps | ||
|
||
return entry_points(group="narwhals.plugins") | ||
group = "narwhals.plugins" | ||
plugins = eps(group=group) if sys.version_info >= (3, 10) else eps()[group] | ||
return cast("EntryPoints", plugins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big thanks for this!!
I can see it still returns the same type of entrypoints iterable. I'll be honest & say that I don't fully get the return cast("EntryPoints", eps().get(group, ()))
- I'll invoke clever typing-magic to appease the typing gods.
narwhals/translate.py
Outdated
return Version.V1.dataframe(InterchangeFrame(native_object), level="interchange") | ||
|
||
for plugin in discover_plugins(): | ||
obj = plugin.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree re obj
, but module
makes me think of narwhals modules. how about loaded_plugin
?
narwhals/translate.py
Outdated
return Version.V1.dataframe(InterchangeFrame(native_object), level="interchange") | ||
|
||
for plugin in discover_plugins(): | ||
obj = plugin.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the naming here to the suggestion.
@ym-pett it looks like you responded to me a few times over the past couple weeks and it only just came through? 😭 |
😭 😭 😭 I'm a dork, but I'm not the only one (is that a Radiohead song yet?): https://github.com/orgs/community/discussions/10369 So sorry Dan, I'll look out for that trap in future.. |
thanks so much for fixing the type problem @dangotbanned ( #2978 (comment) )!! I'll move to implementing the changes you discussed with Marco, will ping you if I get stuck :) |
return eps(group=group) | ||
|
||
|
||
# @mp: should the protocol be defined in namespace? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned, pushed this as it's working, but still needs some cleaning up (e.g I've left your naming so you can see which solution we've implemented. I'm guessing the Plugin Protocol shouldn't be in here, I'll look back on your PR on this topic I reviewed and have a think about where it should go instead.
btw, are you happy with us using this one, or should we use the one you say is even simpler? (I worked with the one I could understand better, but that shouldn't be a barrier ;))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, re-read your #3086, until it can live in your _native
, should I move it into _namespace
? (forgot _
above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the Plugin Protocol shouldn't be in here, I'll look back on your PR on this topic I reviewed and have a think about where it should go instead.
Ok, re-read your #3086, until it can live in your
_native
, should I move it into_namespace
? (forgot_
above)
Have re-read this & thinking that the plugin Protocol that's currently in draft version in #2978 should also go into the lower parts of this file when it's merged? ('it' =
narwhals/_native.py
)
Background
In narwhals
, we've got three main conceptual layers:
Skip me if you don't need a refresher 😉
(1) Narwhals
In the top-level modules, we define things like:
nw.DataFrame
nw.LazyFrame
nw.Series
They're directly user-facing
(2) Compliant
Under the nw._compliant
sub-package, we define things like:
CompliantDataFrame
CompliantLazyFrame
CompliantSeries
They're user-adjacent, some appear in nw.typing
and annotations but not to the same degree as Narwhals.
(3) Native
Following #3035, our typing for this level will be in nw._native.py
:
NativeDataFrame
NativeLazyFrame
NativeSeries
But we are mostly talking about concrete external types like:
pandas.DataFrame
polars.LazyFrame
pyarrow.ChunkedArray
What's that got to do with plugins?
If somebody wants to extend narwhals
; they're going to be building a new set of Compliant classes.
We can use them from Narwhals to talk to this new Native stuff.
Soooooo, I would say that a plugin is most closely related to the Compliant-level.
Suggestion
Rather than adding to nw._native
, I think a better fit would be either:
- Adding a new module
nw._compliant.plugins.py
- or just
nw.plugins.py
- Keeping it in
nw._utils.py
for now- until we eventually split that module up better 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just left a comment
if _is_native_plugin(native_object, plugin): | ||
compliant_namespace = plugin.__narwhals_namespace__(version=version) | ||
compliant_object = compliant_namespace.from_native(native_object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue is that here, if we have narwhals-daft
(for daft dataframes) and test-plugin
(for builtins.dict), then, narwhals will actually try using the test-plugin for a daft.dataframe, and it won't raise at these steps (from_native
doesn't raise), as from_native
typically doesn't do validation
So, steps would be:
- add
is_native
to thePlugin
protocol - make sure that
narwhals-daft
andtest-plugin
both haveis_native
functions at the top-level - change
_is_native_plugin
to do:
def _is_native_plugin(native_object: Any, plugin: Plugin2) -> bool:
pkg = plugin.NATIVE_PACKAGE
return sys.modules.get(pkg) is not None and _might_be(type(native_object), pkg) and pkg.is_native(native_object)
Then, I think we should be good to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed those changes and the corresponding change in narwhals-daft is in a PR here : MarcoGorelli/narwhals-daft#15
oh dear, just seen test failures, will investigate (🤦 I only ran tests for the plugin locally..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
altair and shiny should be unrelated
the coverage one complains about missing coverage - we only check coverage with python3.11, so the check for sys.version < 3,10 should get a # pragma: no cover
on the line of the if-statement
the typing failure shows
narwhals/_utils.py:2104: error: Argument 1 to "__call__" of
"_lru_cache_wrapper" has incompatible type "type[Any]"; expected "Hashable"
[arg-type]
and _might_be(type(native_object), pkg)
, as @dangotbanned suggested the function let's see if he has a preferred solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both for the ping and trying those ideas out 🙂
btw, are you happy with us using this one, or should we use the one you say is even simpler?
Since the first one in (#2978 (comment)) didn't pan out - I'll add a PR today with the changes to the *Namespace
protocols to make the other one easier
The nice thing about it is that if an extension inherits from LazyNamespace
or EagerNamespace
- then they won't need to implement any of this in their plugin 😄
Update
Most of the new code is just tests 🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thanks for that @dangotbanned ! Do I understand correctly that your changes in #3130 will fix the typing error once your PR has been merged? So I don't need to add a pragma: no cover
for the time being (referring to #2978 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the changes I've made will remove the need for that code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@dangotbanned I did a small refactor of the current PR but I didn't want to ask to merge into main, instead just this branch. Didn't manage to do that on narwhals, only on my forked repo, which is probably wrong 🤦 hence I've left it as a draft PR here: ym-pett#1 |
I used it here (#2978 (comment)) ... but forgot by the next day 🤦♂️ (#2978 (comment))
What type of PR is this? (check all applicable)
Related issues
plugins
uno reverse card ym-pett/narwhals#2Checklist
If you have comments or can explain your changes, please do so below
started trying to adapt
nw
so that it can read daft dataframes, but I might be barking up the wrong tree: I installed daft in the repo so that it would be available to nw. I have a feeling we want to avoid that, and I'm working as if I were adding an actual_daft
module rather than prep for a plugin.note to self: if installing daft was correct, need to add this to an install file, think it's pyproject.toml