Skip to content

Conversation

ym-pett
Copy link
Contributor

@ym-pett ym-pett commented Aug 12, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

started trying to adapt nw so that it can read daft dataframes, but I might be barking up the wrong tree: I installed daft in the repo so that it would be available to nw. I have a feeling we want to avoid that, and I'm working as if I were adding an actual _daft module rather than prep for a plugin.

note to self: if installing daft was correct, need to add this to an install file, think it's pyproject.toml

@dangotbanned
Copy link
Member

@ym-pett did you know about Marco's PR?

Apologies if you've discussed this privately 😅

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Aug 12, 2025

thanks for starting this!

😄 @dangotbanned yeah we'd discussed this

My hope is that we can use Daft as the way to develop the plugin system, and it can serve as a reference implementation

Here's what we're aiming for:

If a user has narwhals-daft installed, then they should be able to run

import narwhals as nw
import daft

df_native = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})

df = nw.from_native(df_compliant)
result = df.select("a", nw.col("b") * nw.col("a"))
print(result.collect())

This needs to be done in a way that won't be specific to Daft, so that anyone can register their own plugin without Narwhals having any knowledge about it. In this PR it's currently all Daft-specific

The packaging docs around entry-points might be useful here:

I'll also cc @camriddell into the conversation, as IIRC he'd also thought about pluggable backends


For prior art on plugins and entry-points, I think https://github.com/PyCQA/flake8 might also be good to look at

@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 13, 2025

thanks both, I'll revert the current changes - I feel like I had to go down the wrong route first to see what this actually consists of! :)

I can now go through the materials armed with more background! 🦾

@ym-pett ym-pett closed this Aug 13, 2025
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 3c8b34b to 11fe33f Compare August 13, 2025 10:32
@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 13, 2025

oops, didn't mean to close this! will reopen when I push new content!

@ym-pett ym-pett reopened this Aug 13, 2025
@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 13, 2025

based on flake8 example - will try to get something more sensible in next

@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 1ac87eb to cfd156f Compare August 13, 2025 16:28
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from b004d4e to e3a2f3b Compare August 14, 2025 13:34
Comment on lines 541 to 553
for plugin in discovered_plugins:
obj = plugin.load()
frame = obj.dataframe.DaftLazyFrame

# from obj.dataframe import DaftLazyFrame
try:
df_compliant = frame(native_object, version=Version.MAIN)
return df_compliant.to_narwhals()
except:
# try the next plugin
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, nice! does this work?

to make it not daft-specific, perhaps we could aim to have something like

try:
    df_compliant = obj.from_native(native_object, version=Version.MAIN)

?

This would mean making a top-level function in narwhals-daft too, and then we document that plugin authors are expected to implement this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling I'm doing something weird with how things are imported. Maybe it's that the top-level __init__ file needs altering in daft-narwhals.

with your suggestion, t.py no longer works and I get the error

TypeError: Expected pandas-like dataframe, Polars dataframe, or Polars lazyframe, got: <class 'daft.dataframe.dataframe.DataFrame'>

that dataframe.dataframe looks weird to me... would you expect that structure?

the type of plugin is <class 'importlib.metadata.EntryPoint'>, so I figured I had to load the module via that, for obj I then get the type <class 'module'>

the only way I could get access to the DaftLazyFrame (haven't tried simple LazyFrame yet) was by assigning it to a variable name, I couldn't do something like

from obj.dataframe import DaftLazyFrame (the error then is ModuleNotFoundError: No module named 'obj')

I think at the moment this all leaves us too bound to daft, and I bet I'm breaking a million coding conventions, eek!

I suspect I need to do a better job at exposing the modules within daft-nw but I haven't found how yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps in narwhals_daft/__init__.py you could make a from_native function, and then here use obj.from_native?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, will try that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should plugin detection come before we check our own written backends? That way if someone really wanted to override our pandas backend they would be empowered to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yeah maybe

@ym-pett ym-pett force-pushed the create_fromnative_daft branch from d72da25 to ca01346 Compare August 15, 2025 08:49
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 319b3d6 to 90ad973 Compare August 16, 2025 11:44
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 5e3be1f to 76232df Compare August 16, 2025 14:15
@ym-pett ym-pett force-pushed the create_fromnative_daft branch from 495d727 to ebc1a8f Compare August 16, 2025 14:26
@ym-pett
Copy link
Contributor Author

ym-pett commented Sep 9, 2025

hmm about that failing ruff test in pre-commit, I left this comment on the offending files but it seems to have disappeared (at least I can't find it anymore):

mypy was complaining about missing library stubs for these ("Skipping analyzing X: module is installed, but missing library stubs or py.typed marker"), it helpfully guided me to the documentation & I've chosen the simplest fix from there (https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports)

happy to change for a better one if you think this should be done!

@ym-pett
Copy link
Contributor Author

ym-pett commented Sep 9, 2025

now mypy is quiet, ruff doesn't like it (didn't get complaints locally though?)

what do you think is the best course of action?

@dangotbanned
Copy link
Member

dangotbanned commented Sep 9, 2025

now mypy is quiet, ruff doesn't like it (didn't get complaints locally though?)

what do you think is the best course of action?

Based on pre-commit run - I'd suggest removing the #type: ignores so we can see what the complaint is 🙂

If those imports require an ignore comment - then I think there's either a config issue or something wrong elsewhere

Update 1

I removed all the ignores locally and got:

Show logs

tests/test_plugin/test_plugin/namespace.py:5: error: Cannot find implementation or library stub for module named "test_plugin.dataframe"  [import-not-found]
    from test_plugin.dataframe import DictFrame, DictLazyFrame
    ^
tests/test_plugin/test_plugin/__init__.py:8: error: Cannot find implementation or library stub for module named "test_plugin.dataframe"  [import-not-found]
        from test_plugin.dataframe import DictFrame
    ^
tests/test_plugin/test_plugin/__init__.py:8: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
tests/test_plugin/test_plugin/__init__.py:15: error: Cannot find implementation or library stub for module named "test_plugin.namespace"  [import-not-found]
        from test_plugin.namespace import DictNamespace
    ^
tests/test_plugin/test_plugin/__init__.py:17: error: Returning Any from function declared to return "DictNamespace"  [no-any-return]
        return DictNamespace(version=version)
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Found 4 errors in 2 files (checked 434 source files)

I think the issue is config-related, gonna take a closer look now

Update 2

Should be fixed in these two (6c65fd4), (b26d7d4)

I guess the tricky part is that this test package is defined within a subdirectory of tests 😄

Comment on lines +10 to +11
@pytest.mark.skipif(sys.version_info < (3, 10), reason="3.10+ required for entrypoints")
def test_plugin() -> None:
Copy link
Member

@dangotbanned dangotbanned Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skip might be the reason behind the KeyError here (#2978 (comment))

Either way, we should be able to test < (3, 10) now - so removing it would be the first step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you still want this removing or is this obsolete?

Comment on lines 5 to 8
from typing_extensions import TypeIs

if TYPE_CHECKING:
from narwhals.utils import Version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from typing_extensions import TypeIs
if TYPE_CHECKING:
from narwhals.utils import Version
if TYPE_CHECKING:
from typing_extensions import TypeIs
from narwhals.utils import Version

@@ -0,0 +1,77 @@
from __future__ import annotations

from typing import TYPE_CHECKING, Any, TypeAlias
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli I think ruff isn't configured correctly 🤔

TypeAlias wasn't added until 3.10

Comment on lines 2066 to 2073
@cache
def discover_plugins() -> EntryPoints:
from importlib.metadata import entry_points
import sys
from importlib.metadata import entry_points as eps

return entry_points(group="narwhals.plugins")
group = "narwhals.plugins"
plugins = eps(group=group) if sys.version_info >= (3, 10) else eps()[group]
return cast("EntryPoints", plugins)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big thanks for this!!

I can see it still returns the same type of entrypoints iterable. I'll be honest & say that I don't fully get the return cast("EntryPoints", eps().get(group, ())) - I'll invoke clever typing-magic to appease the typing gods.

return Version.V1.dataframe(InterchangeFrame(native_object), level="interchange")

for plugin in discover_plugins():
obj = plugin.load()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree re obj, but module makes me think of narwhals modules. how about loaded_plugin?

return Version.V1.dataframe(InterchangeFrame(native_object), level="interchange")

for plugin in discover_plugins():
obj = plugin.load()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the naming here to the suggestion.

@dangotbanned
Copy link
Member

@ym-pett it looks like you responded to me a few times over the past couple weeks and it only just came through? 😭

#2978 (review)

@ym-pett
Copy link
Contributor Author

ym-pett commented Sep 10, 2025

@ym-pett it looks like you responded to me a few times over the past couple weeks and it only just came through? 😭

#2978 (review)

😭 😭 😭 I'm a dork, but I'm not the only one (is that a Radiohead song yet?): https://github.com/orgs/community/discussions/10369

So sorry Dan, I'll look out for that trap in future..

@ym-pett
Copy link
Contributor Author

ym-pett commented Sep 10, 2025

thanks so much for fixing the type problem @dangotbanned ( #2978 (comment) )!!

I'll move to implementing the changes you discussed with Marco, will ping you if I get stuck :)

return eps(group=group)


# @mp: should the protocol be defined in namespace?
Copy link
Contributor Author

@ym-pett ym-pett Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned, pushed this as it's working, but still needs some cleaning up (e.g I've left your naming so you can see which solution we've implemented. I'm guessing the Plugin Protocol shouldn't be in here, I'll look back on your PR on this topic I reviewed and have a think about where it should go instead.

btw, are you happy with us using this one, or should we use the one you say is even simpler? (I worked with the one I could understand better, but that shouldn't be a barrier ;))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, re-read your #3086, until it can live in your _native, should I move it into _namespace? (forgot _ above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the Plugin Protocol shouldn't be in here, I'll look back on your PR on this topic I reviewed and have a think about where it should go instead.

Ok, re-read your #3086, until it can live in your _native, should I move it into _namespace? (forgot _ above)

#3086 (comment)

Have re-read this & thinking that the plugin Protocol that's currently in draft version in #2978 should also go into the lower parts of this file when it's merged? ('it' = narwhals/_native.py)

Background

In narwhals, we've got three main conceptual layers:

Skip me if you don't need a refresher 😉

(1) Narwhals

In the top-level modules, we define things like:

  • nw.DataFrame
  • nw.LazyFrame
  • nw.Series

They're directly user-facing

(2) Compliant

Under the nw._compliant sub-package, we define things like:

  • CompliantDataFrame
  • CompliantLazyFrame
  • CompliantSeries

They're user-adjacent, some appear in nw.typing and annotations but not to the same degree as Narwhals.

(3) Native

Following #3035, our typing for this level will be in nw._native.py:

  • NativeDataFrame
  • NativeLazyFrame
  • NativeSeries

But we are mostly talking about concrete external types like:

  • pandas.DataFrame
  • polars.LazyFrame
  • pyarrow.ChunkedArray

What's that got to do with plugins?

If somebody wants to extend narwhals; they're going to be building a new set of Compliant classes.
We can use them from Narwhals to talk to this new Native stuff.

Soooooo, I would say that a plugin is most closely related to the Compliant-level.

Suggestion

Rather than adding to nw._native, I think a better fit would be either:

  • Adding a new module
    • nw._compliant.plugins.py
    • or just nw.plugins.py
  • Keeping it in nw._utils.py for now
    • until we eventually split that module up better 😄

@ym-pett ym-pett mentioned this pull request Sep 11, 2025
14 tasks
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just left a comment

Comment on lines +570 to +572
if _is_native_plugin(native_object, plugin):
compliant_namespace = plugin.__narwhals_namespace__(version=version)
compliant_object = compliant_namespace.from_native(native_object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that here, if we have narwhals-daft (for daft dataframes) and test-plugin (for builtins.dict), then, narwhals will actually try using the test-plugin for a daft.dataframe, and it won't raise at these steps (from_native doesn't raise), as from_native typically doesn't do validation

So, steps would be:

  1. add is_native to the Plugin protocol
  2. make sure that narwhals-daft and test-plugin both have is_native functions at the top-level
  3. change _is_native_plugin to do:
def _is_native_plugin(native_object: Any, plugin: Plugin2) -> bool:
    pkg = plugin.NATIVE_PACKAGE
    return sys.modules.get(pkg) is not None and _might_be(type(native_object), pkg) and pkg.is_native(native_object)

Then, I think we should be good to go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed those changes and the corresponding change in narwhals-daft is in a PR here : MarcoGorelli/narwhals-daft#15

oh dear, just seen test failures, will investigate (🤦 I only ran tests for the plugin locally..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

altair and shiny should be unrelated

the coverage one complains about missing coverage - we only check coverage with python3.11, so the check for sys.version < 3,10 should get a # pragma: no cover on the line of the if-statement

the typing failure shows

narwhals/_utils.py:2104: error: Argument 1 to "__call__" of
"_lru_cache_wrapper" has incompatible type "type[Any]"; expected "Hashable" 
[arg-type]
            and _might_be(type(native_object), pkg)

, as @dangotbanned suggested the function let's see if he has a preferred solution

Copy link
Member

@dangotbanned dangotbanned Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both for the ping and trying those ideas out 🙂

btw, are you happy with us using this one, or should we use the one you say is even simpler?

Since the first one in (#2978 (comment)) didn't pan out - I'll add a PR today with the changes to the *Namespace protocols to make the other one easier

The nice thing about it is that if an extension inherits from LazyNamespace or EagerNamespace - then they won't need to implement any of this in their plugin 😄

Update

Most of the new code is just tests 🥳

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks for that @dangotbanned ! Do I understand correctly that your changes in #3130 will fix the typing error once your PR has been merged? So I don't need to add a pragma: no cover for the time being (referring to #2978 (comment))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the changes I've made will remove the need for that code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

dangotbanned added a commit that referenced this pull request Sep 13, 2025
@ym-pett
Copy link
Contributor Author

ym-pett commented Sep 15, 2025

@dangotbanned I did a small refactor of the current PR but I didn't want to ask to merge into main, instead just this branch. Didn't manage to do that on narwhals, only on my forked repo, which is probably wrong 🤦

hence I've left it as a draft PR here: ym-pett#1

dangotbanned added a commit that referenced this pull request Sep 16, 2025
I used it here (#2978 (comment))

... but forgot by the next day 🤦‍♂️ (#2978 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request extensibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants