Commit 0a9ca68
docs: user guide + runnable examples for distributing expressions (#1547)
* docs: user guide page + runnable examples for distributing expressions
Wraps up the Expr-pickle work with the user-facing material:
* docs/source/user-guide/io/distributing_work.rst — new user guide
page covering the multiprocessing, Ray, and datafusion-distributed
patterns. Includes the Security section that is the canonical home
for the cloudpickle / pickle.loads threat model.
* docs/source/user-guide/io/index.rst — toctree entry.
* examples/multiprocessing_pickle_expr.py — runnable example: a
Pool.map of a closure-capturing UDF across processes, with worker
context registration in the initializer.
* examples/ray_pickle_expr.py — Ray actor analogue.
* examples/datafusion-ffi-example/python/tests/_test_pickle_strict_ffi.py
— exercises the strict-mode refusal end to end against an FFI
capsule scalar UDF (kept under the FFI example crate because the
test needs that crate's compiled artifacts).
* examples/README.md — index entries for the new files.
Also tightens three docstrings that previously duplicated the
security warning so they point at the canonical Security section
instead:
* PythonLogicalCodec::with_python_udf_inlining (rustdoc): one-line
summary plus a relative pointer to distributing_work.rst and the
upstream Python pickle module security warning.
* SessionContext.with_python_udf_inlining: one-sentence summary plus
:doc: link to the user guide.
* datafusion.ipc module docstring: cross-reference to the user guide
for the full pattern.
The crate-level codec.rs module rustdoc also updates "pure-Python
scalar UDFs" to "scalar / aggregate / window UDFs" now that all three
are covered.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: document Python-version and import portability caveats for inline UDFs
Reviewer feedback on the Expr-pickle PRs (#1544) asked that the
cloudpickle portability caveats be discoverable on the user-facing
page, not only in docstrings. The distributing_work.rst page is the
designated canonical home for the distribution story, so add them here:
* New 'Portability requirements for inline Python UDFs' subsection
covering the matching-Python-minor-version requirement and the
by-value vs by-reference import-capture rule (imported modules must
be importable on the worker).
* Qualify the 'fully portable' Python-UDF bullet to point at the new
requirements.
* Cross-reference the new subsection from the closure-capture note.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: restore version-byte and cloudpickle-cache rustdoc wording
Two codec.rs docstrings were reworded in PR4 in ways that dropped
information:
* try_encode_python_scalar_udf: restore the `DFPYUDF` family prefix +
version byte description of the payload framing (PR4 had collapsed it
to `DFPYUDF1` prefix, dropping the version-byte mention).
* cloudpickle cached-handle comment: restore "The encode/decode helpers
above" wording.
* docs: fix reversed tuple order in multiprocessing example docstring
The 'Worker layout' docstring described tasks as `(expr, label)` but
the code builds and unpacks them as `(label, expr)`. Correct the doc
to match.
* Respond to first batch of reviewer comments
* docs: relocate and restructure distributing-work guide
Move the page from user-guide/io/ to the top level of user-guide/ — distributing work is a runtime/operational concern, not a file-format topic, and the shorter "Distributing work" title fits the sidebar cleanly.
Restructure the body to lead with the practical worker-setup pattern instead of the four-slot SessionContext taxonomy. The taxonomy survives at the bottom as a reference subsection; the worker-init example and portability rules now reach the reader before they need it. Also addresses reviewer NIT: wrap the `if __name__ == "__main__":` guidance in a `.. note::` admonition and link to the Python multiprocessing docs.
Add a header paragraph to each runnable example pointing to the user-guide page so a reader who jumps straight to the example gets the surrounding context.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent baec559 commit 0a9ca68
7 files changed
Lines changed: 771 additions & 7 deletions
File tree
- crates/core/src
- docs/source
- user-guide
- examples
- datafusion-ffi-example/python/tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
256 | 256 | | |
257 | 257 | | |
258 | 258 | | |
259 | | - | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
260 | 265 | | |
261 | 266 | | |
262 | 267 | | |
| |||
433 | 438 | | |
434 | 439 | | |
435 | 440 | | |
436 | | - | |
| 441 | + | |
437 | 442 | | |
438 | 443 | | |
439 | 444 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
| |||
0 commit comments