Skip to content

Commit 833444f

Browse files
committed
Respond to first batch of reviewer comments
1 parent 67111d2 commit 833444f

3 files changed

Lines changed: 18 additions & 4 deletions

File tree

docs/source/user-guide/io/distributing_work.rst

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -92,10 +92,16 @@ Then build the expression in the driver and fan it out:
9292
)
9393
print(results) # [[2, 4, 6], [20, 40, 60]]
9494
95-
When saved to a ``.py`` file and executed with the ``spawn`` or
96-
``forkserver`` start method, wrap the driver block in
97-
``if __name__ == "__main__":`` so worker processes can re-import the
98-
module without re-running it.
95+
.. note::
96+
97+
When saved to a ``.py`` file and executed with the ``spawn`` or
98+
``forkserver`` start method, wrap the driver block in
99+
``if __name__ == "__main__":`` so worker processes can re-import
100+
the module without re-running it. This is a standard Python
101+
:py:mod:`multiprocessing` requirement, not DataFusion-specific —
102+
see `Safe importing of main module
103+
<https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods>`_
104+
in the Python docs.
99105

100106

101107
What travels with the expression

examples/multiprocessing_pickle_expr.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@
1717

1818
"""Distribute different DataFusion expressions to worker processes.
1919
20+
For background — the shipped-expression model, what travels inline vs
21+
by name, portability requirements, and the security threat model —
22+
see ``docs/source/user-guide/io/distributing_work.rst``.
23+
2024
Builds a list of parametric expressions in the driver — each closing
2125
over a different threshold value — ships one per worker via
2226
``multiprocessing.Pool``, and collects the results back. The closure

examples/ray_pickle_expr.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@
1717

1818
"""Distribute DataFusion expressions to Ray actors.
1919
20+
For background — the shipped-expression model, what travels inline vs
21+
by name, portability requirements, and the security threat model —
22+
see ``docs/source/user-guide/io/distributing_work.rst``.
23+
2024
Build an expression in the driver, ship it to a pool of Ray actors, and
2125
have each actor evaluate it against its own slice of data. Python UDFs
2226
travel with the shipped expression — no actor-side registration needed.

0 commit comments

Comments
 (0)