-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What would you like to happen?
Summary
This feature request proposes adding asynchronous computation to Apache Beam's Interactive Beam API. This means allowing long-running tasks to execute in the background without blocking the user interface.
Motivation
Interactive Beam is a powerful tool for iterative pipeline development and debugging. However, long-running collect operations can block the interactive environment, hindering productivity and exploration. Introducing asynchronous computation would significantly improve the user experience by allowing developers to continue building the pipeline while computations are executed in the background.
Proposed Solution
Interactive Beam offers a compute API which runs asynchronously in the background and does not produce any result to be displayed on the interactive interface eg. Colab.
def compute(
*pcolls,
wait_for_inputs: bool = True,
blocking: bool = False
runner=None,
options=None,
force_compute=False) -> NoneThis API introduces two new options:
wait_for_inputs: Whether to wait until the asynchronous dependencies are
computed. Setting this to False allows to immediately schedule the
computation, but also potentially results in running the same pipeline
stages multiple times.blocking: If False, the computation will run in non-blocking fashion. In
Colab/IPython environment this mode will also provide the controls for the
running pipeline. If True, the computation will block until the pipeline
is done.
compute operations can subsequently be followed collect operations on the same PCollection for users to view the result.
Benefits
- The ability to compute time consuming
PCollectionsasynchronously - Sink operations which do not produce any meaningful output can use
computeinstead ofcollect
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner