Skip to content

[Feature Request]: Interactive Beam supports asynchronous computations. #33103

@ganesh4991

Description

@ganesh4991

What would you like to happen?

Summary

This feature request proposes adding asynchronous computation to Apache Beam's Interactive Beam API. This means allowing long-running tasks to execute in the background without blocking the user interface.

Motivation

Interactive Beam is a powerful tool for iterative pipeline development and debugging. However, long-running collect operations can block the interactive environment, hindering productivity and exploration. Introducing asynchronous computation would significantly improve the user experience by allowing developers to continue building the pipeline while computations are executed in the background.

Proposed Solution

Interactive Beam offers a compute API which runs asynchronously in the background and does not produce any result to be displayed on the interactive interface eg. Colab.

def compute(
*pcolls, 
wait_for_inputs: bool = True,
blocking: bool = False
runner=None,
options=None,
force_compute=False) -> None

This API introduces two new options:

  • wait_for_inputs: Whether to wait until the asynchronous dependencies are
    computed. Setting this to False allows to immediately schedule the
    computation, but also potentially results in running the same pipeline
    stages multiple times.
  • blocking: If False, the computation will run in non-blocking fashion. In
    Colab/IPython environment this mode will also provide the controls for the
    running pipeline. If True, the computation will block until the pipeline
    is done.

compute operations can subsequently be followed collect operations on the same PCollection for users to view the result.

Benefits

  • The ability to compute time consuming PCollections asynchronously
  • Sink operations which do not produce any meaningful output can use compute instead of collect

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions