Skip to content

Conversation

agupta01
Copy link

References

Resolves #586

Code changes

This PR adds a completed_cells field to the Job model to track the number of cells executed during notebook execution in real-time. The implementation includes:

Model Changes:

  • Added nullable completed_cells column to the Job table in ORM (jupyter_scheduler/orm.py)
  • Added completed_cells field to DescribeJob and UpdateJob models (jupyter_scheduler/models.py)

Execution Tracking:

  • Added JobFeature.track_cell_execution feature flag to enable/disable cell tracking
  • Modified DefaultExecutionManager to use nbconvert's native on_cell_executed hook. Note that this was found to be cleaner than the subclassing approach defined in Track completed cell progress during notebook execution #586.
  • Hook updates the database with ep.code_cells_executed after each cell execution

User-facing changes

  • GET /jobs/{job_id} now returns the current count of completed cells
  • PATCH /jobs/{job_id} accepts completed_cells updates for manual corrections

Backwards-incompatible changes

None

@agupta01 agupta01 changed the title All cell execution tracking during notebook execution Add cell execution tracking during notebook execution Jul 23, 2025
@andrii-i andrii-i added the enhancement New feature or request label Jul 23, 2025
Copy link
Collaborator

@andrii-i andrii-i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @agupta01, thank you. This PR adds cell execution tracking feature in a high-quality and clean manner, db schema changes are done properly and therefore backwards-compatible.

Refactors of validate() and supported_features() to be classmethods makes sense to me and is a clear improvement / cleaner design since both methods don't need instance state.

Two concerns prevent immediate merge:

  1. Enabling cell tracking by default makes every cell execution trigger a database write. This could impact production deployments, especially for users with large notebooks and limited compute and networking resources / configurations. Making this feature opt-in / disabled by default would remedy this.
  2. Classmethod refactor breaks backwards compatibility for any custom ExecutionManager implementations and therefore is a breaking change. Refactoring these changes into a separate PR that would be merged after the final 2.x release would remedy this.

Happy to merge once these changes are made. The cell tracking feature will be a great addition to Jupyter Scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track completed cell progress during notebook execution
2 participants