Add sandbox observability and resource monitoring

**Description**  
Currently, sandbox environments behave like a black box. When processes die (e.g., OOM, crash), users cannot see resource usage or termination reasons. This makes it hard to distinguish between application bugs and infrastructure issues.  

**Proposal**  
- Expose per-sandbox metrics (CPU, memory, disk, I/O) via API and dashboard.  
- Surface process termination reasons (e.g., OOM kill, manual stop, internal crash).  
- Enable integration with external observability tools (e.g., OTEL collector endpoint configuration).  
- (Optional) Provide a built-in lightweight supervisor to restart main processes when killed.  

**Acceptance Criteria**  
- API returns real-time and historical CPU/memory/disk usage for each sandbox.  
- Sandbox events include termination reason with timestamp.  
- Dashboard displays metrics graphs and termination events.  
- Configurable OTEL collector endpoint supported for streaming metrics/logs.  
- (Optional) Supervisor process can be toggled per sandbox for auto-restarts.  

**Impact**  
- Users can self-diagnose issues like OOM without needing Daytona support.  
- Faster debugging and reduced downtime for production-like workloads.  
- Enables external monitoring and alerting pipelines (Datadog, Grafana, etc.).  
- Improves reliability and trust.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sandbox observability and resource monitoring #2487

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add sandbox observability and resource monitoring #2487

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions