Feature Request for Vehicle Service Orchestrator

### Change Request Type

Feature Request

### Description of the Change Request

### Abstract
This feature proposes a vehicle‑specialized orchestration framework that adapts cloud‑native container technologies to the in‑vehicle environment. The feature extends a cloud‑native orchestrator with vehicle‑specific capabilities, enabling applications developed in the cloud to be deployed to the vehicle without modification. It also supports explicit resource management for mixed‑criticality workloads and provides execution guarantees and automatic recovery for safety-critical applications.

Currently, vehicle software integration is burdened by complex deployment processes, platform‑specific custom builds, and improper resource allocation in mixed‑criticality environments. This feature addresses these issues by optimizing proven cloud container orchestration technologies for the automotive domain.

The feature adds vehicle‑specific features on top of container orchestration. Through declarative Manifests, it defines ASIL‑level priorities and resource allocations, reserving dedicated CPU cores and memory for safety-critical workloads. It ensures portability between cloud and vehicle environments without requiring changes to application code.

By combining the limited resources and real‑time constraints of vehicles with the scalability and self‑healing capabilities of cloud‑native technologies, the feature enables stable integration of ADAS/AD software. It provides an environment where workloads of varying criticality can operate without interference, ultimately improving the efficiency and reliability of vehicle software development and operations.

The term Vehicle Service Orchestrator reflects the orchestrator’s role in managing services under the unique operational conditions of a vehicle, rather than simply adapting a cloud‑native orchestrator. Unlike traditional container orchestrators designed for scalable datacenter environments, a vehicle‑specific orchestrator must account for constrained compute budgets, strict isolation between workloads of differing criticality, and continuous operation under varying driving conditions. The name emphasizes that its primary purpose is to coordinate and safeguard in‑vehicle services in a manner appropriate for automotive constraints, bridging cloud‑native development models with the operational realities of the vehicle environment.

### Motivation
**1. Complexity and Platform Dependency in Vehicle Software Deployment**
Current vehicle software deployment faces significant constraints due to complex multi‑stage validation processes, platform‑specific customized builds, and limited update mechanisms. In a typical deployment scenario, any modification to an application requires repeated validation across the development, testing, and vehicle environments, and each validation stage must faithfully reproduce the same execution environment. Integration issues frequently occur due to environment inconsistencies, and model‑specific builds are unavoidable because each vehicle ECU uses different hardware architectures (x86, ARM, RISC‑V) and different OS conditions (Linux distributions, kernel versions, library dependencies).

During OTA updates, a full system reboot is often required, rollback capabilities are limited, and recovering from update failures is challenging. Version management also becomes unnecessarily complex. To address these challenges, vehicle‑optimized orchestration technology is required—one that enables “build once, run anywhere” through container‑based deployment, defines deployment state with a declarative Manifest, and supports automated rollout and rollback. A lightweight solution that accounts for limited in‑vehicle resources and real‑time constraints is essential.

**2. Application Behavior in Mixed‑Criticality Domains**
Vehicle software is composed of applications with variant execution requirements depending on their safety level. For example, an ASIL‑D Automatic Emergency Braking (AEB) system requires strict timing guarantees, while a QM‑level infotainment system can tolerate delays. This mixed‑critical structure also applies when distributing workloads across high‑performance and low‑performance ECUs.

Expected issues in this execution model include resource contention and inappropriate node placement between critical and non‑critical applications. For instance, an infotainment application consuming excessive CPU resources could delay object detection in AEB or increase braking response time—posing a significant safety risk.

To address these challenges, ASIL‑D applications must be allocated dedicated CPU cores and memory, while QM applications should share resources. Furthermore, dynamic resource reallocation is necessary to ensure the execution guarantees and timing requirements of safety‑critical functions when driving conditions change (e.g., urban → highway). Since standard cloud‑native orchestration does not inherently understand ASIL concepts or guarantee prioritization for safety‑critical workloads, a vehicle‑optimized orchestrator is required.

### Rationale
**1. Selection of a Cloud‑Native Orchestration Foundation**
This feature is built upon cloud‑native orchestration technologies that have already been validated at scale in cloud environments. Container orchestration systems proven in the cloud provide core capabilities such as declarative deployment, automatic recovery, and rolling updates, all of which can be directly applied to address the complexity and platform dependency issues found in vehicle software deployment. By optimizing these proven cloud technologies for the in‑vehicle environment, development time can be reduced while improving overall system reliability.

**2. Extension Architecture for Vehicle‑Specific Capabilities**
The reason we adopted an extension architecture—adding vehicle‑specific capabilities instead of using the existing cloud‑native orchestrator as‑is—is the fundamental difference between vehicle and cloud environments. Cloud systems assume virtually infinite scalability, persistent network connectivity, and 99.9% availability, whereas vehicles operate under constrained resources, intermittent connectivity, 99.9999% availability requirements, and stringent real‑time constraints. In particular, mixed‑criticality management based on ASIL levels is a requirement unique to automotive systems and does not exist in cloud environments. Therefore, while the base orchestration features are reused, extensions such as mixed‑criticality awareness, real‑time scheduling, and vehicle‑specific health checks are added.

**3. Declarative Manifest‑Based Configuration**
Defining ASIL levels, resource allocations, and dependencies through a declarative Manifest separates the responsibilities of developers and integrators while reducing deployment complexity. In traditional workflows, developers must manually manage platform‑specific build scripts, environment variables, and resource settings, requiring repetitive adjustments whenever the vehicle model or ECU changes. With a declarative Manifest, developers specify what to deploy, while the orchestrator determines how to deploy it. This allows developers to focus on application logic while integrators adjust only the Manifest to support diverse vehicle environments.

**4. Portability Without Application Code Changes**
The decision to allow cloud‑developed applications to be deployed to vehicles without modifying their code is driven by the need for development productivity and ecosystem utilization. Previously, porting a cloud application to a vehicle required manual adjustments to platform‑specific library dependencies, environment variables, and network configurations—introducing delays and increasing the likelihood of errors. Container‑based deployment encapsulates the application and all of its dependencies into an image, ensuring a consistent execution environment across cloud and vehicle platforms. The orchestrator abstracts environment‑specific differences such as networking, storage, and security. This enables immediate reuse of cloud‑validated applications and frameworks (e.g., AI inference engines, data processing pipelines) within the vehicle while significantly reducing integration overhead through consistent cloud‑to‑vehicle development environments.

**5. Container‑Based Isolation and Resource Management**
Managing all applications—including the Executor—within containers ensures explicit resource management and consistent runtime environments. Previously, the Executor existed as a Rust package with implicit and manually maintained resource allocation, which risked violating FEO guarantees when additional applications were introduced. Through container isolation, each Executor can be explicitly assigned dedicated CPU cores, memory, and GPU resources. cgroup and namespace isolation prevents interference from other workloads. Additionally, container images enable “build once, run anywhere,” providing platform independence across heterogeneous hardware architectures such as x86, ARM, and RISC‑V.

**6. Real‑Time Monitoring and Automatic Recovery Mechanisms**
The design choice to support real‑time monitoring and automated recovery of resource usage, timing metrics, and health status is essential due to the safety requirements and operational complexity of vehicle environments. Previously, when the Executor crashed or timing constraints were violated, issues were only logged and required manual investigation and restart, with little visibility into root causes. In vehicular systems, interruptions to safety‑critical functions can directly affect human life, making millisecond‑level fault detection and recovery indispensable. Periodic health checks via a Liveness Probe, timing‑constraint validation using a Timing Probe, and immediate restart policies upon failure ensure continuity of critical functions without human intervention. Collecting metrics such as CPU and memory usage, per‑task execution time, and timing violation counts enables both post‑incident analysis and proactive prevention.

### Specification

**1. Overview**
A Vehicle Service Orchestrator is a structured and declarative framework for managing the execution flow, timing constraints, and error handling of containers. Developers can define application control flows and resource‑management policies in a platform‑independent manner, while the orchestration automates container deployment, execution guarantees, and dynamic resource allocation. This clearly separates application logic from infrastructure management, enabling stable and vehicle‑optimized operation.

**2. System Architecture**
The system follows a three‑layer architecture specialized for in‑vehicle environments.
In the API Layer,theAPI Server Allows the user to configure(add/remove) the Manifest
In the Orchestration layer, the VehicleData FilterGateway, ActionController and StateManager coordinate workloads.
In the Agent layer, NodeAgent handles execution on each node.
In the Runtime layer, the container engine performs actual container operations.

**2.1. Core Components**

API Server: Interfaces with user to add or remove the manifest(scenarios)
VehicleData Filtergateway: Services are automatically controlled based on changes in vehicle state.
ActionController: Scenario‑based workload control and real‑time scheduling
StateManager: Tracking container lifecycle and managing state transitions
NodeAgent: Container execution and resource management per node

**3. Requirement**

**3.1. Workload Lifecycle Management**
**Standard Command Set**
Seven essential workload commands are supported:
create, start, pause, resume, stop, restart, delete.
All commands are delivered via remote procedure calls and follow a standardized response format.

**Container State Model**
Containers are managed across five main states:
Created (image ready),
Running (active execution),
Paused (memory preserved),
Exited (normal/error termination),
Restarting (automatic recovery).
Transitions between these states follow strict rules.

**3.2. Scenario‑Based Automation**
**Conditional Execution Engine**
Services are automatically controlled based on changes in vehicle state.
Scenario information is retrieved from a distributed key‑value store, and corresponding actions are executed automatically when conditions are met. Integration with real‑time data streams ensures immediate responsiveness.

**3.3. Resource Management and Isolation**
**Container Security Isolation**
User identifiers, group permissions, and Linux capabilities are strictly controlled according to the principle of least privilege. Restricting privileged mode and applying security contexts strengthens system-level protection.

**Performance Optimization**
Processor and memory usage are tracked in real time, allowing early detection of resource shortages. Parallel container creation, asynchronous processing, and automatic scaling optimize startup times and maximize efficiency.

**3.4. Monitoring and Recovery**
**State Monitoring**
Comprehensive health checks continuously monitor process status, port connectivity, and application‑level health. Changes in status are detected immediately, ensuring consistency across the entire system.

**Automatic Recovery Mechanisms**
Failure recovery is automated according to restart policies. Failed containers are automatically restarted, and state‑based corrective actions minimize operational downtime. Customized recovery logic is applied depending on the error type.

### Backwards Compatibility
This feature is designed as an optional extension module that does not modify the existing S‑Core architecture. No changes are required to the current Executors (FEO, Lifecycle, Orchestration) or to application logic. Container‑based deployment and resource‑management features are applied only to services that require them, while existing process‑based workloads remain fully preserved.

The Manifest operates as an additional configuration layer that can be used alongside the existing Launch mechanism rather than replacing it. The image‑delivery pipeline, data formats, and OS initialization procedures (Linux/QNX) remain unchanged. Furthermore, safety and security features complement—rather than replace—current mechanisms, ensuring full Backward Compatibility for the entire platform and all existing applications.

### Estimates for realization

Will be provided Based on the Next Course of Action

### Affects work products

- [x] Requirements
- [x] Architecture
- [x] Safety/Security Analysis
- [x] Detailed Design

### Impact analysis

### **Security Impact**
Downloading container images from unauthorized servers can cause severe security issues. User authentication ensures that only verified and trusted container images can be downloaded.

### **Safety Impact**
When operating cloud‑originated workloads inside the vehicle:

**Complies with ISO 26262 requirements during fault conditions**
Vehicle‑state monitoring enables quick detection of failures, and predefined safety mechanisms—designed and validated according to ISO 26262 processes—ensure compliance with functional‑safety requirements.

**Controlled workload behavior during critical vehicle functions**
When critical applications are active, workloads can be managed according to safety criteria to ensure they do not interfere it.

### Safety or Security relevance

- [ ] none
- [x] Safety relevant
- [x] Security relevant

### ASIL classification

ASIL_B

### Expected Implementation Version

1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request for Vehicle Service Orchestrator #2597

Change Request Type

Description of the Change Request

Abstract

Motivation

Rationale

Specification

Backwards Compatibility

Estimates for realization

Affects work products

Impact analysis

Security Impact

Safety Impact

Safety or Security relevance

ASIL classification

Expected Implementation Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request for Vehicle Service Orchestrator #2597

Description

Change Request Type

Description of the Change Request

Abstract

Motivation

Rationale

Specification

Backwards Compatibility

Estimates for realization

Affects work products

Impact analysis

Security Impact

Safety Impact

Safety or Security relevance

ASIL classification

Expected Implementation Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions