Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions website/docs/main/home/calling/ai/guides/error-handling.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
title: Error handling
description: How to handle failure gracefully and avoid disruption to users
slug: /ai/guides/error-handling
---

# Error handling

The SignalWire AI platform includes mature error handling features that prioritizes user experience and system reliability.
The use of human-like error messages, graceful degradation, and retry logic ensures that users receive a professional experience even during system difficulties.

The error handling patterns demonstrate a mature approach to production AI system reliability, with comprehensive logging, event notification, and fallback mechanisms

## Overview

SignalWire AI communicates user-facing error messages via `ais_say()` messages comunicating the conditions that trigger session termination.
The system employs a graceful degradation approach with polite, human-like error messages to maintain user experience even during failures.

## Error types

This guide differentiates between fatal and recoverable errors.

### Fatal errors

This type of error causes the session to terminate.

| Error type | Message | Trigger conditions | Details |
| ------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| AI text generation/streaming | "I'm sorry I am having a migrane. I have to hangup now." | <ul><li>Fatal error flag is set during AI response processing</li><li>Occurs in `ai_send_text()` function during streaming operations</li><li>Typically indicates severe API communication failures or malformed responses</li></ul> | <ul><li>Sets `ais->offhook = 0` to terminate the call</li><li>Fires a `calling.ai.error` event with fatal flag</li><li>Logs error details to system logs</li><li>Used when the AI system cannot recover from the error</li></ul> |
| Maximum retry attempts exceeded | "I'm sorry I am having a problem. I have to hangup now." | <ul><li>Error count exceeds `settings->max_tries` (default: 3, post-processing: 10)</li><li>Only triggered during streaming operations (`if (stream)`)</li><li>Indicates persistent communication or processing failures</li></ul> | <ul><li>Only spoken during streaming mode</li><li>Implements exponential backoff retry logic before this message</li><li>Sets `ais->offhook = 0` to terminate the call</li><li>Default `max_tries` is 3 for normal operations, 10 for post-processing</li></ul> |


### Recoverable errors

This type of error does not terminate the session.

#### 2.1 "I'm sorry, can you hold on for a second?"

**Context:** First retry attempt
**Trigger Conditions:**
- `errors == 1` (first error encountered)
- System will retry the operation after this message
- Implements exponential backoff (waits `errors` seconds before retry)

**Technical Details:**
- Polite way to indicate temporary processing delay
- Implements exponential backoff (1 second, then 2 seconds, etc.)
- Session continues after retry delay
- Does not terminate the call

### 3. Voice/TTS Configuration Errors (Non-Terminating)

#### 3.1 "There was an error with your voice selection, please check your syntax. using a default voice instead."

**Context:** Voice configuration failure with fallback
**Trigger Conditions:**
- TTS engine initialization fails with specified voice parameters
- System falls back to default Google Cloud voice (`gcloud`, `en-US-Neural2-J`)
- Occurs during session initialization and runtime voice changes

**Technical Details:**
- Graceful degradation to default voice
- Session continues with fallback configuration
- Provides user feedback about configuration issue
- Suggests syntax checking for voice parameters

#### 3.2 "There was an error with the current voice, sorry for the trouble."

**Context:** Runtime voice switching failure
**Trigger Conditions:**
- Voice switching fails during active conversation
- Occurs in the output thread during TTS processing
- System attempts to recover by switching to default voice

**Technical Details:**
- Attempts fallback to default voice
- If fallback fails, sets `ais->running = 0`
- More apologetic tone for runtime interruption
- Indicates temporary service disruption

## Session termination patterns

### Termination Triggers (`ais->offhook = 0`)

The system terminates sessions by setting `ais->offhook = 0` in the following scenarios:

1. **Fatal AI Processing Errors**
- Unrecoverable API communication failures
- Malformed response data
- Critical system errors

2. **Maximum Retry Exceeded**
- Persistent communication failures
- Repeated processing errors
- System reliability threshold exceeded

3. **Function-Triggered Hangup**
- Explicit hangup command from AI functions
- Controlled session termination
- User or system-initiated disconnect

4. **Transfer Operations**
- Call transfer scenarios
- Session handoff to other systems
- Controlled termination for routing

## Error Recovery Mechanisms

### Retry Logic
- **Exponential Backoff:** Wait time increases with each retry (1s, 2s, 3s...)
- **Maximum Attempts:** Configurable via `max_tries` (default: 3, post-processing: 10)
- **Graceful Messages:** User-friendly notifications during retry attempts

### Fallback Strategies
- **Default Voice:** Falls back to `gcloud` `en-US-Neural2-J` on voice failures
- **Service Degradation:** Continues with reduced functionality rather than termination
- **Error Logging:** Comprehensive logging for debugging while maintaining user experience

### Event System
- **Error Events:** Fires `calling.ai.error` events for external monitoring
- **Fatal Flag:** Distinguishes between recoverable and fatal errors
- **Structured Data:** JSON error objects with detailed information

## Best Practices Implemented

### User Experience
1. **Polite Language:** All error messages use apologetic, human-like language
2. **Clear Communication:** Messages explain the situation without technical jargon
3. **Expectation Setting:** Users are informed about delays and recovery attempts
4. **Graceful Degradation:** System continues operation when possible

### Technical Robustness
1. **Comprehensive Logging:** All errors logged with appropriate severity levels
2. **Event Notification:** External systems notified of error conditions
3. **Resource Cleanup:** Proper memory and resource management during errors
4. **State Management:** Consistent session state during error conditions

### Error Classification
1. **Fatal vs Recoverable:** Clear distinction between error types
2. **Context Awareness:** Different handling for different operational contexts
3. **Retry Strategies:** Appropriate retry logic for different error types
4. **Fallback Mechanisms:** Multiple levels of service degradation

## Configuration

The following parameters configure error handling, fallback behavior, and reporting.

### Retry Settings
- `max_tries`: Maximum retry attempts (default: 3, post-processing: 10)
- Exponential backoff timing: `errors * 1 second`

### Voice Fallback
- Default engine: `gcloud`
- Default voice: `en-US-Neural2-J`
- Automatic fallback on configuration errors

### Error Reporting
- Event type: `calling.ai.error`
- Includes fatal flag and error details
- Structured JSON error objects

## Monitor and debug

### Log Levels
- **ERROR:** Fatal conditions and session terminations
- **WARNING:** Retry attempts and recoverable errors
- **INFO:** General error recovery information
- **DEBUG:** Detailed technical information

### Event Monitoring
- Monitor `calling.ai.error` events for system health
- Track fatal vs non-fatal error ratios
- Analyze retry patterns and success rates

### Performance Metrics
- Track error rates by error type
- Monitor retry success rates
- Measure recovery times and user impact
55 changes: 37 additions & 18 deletions website/docs/main/home/calling/ai/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,35 +12,54 @@ import UseCases from "./_usecases/_useCases.mdx";

<Subtitle>Programmable, integrated, realtime voice AI</Subtitle>

<br/><br/>

SignalWire AI is built for **unlimited programmability and scale**.
Integrate AI and deploy a MVP with low-code/no-code drag-and-drop tools, then scale your application on SignalWire's cloud platform.

## Try it out
What makes SignalWire AI different?

<UseCases />
- **Telecom stack:** SignalWire is built by the creators and maintainers of FreeSWITCH®,
the leading open-source communication framework that powers some of the world's largest telephony infrastructures.
Other voice AI providers must outsource their telecom integrations, limiting the depth and efficiency of their implementations, and sometimes resulting in higher cost to the developer.

## SWML
- **Deep integration:** SignalWire's AI is built on our own telecom stack, giving SignalWire the ability to optimize further than others.
This means lower latency, better performance, and greater extensibility and programmability.

SWML (SignalWire Markup Language) is the most powerful and flexible way to use AI on the SignalWire platform.
## SignalWire Markup Language (SWML) {#SWML}

SWML is a structured language for configuring and orchestrating real-time communication applications
using lightweight and readable JSON or YAML files.
These SWML Scripts can be deployed serverlessly in SignalWire's cloud, or from your server.
SWML is the powerful, lightweight, and extensible markup language for orchestrating real-time call flows that enables SignalWire AI.

SWML's `ai` method integrates advanced AI Agents, which can interact with external APIs.
SWML Scripts are JSON or YAML files, which can be edited, hosted, and deployed in a number of ways:
serverlessly in the
<Tooltips tip="The Dashboard is your control panel for all things SignalWire, accessed at `{your-space-name}.signalwire.com`.<br/><br/>[Learn more](/platform/dashboard)</a>">SignalWire Dashboard</Tooltips>,
hosted externally on your own server,
or generated and hosted by your own application using the Agents SDK.

<CardGroup cols="4">

<Card title="Agents SDK" href="href">
Build and deploy SWML applications using the powerful Agents SDK for Python.
</Card>

<Card title="SWML Scripts" href="href">
Host SWML Scripts in the Dashboard or your own server.
</Card>

<Card title="AI Agents UI" href="href">
A web creator for basic AI Agents in the Dashboard.
</Card>

<Card title="Call Flow Builder" href="href">
Call Flow Builder's streamlined featureset and drag-and-drop UI is best for simple demos and non-technical users.
</Card>

<CardGroup cols={3}>
<Card
title="Technical reference"
href="/swml/methods/ai"
icon={<MdCode/>}
>
SWML AI method
</Card>
</CardGroup>

## Try it out

<UseCases />

---

## AI Agents

Configure AI Agents right in your SignalWire Space with a streamlined, no-code user interface.
Expand Down