AI-Driven Action Planner & Troubleshooter (ADAPT)
- Overview
- Setup
- Running the Application
- Project Structure
- Features
- Configuration
- Observability
- Supported AI Models
- Roadmap and Known Issues
- Contributing
- Acknowledgements
ADAPT is an autonomous network troubleshooting system that uses AI-driven agents to diagnose and solve network issues. The system utilizes a multi-agent workflow powered by LangGraph and PydanticAI to provide intelligent, step-by-step troubleshooting of network problems.
The workflow consists of the following AI agents:
- Fault Summarizer: Analyzes network alerts and summarizes the issue
- Action Planner: Creates a detailed troubleshooting plan with specific commands
- Action Executor: Executes commands on network devices (real or simulated)
- Action Analyzer: Analyzes command outputs and determines next steps
- Result Summary: Provides a comprehensive troubleshooting report
We've provided a recorded walkthrough of the setup process and execution with a live device to help you get started quickly.
- Python 3.12 and above
- Docker (optional, for containerized deployment)
- OpenAI API key (for LLM access) - Sign up for OpenAI
- Logfire token (optional, for observability) - Sign up for Logfire
- Clone this repository
- Create and activate a virtual environment:
python -m venv venv venv\Scripts\activate # Windows # OR source venv/bin/activate # Linux/Mac
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file with your configurations (copy from .env.example):copy .env.example .env # Windows # OR cp .env.example .env # Linux/Mac
- Update the
.env
file with your settings:# API settings OPENAI_API_KEY=your_api_key_here LOGFIRE_TOKEN=your_logfire_token_here (optional) # Device credentials DEVICE_USERNAME=admin DEVICE_PASSWORD=password DEVICE_SECRET=enable_password # Configuration paths INVENTORY_PATH=configuration/inventory.yml
-
Clone this repository
-
Prerequisites:
- Docker installed on your system
-
Environment Variables Configuration:
- Copy the example environment file to create your own:
copy .env.example .env
- Edit the
.env
file with your configuration values: ``` OPENAI_API_KEY=your_api_key_here- Configure any device hostname, type, and port settings - Add any other environment variables needed by the application
- Copy the example environment file to create your own:
-
Build and run with Docker Compose:
docker compose up -d
To rebuild after code changes:
docker compose up --build -d
To stop the container:
docker compose down
-
Mounted Volumes: The following directories are mounted from the host into the container:
- ./workbench:/app/workbench: For persistent data storage
- ./configuration:/app/configuration: For device inventories and settings files
Any changes made to these directories on the host will be immediately reflected in the container.
Launch the Streamlit application:
streamlit run streamlit_app.py
When using Docker, the application runs automatically with the following services:
- Streamlit Application: Accessible at
http://localhost:8501
- Alert Queue Service: Exposed on port 8001 (can be started through the Streamlit interface when needed)
ADAPT provides an API endpoint for receiving network alerts:
- Endpoint:
POST /alert
- Port: 8001 (default)
- Input: Any valid JSON content
- Response: Success or error message
To send an alert manually:
curl -X POST http://localhost:8001/alert -H "Content-Type: application/json" -d '{"alert_id":"BGPDOWN-0001","device":"NCS5508-1","severity":"high","message":"BGP neighbor 1.2.3.4 is Down","raw_event":"%ROUTING-BGP-5-ADJCHANGE : neighbor 1.2.3.4 - Hold timer expired"}'
agents/
: Directory containing all agent implementationshello_world/
: Hello World agent implementationfault_summary/
: Fault Summary agent implementationaction_planner/
: Action Planning agent implementationaction_executor/
: Command execution agent implementationaction_analyzer/
: Output analysis agent implementationresult_summary/
: Result Summary agent implementation
configuration/
: Settings and network device inventoryinventory.yml
: Network device inventorysettings.yml
: Application settings
graph.py
: LangGraph implementation for the multi-agent workflowstatic/
: Directory for files served via direct URLsstreamlit_app.py
: Main Streamlit applicationtests/
: Test scenarios for simulation modeutils/
: Utility functions and helpersworkbench/
: Storage for troubleshooting session logsalert_queue.py
: API service for receiving network alerts
-
Multiple Operation Modes:
- Simulation Mode: Run commands without actual execution on devices
- Test Mode: Use predefined test data from YAML files
- Production Mode: Connect to and execute commands on real network devices
-
Golden Rules: Configure safety rules that are always followed by agents
- Edit the
golden_rules
section insettings.yml
to add or modify rules - Rules are enforced during the generation of troubleshooting steps
- Edit the
-
Direct Result Access: Troubleshooting results and response logs are saved as files with direct URL links for easy access and integration with other systems
-
Multi-Agent Workflow: End-to-end troubleshooting using multiple specialized agents
-
Approval System: Critical commands can be configured to require explicit user approval
-
Individual Agent Testing: Each agent can be tested independently through the UI
-
Step Mode: When enabled, requires approval before proceeding to the each step in the workflow
-
Custom Instructions: Add specific set of custom instructions for a known issue. These instructions will heavily influence that agent's behavior when generating troubleshooting steps.
-
Adaptive Mode: When enabled, allows the system to adapt modify the action plan based upon findings from previous steps in the workflow. This allows the system to dynamically adjust its approach based on results.
Test scenarios are defined in YAML files in the tests/
directory. Each file contains:
alert_payload
: The simulated alert that triggers the workflowcustom_instructions
: Specific remediation guidelines for this scenariocommand_outputs
: Simulated outputs for various network commands
To create a new test scenario, copy an existing file and modify it, or use the utils/generate_test.py
script to generate a test scenario using a Test Generation AI Agent.
The settings.yml
file in the configuration/
directory contains various settings:
debug_mode
: Enable verbose logging for debugging (currently unsupported)simulation_mode
: Use LLM to simulate command outputs instead of executing them on devicestest_mode
: Use predefined test data from YAML filestest_name
: The name of the test file to use in test modestep_mode
: Require approval between workflow stepsadaptive_mode
: Allow the system to adapt its troubleshooting action plan based on resultsgolden_rules
: Global rules that agents must followmax_steps
: Maximum number of troubleshooting steps allowed in an action plancustom_instructions
: Remediation guide for known issues
The inventory.yml
file in the configuration/
directory defines network devices:
devices:
DEVICE-NAME:
hostname: "device_ip_address"
device_type: "cisco_xr" # Netmiko driver type
username: "username" # Default from .env if not specified
password: "password" # Default from .env if not specified
optional_args:
port: 22
transport: "ssh"
timeout: 60
ADAPT optionally integrates with Logfire for observability and API usage tracking. To enable this feature:
- Get your token from https://logfire.pydantic.dev
- Add it to your
.env
file:LOGFIRE_TOKEN=your_logfire_token_here
By default, ADAPT uses OpenAI models for its AI agents. However, any model supported by PydanticAI can be used by updating the corresponding environment variables in your .env
file:
# API keys for your chosen provider
OPENAI_API_KEY=your_api_key_here
# For providers other than OpenAI, set the corresponding environment variable:
# OPENROUTER_API_KEY=your_openrouter_api_key
# ANTHROPIC_API_KEY=your_anthropic_api_key
# OLLAMA_API_KEY=your_ollama_api_key
# Model configurations using <provider>:<model> format
REASONER_MODEL=openai:o4-mini # For complex reasoning (action_planner, action_analyzer)
LARGE_MODEL=openai:gpt-4.1 # For standard operations (action_executor, results_summary)
SMALL_MODEL=openai:gpt-4.1-mini # For simpler operations (fault_summary, hello_world)
Visit the PydanticAI Models documentation for a comprehensive list of supported models and providers, along with configuration details. Examples of supported model formats include openai:gpt-4o
, anthropic:claude-3-opus-20240229
, or openrouter:google/gemini-2.5-pro-preview
.
See the CHANGELOG.md for a list of known issues and planned improvements for upcoming releases.
We welcome contributions to ADAPT! Here's how you can contribute:
- Fork the repository on GitHub
- Create a branch for your changes
- Make your changes (new agents, bug fixes, documentation, etc.)
- Submit a pull request back to the main repository
For questions or suggestions, please open an issue on GitHub.
We'd like to thank the following people for their impact on this project:
- Cole Medin - Cole's Youtube videos were integral to helping us understand LangGraph and PydanticAI and come up with a strategy for tackling this project. Can't recommend his content enough for anyone looking to get into agentic AI development!
- Ralph Keyser - Ralph kindly volunteered to be the guinea pig for testing the initial version of ADAPT and provided valuable feedback.