Skip to content

prashantmatlani/csa01

Repository files navigation

title Customer Support OpenEnv Environment
emoji 🤖
colorFrom blue
colorTo green
sdk docker
tags
openenv
reinforcement-learning
llm
customer-support

🤖 Customer Support Agent — OpenEnv Environment

🧠 Overview

This project implements a real-world customer support simulation environment built using the OpenEnv specification.

It is designed to evaluate and train intelligent agents capable of:

  • Understanding noisy and ambiguous user queries
  • Classifying issues correctly
  • Gathering missing information efficiently
  • Resolving tickets under uncertainty

Unlike toy environments, this system models real operational complexity found in production customer support workflows.


🎯 Objective

Build and evaluate an agent that can:

  1. Classify customer issues (billing / technical / delivery)
  2. Collect required information dynamically
  3. Resolve efficiently under constraints
  4. Adapt behavior mid-episode (self-correction)

🏗️ System Architecture

+----------------------+ | Customer Ticket | | (noisy, ambiguous) | +----------+-----------+ | v +----------------------+

Environment (env.py)
- State
- Reward
- Stochasticity
+----------+-----------+
       |
       v

+----------------------+

Observation Space
message
known_info
required
+----------+-----------+
       |
       v

+----------------------+

Agent (LLM + Rule)
- Reasoning (LLM)
- Constraints
- Fallback
+----------+-----------+
       |
       v

+----------------------+

Action
classify
ask_info
resolve
+----------+-----------+
       |
       v

+----------------------+

Environment Step
reward
next_state
+----------------------+

Interaction Loop

RESET → OBSERVE → ACT → STEP → REPEAT

Detailed Flow:

[RESET] ↓ [Observation] ↓ [Agent Decision] ↓ [Action] ↓ [Environment Step] ↓ [Reward + Next State] ↓ [Done?] ── No ──> Loop │ Yes ↓ [Episode End]

Self-Correction Loop

Initial Flow: classify → ask_info → resolve

With Self-Correction:

classify ↓ ask_info ↓ [New Information Arrives] ↓ re-evaluate decision ↓ re-classify (if needed) ↓ ask remaining info ↓ resolve

Agent Decision Logic

IF not classified: → classify

ELIF missing required fields: → ask_info

ELIF uncertain: → re-classify

ELSE: → resolve

Stochastic Behavior

Customer Message = base_variant

  • noise injection
  • ambiguity

Required Info = full_schema

  • randomly masked fields

Difficulty Controls: EASY → low noise, clear signals MEDIUM → moderate noise HARD → high ambiguity + missing info

Reward Flow

Action → Immediate Reward → Final Outcome

Examples:

ask_info (useful) → +0.3 repeat ask → -0.3 step penalty → -0.05 correct classify → +0.2 premature resolve → -1.0 (hard) successful resolve → +0.2 to +1.0

Example Episode

Step 1: classify → reward -0.05 Step 2: ask_info → reward +0.20 Step 3: re-classify → reward -0.05 Step 4: resolve → reward +0.45

Outcome: ✔ success ✔ self-correction observed ✔ efficient resolution

1. Environment (env.py)

A stateful, stochastic simulation of customer support operations.

Key Features

  • Multi-step interaction loop (step, reset, state)
  • Partial observability (missing information)
  • Stochastic noise injection
  • Difficulty-aware configuration
  • Multi-intent ticket handling
  • Reward shaping with penalties for poor decisions

2. Observation Space

{
  "ticket_id": "string",
  "customer_message": "string",
  "known_info": {},
  "required": ["fields"],
  "missing_required": ["fields"],
  "info_progress": 0.0,
  "status": "open | resolved",
  "step_count": 0,
  "remaining_steps": 10,
  "difficulty": "easy | medium | hard"
}

3. Action Space

Action Description
classify Assign category + priority
ask_info Request missing field
resolve Attempt to close ticket

Example:

{
  "type": "ask_info",
  "field": "order_id"
}

🎲 Difficulty & Stochastic Control

The environment dynamically adjusts complexity:

Difficulty Max Steps Noise Missing Info
Easy Low None Minimal
Medium Medium Moderate Partial
Hard High High Significant

Stochastic Elements

  • Noise Injection Adds irrelevant or emotional phrases

  • Information Masking Required fields may be hidden

  • Ambiguity Messages may not clearly indicate category


🧾 Dataset (Production-Style Tickets)

Each ticket includes:

{
  "ticket_id": "...",
  "variants": [...],        # multiple phrasings
  "noise": [...],           # real-world clutter
  "ground_truth": {
      "category": "...",
      "priority": "...",
      "required_info": [...],
      "intents": [...]      # multi-intent support
  }
}

Key Properties

  • Multiple linguistic variations
  • Realistic phrasing (not templated)
  • Multi-intent issues (e.g., billing + technical)
  • No explicit hints (agent must infer)

🔁 Self-Correction Mechanism

The agent is designed to adapt within an episode.

What this means:

  • Can re-classify after new information
  • Can delay resolution under uncertainty
  • Can recover from suboptimal actions

Example behavior:

classify → ask_info → re-classify → resolve

This mimics real-world agent reasoning rather than fixed pipelines.


🧠 Agent Design (agent_llm.py)

Hybrid Intelligence

Component Role
LLM High-level reasoning
Rules Safety + constraints
Fallback Deterministic recovery

Key Capabilities

  • Structured JSON output
  • Retry + validation loop
  • Fallback policy (guarantees progress)
  • Partial autonomy (not over-constrained)

🧮 Reward Design

Reward is dense and shaped, not binary.

Behavior Reward
Step penalty -0.05
Correct classification +0.2
Useful info collection +0.3
Redundant action -0.3
Premature resolve (hard) -1.0
Successful resolve +0.2 to +1.0

📊 Metrics

Tracked per episode:

{
  "success_rate": 0.0,
  "avg_steps": 0.0,
  "avg_reward": 0.0,
  "info_efficiency": 0.0
}

Additional Behavioral Signals

  • Self-correction frequency (re-classification)
  • Resolution efficiency
  • Failure modes under uncertainty

🧪 Tasks & Graders

Three evaluation tasks:

Task Difficulty Objective
easy-info-collection Easy Basic info gathering
medium-complete-info Medium Complete + accurate handling
hard-efficient-resolution Hard Efficient resolution under uncertainty

Grader Properties

  • Deterministic

  • Score range: 0.0 – 1.0

  • Multi-factor scoring:

    • success
    • efficiency
    • completeness

▶️ Inference

Run baseline agent:

python inference.py

Outputs:

[START] task=easy-info-collection ...
[STEP] ...
[END] ...
{"task_id": "...", "score": 0.7}

🐳 Deployment (Hugging Face Spaces)

Build Docker

docker build -t openenv-customer-support-agent .

Run

docker run -p 7860:7860 openenv-customer-support-agent

🌐 API Endpoints

Endpoint Description
/reset Initialize environment
/step Execute action

⚙️ Environment Variables

Required:

API_BASE_URL
MODEL_NAME
HF_TOKEN

✅ OpenEnv Compliance

  • Typed observation/action models
  • step/reset/state implemented
  • 3+ tasks with graders
  • Deterministic scoring
  • Dockerized deployment
  • HF Space compatible

🚀 Key Innovations

  • Real-world task simulation (not toy)
  • Stochastic difficulty scaling
  • Multi-intent ticket modeling
  • Self-correcting agent behavior
  • Hybrid LLM + rule-based architecture
  • Dense reward shaping

🔮 Future Improvements

  • Multi-stage resolution pipelines
  • Conversation memory (history utilization)
  • Active uncertainty estimation
  • Adaptive task generation
  • Multi-agent coordination

🧠 Big Picture

This environment models:

Decision-making under uncertainty with partial information

It is suitable for:

  • RL agent training
  • LLM agent evaluation
  • benchmarking reasoning systems

👤 Author

Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors