| title | Customer Support OpenEnv Environment | ||||
|---|---|---|---|---|---|
| emoji | 🤖 | ||||
| colorFrom | blue | ||||
| colorTo | green | ||||
| sdk | docker | ||||
| tags |
|
This project implements a real-world customer support simulation environment built using the OpenEnv specification.
It is designed to evaluate and train intelligent agents capable of:
- Understanding noisy and ambiguous user queries
- Classifying issues correctly
- Gathering missing information efficiently
- Resolving tickets under uncertainty
Unlike toy environments, this system models real operational complexity found in production customer support workflows.
Build and evaluate an agent that can:
- Classify customer issues (billing / technical / delivery)
- Collect required information dynamically
- Resolve efficiently under constraints
- Adapt behavior mid-episode (self-correction)
+----------------------+ | Customer Ticket | | (noisy, ambiguous) | +----------+-----------+ | v +----------------------+
| Environment (env.py) |
|---|
| - State |
| - Reward |
| - Stochasticity |
| +----------+-----------+ |
|
v
+----------------------+
| Observation Space |
|---|
| message |
| known_info |
| required |
| +----------+-----------+ |
|
v
+----------------------+
| Agent (LLM + Rule) |
|---|
| - Reasoning (LLM) |
| - Constraints |
| - Fallback |
| +----------+-----------+ |
|
v
+----------------------+
| Action |
|---|
| classify |
| ask_info |
| resolve |
| +----------+-----------+ |
|
v
+----------------------+
| Environment Step |
|---|
| reward |
| next_state |
| +----------------------+ |
RESET → OBSERVE → ACT → STEP → REPEAT
Detailed Flow:
[RESET] ↓ [Observation] ↓ [Agent Decision] ↓ [Action] ↓ [Environment Step] ↓ [Reward + Next State] ↓ [Done?] ── No ──> Loop │ Yes ↓ [Episode End]
Initial Flow: classify → ask_info → resolve
With Self-Correction:
classify ↓ ask_info ↓ [New Information Arrives] ↓ re-evaluate decision ↓ re-classify (if needed) ↓ ask remaining info ↓ resolve
IF not classified: → classify
ELIF missing required fields: → ask_info
ELIF uncertain: → re-classify
ELSE: → resolve
Customer Message = base_variant
- noise injection
- ambiguity
Required Info = full_schema
- randomly masked fields
Difficulty Controls: EASY → low noise, clear signals MEDIUM → moderate noise HARD → high ambiguity + missing info
Action → Immediate Reward → Final Outcome
Examples:
ask_info (useful) → +0.3 repeat ask → -0.3 step penalty → -0.05 correct classify → +0.2 premature resolve → -1.0 (hard) successful resolve → +0.2 to +1.0
Step 1: classify → reward -0.05 Step 2: ask_info → reward +0.20 Step 3: re-classify → reward -0.05 Step 4: resolve → reward +0.45
Outcome: ✔ success ✔ self-correction observed ✔ efficient resolution
A stateful, stochastic simulation of customer support operations.
- Multi-step interaction loop (
step,reset,state) - Partial observability (missing information)
- Stochastic noise injection
- Difficulty-aware configuration
- Multi-intent ticket handling
- Reward shaping with penalties for poor decisions
{
"ticket_id": "string",
"customer_message": "string",
"known_info": {},
"required": ["fields"],
"missing_required": ["fields"],
"info_progress": 0.0,
"status": "open | resolved",
"step_count": 0,
"remaining_steps": 10,
"difficulty": "easy | medium | hard"
}| Action | Description |
|---|---|
| classify | Assign category + priority |
| ask_info | Request missing field |
| resolve | Attempt to close ticket |
Example:
{
"type": "ask_info",
"field": "order_id"
}The environment dynamically adjusts complexity:
| Difficulty | Max Steps | Noise | Missing Info |
|---|---|---|---|
| Easy | Low | None | Minimal |
| Medium | Medium | Moderate | Partial |
| Hard | High | High | Significant |
-
Noise Injection Adds irrelevant or emotional phrases
-
Information Masking Required fields may be hidden
-
Ambiguity Messages may not clearly indicate category
Each ticket includes:
{
"ticket_id": "...",
"variants": [...], # multiple phrasings
"noise": [...], # real-world clutter
"ground_truth": {
"category": "...",
"priority": "...",
"required_info": [...],
"intents": [...] # multi-intent support
}
}- Multiple linguistic variations
- Realistic phrasing (not templated)
- Multi-intent issues (e.g., billing + technical)
- No explicit hints (agent must infer)
The agent is designed to adapt within an episode.
- Can re-classify after new information
- Can delay resolution under uncertainty
- Can recover from suboptimal actions
classify → ask_info → re-classify → resolve
This mimics real-world agent reasoning rather than fixed pipelines.
| Component | Role |
|---|---|
| LLM | High-level reasoning |
| Rules | Safety + constraints |
| Fallback | Deterministic recovery |
- Structured JSON output
- Retry + validation loop
- Fallback policy (guarantees progress)
- Partial autonomy (not over-constrained)
Reward is dense and shaped, not binary.
| Behavior | Reward |
|---|---|
| Step penalty | -0.05 |
| Correct classification | +0.2 |
| Useful info collection | +0.3 |
| Redundant action | -0.3 |
| Premature resolve (hard) | -1.0 |
| Successful resolve | +0.2 to +1.0 |
Tracked per episode:
{
"success_rate": 0.0,
"avg_steps": 0.0,
"avg_reward": 0.0,
"info_efficiency": 0.0
}- Self-correction frequency (re-classification)
- Resolution efficiency
- Failure modes under uncertainty
Three evaluation tasks:
| Task | Difficulty | Objective |
|---|---|---|
| easy-info-collection | Easy | Basic info gathering |
| medium-complete-info | Medium | Complete + accurate handling |
| hard-efficient-resolution | Hard | Efficient resolution under uncertainty |
-
Deterministic
-
Score range: 0.0 – 1.0
-
Multi-factor scoring:
- success
- efficiency
- completeness
Run baseline agent:
python inference.pyOutputs:
[START] task=easy-info-collection ...
[STEP] ...
[END] ...
{"task_id": "...", "score": 0.7}
docker build -t openenv-customer-support-agent .docker run -p 7860:7860 openenv-customer-support-agent| Endpoint | Description |
|---|---|
/reset |
Initialize environment |
/step |
Execute action |
Required:
API_BASE_URL
MODEL_NAME
HF_TOKEN
- Typed observation/action models
- step/reset/state implemented
- 3+ tasks with graders
- Deterministic scoring
- Dockerized deployment
- HF Space compatible
- Real-world task simulation (not toy)
- Stochastic difficulty scaling
- Multi-intent ticket modeling
- Self-correcting agent behavior
- Hybrid LLM + rule-based architecture
- Dense reward shaping
- Multi-stage resolution pipelines
- Conversation memory (history utilization)
- Active uncertainty estimation
- Adaptive task generation
- Multi-agent coordination
This environment models:
Decision-making under uncertainty with partial information
It is suitable for:
- RL agent training
- LLM agent evaluation
- benchmarking reasoning systems
Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.