Skip to content

yessasvini23/Inbox_Ops

Repository files navigation

📬 InboxOps

A real-world OpenEnv environment for AI agents that manage a startup founder's operations inbox.

OpenEnv Compatible HF Spaces License: MIT


🚀 Why InboxOps?

Most AI agent benchmarks are unrealistic.

InboxOps simulates real operational chaos:

  • Investor pressure
  • Customer outages
  • Legal deadlines
  • Inbox overload

This is not a toy problem — it’s decision-making under pressure.

What makes it strong:

  • Deterministic + heuristic grading
  • Partial credit scoring
  • SLA-driven urgency modeling
  • Multi-step agent reasoning
  • Deployable via Docker + HuggingFace Spaces

📂 Scenarios

🧪 Scenario 001 — Seed Round Week

Email From Stakes
email_001 Tier-1 VC IC meeting Friday
email_002 Enterprise customer 🚨 Production outage
email_003 Newsletter Noise
email_004 BigTech BD Distribution deal
email_005 Mom Personal
email_007 Paying customer Compliance issue
email_010 Enterprise client Contract renewal

🔥 Scenario 002 — Launch Day Chaos

  • Bugs
  • Refunds
  • Press deadlines
  • Internal conflicts

🎯 Task Definitions

Task Difficulty Goal
Email Classification Easy Categorize emails
Priority Management Medium Add urgency + routing
Full Ops Triage Hard End-to-end decision + reply

📌 Categories

nvestor · customer_support · partnership · personal · newsletter notification · spam · press · internal · operational customer_feedback · sales

⏱ Priority Levels

critical (≤30m) · high (≤2h) · medium (≤8h) · low


🧠 Reward System

total = 0.25 × classification
      + 0.15 × priority
      + 0.20 × routing
      + 0.10 × action
      + 0.20 × draft_quality
      + 0.10 × sla_compliance
      - penalties
Score Meaning
Score	Meaning
0.0–0.3	Poor classification
0.3–0.6	Decent routing
0.6–1.0	Strong execution


⚙️ Action Format
{
  "action_type": "classify",
  "email_id": "email_001",
  "category": "investor",
  "priority": "critical",
  "escalation_team": "founder",
  "suggested_action": "reply_immediately",
  "draft_body": "Hi, I’ll send the deck by Thursday...",
  "reply_tone": "professional_warm"
}

🏗️ Project Structure
inboxops/
├── models.py
├── env.py
├── graders.py
├── inference.py
├── app.py
├── openenv.yaml
├── Dockerfile
├── requirements.txt
├── README.md
└── data/

⚡ Quickstart
1. Clone Repo
git clone https://github.com/your-org/inboxops
cd inboxops
pip install -r requirements.txt
2. Run UI
python app.py
3. Run Agent
TASK=hard SCENARIO_ID=scenario_001 python inference.py

🐳 Docker
docker build -t inboxops .
docker run -p 7860:7860 inboxops

🧪 Python Usage
from env import InboxOpsEnv

env = InboxOpsEnv()
obs = env.reset()

while not obs.done:
    action = ...
    obs, reward, done, info = env.step(action)

print(env.episode_summary())

📊 Baseline Scores
Agent	Score	Grade
Random	0.08	F
Heuristic	0.51	C
Claude Sonnet	~0.74	B

📏 SLA Policies
Situation	Time	Team
Outage	15m	engineering
Contract	60m	legal
Investor	240m	founder

🤝 Contributing
pytest tests/

To add a scenario:
Update data/inbox_scenarios.json
Add ground truth
Update openenv.yaml

📜 License
MIT License © InboxOps


Releases

No releases published

Packages

 
 
 

Contributors