📬 InboxOps

A real-world OpenEnv environment for AI agents that manage a startup founder's operations inbox.

🚀 Why InboxOps?

Most AI agent benchmarks are unrealistic.

InboxOps simulates real operational chaos:

Investor pressure
Customer outages
Legal deadlines
Inbox overload

This is not a toy problem — it’s decision-making under pressure.

What makes it strong:

Deterministic + heuristic grading
Partial credit scoring
SLA-driven urgency modeling
Multi-step agent reasoning
Deployable via Docker + HuggingFace Spaces

📂 Scenarios

🧪 Scenario 001 — Seed Round Week

Email	From	Stakes
email_001	Tier-1 VC	IC meeting Friday
email_002	Enterprise customer	🚨 Production outage
email_003	Newsletter	Noise
email_004	BigTech BD	Distribution deal
email_005	Mom	Personal
email_007	Paying customer	Compliance issue
email_010	Enterprise client	Contract renewal

🔥 Scenario 002 — Launch Day Chaos

Bugs
Refunds
Press deadlines
Internal conflicts

🎯 Task Definitions

Task	Difficulty	Goal
Email Classification	Easy	Categorize emails
Priority Management	Medium	Add urgency + routing
Full Ops Triage	Hard	End-to-end decision + reply

📌 Categories

nvestor · customer_support · partnership · personal · newsletter notification · spam · press · internal · operational customer_feedback · sales

⏱ Priority Levels

critical (≤30m) · high (≤2h) · medium (≤8h) · low

🧠 Reward System

total = 0.25 × classification
      + 0.15 × priority
      + 0.20 × routing
      + 0.10 × action
      + 0.20 × draft_quality
      + 0.10 × sla_compliance
      - penalties
Score Meaning
Score	Meaning
0.0–0.3	Poor classification
0.3–0.6	Decent routing
0.6–1.0	Strong execution


⚙️ Action Format
{
  "action_type": "classify",
  "email_id": "email_001",
  "category": "investor",
  "priority": "critical",
  "escalation_team": "founder",
  "suggested_action": "reply_immediately",
  "draft_body": "Hi, I’ll send the deck by Thursday...",
  "reply_tone": "professional_warm"
}

🏗️ Project Structure
inboxops/
├── models.py
├── env.py
├── graders.py
├── inference.py
├── app.py
├── openenv.yaml
├── Dockerfile
├── requirements.txt
├── README.md
└── data/

⚡ Quickstart
1. Clone Repo
git clone https://github.com/your-org/inboxops
cd inboxops
pip install -r requirements.txt
2. Run UI
python app.py
3. Run Agent
TASK=hard SCENARIO_ID=scenario_001 python inference.py

🐳 Docker
docker build -t inboxops .
docker run -p 7860:7860 inboxops

🧪 Python Usage
from env import InboxOpsEnv

env = InboxOpsEnv()
obs = env.reset()

while not obs.done:
    action = ...
    obs, reward, done, info = env.step(action)

print(env.episode_summary())

📊 Baseline Scores
Agent	Score	Grade
Random	0.08	F
Heuristic	0.51	C
Claude Sonnet	~0.74	B

📏 SLA Policies
Situation	Time	Team
Outage	15m	engineering
Contract	60m	legal
Investor	240m	founder

🤝 Contributing
pytest tests/

To add a scenario:
Update data/inbox_scenarios.json
Add ground truth
Update openenv.yaml

📜 License
MIT License © InboxOps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📬 InboxOps

🚀 Why InboxOps?

📂 Scenarios

🧪 Scenario 001 — Seed Round Week

🔥 Scenario 002 — Launch Day Chaos

🎯 Task Definitions

📌 Categories

⏱ Priority Levels

🧠 Reward System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
env.py		env.py
graders.py		graders.py
inbox_scenarios.json		inbox_scenarios.json
inference.py		inference.py
models.py		models.py
openenv.yaml		openenv.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📬 InboxOps

🚀 Why InboxOps?

📂 Scenarios

🧪 Scenario 001 — Seed Round Week

🔥 Scenario 002 — Launch Day Chaos

🎯 Task Definitions

📌 Categories

⏱ Priority Levels

🧠 Reward System

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages