Ideas from competitive analysis

I use `ralphex` and would like to propose a list of ideas to make it even better, based on a comparative analysis with competitors. I believe this could be useful and suggest considering it.

This document is not a single Issue. It is a collection of candidate ideas for possible future Issues.

---

### 1. Per-task verification gate (orchestrator-side)

The current task prompt (`task.txt`) already requires the agent to run tests and lint as STEP 2 (VALIDATE) before completing a task, so the missing part is not prompt wording but independent verification. Right now validation is still effectively an honor system — the orchestrator has no way to confirm the agent actually ran the commands or that they passed. `superpowers` frames this as an "Iron Law": no completion claims without fresh verification evidence. `gsd` enforces the same idea at the orchestrator level: after each task completes, the harness itself runs a verification gate (lint, test, typecheck) independently of the agent, with 1-2 auto-fix retries. ralphex could add a configurable `verify_command` (e.g., `make test`) that the orchestrator runs after each task iteration ends (after the agent's commit, before the next iteration starts in the task loop). If it fails, the same task is retried with the error output injected into the prompt. Note: the integration point is per-iteration in `runTaskPhase`, not tied to the ALL_TASKS_DONE signal (which fires only once at the end). This is complementary to the in-prompt instruction: the prompt tells the agent what to do, the gate verifies it was actually done.

Expected benefit: high. This should have the biggest practical impact because it turns validation from "the agent said it passed" into an orchestrator-verified guarantee.

### 2. Hallucination guard

`gsd` rejects task completions where the agent made zero tool calls — a sign it "hallucinated" the work without actually doing anything. ralphex has a similar blind spot: a Claude session can complete a task iteration without making any real changes. A lightweight guard — check `git diff` after a task completes, and if there's no diff and no new commits, treat it as a failed attempt and retry with a reinforced prompt — would prevent wasted iterations. This is especially valuable in long autonomous runs where a single hallucinated task can cascade into review failures.

Expected benefit: high. This is a cheap safety net against empty or fake task iterations and should noticeably improve reliability in unattended runs.

### 3. Anti-rationalization tables in task/review prompts

AI agents routinely generate plausible excuses to skip work: "this is too simple to test", "I'll fix it in a later task", "this is outside the scope". `superpowers` addresses this head-on by embedding tables of common rationalizations directly in skill prompts, with explicit rebuttals for each. Currently ralphex prompts don't address this — the agent is free to rationalize skipping steps. Adding a section like "Do NOT skip testing with any of these excuses: ..." to `task.txt` and review prompts would meaningfully reduce the rate of incomplete or sloppy task completions, with zero code changes.

Expected benefit: medium. This is a low-cost prompt improvement that should reduce avoidable corner-cutting, but unlike orchestrator checks it is still a soft guard rather than a guarantee.

### 4. TDD enforcement as optional prompt mode

`superpowers` treats TDD as non-negotiable: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST". This is too strict for every project, but valuable as an opt-in mode. A config option like `tdd_mode = true` could inject TDD requirements into the task prompt: write the failing test first, run it (must fail), write minimal code, run again (must pass), then refactor. Note: this requires a config flag and conditional prompt injection, so it's a small code change despite being prompt-focused in spirit.

Expected benefit: medium-low overall, but potentially high for teams that already prefer TDD. As an opt-in workflow it can improve test discipline, but it is less universally useful than the safety mechanisms above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ideas from competitive analysis #257

1. Per-task verification gate (orchestrator-side)

2. Hallucination guard

3. Anti-rationalization tables in task/review prompts

4. TDD enforcement as optional prompt mode

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Ideas from competitive analysis #257

Description

1. Per-task verification gate (orchestrator-side)

2. Hallucination guard

3. Anti-rationalization tables in task/review prompts

4. TDD enforcement as optional prompt mode

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions