Skip to content

Ideas from competitive analysis #257

@lavr

Description

@lavr

I use ralphex and would like to propose a list of ideas to make it even better, based on a comparative analysis with competitors. I believe this could be useful and suggest considering it.

This document is not a single Issue. It is a collection of candidate ideas for possible future Issues.


1. Per-task verification gate (orchestrator-side)

The current task prompt (task.txt) already requires the agent to run tests and lint as STEP 2 (VALIDATE) before completing a task, so the missing part is not prompt wording but independent verification. Right now validation is still effectively an honor system — the orchestrator has no way to confirm the agent actually ran the commands or that they passed. superpowers frames this as an "Iron Law": no completion claims without fresh verification evidence. gsd enforces the same idea at the orchestrator level: after each task completes, the harness itself runs a verification gate (lint, test, typecheck) independently of the agent, with 1-2 auto-fix retries. ralphex could add a configurable verify_command (e.g., make test) that the orchestrator runs after each task iteration ends (after the agent's commit, before the next iteration starts in the task loop). If it fails, the same task is retried with the error output injected into the prompt. Note: the integration point is per-iteration in runTaskPhase, not tied to the ALL_TASKS_DONE signal (which fires only once at the end). This is complementary to the in-prompt instruction: the prompt tells the agent what to do, the gate verifies it was actually done.

Expected benefit: high. This should have the biggest practical impact because it turns validation from "the agent said it passed" into an orchestrator-verified guarantee.

2. Hallucination guard

gsd rejects task completions where the agent made zero tool calls — a sign it "hallucinated" the work without actually doing anything. ralphex has a similar blind spot: a Claude session can complete a task iteration without making any real changes. A lightweight guard — check git diff after a task completes, and if there's no diff and no new commits, treat it as a failed attempt and retry with a reinforced prompt — would prevent wasted iterations. This is especially valuable in long autonomous runs where a single hallucinated task can cascade into review failures.

Expected benefit: high. This is a cheap safety net against empty or fake task iterations and should noticeably improve reliability in unattended runs.

3. Anti-rationalization tables in task/review prompts

AI agents routinely generate plausible excuses to skip work: "this is too simple to test", "I'll fix it in a later task", "this is outside the scope". superpowers addresses this head-on by embedding tables of common rationalizations directly in skill prompts, with explicit rebuttals for each. Currently ralphex prompts don't address this — the agent is free to rationalize skipping steps. Adding a section like "Do NOT skip testing with any of these excuses: ..." to task.txt and review prompts would meaningfully reduce the rate of incomplete or sloppy task completions, with zero code changes.

Expected benefit: medium. This is a low-cost prompt improvement that should reduce avoidable corner-cutting, but unlike orchestrator checks it is still a soft guard rather than a guarantee.

4. TDD enforcement as optional prompt mode

superpowers treats TDD as non-negotiable: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST". This is too strict for every project, but valuable as an opt-in mode. A config option like tdd_mode = true could inject TDD requirements into the task prompt: write the failing test first, run it (must fail), write minimal code, run again (must pass), then refactor. Note: this requires a config flag and conditional prompt injection, so it's a small code change despite being prompt-focused in spirit.

Expected benefit: medium-low overall, but potentially high for teams that already prefer TDD. As an opt-in workflow it can improve test discipline, but it is less universally useful than the safety mechanisms above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions