Skip to content

Add "20 Questions" game — LLM iteratively queries ES to find a secret log#2301

Open
Copilot wants to merge 22 commits into
mainfrom
copilot/add-20-questions-game
Open

Add "20 Questions" game — LLM iteratively queries ES to find a secret log#2301
Copilot wants to merge 22 commits into
mainfrom
copilot/add-20-questions-game

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 10, 2026

New page where an LLM plays 20 Questions to guess something the user is thinking of from their Elasticsearch cluster. The LLM actively queries Elasticsearch using run_esql_query and other tools to discover data, ask narrowing yes/no questions, and iteratively refine its search.

Game flow: User thinks of anything in their cluster (a log entry, index, field value, service, host, etc.) → LLM uses ES tools to query the cluster → asks yes/no questions → runs refined queries based on answers → narrows down iteratively → makes a guess → user confirms correct/wrong. Final guess-only turn granted if 20 questions exhausted.

Changes

  • peek/src/hooks/useTwentyQuestionsGame.ts — Game logic hook: LLM gets run_esql_query, get_index_info, get_cluster_health tools to actively search the cluster. System prompt instructs a query→ask→refine iterative strategy. 10 tool steps per turn for multi-query reasoning. Question counting (numbered patterns or line-ending ?), guess detection (/^\\s*my guess:\\s*/im).
  • peek/src/components/TwentyQuestionsBoard.tsx — Game board with decomposed sub-components: GameMessageBubble with inline ToolCallProgress (⏳/✓ status for each tool call), GameInput (Yes/No buttons plus free-text input for detailed answers), Correct/Wrong buttons during guess confirmation.
  • peek/src/components/TwentyQuestionsPage.tsx — Page shell with question counter chip, New Game button, LLM-not-configured fallback
  • peek/src/routes/paths.ts / manifest.ts — Route config under Workspace group (order 70, QuizIcon)
  • peek/src/components/AppSidebar.tsx — Added QuizIcon to NAV_ICON_COMPONENTS
  • peek/scripts/page-nav-buttons.mjs + tests/unit/pageNavButtons.test.ts — Screenshot script and route sync test updated

Screenshot

20 Questions game screenshot


The body of this PR is automatically managed by the workflow runtime.


The body of this PR is automatically managed by the Update PR Body workflow.

Copilot AI and others added 2 commits March 10, 2026 05:15
…ions

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
…logs check, extract sub-components

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Copilot AI changed the title [WIP] Add 20 questions game feature Add "20 Questions" game — LLM guesses a secret log via yes/no questions Mar 10, 2026
@strawgate
Copy link
Copy Markdown
Contributor

@copilot so how I actually want this to work is it could be any log in the cluster and the agent asks questions in order to try to narrow it down. And then based on a query that it runs it gets another sampling of logs and then tries to ask me a question that it could use to either determine if the log is probably in that sample or probably not in that sample

Copilot AI and others added 2 commits March 10, 2026 16:11
…arch the cluster

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 10, 2026

@copilot so how I actually want this to work is it could be any log in the cluster and the agent asks questions in order to try to narrow it down. And then based on a query that it run...

Redesigned in b3d610f. The LLM now gets run_esql_query, get_index_info, and get_cluster_health tools and actively queries Elasticsearch to narrow down the secret log. The flow is: query the cluster → ask a question → refine based on answer → query again → repeat. Tool call progress (⏳ Running query… / ✓ Found N rows) shows inline in chat bubbles. Also added a free-text input alongside Yes/No buttons so users can give more detailed answers.

Screenshot

Copilot AI changed the title Add "20 Questions" game — LLM guesses a secret log via yes/no questions Add "20 Questions" game — LLM iteratively queries ES to find a secret log Mar 10, 2026
@strawgate strawgate marked this pull request as ready for review March 10, 2026 16:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new "20 Questions" feature: a route at /twenty-questions (PAGE_PATHS entry), a manifest mapping (PAGE_MANIFEST), a sidebar nav label and Quiz icon registration, an entry in PAGE_NAV_BUTTONS and updated nav tests, a TwentyQuestionsPage and TwentyQuestionsBoard React UI, and a new useTwentyQuestionsGame hook exporting MAX_QUESTIONS and related game types implementing a streaming LLM-driven game loop and state management against Elasticsearch.

Possibly related PRs

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR implements all core objectives: LLM-powered 20-questions game, iterative querying of ES via tool calls, question counting, guess detection, and comprehensive UI with answer controls and progress tracking.
Out of Scope Changes check ✅ Passed All changes directly support the 20-questions feature: game logic hook, board and page components, routing/sidebar integration, and test/script updates. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch copilot/add-20-questions-game
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 10, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://elastic.github.io/ai-github-actions-playground/pr-preview/pr-2301/

Built to branch gh-pages at 2026-03-10 20:50 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

github-actions[bot]

This comment was marked as resolved.

github-actions Bot and others added 2 commits March 10, 2026 17:06
- remove unreachable loading guards in GameInput
- tighten fallback question counting heuristic
- auto-scroll to latest message on updates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Addressed all currently unresolved review threads with targeted code changes and pushed commit 07e623d9e.

  • Resolved PRRT_kwDORVGiRc5zXbnp by removing unreachable loading plumbing and disabled={loading} guards from GameInput in TwentyQuestionsBoard.
  • Resolved PRRT_kwDORVGiRc5zXbnw by tightening fallback question-counting heuristics in countQuestions.
  • Resolved PRRT_kwDORVGiRc5zXbn3 by adding message auto-scroll behavior via useEffect on messages updates.

No review threads required a reply-only response.

Validation run
  • cd peek && npm ci --quiet
  • make lint
  • make build
  • Pre-commit hook related tests: vitest tests/component/App.test.tsx ✅ (19/19 passing)

What is this? | From workflow: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

…ocs and game rules

- Remove secret log pre-selection: the user now thinks of anything in
  their cluster and the AI tries to guess it via 20 yes/no questions
- Include full ES|QL syntax guide in the system prompt so the LLM can
  write correct queries against the cluster
- Rewrite system prompt with proper 20 Questions game rules, strategy
  guidance, question format guidelines, and response format instructions
- Remove SecretLogReveal accordion (no longer needed)
- Remove dead "loading" game status (startGame is now instant)
- Add error recovery with "Try Again" button
- Show question counter during guessing phase too
- Fix win message to show question count instead of "found the log"
- Tighten countQuestions to avoid overcounting casual question marks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 10, 2026

✅ UI Smoke Test Review — 5e0ab59

🎭 Smoke Test Results

Status: ✅ All passed

Total Passed Failed Flaky Skipped
11 11 0 0 0
Individual test results
Status Test Duration
onboarding user reaches the connect entrypoint from the welcome screen 652ms
add data entrypoint exposes the technology picker and step 2 1687ms
metrics user connects, picks a metric, and gets a line chart-ready result 2151ms
traces user opens a trace and pivots from service map context into Query Lab 2401ms
security-focused user validates auth tab switching before submitting credentials 874ms
ops user confirms connection guardrails and can reset back to the landing state 2939ms
query lab preserves query text and results after navigating to Console and back 2869ms
logs explorer route is available and runs a logs query 2192ms
logs landing page dimension drill-down opens query lab with selected filter 2276ms
task manager, ILM, and templates pages render with headings 2206ms
pages have no axe accessibility violations 16023ms

📸 Screenshot Preflight

Result: ✅ Clean — no errors detected

No console errors, page errors, or UI alerts detected.

💡 Recommendations

  • ✨ All smoke tests pass and no UI errors detected. Ready for review.

📎 Full diagnostics and screenshots: workflow run

coderabbitai[bot]

This comment was marked as resolved.

Comment thread peek/src/hooks/useTwentyQuestionsGame.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

TL;DR

The failing run is a real smoke-test failure (not setup): both failed specs time out clicking the Services tile on Logs before either drill-down or axe checks can proceed. Immediate action: stabilize that click path in peek/tests/e2e/smoke.spec.ts around the Logs Services navigation.

Remediation

  • Update the Logs tile click in smoke tests to wait for a stable tile state before clicking (the current locator repeatedly detaches): in both failing locations (smoke.spec.ts:505 and smoke.spec.ts:596), gate the click with an explicit readiness wait and/or a retry-safe click helper for the Services card.
  • Re-run make test-e2e (or at minimum the two failing specs) to confirm both logs landing page dimension drill-down... and pages have no axe accessibility violations pass in Chromium.
Investigation details

Root Cause

UI Smoke Test PR Review failed because Playwright could not complete locator.click() on the Logs Services tile; the element was repeatedly detached during clickability checks until timeout. The final failure step (Fail if smoke tests or preflight failed) is a downstream gate, not the originating cause.

Evidence

  • Workflow: UI Smoke Test PR Review (run 22914816155)
  • Job/step: smoke-reviewRun E2E smoke tests (failed specs), then Fail if smoke tests or preflight failed
  • Key log excerpt:
    • [chromium] › tests/e2e/smoke.spec.ts:490:3 ... logs landing page dimension drill-down opens query lab with selected filter
    • Error: locator.click: Test timeout of 60000ms exceeded
    • waiting for getByRole('main').getByRole('button', { name: /^Services/ })
    • element was detached from the DOM, retrying
    • [chromium] › tests/e2e/smoke.spec.ts:543:3 ... pages have no axe accessibility violations
    • Error: locator.click: Test timeout of 90000ms exceeded

Validation

  • Local reproduction/tests in this detective run: not run (read-only investigation based on workflow logs and posted smoke report comment).

Follow-up

  • I checked prior PR comments and found no earlier PR Actions Detective report on this PR, so this is a new diagnosis.

What is this? | From workflow: PR Actions Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

coderabbitai[bot]

This comment was marked as resolved.

github-actions Bot and others added 2 commits March 10, 2026 17:38
- prevent overlapping game turns with in-flight guards
- use strict guess marker handling and avoid forced guessing state
- cap question count at MAX_QUESTIONS and tighten one-question-left prompt
- restrict rendered markdown elements and add answer input accessible name
- align question chip height with header actions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Implemented and pushed commit 647b7105f to address straightforward review feedback.

Addressed with code changes (and resolved):

  • PRRT_kwDORVGiRc5zXgVJ — prevented overlapping turns in handleAnswer using in-flight guards + latest-message ref.
  • PRRT_kwDORVGiRc5zXvpZ — restricted markdown render surface and added answer input accessible name.
  • PRRT_kwDORVGiRc5zXvpe — added accessible name to free-text answer input.
  • PRRT_kwDORVGiRc5zXvpi — tightened one-question-left instruction and capped counted questions at MAX_QUESTIONS.
  • PRRT_kwDORVGiRc5zXvpj — switched to strict My guess: detection and removed unconditional guessing transition on final-turn prompt.
  • PRRT_kwDORVGiRc5zXwsX — prevented overlapping startGame turns with in-flight guard.
  • PRRT_kwDORVGiRc5zX3Qs — normalized header control heights by removing small chip size.

Left open (not changed):

  • PRRT_kwDORVGiRc5zXvpg — suggestion references SECRET_POOL_SIZE/timestamp-sampled secret selection logic that is not present in the current hook implementation, so no surgical fix was applied.
Validation
  • cd peek && npm ci
  • make lint ✅ (note: changed-file detection against main was unavailable in this environment, so changed-file lint was skipped by the Make target)
  • make lint-full
  • cd peek && npx playwright install chromium
  • cd peek && npx playwright test tests/e2e/smoke.spec.ts --project=chromium --workers=1 --grep "logs landing page dimension drill-down|pages have no axe accessibility violations" ❌ (both still fail in this environment; no additional test-file changes were included in this push)

What is this? | From workflow: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@github-actions github-actions Bot mentioned this pull request Mar 10, 2026
coderabbitai[bot]

This comment was marked as resolved.

strawgate and others added 3 commits March 10, 2026 10:52
- Rewrite system prompt to use information-theoretic binary splitting
  instead of naive one-by-one candidate enumeration
- Send a kickoff message on game start so the LLM immediately asks
  Question 1 without requiring user input first

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Addressed the remaining unresolved review feedback with a focused follow-up commit.

  • Addressed with code changes:
    • PRRT_kwDORVGiRc5zYOnq — added transcript live-region semantics on the message container (role="log", aria-live="polite", aria-relevant="additions text", aria-busy={loading}) in TwentyQuestionsBoard.
    • PRRT_kwDORVGiRc5zYOn0 — enforced tool grounding in useTwentyQuestionsGame: if an assistant turn asks one or more questions without any Elasticsearch tool calls, the turn is rejected, the transient assistant message is removed, and a user-visible error is shown.
    • PRRT_kwDORVGiRc5zXvpg — resolved as outdated/already addressed in current code (the referenced timestamp-window secret-pool selection logic is not present in the hook anymore).

No unresolved thread required a reply-only response.

Validation
  • cd peek && npm ci --quiet
  • make lint
  • make build
  • Pre-commit staged-file unit tests (vitest tests/component/App.test.tsx) ✅ (19/19)

What is this? | From workflow: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
peek/src/hooks/useTwentyQuestionsGame.ts (2)

214-218: ⚠️ Potential issue | 🟠 Major

Enforce the last-question limit before keeping the reply.

When only one question remains, a model response with two questions is still rendered and only the counter is clamped. That leaves the UI showing an illegal extra question. Reject or truncate the turn when remainingQuestions === 1 && newQuestions > 1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@peek/src/hooks/useTwentyQuestionsGame.ts` around lines 214 - 218, Compute
remainingQuestions = Math.max(0, MAX_QUESTIONS - questionCountRef.current)
before accepting the model reply and enforce the limit: if newQuestions >
remainingQuestions then either reject the turn (do not update
questionCountRef.current or call setQuestionCount) or truncate the model reply
to only include remainingQuestions questions before counting; then update
questionCountRef.current += Math.min(newQuestions, remainingQuestions) and call
setQuestionCount(questionCountRef.current). Apply this logic where
countQuestions(text) is used so the UI never displays more questions than
MAX_QUESTIONS.

214-222: ⚠️ Potential issue | 🟠 Major

Reject question turns that skipped Elasticsearch tools.

This still accepts a question-only assistant reply even when toolCalls.length === 0. Once that happens, the game can drift into ungrounded chat instead of narrowing against cluster data. Fail the turn when newQuestions > 0 && toolCalls.length === 0.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@peek/src/hooks/useTwentyQuestionsGame.ts` around lines 214 - 222, The code
currently accepts assistant question turns based only on newQuestions; change
logic in useTwentyQuestionsGame so that if newQuestions > 0 and toolCalls.length
=== 0 the turn is rejected: do not increment questionCountRef, do not call
setQuestionCount, do not setStatus("guessing"), and instead return false (or
otherwise signal a failed/rejected turn) so question-only replies that skipped
Elasticsearch tools are not counted or progressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@peek/src/hooks/useTwentyQuestionsGame.ts`:
- Around line 161-163: Before each new LLM request clear the stale error state
so old error banners don't persist: inside the turn-starting logic (the block
that checks connection and inFlightRef.current and sets inFlightRef.current =
true and setLoading(true)) call the error state setter used in this hook (the
same one set in startGame, e.g., setError(null) or setError(undefined)) before
invoking sendToLLM; ensure this happens every time you begin a request (not only
in startGame) so subsequent successful turns remove prior errors.

---

Duplicate comments:
In `@peek/src/hooks/useTwentyQuestionsGame.ts`:
- Around line 214-218: Compute remainingQuestions = Math.max(0, MAX_QUESTIONS -
questionCountRef.current) before accepting the model reply and enforce the
limit: if newQuestions > remainingQuestions then either reject the turn (do not
update questionCountRef.current or call setQuestionCount) or truncate the model
reply to only include remainingQuestions questions before counting; then update
questionCountRef.current += Math.min(newQuestions, remainingQuestions) and call
setQuestionCount(questionCountRef.current). Apply this logic where
countQuestions(text) is used so the UI never displays more questions than
MAX_QUESTIONS.
- Around line 214-222: The code currently accepts assistant question turns based
only on newQuestions; change logic in useTwentyQuestionsGame so that if
newQuestions > 0 and toolCalls.length === 0 the turn is rejected: do not
increment questionCountRef, do not call setQuestionCount, do not
setStatus("guessing"), and instead return false (or otherwise signal a
failed/rejected turn) so question-only replies that skipped Elasticsearch tools
are not counted or progressed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2a3dfb35-5397-4adb-ad3c-d93583f76cf2

📥 Commits

Reviewing files that changed from the base of the PR and between 3db5e2e and 0874b3d.

📒 Files selected for processing (1)
  • peek/src/hooks/useTwentyQuestionsGame.ts

Comment thread peek/src/hooks/useTwentyQuestionsGame.ts
github-actions Bot and others added 2 commits March 10, 2026 18:06
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Addressed review feedback with a surgical fix and pushed to copilot/add-20-questions-game.

  • ✅ Resolved thread PRRT_kwDORVGiRc5zYWpP via code change in peek/src/hooks/useTwentyQuestionsGame.ts.
  • Change made: clear stale error state at the start of every sendToLLM turn (setError(null) right after acquiring the in-flight guard), so previous timeout/failure banners do not persist into successful turns.
  • No additional inline replies were needed.
Validation
  • cd peek && npm ci --silent && cd .. && make lint && make build
  • make check ✅ (full lint + unit tests + build)
  • Pre-commit staged-file checks also passed (oxfmt, oxlint, related unit tests)

What is this? | From workflow: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@github-actions
Copy link
Copy Markdown
Contributor

🤖 AI Assistant 🤖

7 Enabled Automations • 0 Disabled Automations

Enabled automations

  • Investigate CI Failures
  • Run an extra, in-depth, PR Review
  • Auto-label pull requests
  • Keep the body of the PR updated
  • Address review feedback
  • Address merge conflicts
  • Enable PR Preview deployment
Troubleshooting

This control panel only adds or removes labels on the PR.
It does not directly start workflows.
If you toggle an option, related automation may not run until its next normal trigger event.

  • Re-sync control panel from labels

strawgate and others added 4 commits March 10, 2026 13:47
- Remove the guard that errored with "The AI must inspect cluster data"
  when the LLM asked a question without making a tool call. This was too
  strict — the LLM can reasonably ask follow-up questions based on data
  it already gathered.
- Tighten response format instructions to prevent verbose stat dumps.
  Responses should be 2-4 sentences max with a one-line summary before
  the question.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Give the LLM a "curious detective" personality and instruct it to weave
query findings into natural, fun commentary rather than dumping stats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The LLM lost all memory of previous queries between turns because
sendToLLM stripped messages to {role, content}, dropping tool calls
and results. Now we use the AI SDK's response.messages to maintain
a full llmHistoryRef that includes tool call/result messages across
turns.

Also changed "ALWAYS run at least one query per turn" to only query
when new information is needed, preventing redundant re-queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes: peek/src/hooks/useTwentyQuestionsGame.ts has a real guess-detection bug at line 16.

STRICT_GUESS_RE is ^\s*my guess:\s* (case-insensitive), but the system prompt explicitly asks the model to emit "My guess:". A common markdown response like **My guess:** ... does not match this regex, so hasGuess stays false and the game never enters guessing state (no Correct/Wrong confirmation flow).

Please make guess detection robust to markdown-formatted heading variants (or normalize markdown punctuation before applying the strict check).


What is this? | From workflow: PR Review

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

async (answer: string) => {
if (inFlightRef.current) return;
const userMsg: GameMessage = { id: crypto.randomUUID(), role: "user", content: answer };
setMessages((prev) => [...prev, userMsg]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleAnswer appends the user message before checking whether a turn can actually run. If the Elasticsearch connection drops mid-game, sendToLLM returns immediately on !connection, so the UI records the user answer but never produces an assistant turn or an error state.

Concrete repro: start a game, disconnect from ES, then click Yes/No — message is added at line 274, but no follow-up appears.

Please guard before appending (or surface an explicit error) when !connection, so answers are only added when they can be processed.

const userMsg: GameMessage = { id: crypto.randomUUID(), role: "user", content: answer };
setMessages((prev) => [...prev, userMsg]);

if (status === "guessing") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleAnswer's status === "guessing" path is not idempotent, so rapid double submission can produce conflicting terminal outcomes.

Concrete failure: while the guess-confirmation buttons are visible, a fast double click (or pressing Enter + click) can invoke handleAnswer twice before React applies setStatus(...). Both invocations still read the stale status === "guessing", append user/system messages, and the second call can overwrite the final state (won -> lost or vice versa).

Please make this branch one-shot (for example, a guessResolvedRef guard set at the start of the guessing branch and reset in startGame) so only the first confirmation is applied.

const GAME_TIMEOUT_MS = 60_000;
/** Allow enough steps for the LLM to run queries between questions. */
const GAME_MAX_STEPS = 10;
const STRICT_GUESS_RE = /^\s*my guess:\s*/im;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STRICT_GUESS_RE is too strict and misses common Markdown-formatted guesses, which can break the game flow.

Concrete failure: if the model outputs **My guess:** (answer) or - My guess: (answer), this regex does not match, so status never transitions to "guessing". The UI then keeps showing normal Yes/No answer controls instead of the correct/wrong confirmation controls, and the user cannot resolve the round.

Please broaden detection to allow optional Markdown/bullet prefixes around My guess: (or parse assistant intent more structurally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add "20 questions game"

2 participants