Add "20 Questions" game — LLM iteratively queries ES to find a secret log#2301
Add "20 Questions" game — LLM iteratively queries ES to find a secret log#2301Copilot wants to merge 22 commits into
Conversation
…ions Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
…logs check, extract sub-components Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
|
@copilot so how I actually want this to work is it could be any log in the cluster and the agent asks questions in order to try to narrow it down. And then based on a query that it runs it gets another sampling of logs and then tries to ask me a question that it could use to either determine if the log is probably in that sample or probably not in that sample |
…arch the cluster Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Redesigned in b3d610f. The LLM now gets |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new "20 Questions" feature: a route at /twenty-questions (PAGE_PATHS entry), a manifest mapping (PAGE_MANIFEST), a sidebar nav label and Quiz icon registration, an entry in PAGE_NAV_BUTTONS and updated nav tests, a TwentyQuestionsPage and TwentyQuestionsBoard React UI, and a new useTwentyQuestionsGame hook exporting MAX_QUESTIONS and related game types implementing a streaming LLM-driven game loop and state management against Elasticsearch. Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
- remove unreachable loading guards in GameInput - tighten fallback question counting heuristic - auto-scroll to latest message on updates Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed all currently unresolved review threads with targeted code changes and pushed commit
No review threads required a reply-only response. Validation run
What is this? | From workflow: Address PR Review Feedback Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
…ocs and game rules - Remove secret log pre-selection: the user now thinks of anything in their cluster and the AI tries to guess it via 20 yes/no questions - Include full ES|QL syntax guide in the system prompt so the LLM can write correct queries against the cluster - Rewrite system prompt with proper 20 Questions game rules, strategy guidance, question format guidelines, and response format instructions - Remove SecretLogReveal accordion (no longer needed) - Remove dead "loading" game status (startGame is now instant) - Add error recovery with "Try Again" button - Show question counter during guessing phase too - Fix win message to show question count instead of "found the log" - Tighten countQuestions to avoid overcounting casual question marks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ UI Smoke Test Review —
|
| Total | Passed | Failed | Flaky | Skipped |
|---|---|---|---|---|
| 11 | 11 | 0 | 0 | 0 |
Individual test results
| Status | Test | Duration |
|---|---|---|
| ✅ | onboarding user reaches the connect entrypoint from the welcome screen | 652ms |
| ✅ | add data entrypoint exposes the technology picker and step 2 | 1687ms |
| ✅ | metrics user connects, picks a metric, and gets a line chart-ready result | 2151ms |
| ✅ | traces user opens a trace and pivots from service map context into Query Lab | 2401ms |
| ✅ | security-focused user validates auth tab switching before submitting credentials | 874ms |
| ✅ | ops user confirms connection guardrails and can reset back to the landing state | 2939ms |
| ✅ | query lab preserves query text and results after navigating to Console and back | 2869ms |
| ✅ | logs explorer route is available and runs a logs query | 2192ms |
| ✅ | logs landing page dimension drill-down opens query lab with selected filter | 2276ms |
| ✅ | task manager, ILM, and templates pages render with headings | 2206ms |
| ✅ | pages have no axe accessibility violations | 16023ms |
📸 Screenshot Preflight
Result: ✅ Clean — no errors detected
No console errors, page errors, or UI alerts detected.
💡 Recommendations
- ✨ All smoke tests pass and no UI errors detected. Ready for review.
📎 Full diagnostics and screenshots: workflow run
TL;DRThe failing run is a real smoke-test failure (not setup): both failed specs time out clicking the Remediation
Investigation detailsRoot Cause
Evidence
Validation
Follow-up
What is this? | From workflow: PR Actions Detective Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
- prevent overlapping game turns with in-flight guards - use strict guess marker handling and avoid forced guessing state - cap question count at MAX_QUESTIONS and tighten one-question-left prompt - restrict rendered markdown elements and add answer input accessible name - align question chip height with header actions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Implemented and pushed commit Addressed with code changes (and resolved):
Left open (not changed):
Validation
What is this? | From workflow: Address PR Review Feedback Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
- Rewrite system prompt to use information-theoretic binary splitting instead of naive one-by-one candidate enumeration - Send a kickoff message on game start so the LLM immediately asks Question 1 without requiring user input first Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed the remaining unresolved review feedback with a focused follow-up commit.
No unresolved thread required a reply-only response. Validation
What is this? | From workflow: Address PR Review Feedback Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
peek/src/hooks/useTwentyQuestionsGame.ts (2)
214-218:⚠️ Potential issue | 🟠 MajorEnforce the last-question limit before keeping the reply.
When only one question remains, a model response with two questions is still rendered and only the counter is clamped. That leaves the UI showing an illegal extra question. Reject or truncate the turn when
remainingQuestions === 1 && newQuestions > 1.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@peek/src/hooks/useTwentyQuestionsGame.ts` around lines 214 - 218, Compute remainingQuestions = Math.max(0, MAX_QUESTIONS - questionCountRef.current) before accepting the model reply and enforce the limit: if newQuestions > remainingQuestions then either reject the turn (do not update questionCountRef.current or call setQuestionCount) or truncate the model reply to only include remainingQuestions questions before counting; then update questionCountRef.current += Math.min(newQuestions, remainingQuestions) and call setQuestionCount(questionCountRef.current). Apply this logic where countQuestions(text) is used so the UI never displays more questions than MAX_QUESTIONS.
214-222:⚠️ Potential issue | 🟠 MajorReject question turns that skipped Elasticsearch tools.
This still accepts a question-only assistant reply even when
toolCalls.length === 0. Once that happens, the game can drift into ungrounded chat instead of narrowing against cluster data. Fail the turn whennewQuestions > 0 && toolCalls.length === 0.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@peek/src/hooks/useTwentyQuestionsGame.ts` around lines 214 - 222, The code currently accepts assistant question turns based only on newQuestions; change logic in useTwentyQuestionsGame so that if newQuestions > 0 and toolCalls.length === 0 the turn is rejected: do not increment questionCountRef, do not call setQuestionCount, do not setStatus("guessing"), and instead return false (or otherwise signal a failed/rejected turn) so question-only replies that skipped Elasticsearch tools are not counted or progressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@peek/src/hooks/useTwentyQuestionsGame.ts`:
- Around line 161-163: Before each new LLM request clear the stale error state
so old error banners don't persist: inside the turn-starting logic (the block
that checks connection and inFlightRef.current and sets inFlightRef.current =
true and setLoading(true)) call the error state setter used in this hook (the
same one set in startGame, e.g., setError(null) or setError(undefined)) before
invoking sendToLLM; ensure this happens every time you begin a request (not only
in startGame) so subsequent successful turns remove prior errors.
---
Duplicate comments:
In `@peek/src/hooks/useTwentyQuestionsGame.ts`:
- Around line 214-218: Compute remainingQuestions = Math.max(0, MAX_QUESTIONS -
questionCountRef.current) before accepting the model reply and enforce the
limit: if newQuestions > remainingQuestions then either reject the turn (do not
update questionCountRef.current or call setQuestionCount) or truncate the model
reply to only include remainingQuestions questions before counting; then update
questionCountRef.current += Math.min(newQuestions, remainingQuestions) and call
setQuestionCount(questionCountRef.current). Apply this logic where
countQuestions(text) is used so the UI never displays more questions than
MAX_QUESTIONS.
- Around line 214-222: The code currently accepts assistant question turns based
only on newQuestions; change logic in useTwentyQuestionsGame so that if
newQuestions > 0 and toolCalls.length === 0 the turn is rejected: do not
increment questionCountRef, do not call setQuestionCount, do not
setStatus("guessing"), and instead return false (or otherwise signal a
failed/rejected turn) so question-only replies that skipped Elasticsearch tools
are not counted or progressed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2a3dfb35-5397-4adb-ad3c-d93583f76cf2
📒 Files selected for processing (1)
peek/src/hooks/useTwentyQuestionsGame.ts
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed review feedback with a surgical fix and pushed to
Validation
What is this? | From workflow: Address PR Review Feedback Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
|
🤖 AI Assistant 🤖 7 Enabled Automations • 0 Disabled AutomationsEnabled automations
TroubleshootingThis control panel only adds or removes labels on the PR.
|
- Remove the guard that errored with "The AI must inspect cluster data" when the LLM asked a question without making a tool call. This was too strict — the LLM can reasonably ask follow-up questions based on data it already gathered. - Tighten response format instructions to prevent verbose stat dumps. Responses should be 2-4 sentences max with a one-line summary before the question. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Give the LLM a "curious detective" personality and instruct it to weave query findings into natural, fun commentary rather than dumping stats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The LLM lost all memory of previous queries between turns because
sendToLLM stripped messages to {role, content}, dropping tool calls
and results. Now we use the AI SDK's response.messages to maintain
a full llmHistoryRef that includes tool call/result messages across
turns.
Also changed "ALWAYS run at least one query per turn" to only query
when new information is needed, preventing redundant re-queries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Requesting changes: peek/src/hooks/useTwentyQuestionsGame.ts has a real guess-detection bug at line 16.
STRICT_GUESS_RE is ^\s*my guess:\s* (case-insensitive), but the system prompt explicitly asks the model to emit "My guess:". A common markdown response like **My guess:** ... does not match this regex, so hasGuess stays false and the game never enters guessing state (no Correct/Wrong confirmation flow).
Please make guess detection robust to markdown-formatted heading variants (or normalize markdown punctuation before applying the strict check).
What is this? | From workflow: PR Review
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
| async (answer: string) => { | ||
| if (inFlightRef.current) return; | ||
| const userMsg: GameMessage = { id: crypto.randomUUID(), role: "user", content: answer }; | ||
| setMessages((prev) => [...prev, userMsg]); |
There was a problem hiding this comment.
handleAnswer appends the user message before checking whether a turn can actually run. If the Elasticsearch connection drops mid-game, sendToLLM returns immediately on !connection, so the UI records the user answer but never produces an assistant turn or an error state.
Concrete repro: start a game, disconnect from ES, then click Yes/No — message is added at line 274, but no follow-up appears.
Please guard before appending (or surface an explicit error) when !connection, so answers are only added when they can be processed.
| const userMsg: GameMessage = { id: crypto.randomUUID(), role: "user", content: answer }; | ||
| setMessages((prev) => [...prev, userMsg]); | ||
|
|
||
| if (status === "guessing") { |
There was a problem hiding this comment.
handleAnswer's status === "guessing" path is not idempotent, so rapid double submission can produce conflicting terminal outcomes.
Concrete failure: while the guess-confirmation buttons are visible, a fast double click (or pressing Enter + click) can invoke handleAnswer twice before React applies setStatus(...). Both invocations still read the stale status === "guessing", append user/system messages, and the second call can overwrite the final state (won -> lost or vice versa).
Please make this branch one-shot (for example, a guessResolvedRef guard set at the start of the guessing branch and reset in startGame) so only the first confirmation is applied.
| const GAME_TIMEOUT_MS = 60_000; | ||
| /** Allow enough steps for the LLM to run queries between questions. */ | ||
| const GAME_MAX_STEPS = 10; | ||
| const STRICT_GUESS_RE = /^\s*my guess:\s*/im; |
There was a problem hiding this comment.
STRICT_GUESS_RE is too strict and misses common Markdown-formatted guesses, which can break the game flow.
Concrete failure: if the model outputs **My guess:** (answer) or - My guess: (answer), this regex does not match, so status never transitions to "guessing". The UI then keeps showing normal Yes/No answer controls instead of the correct/wrong confirmation controls, and the user cannot resolve the round.
Please broaden detection to allow optional Markdown/bullet prefixes around My guess: (or parse assistant intent more structurally).

New page where an LLM plays 20 Questions to guess something the user is thinking of from their Elasticsearch cluster. The LLM actively queries Elasticsearch using
run_esql_queryand other tools to discover data, ask narrowing yes/no questions, and iteratively refine its search.Game flow: User thinks of anything in their cluster (a log entry, index, field value, service, host, etc.) → LLM uses ES tools to query the cluster → asks yes/no questions → runs refined queries based on answers → narrows down iteratively → makes a guess → user confirms correct/wrong. Final guess-only turn granted if 20 questions exhausted.
Changes
peek/src/hooks/useTwentyQuestionsGame.ts— Game logic hook: LLM getsrun_esql_query,get_index_info,get_cluster_healthtools to actively search the cluster. System prompt instructs a query→ask→refine iterative strategy. 10 tool steps per turn for multi-query reasoning. Question counting (numbered patterns or line-ending?), guess detection (/^\\s*my guess:\\s*/im).peek/src/components/TwentyQuestionsBoard.tsx— Game board with decomposed sub-components:GameMessageBubblewith inlineToolCallProgress(⏳/✓ status for each tool call),GameInput(Yes/No buttons plus free-text input for detailed answers), Correct/Wrong buttons during guess confirmation.peek/src/components/TwentyQuestionsPage.tsx— Page shell with question counter chip, New Game button, LLM-not-configured fallbackpeek/src/routes/paths.ts/manifest.ts— Route config under Workspace group (order 70,QuizIcon)peek/src/components/AppSidebar.tsx— AddedQuizIcontoNAV_ICON_COMPONENTSpeek/scripts/page-nav-buttons.mjs+tests/unit/pageNavButtons.test.ts— Screenshot script and route sync test updatedScreenshot
The body of this PR is automatically managed by the workflow runtime.
The body of this PR is automatically managed by the Update PR Body workflow.