docs(testing): scaffold Linux compatibility test plan (WIP) by aaddrick · Pull Request #540 · aaddrick/claude-desktop-debian

aaddrick · 2026-04-30T10:37:36Z

Summary

Staging for an eventual automated test harness, in markdown form first
67 functional tests (T01–T39 cross-env, S01–S28 env-specific) + 10 UI surface inventories under `docs/testing/`
Stable test IDs, standardized test bodies, per-element UI tables — shape designed so Playwright / xdotool / DBus assertions can be slotted in later without rewriting the corpus

What this is

A test-plan scaffold. The structure (dashboard separated from specs, runbook for sweep mechanics, UI inventory separate from functional tests, severity tiers, smoke set) reflects how mature OSS projects organize manual testing — and is the foundation for layering in automation incrementally.

What this is not

Done. Roughed in but far from ready:

Most cells are `?`. The matrix is a dashboard, not a record of what's been verified. Only a handful of statuses (KDE-W daily-driver use, Hypr-N per @typedrat, captured failures from prior sweeps) reflect real testing today.
T15–T39 are derived from upstream docs (code.claude.com/docs/en/desktop*) — features whose Linux behavior is officially undocumented (upstream explicitly says "Linux is not supported" for the Code tab). These tests describe intended Linux-side behavior, not anything verified yet.
UI checklists are starter inventories. Every surface has the structure, but element-level coverage needs real walkthroughs to flesh out selectors, state expectations, and per-row known-issue notes. Expect to add rows as edge cases surface.
No automation wired. The structure supports it. Nothing is plugged in.
Severity classifications are best-guess. Should refine once real failure data lands.

Layout

```
docs/testing/
├── README.md orientation, severity tiers, smoke set, automation roadmap
├── matrix.md dashboard: cross-env table + env-specific status snapshots
├── runbook.md VM setup, diagnostic capture, sweep workflow
├── cases/ 67 functional tests grouped by feature surface
│ ├── launch.md
│ ├── tray-and-window-chrome.md
│ ├── shortcuts-and-input.md
│ ├── code-tab-foundations.md
│ ├── code-tab-workflow.md
│ ├── code-tab-handoff.md
│ ├── routines.md
│ ├── extensibility.md
│ ├── distribution.md
│ └── platform-integration.md
└── ui/ per-surface UI checklists (every interactive element)
├── window-chrome-and-tabs.md
├── tray.md
├── sidebar.md
├── prompt-area.md
├── code-tab-panes.md
├── settings.md
├── routines-page.md
├── connectors-and-plugins.md
├── quick-entry.md
└── notifications.md
```

What's covered

Historical project surfaces: app launch, doctor, tray, window decorations, hybrid topbar (PR feat(linux): hybrid titlebar mode for clickable in-app topbar #538), Quick Entry, autostart, hide-to-tray, multi-instance.
Upstream Code-tab surface: Code tab load, sign-in browser handoff, folder picker (portal/native), drag-drop, integrated terminal, file pane, preview pane, PR monitoring (`gh`), scheduled tasks, connectors OAuth, plugin browser, MCP / hooks / CLAUDE.md memory, Dispatch handoff.
Env-specific failures: Ubuntu/DEB, Fedora/RPM, Wayland-native (wlroots), KDE, GNOME (mutter XWayland key-grab regression — Quick Entry feature does not work properly #404), Omarchy, Niri (#BindShortcuts error 5), AppImage, `.desktop` env handling, idle-sleep / suspend, Computer Use (out-of-scope per upstream — graceful unavailability check), auto-update vs apt/dnf, plugin/worktree storage.

Why this shape

Dashboard separated from specs. Status updates touch `matrix.md` only; spec authorship touches `cases/`. Two different workflows, two different files — reduces matrix-merge noise.
UI inventory separate from functional tests. Functional tests catch "the feature broke." UI checklists catch "the feature works but looks wrong." Both matter for Linux because Electron under different DEs / display servers / GTK theme combos produces visual artifacts that aren't behavioral failures.
Stable test IDs + standardized bodies. Sets up automation: `T01`–`T39` and `S01`–`S28` won't move. Each test has `Steps` + `Diagnostics on failure` blocks shaped for scripted runners.

What I'd do next (ordered)

Smoke-set sweep on KDE-W — flip the first 10 cells from `?` to real values, pressure-test the runbook in the process.
Walk one UI surface end-to-end (likely `window-chrome-and-tabs.md`) to validate the checklist format before scaling out.
Prototype the first automation runner — smoke set is the natural target; the standardized bodies should let one runner cover ~10 tests with shared diagnostic capture.
Refine severity once real failure data lands.

Test plan

This is a docs-only change. Suggested review:

Skim `docs/testing/README.md` — does the orientation make sense as a front door?
Skim `docs/testing/matrix.md` — is the dashboard scannable? Are status semantics clear?
Pick one case file (e.g. `cases/code-tab-foundations.md`) — does the standard test body have the right fields for both manual and future automation?
Pick one UI file (e.g. `ui/window-chrome-and-tabs.md`) — is element-level granularity right, or too fine / too coarse?
Spot-check anchor links from `matrix.md` to `cases/` — they should all resolve.
Sanity-check the severity scheme in `README.md` and `runbook.md` — are the tiers and the smoke set the right cuts?

Generated with Claude Code
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
85% AI / 15% Human
Claude: synthesized test items from existing project docs, upstream Claude Code Desktop docs, and Anthropic blog posts; designed the layout; wrote all test bodies, runbook, and UI inventories.
Human: provided the source files (`~/vms/distro-matrix*.md`); directed strategy (file naming, full restructure, add UI doc, eventual-automation framing); steered tradeoffs at each restructure decision point.

Establish a manual test plan for the Linux fork at docs/testing/, structured to support eventual automation. Layout: - README.md orientation, severity tiers, smoke set (10 tests), automation roadmap - matrix.md cross-env dashboard (T01-T39) + env-specific status snapshots (S01-S28) + known-failures rollup - runbook.md VM setup, diagnostic-capture commands, sweep workflow, severity guidance, how to add tests - cases/ 67 functional tests grouped by feature surface; every test has standardized Severity / Steps / Expected / Diagnostics on failure / References sections - ui/ per-surface UI checklists (window chrome, tray, sidebar, prompt, code-tab panes, settings, routines, connectors/plugins, quick entry, notifications). Every row is an interactive element with selector + expected state. Coverage: - Historical project surfaces: app launch, doctor, tray, window decorations, hybrid topbar, Quick Entry, autostart, hide-to-tray, multi-instance. - Upstream Claude Code Desktop surfaces (officially "Linux not supported" per code.claude.com/docs/en/desktop): Code tab, sign-in flow, folder picker, drag-drop, integrated terminal, file pane, preview pane, PR monitoring, scheduled tasks, connectors OAuth, plugin browser, MCP / hooks / CLAUDE.md memory, Dispatch handoff. - Env-specific failure modes: Ubuntu/DEB, Fedora/RPM, Wayland-native (wlroots), KDE, GNOME (mutter XWayland key-grab), Omarchy, Niri, AppImage, .desktop env handling, idle-sleep / suspend, Computer Use (out-of-scope per upstream), auto-update vs apt/dnf, plugin/worktree storage. Automation hooks: - Stable T## / S## test IDs (won't move). - Standardized test bodies — Steps and Diagnostics fields are scripted-runner-shaped. - UI checklists are per-element tables — every row a candidate Playwright / xdotool / DBus assertion. - Smoke set explicit in README — first 10 tests for automation. Co-Authored-By: Claude <claude@anthropic.com>

Captures the brainstorm + research pass behind the eventual harness: three-layer model (renderer / native / manual), why in-VM Playwright beats orchestrator-driven CDP, toolchain choices per layer (playwright- electron, dogtail/AT-SPI, ydotool→libei), anti-patterns to design against from day one, and a suggested first vertical slice (KDE-W + T01). Includes an Open questions section listing eight decisions still owed before any of this becomes code — language split, harness location, image-build tooling, CI execution model, data-testid injection, severity for the Electron-Wayland-default tests, diagnostic retention, JUnit output destination. Sourced; not committed direction yet. Co-Authored-By: Claude <claude@anthropic.com>

Restructures automation.md from brainstorm-with-open-questions to direction-with-residual-decisions. Eight calls captured in a Decisions table near the top: 1. Single language (TypeScript). dbus-next replaces gdbus shell-outs; child_process wraps OS-tool invocations as typed TS helpers; portal mocking via dbus-next handles native-dialog tests. Python only as a last-resort escape hatch for AT-SPI cases that resist mocking. 2. Harness lives at tools/test-harness/. 3. Packer for imperative distro images + Nix flake for Hypr-N. 4. No CI infrastructure initially; harness invokable from CI but sweeps run from the dev box for the first ~20 tests. 5. Semantic locators only (getByRole/getByLabel/getByText). No proactive data-testid injection patch; escalate per-test if a selector proves unstable. 6. X11-default verification is Smoke; Wayland-native characterization is Should. Project keeps X11 default because portal coverage for GlobalShortcuts is uneven across compositors. 7. Last 10 greens + all reds, on main only. Capture --doctor / launcher log / screenshot every run. 8. JUnit lives as workflow-run artifacts. Matrix-regen reads latest run's bundle and PRs the matrix update. T17 (folder picker) moves out of "manual forever" — portal mocking covers the integration test cleanly. dogtail demoted to escape-hatch status, only invoked if a specific test forces it. Co-Authored-By: Claude <claude@anthropic.com>

Adds the in-VM TS harness at tools/test-harness/ covering the four tests that exercise every distinct shape of harness code: - T01 — app launch (playwright-electron) - T03 — tray icon present (dbus-next + StatusNotifierWatcher) - T04 — window decorations draw (xprop + xdotool shell-out helpers) - T17 — folder picker opens (Electron-level dialog intercept; v1) Layout: tools/test-harness/ ├── package.json / tsconfig / playwright.config ├── src/lib/ — electron, dbus, sni, wm, env, retry, diagnostics ├── src/runners/ — one .spec.ts per test ID └── orchestrator/sweep.sh Per Decision 1 (single-language TS): every runner is .ts; OS tools (xprop, xdotool, claude-desktop --doctor) are shelled out via child_process and wrapped as typed TS helpers. dbus-next handles all DBus introspection. No bash test scripts, no Python. T17 is the shallow v1 — intercepts dialog.showOpenDialog at the Electron main process via Playwright's app.evaluate() rather than mocking the portal. Mocking org.freedesktop.portal.FileChooser via dbus-next requires displacing the running portal service or running under dbus-run-session, both intrusive enough to defer until signal warrants it. The test file documents this and the upgrade path. T04 uses xprop / xdotool which work on X11 native and KDE Wayland (via XWayland — the project default per Decision 6). Native-Wayland window-state queries are deferred. Wires runner: fields into the four cases/*.md test specs. Type-check passes; npx playwright test --list discovers all four. Run with: cd tools/test-harness npm install ROW=KDE-W ./orchestrator/sweep.sh Co-Authored-By: Claude <claude@anthropic.com>

Captures four real issues surfaced by trying to run T01 against the installed claude-desktop on Nobara KDE-W, plus the fixes that landed. Fixes that stuck: 1. Bypass the launcher script (/usr/bin/claude-desktop). It redirects Electron's stdout/stderr to ~/.cache/claude-desktop-debian/launcher. log, which means Playwright can't read the CDP advertisement on stderr. launchClaude now resolves the Electron binary + app.asar directly and spawns through Playwright. Override paths via CLAUDE_DESKTOP_ELECTRON / CLAUDE_DESKTOP_APP_ASAR env vars. 2. Inject the launcher's flags. Decision 6 (X11 default) is enforced in production via --disable-features=CustomTitlebar --ozone-platform=x11. Without these, Electron 41 hits a fatal Wayland communication error ("Broken pipe") on this build. Added as LAUNCHER_INJECTED_FLAGS. 3. Inject the launcher's env. ELECTRON_FORCE_IS_PACKAGED=true and ELECTRON_USE_SYSTEM_TITLE_BAR=1 mirror setup_electron_env(). The former makes app.isPackaged return true so resource resolution uses process.resourcesPath; the latter matches hybrid/native titlebar modes. 4. Pre-launch cleanup. Mirrors cleanup_orphaned_cowork_daemon + cleanup_stale_lock + cleanup_stale_cowork_socket in launcher-common .sh. Without it, a previous failed run leaves an orphaned cowork daemon and a stale SingletonLock that poison the next launch. Also: dropped the xdotool dependency. wm.ts now finds the X11 window by walking _NET_CLIENT_LIST + _NET_WM_PID via xprop only, which is universally installed where xdotool isn't. Open finding documented in README "Known limitations": Playwright's _electron.launch() currently fails after Frame Fix completes — the Node-inspector ws disconnects (code 1006) before the renderer ever advertises its DevTools port. Standalone electron --inspect=0 ... app.asar runs cleanly with the same flags (Frame Fix → "Starting app" → window created), so the failure is specific to Playwright + Electron 41 + this build. Likely workarounds: (a) chromium.connectOverCDP() against externally- spawned Electron with fixed --remote-debugging-port; (b) skip L1 entirely for T03/T04 (those don't need Playwright owning the process — just spawn via child_process and use dbus-next / xprop). Type-check passes; orchestrator/sweep.sh runs cleanly. The four .spec .ts files all discover via npx playwright test --list. The blocker is the launch handshake, not the harness shape. Co-Authored-By: Claude <claude@anthropic.com>

Discovered the real blocker behind every failed Playwright launch: the shipped index.pre.js has an authenticated-CDP gate. uF(process.argv) && !qL() && process.exit(1); uF matches --remote-debugging-port / --remote-debugging-pipe on argv; qL validates an ed25519-signed token in CLAUDE_CDP_AUTH (signed payload ${timestamp_ms}.${base64(userDataDir)}, 5-minute TTL) against a hardcoded public key. Without a valid signature the app exits with code 1 right after frame-fix-wrapper completes. Both _electron.launch() and chromium.connectOverCDP() inject --remote-debugging-port=0 and trigger the gate. The signing key is held upstream; we can't forge tokens. CDP-driven L1 testing is blocked until one of: (a) upstream issues a test/CI token, (b) we carry an app-asar.sh patch that neutralizes the gate, or (c) we drive the renderer via accessibility (dogtail / AT-SPI). All three are real options; none belong in this commit. What ships here, working today: T01 — App launch ✓ on KDE-W T03 — Tray icon present ✓ on KDE-W (already was) T04 — Window decorations draw ✓ on KDE-W (already was) T17 — Folder picker opens - (skipped, awaits portal mock v2) The harness now spawns Electron without any debug-port flags and probes the running app externally — xprop for window state, dbus-next for tray. T01 verifies "an X11 window with our pid appears within 15s and its title matches /claude/i" rather than reading navigator.userAgent; T03/T04 were external-probe tests already. Sweep output: $ ROW=KDE-W ./orchestrator/sweep.sh Running 4 tests using 1 worker ✓ 1 T01 — App launch (7.2s) ✓ 2 T03 — Tray icon present (7.2s) ✓ 3 T04 — Window decorations draw (7.1s) - 4 T17 — Folder picker opens 1 skipped 3 passed (22.9s) summary: tests=4 failures=0 errors=0 skipped=1 JUnit XML written, .tar.zst bundle created, exit 0. The CDP auth gate finding is documented at docs/testing/automation.md "The CDP auth gate" with the three escape hatches enumerated. Decision 1 and Decision 5 reopen for L1 once the project picks a path. Co-Authored-By: Claude <claude@anthropic.com>

The CDP gate (lib/electron.ts) only matches --remote-debugging-port / -pipe on argv. It doesn't check --inspect or runtime SIGUSR1 — which is the same code path as the in-app Developer → Enable Main Process Debugger menu item. Spotted by aaddrick. So we spawn Electron clean (gate stays asleep), wait for the X11 window, then send SIGUSR1 to attach the Node inspector at runtime. From there we get main-process JS evaluation, which reaches the renderer via webContents.executeJavaScript() and supports main-process mocks (dialog.showOpenDialog for T17). What landed: src/lib/inspector.ts — new. WebSocket Node-inspector client with evalInMain<T>() and evalInRenderer<T>() wrappers. Node 22+ built-in WebSocket; no extra deps. src/lib/electron.ts — adds app.attachInspector(timeoutMs) which SIGUSR1's the pid and waits for port 9229 to answer. src/runners/T17 — re-enabled. Inspector attaches, dialog mock installs, claude.ai webContents found, Code-tab navigation click succeeds. Skips with rich diagnostic if the folder-picker click chain doesn't land — selector tuning is iterate-as-needed work, not a blocker. Two implementation gotchas captured in code comments: - BrowserWindow.getAllWindows() returns 0 because frame-fix-wrapper substitutes the class and breaks the static registry. Use webContents.getAllWebContents() instead — works correctly. - Runtime.evaluate's awaitPromise + returnByValue returns empty objects for awaited Promise resolutions. Workaround: IIFE returns JSON.stringify(value) and caller JSON.parses. Sweep output: $ ./orchestrator/sweep.sh ✓ T01 — App launch (7.2s) ✓ T03 — Tray icon present (7.2s) ✓ T04 — Window decorations draw (7.1s) - T17 — Folder picker opens 3 passed, 1 skipped (44s) Decision 1's escape-hatch reasoning (dogtail / AT-SPI) is no longer the fallback for L1; it's only relevant for native dialogs the inspector pattern can't reach. The three documented escape hatches under "The CDP auth gate" can be retired — option (4), runtime-attach, is what we actually use. Co-Authored-By: Claude <claude@anthropic.com>

The README's "Automation roadmap" section was written when the harness didn't exist; it described automation in the future tense. Same for the runbook's "Eventual automation" section ("runner: fields are aspirational"). Both lied as of last week. README "Automation status" — points at tools/test-harness/, lists the four wired runners (T01/T03/T04/T17), links automation.md for architecture, links runbook for invocation. runbook "Automated runs" — sweep.sh invocation, output paths, JUnit-to-matrix mapping, coexistence with manual tests, brief on the SIGUSR1 / runtime-attach path through the CDP gate (with link to the long writeup in automation.md). Co-Authored-By: Claude <claude@anthropic.com>

Focused sweep plan for closing #393 / #404 / #370, anchored in upstream design intent rather than user expectation (validated against build-reference/.vite/build/index.js). Adds nine functional test specs (S29-S37) covering Quick Entry popup lifecycle, submit-flow reachability across main-window states, the fullscreen edge case, position memory across restart, multi-monitor fallback, and popup-survives-main-destroy behaviour. Each spec cites specific upstream file:line evidence. Refines ui/quick-entry.md rows with the same upstream evidence and adds rows for popup lifecycle and main-window-destroy persistence. Submit transition row now reflects "always a new chat session, never appended to current" per index.js:515546. Co-Authored-By: Claude <claude@anthropic.com>

Three prerequisites built before adding the closeout sweep runners: - Per-test isolation default in launchClaude(). Fresh XDG_CONFIG_HOME / CLAUDE_CONFIG_DIR per launch via mkdtemp, cleaned up on close. Three modes: default (fresh), shared (pass an Isolation handle for restart-style tests like S35), null (host config — opt-in for tests that need real claude.ai auth via CLAUDE_TEST_USE_HOST_CONFIG). - Row-skipping primitive (skipUnlessRow) so spec files declare applicability once and the orchestrator routes correctly. Maps to JUnit <skipped> → matrix `-`. - Layered Critical/Should assertion pattern. Local signals stay local (popup-closed = isVisible() === false), network-coupled signals (chat URL nav) are tracked separately so a claude.ai hiccup doesn't fail a regression cell. New libs: - isolation.ts — per-test sandbox - row.ts — skipUnlessRow / skipOnRow - argv.ts — /proc/$pid/cmdline + flag-presence check (QE-6, S07, S12, future Wayland-default Smoke) - asar.ts — in-place app.asar reads via @electron/asar (QE-19, future patch sanity for tray.sh / cowork.sh / etc.) - quickentry.ts — domain wrapper. Single point of coupling to upstream's main-process structure for QE-* tests. Anchors on stable strings (loadFile path '.vite/renderer/quick_window/ quick-window.html', IPC channel names, settings keys), not minified vars. S31 — Quick Entry submit reaches new chat from any main-window state. Backs QE-7/8/9; passes on KDE-W in ~28s. The interceptor pivot worth noting: scripts/frame-fix-wrapper.js returns the electron module wrapped in a Proxy whose `get` trap returns a closure-captured PatchedBrowserWindow. Constructor-level wraps (`electron.BrowserWindow = Wrapped`) are silently bypassed — writes succeed but reads ignore them. The reliable hook is at the prototype-method level (loadFile / loadURL); captures every instance regardless of subclass identity. Documented in docs/learnings/test-harness-electron-hooks.md so the next contributor doesn't re-discover the trap. ydotool is a hard prerequisite for QE-* shortcut injection. README's "Quick Entry runners" section walks through one-time host setup (install + ydotoold systemd override for a world-writable socket). sweep.sh fast-fails with a clear diagnostic when the daemon isn't reachable. What's left: ten more runners (S29/S30/S32/S33/S34/S35/S36/S37, QE-6/19 patch sanity, QE-15/17/21 popup chrome). Each is a ~30-60-line recombination over the existing libs — see plan in the closing message of this PR thread. --- Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <claude@anthropic.com> 40% AI / 60% Human Claude: drafted libs + runner, debugged the frame-fix-wrapper Proxy trap, wrote the learnings entry, ran S31 on bare-metal KDE-W Human: scoped the prerequisites split, ran ydotool/ydotoold setup, validated the output, drove design tradeoffs (per-test isolation default, layered Critical/Should assertion, prototype-hook over constructor wrap)

Wires up the remaining QE-* sweep runners from docs/testing/quick-entry-closeout.md. Full sweep on KDE-W now runs 16 specs in ~2.2 min; 10 pass, 5 cleanly skip per spec intent (S12/S32 row-gated to GNOME-W, S36 single-monitor, S37 unreachable on Linux, T17 mid-air on selector tuning). Specs landed: - S09 — patch sanity (asar grep for the KDE-gate string). Pure file probe, no app launch, ~75ms. - S12 — `--enable-features=GlobalShortcutsPortal` argv check. GNOME-W only. Currently a known-failing regression detector until the launcher patch lands; greens once #404 is closed. - S29 — popup lazy-create from closed-to-tray. Verifies the popup webContents is null before the first shortcut, then opens. - S30 — shortcut becomes a no-op after full app exit. Switched from "no leftover process" to a pgrep-pid-delta assertion; the spec's regression target is "no NEW pid spawned by the shortcut," not "zero leftovers" (renderer/zygote teardown is asynchronous, not what S30 is testing). - S31 — pre-existing; updated to use openAndWaitReady(). - S32 — GNOME-W/Ubu-W variant of S31 with a main-reappears assertion that S31 explicitly avoids. Skips on KDE rows; will fail on GNOME-W until the stale-isFocused() patch is widened beyond the current KDE-only #406 gate. - S33 — bundled Electron version. Reads from `electron/package.json` rather than running `electron --version` (the bundled binary auto-loads `resources/app.asar` so `--version` gets passed through as argv to Claude Desktop instead of being intercepted by Electron's flag parser). - S34 — fullscreen main suppresses popup. Inverse-shape test: popup must NOT be visible within 3s of the shortcut. - S35 — position memory across app restart. Two-launch test using a shared isolation handle so XDG_CONFIG_HOME persists across the restart. Heaviest runner (~30s). - S36 — multi-monitor fallback. Skips with `-` on single-monitor hosts per the closeout spec; uses test.fixme() on multi-monitor hosts to surface the missing libvirt-detach orchestration as `?` (untested) rather than a misleading green. - S37 — main-window destroy. Documented skip — unreachable on Linux per the close-to-tray override. Marked `-` on every Linux row in the matrix. Two race conditions surfaced and fixed during the bring-up: 1. **lHn() user-loaded race.** Upstream's shortcut handler (build-reference index.js:515604) checks `!user.isLoggedOut` AFTER ready-to-show and silently skips Ko.show() if the main-process user object hasn't populated yet. URL-changes-past- /login (visible in the renderer) precedes user-object population (in the main process). Mitigation: a new `openAndWaitReady()` helper that retries the shortcut up to 3 times with a per-attempt timeout. Used by S29-S32, S35. 2. **Main-visible-then-trigger race.** Triggering the shortcut immediately after the X11 window appears races the popup show() flow on first invocation. Mitigation: wait for `mainWin.getState().visible === true` before the first shortcut call. The same wait fixes the in-process case where lHn() was a non-issue. New harness primitive: - `waitForUserLoaded(inspector, timeoutMs)` in lib/quickentry.ts — polls the claude.ai webContents URL until it's no longer on a /login or /auth path. The signal is necessary but not sufficient for the lHn() race (auth state has its own timeline), so the retry-loop in `openAndWaitReady()` does the actual heavy lifting. README's Status table updated to list all 16 specs, layout section adds the 10 new runner files. --- Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <claude@anthropic.com> 35% AI / 65% Human Claude: drafted runners + helpers, traced lHn() race through build-reference, debugged race conditions iteratively against the local install Human: scoped batches, validated each runner outcome, drove the diagnostic-attachment + retry-vs-sleep tradeoff decisions

Six QE specs (S29-S35) hand-rolled six different shapes of "wait until the app is ready" — some polled mainWin.getState().visible, some additionally polled for any claude.ai webContents, some chained waitForUserLoaded for the URL-past-/login signal. Each spec started with a 10-20 line block of polling boilerplate. Replaces those with a tiered helper on the ClaudeApp interface: app.waitForReady(level, opts?) → ReadyResultFor<level> with four levels: - 'window' — X11 window mapped (no inspector) - 'mainVisible' — main shell BrowserWindow.isVisible() - 'claudeAi' — any claude.ai webContents reachable - 'userLoaded' — claude.ai URL past /login (lHn() precondition) Higher levels include all lower-level checks. Returns a conditionally-typed shape per level so the inspector handle is non-optional at 'mainVisible' or higher (no `inspector!` casts at call sites). Single overall timeout (default 90_000ms) flows across steps — slow startup eats from later steps' budget rather than tripping a per-step deadline. Hard-fail vs soft-fail split mirrors what the specs already did: - 'window' / 'mainVisible' throw on timeout — no spec today has a skip path for these, treat as hard regression. - 'claudeAi' / 'userLoaded' return with claudeAiUrl / postLoginUrl absent on timeout. Caller checks the field and testInfo.skip()s — the existing not-signed-in skip pattern in S31, S32, S35. Migrations: S29, S30, S34 → 'mainVisible' S31, S32 → 'claudeAi' (preserves the not-signed-in skip) S35 (×2 launch) → 'userLoaded' (preserves the skip on both) Net -64 lines across the six specs (boilerplate gone) and +130 lines in lib/electron.ts (the helper + types). The trade is worth it for the next QE-* runner — readiness becomes a single named call instead of another bespoke poll. Deliberately preserved: - openAndWaitReady's retry loop in lib/quickentry.ts. The lHn() race (build-reference index.js:515604) lives on a different timeline than the renderer URL — main-process user state can lag the URL change past /login. 'userLoaded' is necessary but not sufficient; the retry-on-shortcut path is the cheapest mitigation and stays. - S35's first-launch 3s sleep between userLoaded and the first openAndWaitReady. openAndWaitReady's retry would catch the race too, but eating one full attempt + retryDelayMs is slower than the upfront sleep on a test that already runs ~30s. waitForUserLoaded stays exported from lib/quickentry.ts (lHn() race domain knowledge belongs there) and is consumed by electron.ts. No re-export to keep one canonical import path. Validated on KDE-W: 10 passed, 5 cleanly skipped (S12/S32 row, S36 single-monitor, S37 Linux-unreachable, T17 on /login), 2.1 minutes total. npm run typecheck clean. --- Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <claude@anthropic.com> 60% AI / 40% Human Claude: drafted the helper API, sorted out the conditional-type vs overload tradeoff, migrated the six specs, ran the validation sweep Human: scoped which specs to migrate, defined the level semantics, called out openAndWaitReady's retry as untouchable, validated outcome

… lib Adds eighteen pieces of work across the harness, partitioned by file so they don't conflict, dispatched in parallel and merged together. == Negative validations on existing runners == T03 — assert exactly one SNI item is registered (not just presence), plus toggle nativeTheme.themeSource and re-assert. Catches the tray-rebuild-race regression where the destroy+recreate path would briefly register a duplicate item before deregistering the old one (see docs/learnings/tray-rebuild-race.md). S29 — assert the popup BrowserWindow is reused across shortcut presses, not re-constructed. Counts entries in __qeWindows matching the popup selector after the first press AND after a second press — both must equal 1. Catches a regression where lazy-create runs every press instead of show()/hide() on a persisted Ko ref. S30 — broadens the "no ghost respawn" delta into a full closeout- leak panel. Three additional checks BEFORE the post-exit shortcut press: no `cowork-vm-service` pids remain, the SNI item is deregistered (connection gone), no leftover `SingletonLock` symlink under the isolation's configDir. Existing post-shortcut delta assertion preserved. S32 — replaces the silent `.catch(() => {})` on waitForPopupClosed with explicit popup-state-after-submit assertion. The stale- isFocused short-circuit can also leave the popup visible (since popup.hide() lives downstream of the skipped show()) — independent regression detector from the main-window-visibility check. S34 — adds focus-side assertion to what was a suppression-only test. Upstream contract is `if (ut.isFullScreen()) { ut.focus(); ide(); }` — verify main is still fullscreen AND focused after the shortcut. KDE-W/KDE-X hard-fail (focus is reliable on Plasma); GNOME-W/Ubu-W soft-fixme (mutter routinely no-ops focus on fullscreen surfaces). S35 — three-launch shape: the existing two-launch position-memory check plus an on-disk round-trip (read parsed config.json between launches to confirm the save handler reached disk) plus a clear- and-default check (delete the saved key, launch a third time, assert the popup lands somewhere other than the cleared TARGET — proves the test is reading the real store). Bumped per-test timeout from 180_000 to 240_000. == New harness self-tests (H-prefix) == Introduces an H-prefix convention for runners that validate the harness's preconditions and the build pipeline's invariants — distinct from T-tests (upstream test cases) and S-tests (doc- spec entries). Cheap, fast, ground-truth what the other tests assume. H01 — CDP gate canary. Spawns bundled Electron with `--remote-debugging-port=0` and no CLAUDE_CDP_AUTH; asserts exit code 1 within 10s. If the gate is ever accidentally removed, this fires before the rest of the L1 strategy silently weakens. H02 — frame-fix-wrapper presence. Asserts both `frame-fix-wrapper.js` and `frame-fix-entry.js` exist in app.asar, the wrapper contains `Proxy(`, and `package.json#main` references the entry. File probe — sub-second. H03 — patch fingerprints. Manifest-based check for every build-pipeline patch (KDE gate, frame-fix inject, tray nativeTheme guard, cowork Linux daemon shutdown, claude-code linux-arm64 branch). Catches silent build-orchestrator drift. H04 — cowork daemon lifecycle. Baseline pgrep, launchClaude, wait for daemon to spawn, app.close(), assert daemon is gone. Soft-skips on rows where the daemon isn't gated to spawn (most default builds today). == claude.ai renderer UI domain wrapper == New `lib/claudeai.ts` centralizes renderer-DOM discovery for claude.ai UI patterns. Same shape as `lib/quickentry.ts` — domain class with discovery-by-shape, atom helpers, idempotent mocks. Exports: - activateTab(name) — clicks Chat/Cowork/Code df-pill - installOpenDialogMock + getOpenDialogCalls — idempotent dialog.showOpenDialog mock + recorded calls - findCompactPills, openPill, clickMenuItem, pressEscape — atoms shared by future page objects - class CodeTab — activate(), openEnvPill(), selectLocal(), openFolderPicker() (full chain) Discovery is by structural fingerprint, not Tailwind classes (those rebuild). Probed against a live debugger to confirm: df-pill is exactly 3 instances (Chat/Cowork/Code), compact-pill distinguishes env pill (max-w-[200px]) from Select-folder pill (max-w-[160px]) — same component shape, different label widths. T17 refactored to use the new lib — went from ~470 lines of inline DOM walking to ~70 lines of intent. When claude.ai re-renders the Code tab, the fix is one file over, not per-spec. == Library brittleness fixes == `lib/quickentry.ts`: - getStoredPosition rewritten to read configDir/Claude/config.json directly via electron-store's known JSON shape. Replaces a fragile globalThis-walk that matched any object with .get/.set returning a quickWindowPosition value. - LOGIN_URL_RE anchored: `^https?://[^/]+/(login|auth|sign[-_]?in) (?:[/?#]|$)`. Previous unanchored form would match /oauth/callback as still-on-login. - Dropped dead `skipTaskbar: false` field from getPopupRuntimeProps return shape (no caller used it; the hardcoded false was misleading). `lib/inspector.ts`: - InspectorClient.close() is now idempotent — second close is a no-op. Both runners and electron.ts auto-close path can safely invoke it. `lib/electron.ts`: - ClaudeApp tracks the attached inspector internally; app.close() auto-closes it (existing inline inspector.close() calls in runners stay working idempotently). - Module-level activeLaunches set + signal handlers ensure Ctrl-C during a sweep kills tracked Electron pids and rms isolation tmpdirs before re-emitting the signal. - app.lastExitInfo: { code, signal } | null exposes non-zero exit info post-close. Runners can attach when nonzero; nothing breaks when ignored. == Config + orchestrator == `playwright.config.ts`: - retries: process.env.CI ? 1 : 0 (one retry in CI to absorb compositor flake; local stays at 0 so flakes surface). - forbidOnly: !!process.env.CI prevents stray test.only from sneaking through CI. - /// <reference types="node" /> for `process.env` access (the file isn't covered by tsconfig.json's `src/**/*` include). `orchestrator/sweep.sh`: - Replaces the four `grep -oP ... | head -1` lines (which read only the first <testsuite> element) with a Node-based summary that sums tests/failures/errors/skipped across every suite. - Wrapped in `command -v node` guard with the legacy grep fallback retained inline. - Output line is byte-identical for downstream consumers. == Cleanup + docs == - README.md status table updated: 20 specs, 13 pass on KDE-W, six skip cleanly per spec intent. T17 row reflects the new end-to-end click chain. - lib/claudeai.ts and probe.ts added to the Layout section. - Deleted _investigate_t17_urls.spec.ts (one-off diagnostic that confirmed T17's /login was a fresh-isolation auth miss, not a webContents race). - Kept probe.ts as the seed for the explore CLI in the upcoming UI-mapping plan. == UI mapping plan == `docs/testing/claudeai-ui-mapping-plan.md` — executable plan for systematically mapping claude.ai's renderer UI into reusable test-harness abstractions. Three layers: shape-based atoms, page objects per major surface, discovery tooling. Phase 1 (explore CLI with snapshot/diff) and Phase 2 (UI map markdown) are independent and can run in parallel; Phase 5 (drift detection H05) depends on Phase 1. == Validation == KDE-W sweep: 13 pass, 6 cleanly skip, 0 fail. 2.7 min total. T17 verified end-to-end via the env-pill chain after refactor. npx tsc --noEmit clean across all changes. --- Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 70% AI / 30% Human Claude: dispatched five parallel agents per file partition (libs / runners batch 1 / runners batch 2 / new H-tests / config), wrote the claudeai.ts extraction agent brief informed by live-debugger probe evidence, drafted the UI mapping plan Human: scoped which improvements to make, called out skip vs fail edges (S34 KDE-strict / GNOME-fixme), shared live-renderer DOM dumps that ground-truthed T17's click chain (Code df-pill → env pill → Local → Select folder → Open folder), validated each step

Switches the inventory walker from a renderer-side document.querySelectorAll IIFE to Chromium's accessibility tree (Accessibility.getFullAXTree over CDP). Account-portable element identification via ariaPath + role + AX-computed name; click path moves to backendDOMNodeId via DOM.resolveNode + Runtime.callFunctionOn. Walker (explore/walker.ts): - snapshotSurface consumes AX nodes via axTreeToSnapshot - waitForAxTreeStable gates seed snapshot, post-navigation snapshot, and every snapshotSurface call (Accessibility.enable lag is async; first read on a cold load returns 4 nodes vs 800+ when settled) - redrivePath uses location.reload() instead of navigateTo to discard any state prior drills left in the SPA (open dialog, expanded sidebar, scrolled focus) - captureFingerprint's isListRowChild extended: button + group ancestors, plus a sibling-count fallback (>=15 same-role siblings) for claude.ai's flat marketplace dialogs and complementary sidebar - step 3 (positional) skipped for list-row children so they collapse via step 4's instance shape - MAX_CONSECUTIVE_LOOKUP_FAILURES bumped 25 -> 75 for sidebar virtualization noise (timeout counter still gates real wedges) - RawElement / RawAncestor reshaped: tagName / role / ariaLabel / textContent / dataState / parentChainSignature / ancestorAriaLabel dropped; backendDOMNodeId added; accessibleName is sole name source Inspector (src/lib/inspector.ts): - AxNode interface published - clickByBackendNodeId: DOM.resolveNode + Runtime.callFunctionOn (replaces selector-based click reconstruction) Name classifier (src/lib/name-classifier.ts): - cowork-session shape regex (Idle|Ready|Awaiting input|...) - row-more-options shape regex (^More options for ) Isolation (src/lib/isolation.ts): - seedFromHost option: kill host Claude, copy auth-relevant subset of ~/.config/Claude into per-launch tmpdir for U01 / H05 Driver (explore/walk-isolated.ts): - Replaces explore walk for safe walks: launches Claude inside the test-harness isolation rather than mutating the host profile Runners: - H05_ui_drift_check.spec.ts (claude.ai UI drift detection) - U01_ui_visibility.spec.ts (placeholder stub; regenerated post-walk) Self-test fixtures rewritten as synthetic AxNode trees fed through axTreeToSnapshot; existing 7 plan-example traces produce identical idTailFromFingerprint outputs. Co-Authored-By: Claude <claude@anthropic.com>

Plan (docs/testing/fingerprint-v7-plan.md): - Adds "Live-walk shakedown (post-Phase 2)" subsection enumerating the five real bugs the first end-to-end walks surfaced and their fixes (AX-stable gate, reload vs navigate, sibling-count list heuristic, two new instance shapes, threshold bump) - Resolves three open questions with first-clean-walk data: CDP cost is not a bottleneck (817-node tree settles <1s), role overrides work as intended (Skip to content captured as link), no account-bound kind needed (existing pattern + heuristic + collapse cover the observed cases) - Cross-references for walk-isolated.ts and clickByBackendNodeId Learnings (docs/learnings/test-harness-ax-tree-walker.md): - Five non-obvious AX-tree traps with symptoms + fixes: Accessibility.enable async lag, navigateTo no-op carrying state, claude.ai's flat dialog/complementary lists, per-row "More options for X" trigger needing its own shape, sidebar virtualization vs the lookup-failure threshold - Closing note on driver choice (walk-isolated.ts over explore walk) Prompts (docs/testing/fingerprint-v7-*-prompt.md): - implementation-prompt: original v7 walker rewrite prompt - ax-migration-prompt: DOM-walk -> AX-tree substrate migration prompt - runners-prompt: NEW. Self-contained prompt for next session to wire U01 against the fresh inventory and iterate autonomously to a clean pass/drift/fail baseline CLAUDE.md: link the new learnings doc Inventory artifacts: - ui-inventory.json + ui-inventory.meta.json: 90-entry inventory captured against claude.ai/epitaxy on app 1.5354.0 via walk-isolated.ts seedFromHost path. Marketplace dialog folded to single button-instance+704; cowork sidebar to button-instance+72; search history to option-instance+25 - ui-vocabulary.json: stable/suspect name corpus derived from prior walk - ui-inventory-reconciliation.md: v6-era reconciliation notes - ui-snapshots/{README.md,.gitkeep}: snapshots dir scaffold (JSON contents gitignored to avoid diff churn) claudeai-ui-map.md: human-readable map of the inventory's reachable surfaces Matrix (docs/testing/matrix.md): U01 row added; entry-count phrasing generalized so it doesn't go stale on each re-walk Co-Authored-By: Claude <claude@anthropic.com>

U01 was a placeholder skipping with "v7 cutover — re-walk required"; the v7 walker has shipped a fresh inventory, so regenerate the spec and land two resolver fixes the live sweep surfaced. `findByFingerprint`: the strictness gate only consulted `kind`, so entries with `kind: persistent` + `classification: instance` (the post-walk persistent-collapse promotes degenerate-shaped fingerprints when they appear on ≥3 surfaces) failed with "expected exactly one match, got N". The fingerprint's own degenerate-shape claim should win — defer to `classification === 'instance'` too. `redrivePath`: the dangling `startUrl` parameter was the smoking gun. After a prior test drilled into a deeper URL (e.g. /settings/customize), `location.reload()` reloaded the deep URL instead of returning to startUrl, and the next test's first `clickById` saw a contaminated surface. Navigate to startUrl when currentUrl has drifted; reload only when already at startUrl. Sweep results across three runs: 73/17 → 89/1 → 89/1, with the single failure being non-deterministic (different test each sweep, both consistent with focus-management transients and sidebar virtualization documented in docs/learnings/test-harness-ax-tree-walker.md). Generator gate inverted to make the safe-by-default path (seedFromHost: true) trigger when the env var is unset, mirroring H05's pattern but with the seed lifted from the host config. Co-Authored-By: Claude <claude@anthropic.com>

…migration The three v7 handoff prompts (vocabulary scaffold, AX-tree substrate migration, U-prefix runner wire-up) have all been implemented and merged. Retire them — the design contract still lives in fingerprint-v7-plan.md; the per-iteration prompts were single-use scaffolding for fresh sessions. Add claudeai-lib-ax-migration-prompt.md as the next-iteration handoff: tools/test-harness/src/lib/claudeai.ts is still on the old substrate (document.querySelector against minified-tailwind shapes) and is the highest-payoff target for the v7 plan's "design goal §2: Resilient to cosmetic drift". The prompt mirrors the prior handoffs' structure (authoritative refs, code anchors, phases, self-correction loop, termination conditions, final report format) and scopes the spike at openPill before fanning out to the rest of the file. Co-Authored-By: Claude <claude@anthropic.com>

Replace every CSS-shape walk in lib/claudeai.ts with AX-tree queries sourced from Chromium's Accessibility.getFullAXTree. Discovery now reads role + accessibleName + hasPopup from the same substrate the v7 walker uses, dropping the brittle button[aria-haspopup=menu] + span.truncate.max-w-[Npx] coupling that was the recurring break point on every upstream tailwind regen. Substrate change: - inspector.ts: surface AxValue + AxProperty types; explicit properties? on AxNode so consumers can read state tokens. - walker.ts: export RawElement, add hasPopup field, populate via readHasPopup() reading node.properties[].name === 'hasPopup'. - selfTest Case 10 covers menu / 'false' / absent values. Page-object migration (lib/claudeai.ts): - snapshotAx() helper gates on waitForAxTreeStable by default (post-userLoaded the first AX read can return ~4 nodes — see docs/learnings/test-harness-ax-tree-walker.md §1). - Polling loops in openPill (post-click) + clickMenuItem gate once upfront, then poll with { fast: true } so per-iteration stability re-checks don't fight the menuitem-appear poll. - activateTab matches role:'button' + literal accessibleName. - findCompactPills filters by role:'button' + hasPopup === 'menu', drops cowork sidebar via /^More options for / exclusion. Drops CompactPill.maxW field (tailwind artifact, only ever in error messages). - openPill / clickMenuItem use clickByBackendNodeId for the click path — same backend-id flow the walker uses. Live probe (explore/probe-claudeai-ax.ts) confirmed the discrimination shapes against the host renderer — found 49 buttons with hasPopup (48 menu, 1 dialog), env pill 'Local' resolves under main > region[Primary pane], 37 cowork sidebar triggers correctly excluded by the row-more-options filter. Caught one bug along the way: CDP exposes the property as 'hasPopup' (camelCase), not 'haspopup' — the synthetic selfTest fixture used the wrong casing too, so both sides agreed on the wrong contract until the live probe surfaced it. T17_folder_picker passes on KDE-W with CLAUDE_TEST_USE_HOST_CONFIG=1. Co-Authored-By: Claude <claude@anthropic.com>

The 90-test U01 sweep was wired against an account-specific v7 inventory snapshot; running it during routine sweeps fired noise against unrelated drift. The spec is auto-generated from the v7 inventory via npm run gen:render-specs, so this is a soft delete — regenerate any time a fresh inventory walk lands. Co-Authored-By: Claude <claude@anthropic.com>

Adds the implementation prompt for the next session: spawn one subagent per file in docs/testing/cases/, have each one cross-check its tests against the extracted Claude Desktop source under build-reference/app-extracted/, and edit in place to add code anchors / mark drift / flag missing features. Mirrors the structure of the already-retired claudeai-lib-ax-migration-prompt.md so the workflow is consistent. Triggered by the AX migration validation surfacing how easily case docs drift from upstream — the test author's "click X menu" can silently diverge from upstream's actual labels two versions later, and the failure looks like a Linux compat issue when it's really a doc-vs-source drift. Co-Authored-By: Claude <claude@anthropic.com>

Static anchor sweep: each test in docs/testing/cases/*.md now points at the upstream code (or wrapper script) backing its load-bearing claim, so the next sweep can tell "Linux compat regression" apart from "case doc drifted while we weren't looking." - 75 tests across 10 files reviewed - 63 grounded with code anchors (index.js:N, scripts/*.sh:N) - 9 drifted Steps/Expected corrected against actual upstream behavior - 2 marked Missing in build (S12 Wayland portal flag, S26 auto-update) - 1 flagged Ambiguous (T39 /desktop is a CLI surface, not Electron asar) Notable corrections: - T05: scheme is claude://, not https:// (project never registers x-scheme-handler/https; old spec was always going to fail on Linux) - T15: sign-in is in-app loadURL into mainView, not xdg-open handoff - T18: drag-attach uses webUtils.getPathForFile, not text/uri-list MIME - T20: file conflict check is sha256-based, not mtime-based - T22: gh-install path is macOS/brew-only on Linux/Windows - T30: PR-close auto-archive wait is ~5-6 min (5m setInterval + 30s startup + 1h non-terminal cooldown), not "~1 minute" - T14: PR #536 is closed/docs-only — no in-tree multi-instance flag Inventory anchors added for renderer-side surfaces present in the idle-state v7 capture (T16 Code tab, T17 select-folder, T26 Routines, T11/T33 plugin nav). Surfaces inside modals/popups (T22 toolbar, T25 Show-in-Files context menu, T31 side chat, T32 slash menu) are flagged for re-capture with the surface open. S26 finding worth follow-up: autoUpdater gate is structurally open on Linux when packaged (lii() at index.js:508761-508774 returns true with ELECTRON_FORCE_IS_PACKAGED=true from launcher-common.sh:249) — saved from real download attempts only by Electron's Linux autoUpdater being unimplemented. T07/S13 reference WCO-shim files that exist on main (PR #538 merged 2026-05-01) but not on this branch (docs/compat-matrix forked earlier); anchors point at main: with explicit caveats. Co-Authored-By: Claude <claude@anthropic.com>

Static greps against the 546k-line beautified bundle have known blind spots — lazy require()s, dynamic handler tables, conditional wiring. This probe connects to a running Claude Desktop via the existing InspectorClient (port 9229, opened by launchClaude's SIGUSR1 path) and dumps runtime state keyed by test-ID into a JSON the next grounding sweep can diff across upstream versions. Captures: - App metadata (version, isPackaged, ready state) - Full IPC handler registry (invoke + on channels) - WebContents inventory (URLs, types) - globalShortcut.isRegistered() for known accelerators - app.getLoginItemSettings() (autostart resolution) - safeStorage availability + backend (libsecret on Linux) - autoUpdater.getFeedURL() — empirical answer to the S26 structural- open claim that static analysis couldn't resolve - Notification.isSupported() Read-only / non-destructive; observes API state, never clicks UI or fires shortcuts. Records explicit gaps[] for surfaces it can't reach from idle (S20 powerSaveBlocker enumeration; T22/T31/T32 contextual renderer surfaces; T39 CLI binary). Run: cd tools/test-harness && npm run grounding-probe Output: /tmp/grounding-probe.json (override with --out PATH) Co-Authored-By: Claude <claude@anthropic.com>

Two extensions to the grounding probe, each closing a gap I flagged on the first cut: - --launch: spins up a fresh isolated instance via launchClaude(), waits for 'mainVisible' (cheapest level that returns the inspector), captures, tears down. Default still attaches to an already-running app on port 9229; --launch is the self-contained / CI-usable path. - --include-synthetic + S20 powerSaveBlocker probe: starts a blocker, reads isStarted, stops immediately. Brief inhibit (~ms). Read-only by default — synthetic state changes are opt-in. Doesn't verify the case-doc claim that keepAwakeEnabled toggles trigger this; that needs correlating settings IO with the `PhA` Set at index.js:241897, which depends on minified-name stability. Left to the next sweep. Argv parser rewritten to handle bare flags (--launch, --include-synthetic) alongside key/value pairs (--port 9229, --out PATH). Co-Authored-By: Claude <claude@anthropic.com>

…els, SNI Closes the bulk of the remaining gaps from the last cut: - AX fingerprint of the current claude.ai webContents (role+name+ hasPopup, reduced form). Stored once at the top level; per-test entries for T22/T26/T31/T32 reference it via { axFingerprintRef }. Captures whatever surface is on screen at probe time, so the user opens the slash menu / side chat / routines modal / PR toolbar before re-running to anchor those surfaces. - Editor handoff IPC channels (T24/T38). Static anchor is `Mtt` at index.js:463902 — variable name is minified, so we match handlers by /external|editor|openIn/i name pattern instead. Sufficient to diff across upstream versions (renames will surface as removed channels with similar replacements). - SNI / tray registration (T03). `findItemByPid()` from sni.ts attribu- tes a registered StatusNotifierItem to our pid. dbus-next is loaded via dynamic import so non-DBus environments (CI containers without a session bus) still get a partial probe rather than a hard fail. Reduced gaps[] to just T39 (CLI surface, out-of-scope) and the optional opt-outs (powerSaveBlocker without --include-synthetic; empty AX fingerprint when claude.ai isn't loaded yet). Co-Authored-By: Claude <claude@anthropic.com>

Branch was rebased onto main; scripts/wco-shim.js + scripts/patches/ wco-shim.sh are now on this branch via PR #538. The "lives on main, not yet on docs/compat-matrix" notes the grounding subagent added are no longer accurate — anchors point at files that exist locally. Co-Authored-By: Claude <claude@anthropic.com>

Folds the conventions the grounding sweep landed into the README so future authors and sweeps work from the same shape. Adds: - **Code anchors:** field — `<file>:<line>` pointers to where the load-bearing claim is implemented. - **Inventory anchor:** field — optional, for surfaces present in the v7 walker's idle capture. - "Anchor scope" section codifying the four buckets (upstream code, wrapper, server-rendered SPA, CLI binary) and where to anchor each. - "Drift markers" section codifying the Drifted / Missing / Ambiguous classifications the sweep already uses. No content changes to existing case files — they already follow these conventions in practice; the README now documents them. Co-Authored-By: Claude <claude@anthropic.com>

…nd runs Adds a top-level harness flag that flips every launchClaude() spawn from the default X11-via-XWayland backend to native Wayland, so the full suite can run under Wayland with a single env var instead of per-spec plumbing. Implementation mirrors scripts/launcher-common.sh:132-139: - Renames LAUNCHER_INJECTED_FLAGS to LAUNCHER_INJECTED_FLAGS_X11 and adds LAUNCHER_INJECTED_FLAGS_WAYLAND with the launcher's Wayland flag set (UseOzonePlatform, WaylandWindowDecorations, ozone-platform, wayland-ime, wayland-text-input-version=3). - harnessUseWayland() reads CLAUDE_HARNESS_USE_WAYLAND. - launchClaude() picks the flag set, adds CLAUDE_USE_WAYLAND=1 and GDK_BACKEND=wayland to the spawn env. Spread order keeps caller- supplied extraEnv winning, so a single test can still opt back to X11 inside a Wayland-mode sweep. - sweep.sh advertises the mode on stderr. - README documents the var + the npm-test recipe. Default unchanged: every runner still gets X11. The flag opts in. Verification (live): CLAUDE_HARNESS_USE_WAYLAND=1 npx playwright test src/runners/T17_folder_picker.spec.ts, then while the app is up confirm --ozone-platform=wayland is on argv via /proc/<pid>/cmdline. The harness spawns Electron directly (CDP-gate workaround at electron.ts: 102), so launcher-common.sh isn't sourced and ~/.cache/claude-desktop- debian/launcher.log is not written by harness runs. Co-Authored-By: Claude <claude@anthropic.com>

The action items from the last few sessions (case-doc grounding, runtime probe, autoUpdater issue, Wayland-mode runs) needed pointers across the testing docs so the next contributor isn't reverse- engineering them from git log. - docs/testing/README.md — bump date, surface grounding sweep + probe in the automation-status section, fix the test corpus snapshot (S-tests went from 28 to 37 since this was last counted). - docs/testing/runbook.md — add "Grounding sweep" section (static pass + runtime pass) alongside the existing test sweep, document the Wayland-mode sweep recipe, link upstream-bump trigger to it. - tools/test-harness/README.md — add grounding-probe.ts to the layout, a Run-section recipe, and a dedicated "Grounding probe" section explaining when to reach for it vs the static grep. - docs/testing/cases/distribution.md — link S26 to issue #567 (autoUpdater no-op tracking), now that the bug is filed. Co-Authored-By: Claude <claude@anthropic.com>

Counterpart to docs/testing/cases-grounding-prompt.md — a fan-out prompt for the workstream of wiring runners against the 61 of 76 tests that don't have one yet. Structured the same way as the grounding prompt: Phase 0 calibration, Phase 1 triage subagent producing a tiered plan (docs/testing/runner-implementation-plan.md), Phase 2/3 fan-out per test in Tiers 1-2, Phase 4 synthesis. Tier 3 (renderer-heavy / login-required) deferred to follow-up sessions; Tier 4 (CLI binary, issue-gated, env-blocked) marked out of scope with reasons. Constraints flag the known landmines: CDP gate workaround, the BrowserWindow Proxy gotcha, default isolation + escape hatches, ydotool prereqs, skipUnlessRow as the first line of every spec. "Don't ship stubs" called out explicitly so a session that hits a blocker reports it instead of leaving placeholder runners that pass trivially. Realistic next-session goal: 13-16 new runners (Tier 1 + as much Tier 2 as fits), bumping coverage from 15/76 (20%) to ~30/76 (40%). Future sessions handle the renderer-heavy Tier 3 once they have a session-time budget and host claude.ai login. Co-Authored-By: Claude <claude@anthropic.com>

- runDoctor() now returns {output, exitCode} so T02/T13/S05 can assert against the doctor exit code (was string-only, swallowed the code). - MainWindow.setState() accepts 'close' and calls win.close() so T08 exercises frame-fix-wrapper.js:178-185 (the close-to-tray interceptor) — distinct from 'hide' which would bypass the wrapper. - Add docs/testing/runner-implementation-plan.md: tiered triage of the 61 missing runners with execution-time reclassifications (T05 → Tier 3 delivery, T07 → Tier 2 via seedFromHost, T14 split into a/b, S20 deferred via #569). - Refresh T13/S05 case-doc anchors: scripts/doctor.sh:290-299 → :353-362 (file edited since the anchor was written). - Update test-harness README status to reflect the post-batch spec inventory and link to the plan doc. Co-Authored-By: Claude <claude@anthropic.com>

Each runner is independent of the others and matches one case-doc test ID. Pure file probes (asar fingerprints, source-tree grep) and short-lived spawn probes; no app launch needed. Specs landed: - T02 — claude-desktop --doctor exit code is 0 - T11 — plugin install code path fingerprints (installPlugin log, installed_plugins.json) present in bundled index.js - T13 — --doctor does not false-flag rpm/deb installs as missing-dpkg AppImage - T14a — requestSingleInstanceLock + 'second-instance' strings in bundle (T14b runtime probe lands separately) - S01 — AppImage launches without libfuse.so.2 complaint (skips cleanly on non-AppImage rows) - S02 — no strict == equality against XDG_CURRENT_DESKTOP in launcher / patches (regression detector) - S03 — dpkg-query Depends: field non-empty (currently fails as upstream-contract regression detector — deb.sh:185-197 emits no Depends: line) - S04 — rpm -qR has at least one non-rpmlib(...) requirement (currently fails — rpm.sh:188 has AutoReqProv: no, no manual Requires:) - S05 — doctor does not false-flag rpm-installed package - S08 — KDE tray-rebuild fast-path (.setImage(...createFromPath...)) injected by tray.sh:212-217 - S15 — AppImage --appimage-extract fallback exits 0; squashfs-root/ AppRun --version runs without FUSE error - S16 — AppImage mount(8) entry appears post-launch and clears within ~10s of close - S21 — no handle-lid-switch / HandleLidSwitch strings in bundle (lid policy deferred to OS) - S22 — new Set(["darwin","win32"]) computer-use platform gate present, no 2-element Set pairing linux (file-probe form) - S26 — setFeedURL present + project suppression marker absent (currently fails — gated on #567 auto-update suppression patch) - S27 — installed_plugins.json + homedir resolver present, no */plugins system paths in bundle Three specs are intentional regression detectors — they ship "red" today (S03, S04, S26) because the upstream contract isn't yet met. Each error message names the upstream defect or issue so matrix-regen surfaces them as actionable cells. Co-Authored-By: Claude <claude@anthropic.com>

Single launchClaude() + inspector + Electron-API or window-state assertion. Each runner asserts a contract that requires the app to actually be running. Specs landed: - T05 — claude:// URL delivers via app.on('second-instance') (Tier 3 delivery probe: xdg-open fires the URL, the running app's hook captures it). Uses isolation: null because the SingletonLock collision must route to the same user-data dir. - T06 — globalShortcut.isRegistered('Ctrl+Alt+Space') returns true after waitForReady('mainVisible') - T07 — five topbar buttons render with non-zero rects. First spec to exercise createIsolation({ seedFromHost: true }) — kills host Claude, copies auth allowlist (Cookies, Local State, Local Storage, IndexedDB, etc.) into per-test tmpdir, runs hermetically against signed-in account, tmpdir destroyed on close. - T08 — MainWindow.setState('close') fires the wrapper's close interceptor; window hidden, proc still alive - T09 — setLoginItemSettings({ openAtLogin }) writes/removes $XDG_CONFIG_HOME/autostart/claude-desktop.desktop - T12 — app.getGPUFeatureStatus() returns populated object; reaching mainVisible proves the renderer didn't crash - T14b — second invocation under same isolation exits cleanly via requestSingleInstanceLock early-return; primary pid stays alive - S07 — under CLAUDE_HARNESS_USE_WAYLAND=1, spawned Electron has --ozone-platform=wayland on argv (skips when env unset) - S17 — shell-path-worker overlays the user's login-shell PATH onto a deliberately-scrubbed env. Re-forks shellPathWorker.js via utilityProcess.fork + MessageChannelMain to observe the worker output directly (the main-process FX() merger only fills undefined keys, so reading process.env.PATH after a non-undefined override wouldn't observe the effect). T05 originally planned as a Tier 2 isDefaultProtocolClient probe but reshaped — that runtime call is a no-op in the harness because ELECTRON_FORCE_IS_PACKAGED=true makes app.getName() resolve to "Claude" (not "claude-desktop"), so the xdg-mime shellout fails silently. Real registration is install-time via the .desktop file MimeType= line. T05 ships as the delivery probe instead. T07 originally deferred to Tier 3 ("topbar is React-rendered SPA") but the harness's seedFromHost primitive (isolation.ts:37-44, never exercised before this commit) lifts it back to Tier 2. Co-Authored-By: Claude <claude@anthropic.com>

Mirrors lib/claudeai.ts:installOpenDialogMock (used by T17). Replaces electron.shell.showItemInFolder with a recording mock so Tier 2 reframe specs can assert "the IPC layer reaches the egress with the right path" without firing the real DBus FileManager1 / xdg-open dispatch on the host. Idempotent (guarded by globalThis.__claudeAiShowItemMockInstalled), matches the existing mock helper's call-recording shape, exports a companion getShowItemInFolderCalls reader. Used by the rewritten T25 runner in the next commit. Co-Authored-By: Claude <claude@anthropic.com>

Categories landed: - B (seedFromHost-unlocked): T16 (Code tab loads), T26 (Routines page renders) — both promote Tier 3 → Tier 2 via the seedFromHost primitive shipped in session 1. - A (Tier 2 single-launch deferred from session 1): T10 (Cowork daemon respawn after SIGKILL), S10 (KDE-W Quick Entry popup transparent), S25 (safeStorage round-trip across two launches with shared isolation handle). - C (Tier 2 reframes): T23 (Notification reaches DBus via dbus-monitor subprocess), T25 (shell.showItemInFolder via mock-then-call — mirrors T17's installOpenDialogMock), T38 (openInEditor IPC handler registered probe via ipcMain._invokeHandlers), S19 (CLAUDE_CONFIG_DIR extraEnv reaches main process). - Tier 1 reclass: S28 (worktree permission classifier asar fingerprint — Sbn() is closure-local, not inspector-reachable). Mechanism notes — see plan doc status section for full rationale: - T23 uses dbus-monitor not gdbus monitor (the latter only sees signals owned by a destination, not method calls to it). - T38 inspects ipcMain._invokeHandlers for handler registration; the channel ends in $eipc_message$_<UUID>_$_claude.web_$_<name> with a build-stable UUID prefix — anchors on the suffix. - T25 mock-then-call beats invoke-then-cleanup (no host file manager pop-up, stronger assertion). - S25 compares decrypted plaintexts not ciphertexts (safeStorage on Linux uses random IVs). Co-Authored-By: Claude <claude@anthropic.com>

- runner-implementation-plan.md: new "Status (post-execution)" sub- section for session 2 listing the 10 new specs and the four reclassification notes (S28 → Tier 1, T38 framing, T23 tool choice, S19 honest-stub note). Session 1 sub-section preserved verbatim below for comparison. - README.md: 50-spec inventory (was 40), new T-rows (T10, T16, T23, T25, T26, T38) and S-rows (S10, S19, S25, S28) interleaved into the existing tables. Substrate-primitives paragraph extended with dbus-monitor, mock-then-call, ipcMain registry introspection, safeStorage round-trip, extraEnv precedence. - runner-implementation-followup-prompt.md: rewritten for session 3 — deferred items (T31, T32, S06, S11, S14), Tier 3 → Tier 2 reframes (T22, T35, T37), asar fingerprint cleanups (T24, T30, T33), the focus-shifter primitive build, and the mock-then-call extension for T24 as an alternative to its asar form. Includes the "known mechanism-recipe table" cumulating sessions 1+2. - runner-implementation-prompt.md: deleted (session 1's prompt, superseded by the followup that's been the rolling document since session 1 ended). Co-Authored-By: Claude <claude@anthropic.com>

… helper Session 3 brings the third mock-then-call helper online (installOpenExternalMock for shell.openExternal, mirroring installShowItemInFolderMock and installOpenDialogMock). Threshold from the session prompt was met — pull the three install/get pairs out of lib/claudeai.ts into a dedicated lib/electron-mocks.ts. The mocks are generic Electron module patches (dialog, shell), not claude.ai-domain, so the new home keeps claudeai.ts focused on AX-tree page-objects. T17, T25 imports updated to point at the new module. T24 (added in the follow-up commit) imports from electron-mocks.ts directly. Co-Authored-By: Claude <claude@anthropic.com>

Coverage 50/76 → 57/76. Seven new specs land + one session-2 carryover (T38) reclassified after the eipc-registry finding below. New specs: - T22 (PR monitoring) — Tier 1 fingerprint: LocalSessions_$_getPrChecks eipc channel name + "gh CLI not found in PATH" Linux-fallthrough throw site (case-doc anchors :464281 / :464964 / :464368). - T24 (Open in editor) — Tier 2 mock-then-call: installOpenExternalMock patches shell.openExternal from main, evalInMain calls it with a vscode://file/... URL, assert recorded call lists URL verbatim. No real editor launch (mock returns Promise<boolean>). - T30 (Auto-archive cadence) — Tier 1 fingerprint: single regex anchoring 300*1e3 ≤ 3600*1e3 ≤ AutoArchiveEngine in colocation (≤200 / ≤3000 char proximity windows tuned to current bundle), plus ccAutoArchiveOnPrClose .includes() inside the captured window. - T31 (Side chat) — Tier 1 fingerprint: side-chat eipc trio (startSideChat / sendSideChatMessage / stopSideChat). - T32 (Slash menu) — Tier 1 fingerprint: LocalSessions_$_getSupportedCommands + slashCommands schema. - T33 (Plugin browser) — Tier 1 fingerprint: CustomPlugins_$_listMarketplaces + listAvailablePlugins. - T37 (CLAUDE.md memory) — Tier 1 fingerprint: high-signal "[GlobalMemory] Copied CLAUDE.md" log line + CLAUDE.md filename + CLAUDE_CONFIG_DIR env-var token. Fixture-readback form deferred — parsed-memory state is closure-local. eipc-registry finding (T38 reclassification): Session 2's T38 used ipcMain._invokeHandlers introspection. KDE-W run revealed that registry holds only three chat-tab MCP-bridge handlers (list-mcp-servers, connect-to-mcp-server, request-open-mcp-settings) regardless of ready level (mainVisible / claudeAi / userLoaded) and regardless of authentication state (default isolation vs. seedFromHost: true verified via probe). The $eipc_message$_<UUID>_$_claude.web_$_<name> protocol uses a closure- local message-port registry not reachable from globalThis — same gotcha as session 2's Sbn() (S28) and cE()/Tce() (S19). T38 rewritten as a Tier 1 asar fingerprint anchoring on the LocalSessions_$_openInEditor channel-name string in the bundle. T22, T31, T33 (originally drafted with the same broken pattern) ship as Tier 1 fingerprints from the start. T24 is unaffected — it patches the stdlib Electron shell module from main, not the eipc layer. KDE-W: 9/9 pass in 18.2s (7 new + T25 verifying the lib import-extract didn't break it + T38 reclassified). Co-Authored-By: Claude <claude@anthropic.com>

Updates the post-execution status section with session 3's seven shipped specs, the eipc-registry finding (corrects session 2's T38 assumption), and the four reclassifications (T22/T31/T33/T38 from Tier 2 IPC probes to Tier 1 fingerprints). Captures the authentication-state lesson too — launches that depend on authenticated renderer state need createIsolation({ seedFromHost: true }), even if the case-doc-shaped Tier 2 form looks hermetic on paper. README inventory grows from 50 to 57 specs and adds a note that LocalSessions_$_* / CustomPlugins_$_* channels use a custom eipc protocol, not Electron's standard ipcMain.handle() — so future runners should anchor on channel-name strings (Tier 1) rather than introspect _invokeHandlers (broken). Followup prompt rewritten for session 4: focus-shifter primitive + S11/S14, T35 MCP separation fingerprints (Phase 1) and optional fixture-readback (Phase 2, may abort), and the eipc-registry exposer as a flagged primitive gap. Co-Authored-By: Claude <claude@anthropic.com>

aaddrick · 2026-05-03T21:40:53Z

Closing this WIP — will redraft once the test-plan + harness work is finished. Branch stays for ongoing iteration.

aaddrick and others added 25 commits May 3, 2026 07:55

aaddrick force-pushed the docs/compat-matrix branch from 2f2134d to ade75d7 Compare May 3, 2026 11:57

aaddrick and others added 4 commits May 3, 2026 08:00

aaddrick and others added 9 commits May 3, 2026 14:41

aaddrick closed this May 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(testing): scaffold Linux compatibility test plan (WIP)#540

docs(testing): scaffold Linux compatibility test plan (WIP)#540
aaddrick wants to merge 38 commits intomainfrom
docs/compat-matrix

aaddrick commented Apr 30, 2026

Uh oh!

aaddrick commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aaddrick commented Apr 30, 2026

Summary

What this is

What this is not

Layout

What's covered

Why this shape

What I'd do next (ordered)

Test plan

Uh oh!

aaddrick commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant