Skip to content

fix(connect-evm): suppress switchChain _failed event on successful addEthereumChain recovery#294

Open
adonesky1 wants to merge 10 commits into
mainfrom
wapi-1253-chain-switch-recovery-dedupe
Open

fix(connect-evm): suppress switchChain _failed event on successful addEthereumChain recovery#294
adonesky1 wants to merge 10 commits into
mainfrom
wapi-1253-chain-switch-recovery-dedupe

Conversation

@adonesky1
Copy link
Copy Markdown
Contributor

Stacked on top of #290. Targets wapi-1253-failure-reason-and-tracking-fixes; once #290 lands GitHub will retarget the base to main automatically and the diff will narrow to just the three files in this PR.

Summary

On the switchChain → "Unrecognized chain ID" → addEthereumChain recovery path, the EVM shim was firing four Mixpanel events for what is logically one chain change:

mmconnect_wallet_action_requested  (switch)
mmconnect_wallet_action_failed     (switch — swallowed by the shim, never surfaces to the dapp)
mmconnect_wallet_action_requested  (add)
mmconnect_wallet_action_succeeded  (add)

The middle _failed is misleading. The dapp called switchChain(chainId, chainConfiguration); the wallet didn't know the chain; the shim's recovery added it and switched; the dapp's promise resolves successfully. The user sees one chain change. Mixpanel saw a failure that never reached the dapp.

This PR hoists the recovery-eligibility check above the _failed track call so the failed event only fires when there's no recovery to attempt. Post-merge, the same successful-recovery flow emits three events:

mmconnect_wallet_action_requested  (switch)
mmconnect_wallet_action_requested  (add)
mmconnect_wallet_action_succeeded  (add)

Behavioural matrix

Scenario Before (events fired) After (events fired)
Switch succeeds switch _requested, switch _succeeded same
Switch fails, no chainConfiguration switch _requested, switch _failed same
Switch fails for non-"Unrecognized chain ID" reason (user rejection, etc.), even with chainConfiguration switch _requested, switch _failed/_rejected same
Switch "Unrecognized chain ID" + chainConfiguration, recovery succeeds switch _requested, switch _failed, add _requested, add _succeeded switch _requested, add _requested, add _succeeded
Switch "Unrecognized chain ID" + chainConfiguration, recovery itself fails switch _requested, switch _failed, add _requested, add _failed switch _requested, add _requested, add _failed

The suppression preserves the "exactly one _failed per logical chain change" invariant in the unhappy path too — the add's own _failed represents the failure of the whole flow.

Why not just remove the shim tracking entirely?

That was the framing of the closed #291, which claimed the multichain RequestRouter was tracking these methods at the wallet boundary so the shim was duplicating it. Source review showed that's not true — MetamaskConnectEVM.switchChain calls this.#request(...)transport.sendEip1193Message(...), which does a raw postMessage (default transport) / dappClient.sendRequest({ name: 'metamask-provider', ... }) (MWP) directly. Those paths bypass RequestRouter.invokeMethod / wallet_invokeMethod and so bypass #withAnalyticsTracking entirely. The shim is the only analytics layer for the four EIP-1193 intercepted methods (wallet_switchEthereumChain, wallet_addEthereumChain, eth_requestAccounts / wallet_requestPermissions). #291 was closed for this reason; this PR is the correct, scoped fix for the one real intra-shim double-fire.

Recovering the lost signal

The pre-suppression switch _failed event was the only signal that told us "wallet X didn't know chain Y." Post-suppression this is still recoverable via a Mixpanel query: any wallet_addEthereumChain _requested event for which the same session has no preceding dapp-initiated addEthereumChain call is, by construction, a chain-switch-recovery attempt. Captured as backlog item (b) in the failure_reason follow-up plan.

Test plan

New test coverage in packages/connect-evm/src/connect.test.ts (describe('switchChain', ...)):

  • falls back to wallet_addEthereumChain when wallet_switchEthereumChain fails with "Unrecognized chain ID" and chainConfiguration is provided — extended to assert event names match [switch _requested, add _requested, add _succeeded] (no _failed).
  • still emits _failed for the switch when recovery is impossible (no chainConfiguration provided) — renamed from the old title; asserts [switch _requested, switch _failed].
  • rethrows non "Unrecognized chain ID" errors without falling back, even when chainConfiguration is provided — extended to assert one of _failed / _rejected still fires.
  • New: emits _failed for the add when recovery itself fails (suppression only applies on successful recovery) — asserts the unhappy-recovery path emits exactly one _failed (the add's), not two.

Test scaffolding:

  • mockCore extended with options.dapp, versions, transportType, and storage.getAnonId() so getWalletActionAnalyticsProperties runs end-to-end instead of being swallowed by #trackWalletAction*'s defensive try/catch. Confirmed by running the failing tests against the un-suppressed code — the first round of failures showed empty event arrays (analytics path silently throwing); after the mock fix, the failures show the actual switch _failed between switch and add, exactly the bug.
  • vi.spyOn(analytics, 'track') set up in beforeEach after the session-establishment events fire so test assertions only see calls made during the test body.

Validation:

  • yarn workspace @metamask/connect-evm test --run — 64 passed.
  • yarn workspace @metamask/connect-multichain test --run — 354 passed, 4 skipped (no regression from the mock change).
  • yarn build — all 11 tasks succeed.
  • yarn lint:eslint — clean.
  • yarn changelog:validate — clean.

Related

adonesky1 added 10 commits May 11, 2026 12:37
…sRejectionError`

Two bundled changes from the WAPI-1253 spike that share scaffolding
(both reuse `getUnwrappedErrorDetails`):

1. **Attach `failure_reason` to `_failed` events.** Both
   `mmconnect_wallet_action_failed` and `mmconnect_connection_failed`
   now carry a producer-side tag describing why they fired —
   `transport_timeout`, `transport_disconnect`,
   `wallet_method_unsupported`, `wallet_invalid_params`,
   `wallet_internal_error`, `wallet_custom_error`,
   `no_active_session`, `unrecognised_chain`, `rpc_node_http_error`,
   `rpc_node_request_error`, `rpc_node_response_error`, or `unknown`.
   The schema-side change lives in:
   consensys-vertical-apps/metamask-sdk-analytics-api#31

   Note: the `rpc_node_*` branches exist in the classifier but no
   producer currently emits them — read-only RPC failures still throw
   without tracking. A follow-up PR may wire them up.

2. **Unwrap `RPCInvokeMethodErr` in `isRejectionError`.** The router
   re-throws wallet errors wrapped in `RPCInvokeMethodErr`, whose
   outer `code` is the SDK's static `53`. Without unwrapping,
   wallet-side `code: 4001` rejections fell through to the brittle
   message-substring fallback, and any rejection whose message didn't
   happen to contain "reject"/"denied"/"cancel" was misclassified as
   a failure. Also tightened the "user" message heuristic so unrelated
   phrases like "user operation reverted" (Account Abstraction) no
   longer count as rejections.

Tests:
- New `analytics.test.ts` covering `isRejectionError` (including the
  `RPCInvokeMethodErr` unwrap and the tightened "user" heuristic) and
  every branch of `classifyFailureReason`.
- New `requestRouter.test.ts` assertions for `failure_reason` on
  wallet internal errors and transport timeouts, plus the regression
  test that wallet-side 4001 with a non-matching message now lands in
  `_rejected`.

Schema mirror: hand-edited `packages/analytics/src/schema.ts` to add
the new optional property. Regenerating via `openapi-typescript@7`
would have produced a 1k-line diff (the file was generated by an
older version that emitted `type` aliases instead of `interface`s);
hand-edit keeps the diff tight and matches the existing style.

Related: WAPI-1253. Sibling PRs from the same spike:
- #291 — dedupe EVM shim tracking.
- #292 — `isRejectionError` unwrap fix on its own (this PR carries
  the same fix as a dependency; whichever lands second is a no-op).
…nd bench

Follow-up to manual testing of the failure_reason classifier. Two
behaviour fixes in `classifyFailureReason` / `isRejectionError`, plus
moves the test-bench bits into a permanent home in the playground.

Classifier:

- Drop `code === 4100` from `isRejectionError`. EIP-1193 `4100
  Unauthorized` is what MetaMask returns when a method is not in the
  granted CAIP-25 scope — the multichain permission layer rejects it
  before any method handler runs (see core/packages/
  multichain-api-middleware/src/handlers/wallet-invokeMethod.ts).
  That's a permission/support signal, not a user-driven rejection.
  Misclassifying it inflated `_rejected` and hid genuine permission
  issues from `_failed`. After this change, dapp calls for methods
  outside the granted scope correctly surface as `_failed` with
  `failure_reason: wallet_unauthorized`.

- Add `wallet_unauthorized` to the `FailureReason` union and route
  `4100` to it. Distinct from `wallet_method_unsupported` because the
  wallet never gets to see whether the method exists.

- Add explicit branches for `4200 → wallet_method_unsupported` (wallet
  handler exists but explicitly refuses) and `4902 →
  unrecognised_chain` (`wallet_switchEthereumChain` to an unknown
  chain) so we don't lose signal to the catch-all
  `wallet_custom_error` range.

Playground:

- Rename `FailureReasonTestBench` → `AnalyticsTestBench` and wrap in
  a `<details>` so it's a permanent, collapsible section. No longer
  gated on `isConnected` — it always renders so testers can also
  exercise the `no_active_session` branch.

- Use the first connected EVM account from `session.sessionScopes`
  for the `personal_sign` rejection sanity-check. Calling
  `personal_sign(msg, 0x0000…)` made the wallet bail with `-32602`
  before showing the prompt, defeating the rejection check.

- Move the echo server from the temp `wapi-1253-test-rig/` dir into
  `playground/browser-playground/scripts/analytics-echo-server.mjs`
  and wire it as `yarn analytics:echo`.

- Document the full manual-testing workflow in the playground README
  (echo server + `METAMASK_ANALYTICS_ENDPOINT` env var + CSP gotcha
  + the multichain scope behaviour that explains why so many buckets
  funnel into `wallet_unauthorized`).
`wallet_custom_error` was the catch-all for any EIP-1193 / EIP-1474
provider-defined code in the 1000-4999 range we hadn't carved out into
its own bucket. With named branches now in place for 4001 (handled via
`isRejectionError`), 4100, 4200, and 4902, the remaining "we don't
know" cases are conceptually identical to `unknown` — having two
buckets for the same signal adds noise without insight.

After this:

- Unrecognised provider codes (e.g. 4900 Disconnected) fall through to
  `unknown` instead of `wallet_custom_error`.
- The catch-all `1000 <= code <= 4999` branch is removed.
- Future codes can be promoted from `unknown` into named buckets by
  adding new branches above the fallback — no schema migration needed
  since `failure_reason` is still an open string.

Updated:
- Test for code 4900 now asserts `unknown` instead of `wallet_custom_error`.
- Playground `ExpectedBucket` union drops the entry.
- CHANGELOG entry reworded to reflect the tighter taxonomy.
The `no_active_session` classifier branch matched the SDK-thrown
sentinel string `'No active session found'`, raised by `RequestRouter`
and `Multichain` when `getActiveSession()` returns `null`. Both
occurrences live inside `setTimeout` callbacks in the deeplink-open
path:

  setTimeout(async () => {
    const session = await this.transport.getActiveSession();
    if (!session) {
      throw new Error('No active session found');  // unhandled
    }
    ...
  }, 10);

That throw is unhandled — it doesn't reach `#withAnalyticsTracking`'s
try/catch and never produces a `mmconnect_wallet_action_failed`
event. The classifier branch was dead from day one.

The real pre-connect signal — `invokeMethod` called before
`Transport not initialized, establish connection first` — is thrown
from the `transport` getter, *outside* the `RequestRouter`'s analytics
wrapper, so it's also never tracked. Wiring up pre-flight tracking is
deliberately out of scope for this PR (see "Out of scope" in the PR
body).

Drop the bucket from the `FailureReason` union, the classifier branch,
the test, the CHANGELOG entry, and the corresponding button in the
playground analytics test bench. Pre-flight failures will still be
diagnosable when we wire them up; we'll add the bucket back at the
same time. Keeping it now just risks reviewers (or future us) trying
to trigger it and finding it doesn't fire.
…cket

- Inspect wallet JSON-RPC code BEFORE the transport_disconnect substring
  heuristic so a wallet `{ code: 4900, message: 'Disconnected' }` no
  longer leaks into transport_disconnect.
- Narrow the transport_disconnect substring set ('transport disconnect',
  'connection lost', 'socket closed', 'not connected') so it doesn't
  snag wallet errors that happen to mention "disconnect".
- Drop session_expired from FailureReason — no producer in the monorepo
  throws an error matching its message regex, the branch was
  defensive-only.
- Document connection-side error shapes in a new test block so the
  classifier's current behaviour on `transport.connect()` catch errors
  ('Transport not initialized', 'Existing connection is pending', etc.)
  is pinned and visible.
- analytics.test.ts: re-wrap two long lines per prettier (mechanical
  formatting only, no test behaviour changes).
- analytics-echo-server.mjs: auto-fix curly, prefer-destructuring,
  prefer-template, hashbang, and other surface-level lint errors;
  inline-disable n/no-process-env for the documented PORT override;
  rename catch binding err -> parseError (id-denylist).
- CHANGELOGs: append #290 PR links to every Unreleased bullet under
  packages/analytics and packages/connect-multichain (required by the
  check_changelog workflow) and add an Unreleased block to
  playground/browser-playground for the Analytics test bench + echo
  server additions.

No production code changes.
Collapse iterative-development bullets in the Unreleased sections of
@metamask/connect-multichain, @metamask/analytics, and
@metamask/browser-playground down to what an atomic reader needs:

- connect-multichain: 1 Added + 4 Fixed -> 1 Added + 3 Fixed. Dropped
  the session_expired-removal bullet (no producer ever emitted it, so
  it's not a user-facing change worth documenting) and the standalone
  4900 / transport_disconnect ordering bullet (covered by the
  classifier description in the Added entry).
- analytics: trim the schema-mirror bullet, drop the verbose taxonomy
  enumeration (it lives in the producer changelog and the schema PR).
- browser-playground: collapse the two test-bench bullets into one.

No code or behaviour change.
Narrow this PR to just the `failure_reason` taxonomy + classifier
work. The two `isRejectionError` correctness fixes (unwrap
RPCInvokeMethodErr, drop 4100 from the rejection set, tighten the
"user" message heuristic) move to #292 so the two pieces can be
reviewed independently.

isRejectionError is now byte-identical to main on this branch. The
new wallet_unauthorized classifier bucket is forward-compatible
dead code until #292 lands and stops routing 4100 to _rejected.

Drop the corresponding tests: the isRejectionError describe block in
analytics.test.ts and the "code 4001 wrapped in RPCInvokeMethodErr
lands in _rejected" regression test in requestRouter.test.ts (both
belong to #292's surface area).

CHANGELOG: drop the three isRejectionError Fixed bullets — they're
covered by #292's changelog.

No behavior change vs main for isRejectionError.
Same dead-bucket cleanup we did for no_active_session and
session_expired. The classifier had RPCHttpErr /
RPCReadonlyRequestErr / RPCReadonlyResponseErr branches mapping to
rpc_node_http_error / rpc_node_request_error / rpc_node_response_error,
but no producer emits any of them — handleWithRpcNode in
RequestRouter throws without ever calling analytics.track. They were
unreachable from day one.

Drop the three FailureReason union members, the three classifier
branches, the three matching tests, and the three error-class
imports. Also drop the "Not currently tracked" disclaimer block in
the playground test bench (used to call them out as supported-but-
not-wired) and update the analytics package schema description to
swap the misleading `rpc_node_error` example for `wallet_unauthorized`.

When a producer wires up handleWithRpcNode tracking, we will add
the buckets back at the same time. Keeping them now just risks
reviewers (or future us) thinking they fire and being confused when
they never do.
…addEthereumChain recovery

When `wallet_switchEthereumChain` throws "Unrecognized chain ID" and the
caller supplied a `chainConfiguration`, the shim catches the error and
transparently calls `#addEthereumChain` — from the dapp/user's perspective
this is still one logical chain change with one outcome. The previous
behaviour fired `mmconnect_wallet_action_failed` for the swallowed switch
attempt before the recovery ran, producing four Mixpanel events
(`switch _requested`, `switch _failed`, `add _requested`, `add _succeeded`)
for what should be three.

Hoist the recovery-eligibility check above the `_failed` track call so the
event only fires when there's no recovery to attempt:
- No `chainConfiguration` provided → switch `_failed` still fires (caller
  sees the rejection too).
- Error is not "Unrecognized chain ID" → switch `_failed`/`_rejected` still
  fires.
- Error is "Unrecognized chain ID" + `chainConfiguration` is set → fall
  through to `#addEthereumChain`, whose own `_succeeded` / `_failed`
  represents the whole flow. Net effect: exactly one `_failed` per logical
  chain change, regardless of which RPC method the wallet rejected.

Tests:
- New `analytics.track` spy on the switchChain test block (the existing
  tests only asserted network calls, not analytics surface).
- Asserts the suppression on the recovery-success path (3 events, not 4).
- Asserts `_failed` still fires when no chainConfiguration is provided.
- Asserts `_failed`/`_rejected` still fires for non-"Unrecognized chain"
  errors with chainConfiguration.
- New test: when recovery itself fails, suppression of the switch event
  is preserved and only the add's `_failed` fires (1 `_failed`, not 2).

Mock core extended with `options.dapp`, `versions`, `transportType`, and
`storage.getAnonId()` so `getWalletActionAnalyticsProperties` can run
end-to-end instead of being swallowed by `#trackWalletAction*`'s
defensive try/catch.

Related: WAPI-1253. The closed #291 conflated this fix with a
non-existent shim-vs-router duplication; this PR addresses only the real
intra-shim double-fire.
@adonesky1 adonesky1 requested a review from a team as a code owner May 11, 2026 21:43
wenfix
wenfix previously approved these changes May 13, 2026
Base automatically changed from wapi-1253-failure-reason-and-tracking-fixes to main May 14, 2026 20:30
@adonesky1 adonesky1 dismissed wenfix’s stale review May 14, 2026 20:30

The base branch was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants