fix(connect-evm): suppress switchChain _failed event on successful addEthereumChain recovery#294
Open
adonesky1 wants to merge 10 commits into
Open
fix(connect-evm): suppress switchChain _failed event on successful addEthereumChain recovery#294adonesky1 wants to merge 10 commits into
_failed event on successful addEthereumChain recovery#294adonesky1 wants to merge 10 commits into
Conversation
…sRejectionError` Two bundled changes from the WAPI-1253 spike that share scaffolding (both reuse `getUnwrappedErrorDetails`): 1. **Attach `failure_reason` to `_failed` events.** Both `mmconnect_wallet_action_failed` and `mmconnect_connection_failed` now carry a producer-side tag describing why they fired — `transport_timeout`, `transport_disconnect`, `wallet_method_unsupported`, `wallet_invalid_params`, `wallet_internal_error`, `wallet_custom_error`, `no_active_session`, `unrecognised_chain`, `rpc_node_http_error`, `rpc_node_request_error`, `rpc_node_response_error`, or `unknown`. The schema-side change lives in: consensys-vertical-apps/metamask-sdk-analytics-api#31 Note: the `rpc_node_*` branches exist in the classifier but no producer currently emits them — read-only RPC failures still throw without tracking. A follow-up PR may wire them up. 2. **Unwrap `RPCInvokeMethodErr` in `isRejectionError`.** The router re-throws wallet errors wrapped in `RPCInvokeMethodErr`, whose outer `code` is the SDK's static `53`. Without unwrapping, wallet-side `code: 4001` rejections fell through to the brittle message-substring fallback, and any rejection whose message didn't happen to contain "reject"/"denied"/"cancel" was misclassified as a failure. Also tightened the "user" message heuristic so unrelated phrases like "user operation reverted" (Account Abstraction) no longer count as rejections. Tests: - New `analytics.test.ts` covering `isRejectionError` (including the `RPCInvokeMethodErr` unwrap and the tightened "user" heuristic) and every branch of `classifyFailureReason`. - New `requestRouter.test.ts` assertions for `failure_reason` on wallet internal errors and transport timeouts, plus the regression test that wallet-side 4001 with a non-matching message now lands in `_rejected`. Schema mirror: hand-edited `packages/analytics/src/schema.ts` to add the new optional property. Regenerating via `openapi-typescript@7` would have produced a 1k-line diff (the file was generated by an older version that emitted `type` aliases instead of `interface`s); hand-edit keeps the diff tight and matches the existing style. Related: WAPI-1253. Sibling PRs from the same spike: - #291 — dedupe EVM shim tracking. - #292 — `isRejectionError` unwrap fix on its own (this PR carries the same fix as a dependency; whichever lands second is a no-op).
…nd bench Follow-up to manual testing of the failure_reason classifier. Two behaviour fixes in `classifyFailureReason` / `isRejectionError`, plus moves the test-bench bits into a permanent home in the playground. Classifier: - Drop `code === 4100` from `isRejectionError`. EIP-1193 `4100 Unauthorized` is what MetaMask returns when a method is not in the granted CAIP-25 scope — the multichain permission layer rejects it before any method handler runs (see core/packages/ multichain-api-middleware/src/handlers/wallet-invokeMethod.ts). That's a permission/support signal, not a user-driven rejection. Misclassifying it inflated `_rejected` and hid genuine permission issues from `_failed`. After this change, dapp calls for methods outside the granted scope correctly surface as `_failed` with `failure_reason: wallet_unauthorized`. - Add `wallet_unauthorized` to the `FailureReason` union and route `4100` to it. Distinct from `wallet_method_unsupported` because the wallet never gets to see whether the method exists. - Add explicit branches for `4200 → wallet_method_unsupported` (wallet handler exists but explicitly refuses) and `4902 → unrecognised_chain` (`wallet_switchEthereumChain` to an unknown chain) so we don't lose signal to the catch-all `wallet_custom_error` range. Playground: - Rename `FailureReasonTestBench` → `AnalyticsTestBench` and wrap in a `<details>` so it's a permanent, collapsible section. No longer gated on `isConnected` — it always renders so testers can also exercise the `no_active_session` branch. - Use the first connected EVM account from `session.sessionScopes` for the `personal_sign` rejection sanity-check. Calling `personal_sign(msg, 0x0000…)` made the wallet bail with `-32602` before showing the prompt, defeating the rejection check. - Move the echo server from the temp `wapi-1253-test-rig/` dir into `playground/browser-playground/scripts/analytics-echo-server.mjs` and wire it as `yarn analytics:echo`. - Document the full manual-testing workflow in the playground README (echo server + `METAMASK_ANALYTICS_ENDPOINT` env var + CSP gotcha + the multichain scope behaviour that explains why so many buckets funnel into `wallet_unauthorized`).
`wallet_custom_error` was the catch-all for any EIP-1193 / EIP-1474 provider-defined code in the 1000-4999 range we hadn't carved out into its own bucket. With named branches now in place for 4001 (handled via `isRejectionError`), 4100, 4200, and 4902, the remaining "we don't know" cases are conceptually identical to `unknown` — having two buckets for the same signal adds noise without insight. After this: - Unrecognised provider codes (e.g. 4900 Disconnected) fall through to `unknown` instead of `wallet_custom_error`. - The catch-all `1000 <= code <= 4999` branch is removed. - Future codes can be promoted from `unknown` into named buckets by adding new branches above the fallback — no schema migration needed since `failure_reason` is still an open string. Updated: - Test for code 4900 now asserts `unknown` instead of `wallet_custom_error`. - Playground `ExpectedBucket` union drops the entry. - CHANGELOG entry reworded to reflect the tighter taxonomy.
The `no_active_session` classifier branch matched the SDK-thrown
sentinel string `'No active session found'`, raised by `RequestRouter`
and `Multichain` when `getActiveSession()` returns `null`. Both
occurrences live inside `setTimeout` callbacks in the deeplink-open
path:
setTimeout(async () => {
const session = await this.transport.getActiveSession();
if (!session) {
throw new Error('No active session found'); // unhandled
}
...
}, 10);
That throw is unhandled — it doesn't reach `#withAnalyticsTracking`'s
try/catch and never produces a `mmconnect_wallet_action_failed`
event. The classifier branch was dead from day one.
The real pre-connect signal — `invokeMethod` called before
`Transport not initialized, establish connection first` — is thrown
from the `transport` getter, *outside* the `RequestRouter`'s analytics
wrapper, so it's also never tracked. Wiring up pre-flight tracking is
deliberately out of scope for this PR (see "Out of scope" in the PR
body).
Drop the bucket from the `FailureReason` union, the classifier branch,
the test, the CHANGELOG entry, and the corresponding button in the
playground analytics test bench. Pre-flight failures will still be
diagnosable when we wire them up; we'll add the bucket back at the
same time. Keeping it now just risks reviewers (or future us) trying
to trigger it and finding it doesn't fire.
…cket
- Inspect wallet JSON-RPC code BEFORE the transport_disconnect substring
heuristic so a wallet `{ code: 4900, message: 'Disconnected' }` no
longer leaks into transport_disconnect.
- Narrow the transport_disconnect substring set ('transport disconnect',
'connection lost', 'socket closed', 'not connected') so it doesn't
snag wallet errors that happen to mention "disconnect".
- Drop session_expired from FailureReason — no producer in the monorepo
throws an error matching its message regex, the branch was
defensive-only.
- Document connection-side error shapes in a new test block so the
classifier's current behaviour on `transport.connect()` catch errors
('Transport not initialized', 'Existing connection is pending', etc.)
is pinned and visible.
- analytics.test.ts: re-wrap two long lines per prettier (mechanical formatting only, no test behaviour changes). - analytics-echo-server.mjs: auto-fix curly, prefer-destructuring, prefer-template, hashbang, and other surface-level lint errors; inline-disable n/no-process-env for the documented PORT override; rename catch binding err -> parseError (id-denylist). - CHANGELOGs: append #290 PR links to every Unreleased bullet under packages/analytics and packages/connect-multichain (required by the check_changelog workflow) and add an Unreleased block to playground/browser-playground for the Analytics test bench + echo server additions. No production code changes.
Collapse iterative-development bullets in the Unreleased sections of @metamask/connect-multichain, @metamask/analytics, and @metamask/browser-playground down to what an atomic reader needs: - connect-multichain: 1 Added + 4 Fixed -> 1 Added + 3 Fixed. Dropped the session_expired-removal bullet (no producer ever emitted it, so it's not a user-facing change worth documenting) and the standalone 4900 / transport_disconnect ordering bullet (covered by the classifier description in the Added entry). - analytics: trim the schema-mirror bullet, drop the verbose taxonomy enumeration (it lives in the producer changelog and the schema PR). - browser-playground: collapse the two test-bench bullets into one. No code or behaviour change.
Narrow this PR to just the `failure_reason` taxonomy + classifier work. The two `isRejectionError` correctness fixes (unwrap RPCInvokeMethodErr, drop 4100 from the rejection set, tighten the "user" message heuristic) move to #292 so the two pieces can be reviewed independently. isRejectionError is now byte-identical to main on this branch. The new wallet_unauthorized classifier bucket is forward-compatible dead code until #292 lands and stops routing 4100 to _rejected. Drop the corresponding tests: the isRejectionError describe block in analytics.test.ts and the "code 4001 wrapped in RPCInvokeMethodErr lands in _rejected" regression test in requestRouter.test.ts (both belong to #292's surface area). CHANGELOG: drop the three isRejectionError Fixed bullets — they're covered by #292's changelog. No behavior change vs main for isRejectionError.
Same dead-bucket cleanup we did for no_active_session and session_expired. The classifier had RPCHttpErr / RPCReadonlyRequestErr / RPCReadonlyResponseErr branches mapping to rpc_node_http_error / rpc_node_request_error / rpc_node_response_error, but no producer emits any of them — handleWithRpcNode in RequestRouter throws without ever calling analytics.track. They were unreachable from day one. Drop the three FailureReason union members, the three classifier branches, the three matching tests, and the three error-class imports. Also drop the "Not currently tracked" disclaimer block in the playground test bench (used to call them out as supported-but- not-wired) and update the analytics package schema description to swap the misleading `rpc_node_error` example for `wallet_unauthorized`. When a producer wires up handleWithRpcNode tracking, we will add the buckets back at the same time. Keeping them now just risks reviewers (or future us) thinking they fire and being confused when they never do.
…addEthereumChain recovery When `wallet_switchEthereumChain` throws "Unrecognized chain ID" and the caller supplied a `chainConfiguration`, the shim catches the error and transparently calls `#addEthereumChain` — from the dapp/user's perspective this is still one logical chain change with one outcome. The previous behaviour fired `mmconnect_wallet_action_failed` for the swallowed switch attempt before the recovery ran, producing four Mixpanel events (`switch _requested`, `switch _failed`, `add _requested`, `add _succeeded`) for what should be three. Hoist the recovery-eligibility check above the `_failed` track call so the event only fires when there's no recovery to attempt: - No `chainConfiguration` provided → switch `_failed` still fires (caller sees the rejection too). - Error is not "Unrecognized chain ID" → switch `_failed`/`_rejected` still fires. - Error is "Unrecognized chain ID" + `chainConfiguration` is set → fall through to `#addEthereumChain`, whose own `_succeeded` / `_failed` represents the whole flow. Net effect: exactly one `_failed` per logical chain change, regardless of which RPC method the wallet rejected. Tests: - New `analytics.track` spy on the switchChain test block (the existing tests only asserted network calls, not analytics surface). - Asserts the suppression on the recovery-success path (3 events, not 4). - Asserts `_failed` still fires when no chainConfiguration is provided. - Asserts `_failed`/`_rejected` still fires for non-"Unrecognized chain" errors with chainConfiguration. - New test: when recovery itself fails, suppression of the switch event is preserved and only the add's `_failed` fires (1 `_failed`, not 2). Mock core extended with `options.dapp`, `versions`, `transportType`, and `storage.getAnonId()` so `getWalletActionAnalyticsProperties` can run end-to-end instead of being swallowed by `#trackWalletAction*`'s defensive try/catch. Related: WAPI-1253. The closed #291 conflated this fix with a non-existent shim-vs-router duplication; this PR addresses only the real intra-shim double-fire.
wenfix
previously approved these changes
May 13, 2026
Base automatically changed from
wapi-1253-failure-reason-and-tracking-fixes
to
main
May 14, 2026 20:30
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on top of #290. Targets
wapi-1253-failure-reason-and-tracking-fixes; once #290 lands GitHub will retarget the base tomainautomatically and the diff will narrow to just the three files in this PR.Summary
On the
switchChain → "Unrecognized chain ID" → addEthereumChainrecovery path, the EVM shim was firing four Mixpanel events for what is logically one chain change:The middle
_failedis misleading. The dapp calledswitchChain(chainId, chainConfiguration); the wallet didn't know the chain; the shim's recovery added it and switched; the dapp's promise resolves successfully. The user sees one chain change. Mixpanel saw a failure that never reached the dapp.This PR hoists the recovery-eligibility check above the
_failedtrack call so the failed event only fires when there's no recovery to attempt. Post-merge, the same successful-recovery flow emits three events:Behavioural matrix
switch _requested,switch _succeededchainConfigurationswitch _requested,switch _failedchainConfigurationswitch _requested,switch _failed/_rejectedchainConfiguration, recovery succeeds_requested, switch_failed, add_requested, add_succeeded_requested, add_requested, add_succeededchainConfiguration, recovery itself fails_requested, switch_failed, add_requested, add_failed_requested, add_requested, add_failedThe suppression preserves the "exactly one
_failedper logical chain change" invariant in the unhappy path too — the add's own_failedrepresents the failure of the whole flow.Why not just remove the shim tracking entirely?
That was the framing of the closed #291, which claimed the multichain
RequestRouterwas tracking these methods at the wallet boundary so the shim was duplicating it. Source review showed that's not true —MetamaskConnectEVM.switchChaincallsthis.#request(...)→transport.sendEip1193Message(...), which does a rawpostMessage(default transport) /dappClient.sendRequest({ name: 'metamask-provider', ... })(MWP) directly. Those paths bypassRequestRouter.invokeMethod/wallet_invokeMethodand so bypass#withAnalyticsTrackingentirely. The shim is the only analytics layer for the four EIP-1193 intercepted methods (wallet_switchEthereumChain,wallet_addEthereumChain,eth_requestAccounts/wallet_requestPermissions). #291 was closed for this reason; this PR is the correct, scoped fix for the one real intra-shim double-fire.Recovering the lost signal
The pre-suppression
switch _failedevent was the only signal that told us "wallet X didn't know chain Y." Post-suppression this is still recoverable via a Mixpanel query: anywallet_addEthereumChain _requestedevent for which the same session has no preceding dapp-initiatedaddEthereumChaincall is, by construction, a chain-switch-recovery attempt. Captured as backlog item (b) in thefailure_reasonfollow-up plan.Test plan
New test coverage in
packages/connect-evm/src/connect.test.ts(describe('switchChain', ...)):falls back to wallet_addEthereumChain when wallet_switchEthereumChain fails with "Unrecognized chain ID" and chainConfiguration is provided— extended to assert event names match[switch _requested, add _requested, add _succeeded](no_failed).still emits _failed for the switch when recovery is impossible (no chainConfiguration provided)— renamed from the old title; asserts[switch _requested, switch _failed].rethrows non "Unrecognized chain ID" errors without falling back, even when chainConfiguration is provided— extended to assert one of_failed/_rejectedstill fires.emits _failed for the add when recovery itself fails (suppression only applies on successful recovery)— asserts the unhappy-recovery path emits exactly one_failed(the add's), not two.Test scaffolding:
mockCoreextended withoptions.dapp,versions,transportType, andstorage.getAnonId()sogetWalletActionAnalyticsPropertiesruns end-to-end instead of being swallowed by#trackWalletAction*'s defensive try/catch. Confirmed by running the failing tests against the un-suppressed code — the first round of failures showed empty event arrays (analytics path silently throwing); after the mock fix, the failures show the actualswitch _failedbetween switch and add, exactly the bug.vi.spyOn(analytics, 'track')set up inbeforeEachafter the session-establishment events fire so test assertions only see calls made during the test body.Validation:
yarn workspace @metamask/connect-evm test --run— 64 passed.yarn workspace @metamask/connect-multichain test --run— 354 passed, 4 skipped (no regression from the mock change).yarn build— all 11 tasks succeed.yarn lint:eslint— clean.yarn changelog:validate— clean.Related
failure_reasonto_failedevents + fixisRejectionError#290 —failure_reasonplumbing + classifier hardening. This PR is stacked on it.isRejectionErrorcorrectness fixes (unwrap, drop 4100, tighten "user" heuristic) #292 —isRejectionErrorcorrectness fixes (independent).error_code+error_message_samplecompanions (stacked on feat(analytics): attachfailure_reasonto_failedevents + fixisRejectionError#290).