Skip to content

Unify instanceof with TypeTagJumpTable dispatch, fused IsType, InstanceOfRemoved diagnostic#3340

Merged
antoniosarosi merged 19 commits intocanaryfrom
unify-instanceof-with-typetag
Apr 10, 2026
Merged

Unify instanceof with TypeTagJumpTable dispatch, fused IsType, InstanceOfRemoved diagnostic#3340
antoniosarosi merged 19 commits intocanaryfrom
unify-instanceof-with-typetag

Conversation

@antoniosarosi
Copy link
Copy Markdown
Contributor

@antoniosarosi antoniosarosi commented Apr 7, 2026

Artifacts | Task

What problems was I solving

Class-type match arms compiled to sequential IsType chains (O(n) per arm), while enum variants already enjoyed O(1) jump table dispatch via Discriminant. The SwitchKind::TypeTag infrastructure was fully plumbed through MIR -> bytecode -> VM but sat dead behind #[allow(dead_code)]. Meanwhile, instanceof remained in the V2 surface language despite being removed from V1, silently lowering to Rvalue::IsType instead of producing the intended E0098 diagnostic. Type checks used two different multi-instruction patterns: LoadConst + CmpOp::InstanceOf for classes and TypeTag + LoadConst + CmpOp::Eq for primitives. Sparse switch dispatch used O(log K) binary search, which for K=8 arms emits ~4K instructions of comparisons and branching.

After this PR:

  • Class-type match/catch arms compile through TypeTag + Terminator::Switch -> analyze_switch picks JumpTable (dense, >=4 arms), PerfectHash (sparse, >=4), or IfElseChain (<4)
  • Sparse >=4-arm TypeTag switches use compile-time minimal perfect hashing (Instruction::DenseTag) for O(1) dispatch, replacing the O(log K) BinarySearch path. Memory scales with K (arms), not N (total classes)
  • A single fused Instruction::IsType(usize) replaces both multi-instruction type-check patterns, reducing per-arm instruction count from 2-3 to 1
  • instanceof keyword emits E0098 InstanceofRemoved diagnostic at CST->AST lowering
  • CmpOp::InstanceOf is deleted entirely from bytecode and VM

What user-facing changes did I ship

  • instanceof keyword now produces a helpful error: "instanceof is no longer supported. Use a match expression for type checking instead."
  • Class-type match/catch expressions with 4+ arms now dispatch via jump table (dense tags) or perfect hash + jump table (sparse tags) instead of sequential comparison chains
  • Type checks in sequential chains use a single is_type instruction instead of 2-3 instructions
  • No breaking changes for valid BAML code -- instanceof was already non-functional in V1 and all existing tests use match

How I implemented it

Six incremental commits, each independently testable, delivered in bottom-up order (MIR -> bytecode -> AST cleanup -> optimizations):

Phase 1: Activate SwitchKind::TypeTag (commit dc4a387)

  • lower.rs -- Core change. Removed #[allow(dead_code)] from SwitchKind::TypeTag and class_type_tags. Added type_tag_for_ty() helper. Extended try_lower_match_as_switch to recognize AstPattern::Binding (resolving to a class type) and AstPattern::TypedBinding patterns, routing them through SwitchKind::TypeTag. Added TypeTag switch-operand emission (Rvalue::TypeTag), arm name generation, and exhaustiveness handling.
  • lower.rs -- Added build_class_type_tags() method that assigns tags iterating compiler2_all_files in emitter order, ensuring MIR tags match runtime class.type_tag values.
  • match_types.rs -- Integration tests for JumpTable, IfElseChain, and mixed class+primitive dispatch.

Phase 2: Fused Instruction::IsType (commit 0f43b6e)

  • bytecode.rs -- Added Instruction::IsType(usize). Deleted CmpOp::InstanceOf from CmpOp enum and Display.
  • vm.rs -- Added IsType handler: dispatches on Value::Object for class identity or Value::Int for type tag comparison. Removed all six CmpOp::InstanceOf match arms.
  • emit.rs -- Rewrote is_type() to emit a single Instruction::IsType(const_idx) for both class and primitive paths.
  • debug.rs -- Display for IsType in expanded, textual, and color modes. Textual mode resolves constant metadata for human-readable output (is_type Cat instead of is_type ?2).

Phase 3: Remove instanceof keyword (commit d32147f)

  • lower_expr_body.rs -- KW_INSTANCEOF now emits LoweringDiagnostic::InstanceofRemoved and returns Expr::Missing.
  • lowering_diagnostic.rs -- Added InstanceofRemoved variant with E0098 error code.
  • ast.rs -- Deleted BinaryOp::Instanceof variant.
  • Removed Instanceof arms from builder.rs, lower.rs, from_ast.rs, compiler.rs.
  • instanceof.rs -- Tests verify E0098 diagnostic.

Phase 4: Perfect Hash + DenseTag for sparse switches (commits d365605, 98ae8f5, 119a7e9)

  • emit.rs -- Added SwitchStrategy::PerfectHash variant and find_perfect_hash(): brute-force search over multiply-shift hash family h(x) = ((x as u64).wrapping_mul(M) >> S) & mask to find collision-free mapping. For K <= 128 keys, tries table sizes next_power_of_two(K) then 2x, with M in 1..10_000 and S in 0..64. Completes in microseconds for typical arm counts.
  • emit.rs -- Added emit_switch_perfect_hash(): emits type_tag + DenseTag(table_idx) + JumpTable -- just 3 instructions for O(1) dispatch regardless of tag sparsity. The DenseTag instruction remaps sparse tags to dense [0, K-1] indices; unknown tags push -1 as sentinel for the jump table's default arm.
  • bytecode.rs -- Added Instruction::DenseTag(usize), MatchHashTable (multiply/shift/mask + entries), MatchHashEntry (expected_tag + dense_index). Bytecode gains match_hash_tables: Vec<MatchHashTable> field.
  • vm.rs -- DenseTag handler: computes h(tag), verifies entry.expected_tag == tag, pushes dense_index or -1 sentinel.
  • debug.rs -- Display for DenseTag: shows dense_tag [Cat, Dog, Bird, Fish] with key names from the hash table.

Phase 5: Catch arm TypeTag dispatch + threshold tuning (commit 119a7e9)

  • lower.rs -- Added try_lower_catch_as_switch(): catch arms with 4+ typed patterns now use TypeTag switch dispatch with throw_if_panic for wildcard arms and rethrow for default. Catch switches are never marked exhaustive (panics must always be rethrowable).
  • lower.rs -- Raised TypeTag switch threshold to 4 arms for both match and catch. Below 4 arms, the sequential is_type chain is more compact than the type_tag + copy/cmp if-else chain.
  • lower.rs -- Adopted canary's classify_pattern_type_tag() helper, updated to support class types via type_tag_for_ty(). Fixed otherwise-block lowering to use recomputed is_switch_exhaustive flag.
  • emit.rs -- Single-arm exhaustive switches skip the discriminant entirely (no type_tag + pop 1 for guaranteed matches).
  • exceptions.rs -- Tests: catch_four_typed_arms_jump_table, catch_four_typed_arms_plus_wildcard_jump_table, catch_two_typed_arms_sequential_chain, catch_mixed_literal_and_typed_no_switch, null routing, and canary's primitive TypeTag + instanceof chain tests.

Bug fixes along the way

  • Null literal routing (commit 98ae8f5): Null literal patterns in TypeTag switches now route through the tag path instead of bailing out, since null has a stable type tag (typetag::NULL).
  • Exhaustive match chain fix (commit 9c0bc05): Skip the last-arm pattern test in exhaustive match chains to avoid redundant comparison when the otherwise block is unreachable.
  • Single-arm exhaustive skip (commit d365605): Don't emit type_tag + pop 1 when a match has exactly one exhaustive arm.

Deviations from the plan

Implemented as planned

  • All four phases delivered: TypeTag activation, fused IsType, instanceof removal, perfect hash dispatch
  • Core architecture matches: type_tag_for_ty(), classify_pattern_type_tag(), try_lower_match_as_switch extension, IsType VM handler, emit rewrite, find_perfect_hash() with multiply-shift family, MatchHashTable/MatchHashEntry data structures

Deviations/surprises

  • MatchHash -> DenseTag: Plan specified Instruction::MatchHash(usize), implementation uses Instruction::DenseTag(usize). Naming only -- semantics identical.
  • 4-arm threshold: Plan assumed analyze_switch would implicitly choose the right strategy. Implementation added an explicit 4-arm minimum for TypeTag switches since the copy/cmp overhead of small switches makes sequential is_type chains more compact.
  • build_class_type_tags: Plan described inline population during package iteration. Implementation extracted a dedicated method iterating compiler2_all_files in emitter order -- required for correctness to ensure MIR tags match runtime values.
  • TypeTag exhaustiveness inference: For match + TypeTag with no wildcard, treat as exhaustive since TIR's required_match_cases returns None for class types. Not in plan but required for correctness.
  • Catch never exhaustive: Catch switches never marked exhaustive even when all declared thrown types are covered, because panics must always be rethrowable via the otherwise block.

Additions not in plan

  • Catch arm TypeTag dispatch (try_lower_catch_as_switch) -- the plan only covered match expressions
  • is_type operand metadata for human-readable debug output (is_type Cat not is_type ?2)
  • Single-arm exhaustive switch optimization (skip discriminant entirely)
  • Null literal routing through TypeTag path
  • Exhaustive match chain last-arm skip

Items planned but not implemented

  • None. All four phases fully implemented.

How to verify it

Setup

git fetch
git checkout unify-instanceof-with-typetag
cd baml_language

Automated Tests

cargo test -p baml_tests -p bex_vm -p bex_vm_types

Manual Verification

  • cargo test -p baml_tests -- match_class_type_tag_jump_table -- 4+ dense class arms use type_tag + jump_table
  • cargo test -p baml_tests -- match_class_type_is_type_chain -- <4 class arms use sequential is_type
  • cargo test -p baml_tests -- match_mixed_class_primitive -- mixed class + primitive TypeTag switch
  • cargo test -p baml_tests -- catch_four_typed_arms_jump_table -- catch TypeTag dispatch
  • cargo test -p baml_tests -- instanceof -- E0098 diagnostic emitted
  • Bytecode snapshots show is_type replacing load_const + cmp_op instanceof and type_tag + load_const + cmp_op ==
  • Bytecode snapshots show dense_tag + jump_table for sparse >=4-arm TypeTag switches

Description for the changelog

Unify type checking: activate TypeTag dispatch for class-type match/catch arms with jump table (dense) or perfect hash (sparse) for O(1) dispatch, replace multi-instruction type checks with fused IsType, and remove instanceof keyword with E0098 diagnostic.

Route class-typed and primitive-typed match arms through
Terminator::Switch → analyze_switch (JumpTable/BinarySearch/IfElseChain)
instead of sequential IsType chains.

- Remove dead_code attributes from SwitchKind::TypeTag and class_type_tags
- Add type_tag_for_ty helper for type tag lookup
- Extend try_lower_match_as_switch to recognize Binding and TypedBinding
  patterns resolving to class/primitive types
- Add SwitchKind::TypeTag arm in switch-operand and arm_names emission
- Fix class_type_tags ordering to match emitter iteration order
- Add snapshot tests for JumpTable, IfElseChain, and BinarySearch dispatch
Replace the two multi-instruction type-check patterns with a single
fused Instruction::IsType(usize):
- LoadConst + CmpOp::InstanceOf (classes) → IsType(const_idx)
- TypeTag + LoadConst + CmpOp::Eq (primitives) → IsType(const_idx)

The constant pool index resolves to either a class HeapPtr (identity
check) or an integer tag (type tag comparison). CmpOp::InstanceOf is
deleted entirely from bytecode and VM.
…nostic

Remove BinaryOp::Instanceof from AST, TIR, MIR, visualization, and
onionskin crates. Emit InstanceofRemoved (E0098) diagnostic at
CST→AST lowering with a helpful message pointing users to match
expressions. Lexer/parser/formatter kept for parse-level error recovery.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Apr 10, 2026 6:10pm
promptfiddle Ready Ready Preview, Comment Apr 10, 2026 6:10pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed the instanceof binary operator from the AST and lowering, added a InstanceofRemoved lowering diagnostic, introduced type-tag-based match lowering and an Instruction::IsType emission, and updated emitter, VM, bytecode, visualization, and tests to use is_type-based dispatch.

Changes

Cohort / File(s) Summary
AST & Lowering
baml_language/crates/baml_compiler2_ast/src/ast.rs, baml_language/crates/baml_compiler2_ast/src/lower_expr_body.rs, baml_language/crates/baml_compiler2_ast/src/lowering_diagnostic.rs
Removed BinaryOp::Instanceof and its Display mapping; encountering KW_INSTANCEOF now emits LoweringDiagnostic::InstanceofRemoved and returns Expr::Missing.
MIR / Lowering Context
baml_language/crates/baml_compiler2_mir/src/ir.rs, baml_language/crates/baml_compiler2_mir/src/lower.rs
Added centralized class type-tag building (build_class_type_tags) and type_tag_for_ty; removed MIR lowering for AstBinaryOp::Instanceof; implemented SwitchKind::TypeTag lowering using Rvalue::TypeTag and adjusted exhaustiveness/arm materialization.
Emitter / Codegen
baml_language/crates/baml_compiler2_emit/src/emit.rs
Optimized single-arm exhaustive switches; PullSink::is_type now emits Instruction::IsType(const) for known class/typetag constants, replacing load+cmp sequences.
VM & Bytecode
baml_language/crates/bex_vm_types/src/bytecode.rs, baml_language/crates/bex_vm/src/vm.rs, baml_language/crates/bex_vm/src/debug.rs
Added Instruction::IsType(usize) and removed CmpOp::InstanceOf; VM implements IsType (pop value → push Bool via class identity or typetag); debug/display updated.
TIR / Builder
baml_language/crates/baml_compiler2_tir/src/builder.rs
Adjusted diagnostic wording for union narrowing; removed Instanceof from binary-op result-typing paths.
Visualization & Tools
baml_language/crates/baml_compiler2_visualization/src/control_flow/from_ast.rs, baml_language/crates/tools_onionskin/src/compiler.rs
Removed rendering/mapping for BinaryOp::Instanceof in visualization and onionskin operator descriptions.
Tests & Snapshots
baml_language/crates/baml_tests/tests/instanceof.rs, baml_language/crates/baml_tests/tests/exceptions.rs, baml_language/crates/baml_tests/tests/match_basics.rs, baml_language/crates/baml_tests/tests/match_types.rs
Updated/renamed instanceof tests to assert the removal diagnostic; updated many bytecode snapshots to use is_type ?<n> and reflect jump-table vs linear dispatch changes for union type patterns.

Sequence Diagram(s)

sequenceDiagram
  participant Parser
  participant Lowerer as Lowerer (AST)
  participant MIR as MIR/LoweringContext
  participant Emitter as Emitter/Stackify
  participant VM

  Parser->>Lowerer: parse source (including `instanceof` / match type-patterns)
  Lowerer->>Lowerer: detect `KW_INSTANCEOF` -> emit InstanceofRemoved\nreturn Expr::Missing
  Note right of Lowerer: for match type-patterns, build class_type_tags and emit Rvalue::TypeTag
  Lowerer->>MIR: produce SwitchKind::TypeTag / Rvalue::TypeTag
  MIR->>Emitter: lower switch/arms, request type_tag constants
  Emitter->>Emitter: materialize discriminant or emit `IsType(const_idx)`
  Emitter->>VM: emit bytecode with `Instruction::IsType(idx)`
  VM->>VM: `IsType` pops value, compares by class identity or typetag, pushes Bool
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • aaronvg

"I hopped through trees of code today,
Found instanceof had gone away.
I left a note with carrot cheer,
'Use match!' I said, then thumped my ear. 🥕🐇"

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: unifying instanceof with TypeTag dispatch, introducing the fused IsType instruction, and adding the InstanceofRemoved diagnostic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch unify-instanceof-with-typetag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Binary size checks passed

7 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 5.8 MB 5.7 MB +121.7 KB (+2.1%) OK
bridge_cffi-stripped Linux 5.8 MB 5.7 MB +98.6 KB (+1.7%) OK
bridge_cffi macOS 4.8 MB 4.6 MB +143.8 KB (+3.1%) OK
bridge_cffi-stripped macOS 4.8 MB 4.7 MB +86.7 KB (+1.9%) OK
bridge_cffi Windows 4.8 MB 4.6 MB +136.5 KB (+3.0%) OK
bridge_cffi-stripped Windows 4.8 MB 4.7 MB +89.5 KB (+1.9%) OK
bridge_wasm WASM 3.1 MB 3.0 MB +117.0 KB (+3.9%) OK

Generated by cargo size-gate · workflow run

coderabbitai[bot]
coderabbitai bot previously requested changes Apr 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a7f463a-39ee-4efb-8547-6a4069ba0ac7

📥 Commits

Reviewing files that changed from the base of the PR and between 3aa278c and d32147f.

⛔ Files ignored due to path filters (13)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__testing_std__/baml_tests____testing_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/catch_all_keyword/baml_tests__catch_all_keyword__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/catch_throw/baml_tests__catch_throw__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/catch_throw_regressions/baml_tests__catch_throw_regressions__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/error_cases/baml_tests__error_cases__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/format_checks/baml_tests__format_checks__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/format_checks/baml_tests__format_checks__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/match_exhaustiveness/baml_tests__match_exhaustiveness__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/match_exhaustiveness/baml_tests__match_exhaustiveness__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_textual.snap is excluded by !**/*.snap
📒 Files selected for processing (16)
  • baml_language/crates/baml_compiler2_ast/src/ast.rs
  • baml_language/crates/baml_compiler2_ast/src/lower_expr_body.rs
  • baml_language/crates/baml_compiler2_ast/src/lowering_diagnostic.rs
  • baml_language/crates/baml_compiler2_emit/src/emit.rs
  • baml_language/crates/baml_compiler2_mir/src/ir.rs
  • baml_language/crates/baml_compiler2_mir/src/lower.rs
  • baml_language/crates/baml_compiler2_tir/src/builder.rs
  • baml_language/crates/baml_compiler2_visualization/src/control_flow/from_ast.rs
  • baml_language/crates/baml_tests/tests/exceptions.rs
  • baml_language/crates/baml_tests/tests/instanceof.rs
  • baml_language/crates/baml_tests/tests/match_basics.rs
  • baml_language/crates/baml_tests/tests/match_types.rs
  • baml_language/crates/bex_vm/src/debug.rs
  • baml_language/crates/bex_vm/src/vm.rs
  • baml_language/crates/bex_vm_types/src/bytecode.rs
  • baml_language/crates/tools_onionskin/src/compiler.rs
💤 Files with no reviewable changes (3)
  • baml_language/crates/baml_compiler2_visualization/src/control_flow/from_ast.rs
  • baml_language/crates/baml_compiler2_ast/src/ast.rs
  • baml_language/crates/tools_onionskin/src/compiler.rs

Resolve conflicts between canary's error type refactor (VmError →
VmInternalError/VmBamlError/VmRustFnError) and this branch's removal of
CmpOp::InstanceOf in favor of fused Instruction::IsType.

Conflicts resolved:
- bex_vm/src/vm.rs: Dropped CmpOp::InstanceOf match arms that canary
  updated with new error types, since InstanceOf is removed from the enum.
- Bytecode snapshots: Regenerated to reflect both canary's new error
  classes (AllocFailure, etc.) and this branch's is_type instruction.
Avoids emitting wasteful type_tag + pop 1 when a match has exactly one
exhaustive arm—no comparison is needed so the discriminant is never pushed.
Null patterns in match arms now go through SwitchKind::TypeTag instead
of falling back to the chain path. Combined with the single-arm
exhaustive optimization, this eliminates redundant null comparisons.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_compiler2_mir/src/lower.rs`:
- Around line 3619-3627: The code sets is_switch_exhaustive using
otherwise_idx.is_none() && (is_exhaustive || matches!(switch_kind,
Some(SwitchKind::TypeTag))), which incorrectly treats any TypeTag switch without
a wildcard as exhaustive; change the condition to only treat the switch as
exhaustive when TIR marked it exhaustive. Update the expression for
is_switch_exhaustive to otherwise_idx.is_none() && is_exhaustive (remove the
matches!(switch_kind, Some(SwitchKind::TypeTag)) clause) so only true exhaustive
switches (as indicated by is_exhaustive) skip the final arm comparison.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d3a966da-1fbe-4e3a-a425-f863d3bc9cdc

📥 Commits

Reviewing files that changed from the base of the PR and between d365605 and 98ae8f5.

⛔ Files ignored due to path filters (10)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__testing_std__/baml_tests____testing_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__testing_std__/baml_tests____testing_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/lambda_advanced/baml_tests__lambda_advanced__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/match_exhaustiveness/baml_tests__match_exhaustiveness__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/match_exhaustiveness/baml_tests__match_exhaustiveness__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_textual.snap is excluded by !**/*.snap
📒 Files selected for processing (2)
  • baml_language/crates/baml_compiler2_mir/src/lower.rs
  • baml_language/crates/baml_tests/tests/match_basics.rs

Revert null-to-TypeTag routing (it made 2-arm optional matches worse)
and instead teach lower_match_chain about exhaustiveness: when the match
is exhaustive and we reach the last arm with no guard, skip the pattern
test entirely since it must match.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
baml_language/crates/baml_compiler2_mir/src/lower.rs (1)

3596-3604: ⚠️ Potential issue | 🟠 Major

TypeTag switch is marked exhaustive without proof.

This can miscompile partial typed matches by turning the default path into Unreachable when is_exhaustive is false.

Safe minimal fix
-        let is_switch_exhaustive = otherwise_idx.is_none()
-            && (is_exhaustive || matches!(switch_kind, Some(SwitchKind::TypeTag)));
+        let is_switch_exhaustive = otherwise_idx.is_none() && is_exhaustive;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_compiler2_mir/src/lower.rs` around lines 3596 -
3604, The code wrongly treats TypeTag switches as exhaustive without proof;
update the is_switch_exhaustive computation (the binding named
is_switch_exhaustive in lower.rs) to require the original is_exhaustive flag
rather than special-casing SwitchKind::TypeTag — i.e., only set
is_switch_exhaustive when otherwise_idx.is_none() && is_exhaustive, removing the
matches!(..., SwitchKind::TypeTag) clause so partial typed matches don't turn
the default path into Unreachable.
🧹 Nitpick comments (1)
baml_language/crates/baml_compiler2_mir/src/lower.rs (1)

465-493: Avoid recomputing class type tags for every lowering context.

build_class_type_tags walks all files and is invoked in both constructors, which can add avoidable compile-time overhead. Consider memoizing this map behind a DB query and reusing it here.

Also applies to: 621-623, 792-794

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_compiler2_mir/src/lower.rs` around lines 465 - 493,
build_class_type_tags currently recomputes the full map by walking all files
(using compiler2_all_files, file_item_tree, file_package) each time it's called;
convert this to a memoized DB query so the map is computed once and cached and
callers (the constructors mentioned) call that query instead. Move the body of
build_class_type_tags into a new salsa-tracked/DB query (e.g.,
class_type_tags_query or Db::class_type_tags) and have build_class_type_tags
delegate to that query (or replace callers to call the new query directly);
ensure the query returns IndexMap<TypeName,i64> using the same logic
(module_path creation, TypeName construction, type_tag counter) so behavior is
unchanged but cached. Update the two constructors that currently invoke
build_class_type_tags to use the new DB query, and apply the same pattern to the
other similar sites called out (lines ~621-623 and ~792-794).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@baml_language/crates/baml_compiler2_mir/src/lower.rs`:
- Around line 3596-3604: The code wrongly treats TypeTag switches as exhaustive
without proof; update the is_switch_exhaustive computation (the binding named
is_switch_exhaustive in lower.rs) to require the original is_exhaustive flag
rather than special-casing SwitchKind::TypeTag — i.e., only set
is_switch_exhaustive when otherwise_idx.is_none() && is_exhaustive, removing the
matches!(..., SwitchKind::TypeTag) clause so partial typed matches don't turn
the default path into Unreachable.

---

Nitpick comments:
In `@baml_language/crates/baml_compiler2_mir/src/lower.rs`:
- Around line 465-493: build_class_type_tags currently recomputes the full map
by walking all files (using compiler2_all_files, file_item_tree, file_package)
each time it's called; convert this to a memoized DB query so the map is
computed once and cached and callers (the constructors mentioned) call that
query instead. Move the body of build_class_type_tags into a new
salsa-tracked/DB query (e.g., class_type_tags_query or Db::class_type_tags) and
have build_class_type_tags delegate to that query (or replace callers to call
the new query directly); ensure the query returns IndexMap<TypeName,i64> using
the same logic (module_path creation, TypeName construction, type_tag counter)
so behavior is unchanged but cached. Update the two constructors that currently
invoke build_class_type_tags to use the new DB query, and apply the same pattern
to the other similar sites called out (lines ~621-623 and ~792-794).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d414997a-d5f5-42ec-8c71-c9ccf9dfc588

📥 Commits

Reviewing files that changed from the base of the PR and between 98ae8f5 and 9c0bc05.

⛔ Files ignored due to path filters (11)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__testing_std__/baml_tests____testing_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/format_checks/baml_tests__format_checks__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/format_checks/baml_tests__format_checks__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/match_exhaustiveness/baml_tests__match_exhaustiveness__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/match_exhaustiveness/baml_tests__match_exhaustiveness__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/namespaces_type_resolution/baml_tests__namespaces_type_resolution__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/namespaces_type_resolution/baml_tests__namespaces_type_resolution__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_textual.snap is excluded by !**/*.snap
📒 Files selected for processing (2)
  • baml_language/crates/baml_compiler2_mir/src/lower.rs
  • baml_language/crates/baml_tests/tests/match_basics.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • baml_language/crates/baml_tests/tests/match_basics.rs

…-arm threshold

- Add try_lower_catch_as_switch: catch arms with 4+ typed patterns now
  use type_tag + jump_table dispatch instead of sequential is_type chains.
  Handles throw_if_panic for wildcard arms and rethrow for default.

- Raise TypeTag switch threshold to 4 arms for both match and catch.
  For <4 arms the sequential is_type chain is more compact than the
  type_tag + copy/cmp_op if-else chain.

- Add operand metadata to IsType instructions so bytecode debug output
  shows `is_type Cat` instead of `is_type ?2`.

- Add tests: null in TypeTag switch, catch switch optimization (4-arm
  jump_table, wildcard+throw_if_panic, 2-arm sequential, mixed no-switch).
Resolves conflicts between TypeTag switch dispatch (this branch) and
bytecode optimization PR (canary):

- lower.rs: adopt canary's classify_pattern_type_tag/format_type_tag_name
  helpers, updated to support class types via type_tag_for_ty
- Fix exhaustive switch assertion: only infer TypeTag exhaustiveness for
  match expressions, not catch (any value can be thrown at runtime)
- Fix otherwise-block lowering to use recomputed is_switch_exhaustive
  rather than caller's original is_exhaustive flag
- exceptions.rs: merge both branch's new tests, update inline snapshots
  for canary's InitField optimization
- Accept snapshot updates from canary's empty-block elimination
…ceof tests

- Delete try_lower_catch_as_switch (~160 lines) — the unified
  try_lower_as_switch already handles catch via SwitchOtherwise::Catch
- Delete baml_tests/tests/instanceof.rs (custom test, replaced by snapshots)
- Update LSP tests to expect E0098 instanceof-removed diagnostic
- Add instanceof_removed insta snapshot project for diagnostic coverage
The if-else switch emitter copied the discriminant before each arm's
comparison so it survived for the next arm. On the last arm there is no
next arm, so the copy + two pops (match/fallthrough) were wasted.

New `emit_compare_and_branch_final` lets `cmp_op ==` consume the
discriminant directly, saving 3 instructions per non-exhaustive switch.

Also fixes stale `MatchHash` → `DenseTag` comment in bytecode.rs.
Instead of pushing the discriminant once and keeping it alive across
arms with copy/pop, each arm now re-loads from the operand. This
makes every arm self-contained and eliminates all copy+pop overhead
in the if-else chain (2 instructions saved per non-last arm).

The copy/pop pattern is retained only in emit_compare_and_branch for
the binary search strategy where the discriminant must persist across
the tree traversal.
Add JumpIfFalse (peek TOS, jump if false, no pop) and ShortCircuit
terminator to eliminate redundant store/load pairs from &&/|| evaluation.

The emitter produces the same pattern as the legacy compiler:
  &&: eval(lhs); JumpIfFalse(join); Pop(1); eval(rhs)
  ||: eval(lhs); JumpIfFalse(rhs); Jump(join); Pop(1); eval(rhs)

The destination local is classified as PhiLike via is_short_circuit_phi,
so the value stays on the operand stack with no store_var/load_var.

Total codegen: -24,459 bytes (-7.4%) vs canary across all snapshots.
The stack_carry simulation conservatively rejected all locals whose use
site was a ShortCircuit terminator, forcing redundant store_var/load_var
spills. Since ShortCircuit peeks the operand (value stays on TOS), a
PhiLike value already on the stack works correctly with JumpIfFalse.

Removes ~3.2KB of redundant bytecode from format_checks and 56 bytes
from null_handling.
Update inline snapshots for 3 ignored (compiler2 todo) assignment tests
to reflect the InitField optimization from e817f17. The runtime
assertions still fail due to unimplemented method-call-through-field
patterns — these tests remain #[ignore].
Convert §12/§13 catch tests from external file snapshots to inline
snapshots, consistent with every other test in exceptions.rs.
ShortCircuit is side-effect-free unlike Call/Await, so remove it from
the force-alive list in eliminate_dead_locals. When a ShortCircuit
destination is dead, replace the terminator with Goto to join so
dead-block elimination cleans up the eval_rhs block.

Also inline the last 2 external match_types snapshots and delete the
tests/snapshots/ directory.

null_handling codegen: -150 bytes vs canary (was +128 before).
@antoniosarosi antoniosarosi changed the title feat: unify instanceof with TypeTag — jump table dispatch, fused IsType, E0098 diagnostic Unify instanceof with TypeTag — jump table dispatch, fused IsType, E0098 diagnostic Apr 10, 2026
@antoniosarosi antoniosarosi changed the title Unify instanceof with TypeTag — jump table dispatch, fused IsType, E0098 diagnostic Unify instanceof with TypeTag — jump table dispatch, fused IsType, InstanceOfRemoved diagnostic Apr 10, 2026
@antoniosarosi antoniosarosi enabled auto-merge April 10, 2026 17:52
@antoniosarosi antoniosarosi changed the title Unify instanceof with TypeTag — jump table dispatch, fused IsType, InstanceOfRemoved diagnostic Unify instanceof with TypeTagJumpTable dispatch, fused IsType, InstanceOfRemoved diagnostic Apr 10, 2026
@antoniosarosi antoniosarosi added this pull request to the merge queue Apr 10, 2026
Merged via the queue into canary with commit d124833 Apr 10, 2026
41 checks passed
@antoniosarosi antoniosarosi deleted the unify-instanceof-with-typetag branch April 10, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant