v2: arm64 codegen perf + transformer/cleanc micro-opts by medvednikov · Pull Request #27258 · vlang/v

medvednikov · 2026-05-25T13:07:40Z

Summary

Speeds up the v2 ARM64 backend by ~10x on self-compile and trims hot loops in the transformer and cleanc backend.

ARM64 codegen

O(N²) → O(N) pending-label scan (e0e5c7a): the branch-fixup loop in gen_func scanned the full pending_label_blks list once per block. Replaced with a per-block linked list (pending_head[blk] + pending_next[i]), backed by a sparse []int (avoids map ops and a v2-self-compile hang seen with the map variant). ARM64 Gen on cmd/v2/v2.v: cleanc-built v2 ~25s → ~3.3s; arm64-built v3 ~36s → ~6.3s. Chain still converges (v4 = v5 except filename byte).
Inner-loop micro-opts in arm64.v (selected_opcode, reg_map single lookup, etc.) — ~6x on top of the original baseline.
Earlier pre-pass restructuring that took the hottest path from ~17s → ~1s.

Transformer

Cache is_native_be on the Transformer struct so hot loops (transform_stmts, IfGuardExpr handling, OrExpr expansion) skip per-call t.pref nil/enum checks.
Lazy smartcast snapshot in transform_stmts: most blocks enter with an empty smartcast stack, so skip the .clone() of stack + counts map when there is nothing to restore.

cleanc backend

gen_direct_fn_pointer_call: when a fnptr expects a pointer param and the argument is already a pointer (or a mutable param), don't re-& it.
Per-stage timing stats (mark_cgen_step) inside gen_passes_1_to_4 for -stats: surfaces setup.generic_struct_bindings, setup.discover_generic_specs, setup.fn_signatures, etc. — no runtime cost when -stats is off.

Test plan

ARM64 self-host chain still converges: v -o cmd/v2/v2 cmd/v2/v2.v && cmd/v2/v2 --no-parallel -backend arm64 -o v3 cmd/v2/v2.v && ./v3 --no-parallel -gc none -backend arm64 -o v4 cmd/v2/v2.v — v4 = v5 modulo filename byte.
ARM64 test suites pass (string 109/109, array 131/131, map 54/54, math, int, hex, sorting, etc.).
CI passes on master.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4078333289

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T13:16:57Z

 fn (mut g Gen) record_pending_label(blk int) {
 	off := g.macho.text_data.len - g.curr_offset
+	new_idx := g.pending_label_offs.len
+	prev_head := g.pending_head[blk]


Validate block index before queuing pending labels

record_pending_label now dereferences g.pending_head[blk] directly, but several branch lowering paths can pass through unresolved block IDs (e.g. values that map to -1 in val_to_block) into this helper. Before this change those cases were just queued as unresolved; now they can trigger an out-of-bounds panic and hard-crash codegen. Please guard blk bounds (or keep a safe fallback path) before indexing pending_head.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c776b4816d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T15:14:07Z

+		is_native_be:                p != unsafe { nil }
+			&& (p.backend == .arm64 || p.backend == .x64)


Recompute native-backend state when preferences change

Do not treat is_native_be as immutable state: pref.backend is mutable after transformer creation (for example, tests set t.pref.backend = .arm64 on an existing transformer), but this cache is only initialized once here and then used in many lowering paths. When the backend is changed later, native-only rewrites are skipped (or C-specific rewrites are applied), which can produce incorrect transformed AST/code for that backend. Compute this from t.pref.backend at use sites or update the cache whenever pref.backend changes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 749669eeaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T15:24:27Z

+	t2 := time.now()
 	g.gen_post_pass()
+	post_ms := f64(time.since(t2)) / f64(time.millisecond)
+	eprintln('ARM64 gen sub: pre=${pre_ms:.1}ms funcs=${funcs_ms:.1}ms post=${post_ms:.1}ms')


Gate ARM64 timing logs behind an opt-in flag

gen() now unconditionally writes ARM64 gen sub: ... to stderr on every successful ARM64 compilation. This changes normal compiler output in non-debug builds and can break callers/tests that treat any stderr text as a failure signal or rely on stable diagnostic streams; the timing print should be behind -stats or an env flag (like the detailed timer line below it) instead of always-on.

Useful? React with 👍 / 👎.

Linear-time pending_label scan via per-block linked list, selected_opcode and reg_map micro-opts, assorted arm64 codegen speedups. Squashed from a series of wip commits.

Replaces `saved := t.pending_stmts.clone(); t.pending_stmts.clear()` with `saved := t.pending_stmts; t.pending_stmts = []ast.Stmt{}` across 19 sites in expr.v, for.v, and transformer.v. The original pattern allocated a deep copy on every save and again on every restore; the swap pattern transfers ownership without copying buffer contents.

Skipping mem2reg + phi elimination by default cuts peak memory significantly and makes compile times shorter. -prod restores the previous optimize-by-default behavior alongside -O3 -flto for the C compiler.

Set V2_MEM=1 to print resident memory at parse / type check / transform / mark used / codegen boundaries. Used to pinpoint phase-level allocation hotspots when chasing v1 memory parity.

Initial flat AST representation: contiguous nodes/edges/strings arenas, FlatBuilder for streaming legacy AST in, FlatReader for rehydration, basic tests.

walk_stmt/walk_expr/walk_type were adding sizeof(Stmt|Expr|Type) at the top of each function while also charging the same 16-byte sumtype slot via add_array_storage at the parent, and never counted the heap-allocated variant payload (FnDecl, CallExpr, ArrayType, ...). Result: legacy memory estimate was undercounted by ~5x and meaningless to compare against flat. Introduce add_variant_payload(size) helper that bumps node_bytes and a new allocs counter together, call it from each variant arm with the real variant struct's sizeof, and strip the top-level sizeof from plain-struct walkers (their storage is already counted at parent/array level). Add LegacyAstStats.allocs and surface it in print_flat_ast_summary: full v2 source now reports 1.15M legacy heap allocations collapsing to 4 flat arenas - the dominant memory win is fewer GC-tracked objects, not raw bytes (still ~2.8% reduction). Round-trip tests (11/11) still pass.

The parser emits ast.empty_expr / ast.empty_stmt (always EmptyExpr(0) / EmptyStmt(0)) anywhere an Expr/Stmt slot is structurally required but absent (for-loop init/cond/post, optional clause arms, ...). The flat converter was minting a fresh 32-byte node for every occurrence: 65,897 expr_empty + 3,032 stmt_empty cells on a full v2 self-build. Cache one singleton per kind in FlatBuilder (same pattern as empty_list_id) and reuse it whenever the source value is the canonical zero. Non-zero u8 payloads still emit fresh nodes for losslessness (none exist in current parser output, but the converter must not silently drop them). Result on cmd/v2/v2.v clean build: - nodes: 854,818 → 785,889 (-68,929, -8.1%) - memory: 31.6 MB → 29.4 MB (-2.2 MB, 9.65% vs legacy AST) - allocs: still 4 arenas Also add FlatAst.count_nodes_by_kind() and a V2_FLAT_HIST=1-gated per-kind histogram in print_flat_ast_summary for future schema work (used to find these singletons in the first place). Round-trip tests (11/11) still pass — signature is unchanged because the singleton has the same extra=0/pos=0 as every original instance.

…lder Set up the public API surface the type checker, transformer, and SSA builder will need to consume FlatAst directly. The recursive AST stays in place for now — downstream consumers haven't been migrated yet — but this commit lands two things that gate every follow-up step: 1. ast.FlatBuilder is now public, with `new_flat_builder()` and a new `append_file(File) FlatNodeId` shim. The legacy converters (add_expr/add_stmt/...) stay internal; front-ends must go through `append_file` during the transition. 2. Parser.parse_files_to_flat(files, fs) ast.FlatAst is the streaming entry point: parse one file, convert, drop the legacy struct, loop. Peak resident AST is one legacy file plus the cumulative flat, instead of the full legacy forest plus the flat copy that `ast.flatten_files(p.parse_files(...))` produces today. flat_streaming_test.v asserts byte-for-byte signature equality between the streamed FlatAst and the two-step `flatten_files(parse_files())` output across a synthetic single-file source, a two-file source (to exercise per-file intern reuse and node-id ordering), and the real vlib/v2/ast/ast.v. All 11 existing round-trip tests still pass. Builder still calls parse_files() — wiring it through to the streaming path comes after the type checker can read FlatAst.

…TRIP) Adds end-to-end validation of the streaming flat AST + lossless roundtrip on the full v2 self-host. With V2_FLAT_ROUNDTRIP=1 the builder routes every parser.parse_files() through parse_files_to_flat + FlatAst.to_files(), producing byte-identical C output (281,705 lines) vs the direct path. - ast.arena_caps_for_bytes(): shared pre-sizing heuristic (~150 nodes/KB, ~170 edges/KB, ~5 strings/KB) used by both flatten_files and the streaming parse path. Eliminates ~10 geometric reallocs on self-host. - parse_files_to_flat: uses the shared helper. - Builder.parse_batch(): wrapper routing parser calls through direct or streaming+roundtrip; controlled by flat_roundtrip_enabled (env-gated). - Warn when V2_FLAT_ROUNDTRIP=1 is set alongside parallel parse (the parallel path doesn't honor the switch yet). Phase 2 invariant validated: streaming parse + lossless conversion. Next step (consumer migration) will let us drop the legacy AST entirely.

Forward declaration of the type checker's FlatAst-aware API surface. The current implementation rehydrates via FlatAst.to_files() then calls check_files; future PRs progressively replace the body with direct flat reads (selector_names first, then file metadata, then statements) so the legacy AST need not be materialized. Includes a regression test that builds both an env-via-files and an env-via-flat for the same source and verifies registered functions and methods match. Guards the eventual direct-FlatAst implementation against behavioral drift.

First real consumer-side replacement: Checker.check_flat now reads the selector_names map directly off FlatFile entries instead of letting the rehydrated legacy ast.File supply them. The rest of check_files's body is inlined into check_flat (rehydrating once into a local) so each migrated step can be pulled out of the to_files() path incrementally. Builder gets a V2_CHECK_FLAT=1 env switch that routes type_check_files through Checker.check_flat. End-to-end validation: V2_CHECK_FLAT=1 (and V2_FLAT_ROUNDTRIP=1 + V2_CHECK_FLAT=1 combined) produce byte-identical C output on the v2 self-host.

Adds public FlatAst.read_file_imports / read_file_stmts / file_mod helpers and migrates register_imported_symbols to a _from_flat variant that drives off them, removing one more dependency on the rehydrated []ast.File path in check_flat. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.

Adds preregister_all_scopes_from_flat and preregister_scopes_from_flat which use FlatAst.file_mod + active_file_imports_from_flat directly, removing one more legacy []ast.File hop in check_flat. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.

Adds preregister_all_types_from_flat and preregister_all_fn_signatures_from_flat that drive their per-file loops off FlatAst.file_mod + read_file_stmts. check_flat now only rehydrates []ast.File for the final per-file check_file pass; everything before that runs straight off the flat representation. C output remains byte-identical with V2_CHECK_FLAT=1.

Adds check_file_from_flat, check_struct_field_defaults_from_flat, and check_enum_field_values_from_flat. check_flat now drives every step directly off FlatAst; the only remaining to_files() call is gated behind \$if ownership ? for the lifetime/escape validators that still walk recursive ast.File. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.

Adds b.flat populated once after parse when V2_CHECK_FLAT=1, and switches type_check_files to consume it directly instead of re-running flatten_files on every checker entry. Sets up the persistent FlatAst surface the transformer and SSA builders will read from next. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.

When V2_CHECK_FLAT=1 the builder now owns a persistent ast.FlatBuilder that every parse_batch streams into via the new Parser.parse_files_into_flat helper. Per batch we rehydrate just the freshly-appended FlatFiles through the new FlatAst.to_files_range helper, so legacy downstream consumers still see []ast.File while b.flat ends up populated as a side effect of parsing. The post-parse flatten_files() pass is replaced with a direct b.flat_builder.flat handoff; the parallel parse path keeps the legacy fallback since it bypasses parse_batch. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.

`check_flat` no longer rehydrates the file slice for the `$if ownership ?` gate. Adds `lifetime_validate_files_from_flat` and `escape_validate_files_from_flat` (plus a flat passthrough prescan) so lifetime and escape validation walks `flat.read_file_stmts(ff)` directly. Drops the last `flat.to_files()` call out of the checker entry point.

`parse_files` now resolves the core + expanded-user path list upfront and hands it to `init_flat_builder_for_paths`, which stats the union and calls `arena_caps_for_bytes(total * 2)` to seed the flat arenas before the first `parse_batch`. The 2x scale factor reserves room for the import walk, which is the only batch we cannot enumerate ahead of time. Closes most of the realloc churn the one-shot `flatten_files` was side-stepping. The lazy `ensure_flat_builder_inited` fallback stays in place for entry points that bypass `parse_files` (e.g. tests).

Adds a FlatAst-driven entry point to the sequential transformer: - pre_pass_from_flat(flat) mirrors pre_pass; the elided_fns walk, collect_generic_fn_value_refs, and collect_runtime_const_inits all grow _from_flat variants that iterate flat.files and pull stmts via flat.read_file_stmts(ff). - transform_files_from_flat(flat, files) wraps that pre-pass with the existing per-file transform_file loop and post_pass — the rehydrated file slice is still consumed by transform_file itself, so this is purely a source-of-truth change for the accumulators. - Builder routes the serial transform through the new entry when flat_check_enabled; parallel transform is unchanged. Validated byte-identical C output (V2_CHECK_FLAT=1, --no-parallel) against the legacy path; the existing transformer/flat tests still pass.

Routes the parallel transformer's pre-pass through pre_pass_from_flat when flat_check_enabled. The per-file workers and the post-pass still walk rehydrated ast.File values; only the accumulator pass changes its source of truth here. Both parallel and serial transform paths now share the same FlatAst- seeded pre-pass, matching the checker's wiring.

…rial Two pairs of files defined duplicate symbols under `-d parallel`: - builder/type_check_parallel.v vs type_check_d_parallel.v - util/worker_pool.v vs worker_pool_d_parallel.v V's file filter only excludes the always-included variant when it ends in `_notd_<flag>.v`. Renaming unblocks the `-d parallel` build (compile-time duplicate-method/struct errors are gone). Also make the real parallel type-check honor V2_CHECK_FLAT=1 by delegating to the serial flat path; a parallel flat pipeline is future work.

The flat-aware transform entry point used to take both `&FlatAst` and `[]ast.File` and only used the latter. Now it materializes the per-file input via `flat.to_files()` itself, so the transformer no longer reads `b.files` on this path. A precondition for eventually skipping `b.files` between parse and transform when V2_CHECK_FLAT=1 is on.

When V2_CHECK_FLAT=1, materialize the per-file input via flat.to_files() once before chunking, so the parallel worker slices no longer alias b.files. Both transform paths (serial and parallel) now read their input from flat in flat mode — b.files is only fed in on the legacy path.

The `files []ast.File` argument was already ignored (`_ = files`) in both constructors. Removing it makes the API match what the transformer actually consumes — files come in via transform_files / transform_files_from_flat or are pulled per-worker through new_worker_clone, not the constructor. Builder and test call sites updated accordingly.

update_parse_summary_counts and count_parsed_v_lines walked b.files purely for file.name. When V2_CHECK_FLAT=1, pull the same names from b.flat.files via flat.file_name(ff) — no rehydration needed. print_flat_ast_summary also stops re-flattening b.files when b.flat is already populated.

The LambdaExpr arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the body expression, a leaf out.emit_expr(ast.Expr(arg)) per arg Ident, and the new emit_lambda_expr_by_ids builder helper. Mirrors add_expr(LambdaExpr) encoding: edge[0] = body, edge[1..] = args. The transform_expr arm for LambdaExpr is a pure wrapper — args ([]Ident) are copied verbatim while only the body is transformed. Direct-emit mirrors that exactly: each arg goes through the leaf emit path (Idents are identity), the body goes through recursive transform-and-emit. Adds fixture_lambda_expr (apply(f fn (int) int, x int) higher-order fn + use_lambda() caller passing |y| y + 1) registered across all 5 harness invariants — 61 → 66 transformer-diff tests. First port pattern where the legacy arm has partially-transformed children — same shape as future FnLiteral (captured_vars verbatim, stmts transformed) and MatchExpr.branches ports.

The FnLiteral arm in transform_expr_to_flat now direct-emits via three leaf/transform child paths plus the new emit_fn_literal_by_ids builder helper: - typ (FnType): out.emit_type(ast.Type(expr.typ)) — verbatim leaf - captured_vars ([]Expr): out.emit_expr(cv) per element — verbatim leaf - stmts ([]Stmt): transform_stmts(expr.stmts) then out.emit_stmt(...) per result — still allocates a []Stmt slice; the transform_stmts_to_flat seam is the next big port target. Mirrors add_expr(FnLiteral) encoding exactly: edge[0]=type, edge[1..1+captured.len]=captured_vars, edge[1+captured.len..]=stmts; captured_vars.len in extra so the boundary is recoverable on decode. Skips the ast.FnLiteral wrapper struct allocation per occurrence. Adds fixture_fn_literal (make_doubler() returning fn(int) int + use_fn_literal() that binds a fn(int) int to a local and calls it) registered across all 5 harness invariants — 66 → 71 transformer-diff tests. First expression port where the legacy arm transforms a stmt list — the "call existing transform_stmts then emit per-result via out.emit_stmt" pattern lets us land per-arm wins without touching the body driver.

The PostfixExpr arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the inner expression plus the new emit_postfix_expr_by_id builder helper. Skips the ast.PostfixExpr struct allocation for plain `++` / `--` (.inc / .dec). Legacy fallback guard: expr.op == .not || expr.op == .question. - native backends keep the postfix node intact - non-native backends rewrite expr! / expr? into a CastExpr over the inner Result/Option base type - string-range check renames substr → substr_checked when inner is a string[range] expression - multiple fallbacks may produce a CastExpr, an Ident, or an unwrapped inner — none of which are bit-equal to a structural PostfixExpr rebuild Adds fixture_postfix_expr (mut-bound local + a++ a++ a-- + return) registered across all 5 harness invariants — 71 → 76 transformer-diff tests. With ParenExpr/PrefixExpr/PostfixExpr/ModifierExpr ported, all four unary-op wrapper variants are now direct-emit.

The CastExpr arm in transform_expr_to_flat now direct-emits via the leaf out.emit_expr(expr.typ) (type is NOT transformed — copied verbatim, mirroring transform_expr's CastExpr arm) plus a recursive transform_expr_to_flat for the inner expression plus the new emit_cast_expr_by_ids builder helper. Skips the ast.CastExpr struct allocation for the primitive-cast common case (int(x), f64(y), u8(z), etc.). Legacy fallback guard: when expr.typ resolves to a sum type via t.type_expr_name_full(expr.typ) + t.is_sum_type(name). transform_cast_expr rewrites sumtype casts into SumType{type_tag, payload_ptr} struct literal initializers (variant-tag lookup, payload allocation) — not a bit-equal structural rebuild. Falls back to out.emit_expr(t.transform_expr(expr)). Adds fixture_cast_expr (primitive-cast fn use_cast() exercising f64(3.5), int(a), u8(b + 1), int(c) + b) registered across all 5 invariants — 76 → 81 transformer-diff tests. Existing fixture_sumtype_is_as already exercises the sumtype-fallback path. First port to use read-only Transformer helpers (type_expr_name_full, is_sum_type) inside the arm to gate fast path vs legacy fallback — the shape future rewrite-aware ports (InfixExpr operator overload, CallExpr method-to-function, AsCastExpr smartcast snapshot) will follow.

The FieldInit arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the inner value plus the existing emit_field_init_by_id builder helper. Skips the ast.FieldInit struct allocation per occurrence — transform_expr's FieldInit arm just transforms value and copies name verbatim. FieldInit reaches the expression dispatch when it appears as a standalone call arg in struct-shorthand / named-arg syntax (foo(name: value)). The same emit_field_init_by_id helper that ConstDecl uses for its aux_field_init child node covers the expression position too — add_expr(FieldInit) and add_field_init(field) both route through add_field_init so the encodings are bit-equal. Adds fixture_field_init (struct Cfg + a make(name: 'hello', n: 5) call) registered across all 5 invariants — 81 → 86 transformer-diff tests. First port where the helper was already available — pattern for future variants that share an encoding with an aux node (e.g. StructInit fields list, function-call FieldInit args).

The AsCastExpr arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the inner expression plus the leaf out.emit_expr(expr.typ) (type is NOT transformed — copied verbatim) plus the new emit_as_cast_expr_by_ids builder helper. Skips the ast.AsCastExpr struct allocation per occurrence. First port that mutates Transformer state inside the arm. To keep the `as` cast intact (a same-expression smartcast would otherwise rewrite the operand into a direct sum payload access and hide the original storage type from the backend — wrong for nested sum types), the arm snapshots smartcast_stack + smartcast_expr_counts, drains matching smartcast entries via a bounded remove_smartcast_for_expr loop (limit smartcast_search_limit=128), recurses into the inner expression, then restores the snapshots. The legacy arm and direct-emit share the prologue/drain/restore verbatim — only the inner recursion target changes (flat vs legacy). Adds fixture_as_cast_expr (sum type Shape with Circle + Square variants, pick(s Shape, k int) f64 returning (s as Circle).r or (s as Square).side) registered across all 5 invariants — 86 → 91 transformer-diff tests. The shape for future state-mutating ports (Ident smartcast lookup, IfGuardExpr stmt synthesis, OrExpr prefix_stmts collection, UnsafeExpr stmt-list transform).

The UnsafeExpr arm in transform_expr_to_flat now direct-emits via two paths matching transform_expr's arm: 1. unsafe { nil } normalises to a plain nil Ident (detected via t.is_unsafe_nil_expr(expr)) — emitted as a leaf Ident through out.emit_expr(...). Identity in transform_expr's else case. 2. otherwise transform expr.stmts via transform_stmts and emit each result via out.emit_stmt(...), then assemble via the new emit_unsafe_expr_by_ids builder helper. Mirrors add_expr(UnsafeExpr) encoding exactly (edges = body stmts in order). Skips the ast.UnsafeExpr struct allocation per non-nil occurrence. The body stmt-list is still materialised because transform_stmts returns one — the stmt-list expansion seam is the next big port target. Adds fixture_unsafe_expr (use_unsafe() binding p := unsafe { nil }, q := unsafe { a := 1; a + 2 }, and comparing p == unsafe { nil }) registered across all 5 invariants — 91 → 96 transformer-diff tests. Second port (after FnLiteral session 9) that runs transform_stmts inside the arm. Both will share the transform_stmts_to_flat seam refactor when it lands.

The LockExpr arm in transform_expr_to_flat now rewrites `lock x { body }` / `rlock x { body }` into a sequence of mutex lock/unlock calls wrapped in an UnsafeExpr (compound expression). expand_lock_expr(expr) returns [lock_calls..., body..., unlock_calls...]; the legacy arm duplicates the last body stmt after the unlock calls so GCC compound-expression value semantics return the body's tail value instead of void (since the last stmt determines the compound's value). Direct-emit mirrors the rewrite exactly — same expand_lock_expr call, same duplicate-last fix-up, then emits each stmt via out.emit_stmt(...) and assembles via the existing emit_unsafe_expr_by_ids helper (the legacy LockExpr arm produces an UnsafeExpr so session-14's helper covers this directly). Skips the ast.UnsafeExpr wrapper struct allocation per occurrence. CRITICAL encoding detail: the legacy ast.UnsafeExpr{stmts: stmts} is constructed with no explicit pos (defaults to zero), so direct- emit must also pass token.Pos{} — passing expr.pos (the LockExpr's pos) would diverge from the legacy encoding. First port to import v2.token in flat_write.v for the zero-pos literal. Adds fixture_lock_expr (Counter{mut value int} struct + use_lock() with shared c and value-form x := lock c { c.value = 5; c.value }) registered across all 5 invariants — 96 → 101 transformer-diff tests. First port where the rewrite produces a different expression shape than the input (LockExpr → UnsafeExpr) and reuses an existing emit helper for the target shape rather than the input shape.

The OrExpr arm in transform_expr_to_flat now direct-emits the expression-context or-block (statement-level sites go through dedicated handlers and never reach this arm). expand_single_or_expr returns the data-access expression that stands in for the or-block value, and appends prefix stmts (typed temp init, optional-state branch). When prefix_stmts is non-empty the legacy arm wraps everything in an UnsafeExpr (GCC compound expression) — direct-emit mirrors that via the existing emit_unsafe_expr_by_ids helper (passing token.Pos{}). When empty, the result expression goes through out.emit_expr(...) (leaf). Skips the ast.UnsafeExpr wrapper allocation per occurrence in the compound-expression case. Adds fixture_or_expr (maybe(n) ?int + pair(a, b) + use_or_expr(n) returning pair(maybe(n) or { -1 }, 2) — or-block nested in call args) registered across all 5 invariants — 101 → 106 transformer-diff tests. Third shape-changing port reusing emit_unsafe_expr_by_ids (after LockExpr session 15). The "lower to UnsafeExpr-with-stmts" pattern is firmly established.

…rate arm)

… fixture

…xture

… fixture

…ture

The Ident arm in `transform_expr_to_flat` direct-emits via the leaf `out.emit_expr(ast.Expr(expr))` for the default path — Ident is identity in `transform_expr`'s arm (same shape returned, no allocation). Two state-dependent rewrite branches fall back to legacy: 1. `@VMODROOT` → `vmodroot_string_literal` produces a StringLiteral. 2. `find_smartcast_for_expr(expr.name)` hit → `apply_smartcast_direct_ctx` lowers to a sum payload access. Both gate checks are immutable lookups (`&Transformer` receiver), safe to probe upfront. First port where the saving is **dispatch-skip** rather than struct- allocation skip: Ident reaches `transform_expr_to_flat` recursively from every ported ancestor arm (ParenExpr, PrefixExpr, ModifierExpr, CastExpr, AsCastExpr, FieldInit, LambdaExpr, IndexExpr, SelectorExpr, ComptimeExpr default, InitExpr default fields, etc.) — porting skips the `transform_expr` function call + match dispatch per Ident reached via those ports. Adds `fixture_ident` covering file-scope Idents inside already-ported expr arms: `const neg_base = -base` (Ident inside PrefixExpr), `const wrapped_base = (base)` (Ident inside ParenExpr), `const inv_base = ~base` (Ident inside PrefixExpr `~`), plus mut-param + multiple Ident references in InfixExpr operands. 131 → 136 transformer-diff tests, all green; harness runs in 984ms.

…t + fixture

…ompleteness sweep)

…port

…tch-skip)

…rsive descent)

…port)

…tmt port)

…level identity)

…ss-arm routing)

…ia helper)

medvednikov · 2026-05-26T22:46:54Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13a86143cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T22:50:21Z

+	t2 := time.now()
 	g.gen_post_pass()
+	post_ms := f64(time.since(t2)) / f64(time.millisecond)
+	eprintln('ARM64 gen sub: pre=${pre_ms:.1}ms funcs=${funcs_ms:.1}ms post=${post_ms:.1}ms')


Gate ARM64 timing logs behind an opt-in flag

This now prints ARM64 gen sub: ... on every arm64 compilation, even when no debug/profiling flag is enabled. Unconditional stderr output from the compiler backend is a behavior regression that can break scripts/tests expecting clean tool output and adds per-compile formatting overhead in normal runs. The summary line should be behind the same env-gated path as detailed timing output (or a stats/debug flag).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-26T22:50:21Z

+					if sym_obj := import_scope.lookup_parent(symbol.name, 0) {
+						file_scope.insert(symbol.name, sym_obj)
+					}


Preserve module-storage symbol filter in flat import binding

The flat-path importer inserts every looked-up symbol directly into file scope, but the legacy path explicitly skips Global symbols where is_module_storage() is true. With V2_CHECK_FLAT=1, import mod { name } can now bind module-storage globals that were intentionally excluded, changing name resolution semantics between flat and non-flat checking modes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 25, 2026

View reviewed changes

medvednikov added 27 commits May 26, 2026 01:58

v2: arm64 codegen perf push

100f127

Linear-time pending_label scan via per-block linked list, selected_opcode and reg_map micro-opts, assorted arm64 codegen speedups. Squashed from a series of wip commits.

v2: -O0 default, -prod enables SSA optimization

0b32e81

Skipping mem2reg + phi elimination by default cuts peak memory significantly and makes compile times shorter. -prod restores the previous optimize-by-default behavior alongside -O3 -flto for the C compiler.

v2: add V2_MEM env gate for per-phase RSS instrumentation

a22cf03

Set V2_MEM=1 to print resident memory at parse / type check / transform / mark used / codegen boundaries. Used to pinpoint phase-level allocation hotspots when chasing v1 memory parity.

v2: FlatAst foundation

fc53e6a

Initial flat AST representation: contiguous nodes/edges/strings arenas, FlatBuilder for streaming legacy AST in, FlatReader for rehydration, basic tests.

weird v2 fixes

211bee4

medvednikov added 27 commits May 26, 2026 20:30

v2: transformer: phase 4 session 17 — IfGuardExpr direct-emit (degene…

b773e34

…rate arm)

v2: transformer: phase 4 session 18 — SelectorExpr deep helper port +…

674a3f8

… fixture

v2: transformer: phase 4 session 19 — IndexExpr deep helper port + fi…

21b741e

…xture

v2: transformer: phase 4 session 20 — ComptimeExpr deep helper port +…

82fdd8c

… fixture

v2: transformer: phase 4 session 21 — InitExpr deep helper port + fix…

62c4fbe

…ture

v2: transformer: phase 4 session 22 — ReturnStmt port + fixture

2d7a459

v2: transformer: phase 4 session 24 — KeywordOperator expr direct-emi…

99d866d

…t + fixture

v2: transformer: phase 4 session 25 — GenericArgs expr direct-emit (c…

ce5251c

…ompleteness sweep)

v2: transformer: phase 4 session 26 — StringInterLiteral deep helper …

f42ae9f

…port

v2: transformer: phase 4 session 27 — ExprStmt port (stmt-level dispa…

5a29329

…tch-skip)

v2: transformer: phase 4 session 28 — LabelStmt port (stmt-level recu…

f3e34ef

…rsive descent)

v2: transformer: phase 4 session 29 — BlockStmt port (wrap-only stmt …

358669c

…port)

v2: transformer: phase 4 session 30 — DeferStmt port (wrap-only stmt …

b2afd98

…port)

v2: transformer: phase 4 session 31 — ComptimeStmt port (two-branch s…

c3d05cc

…tmt port)

v2: transformer: phase 4 session 32 — AssertStmt port (fallback stmt-…

e3cf157

…level identity)

v2: transformer: phase 4 session 33 — GenericArgOrIndexExpr port (cro…

c49cf1c

…ss-arm routing)

v2: transformer: phase 4 session 34 — AssocExpr port (always-lowers v…

13a8614

…ia helper)

medvednikov force-pushed the v2-arm64-perf-may26 branch from 749669e to 13a8614 Compare May 26, 2026 20:53

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2: arm64 codegen perf + transformer/cleanc micro-opts#27258

v2: arm64 codegen perf + transformer/cleanc micro-opts#27258
medvednikov wants to merge 113 commits into
masterfrom
v2-arm64-perf-may26

medvednikov commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

medvednikov commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		is_native_be: p != unsafe { nil }
		&& (p.backend == .arm64 \|\| p.backend == .x64)

Uh oh!

Conversation

medvednikov commented May 25, 2026

Summary

ARM64 codegen

Transformer

cleanc backend

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

medvednikov commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant