v2: arm64 codegen perf + transformer/cleanc micro-opts#27258
v2: arm64 codegen perf + transformer/cleanc micro-opts#27258medvednikov wants to merge 113 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4078333289
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn (mut g Gen) record_pending_label(blk int) { | ||
| off := g.macho.text_data.len - g.curr_offset | ||
| new_idx := g.pending_label_offs.len | ||
| prev_head := g.pending_head[blk] |
There was a problem hiding this comment.
Validate block index before queuing pending labels
record_pending_label now dereferences g.pending_head[blk] directly, but several branch lowering paths can pass through unresolved block IDs (e.g. values that map to -1 in val_to_block) into this helper. Before this change those cases were just queued as unresolved; now they can trigger an out-of-bounds panic and hard-crash codegen. Please guard blk bounds (or keep a safe fallback path) before indexing pending_head.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c776b4816d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| is_native_be: p != unsafe { nil } | ||
| && (p.backend == .arm64 || p.backend == .x64) |
There was a problem hiding this comment.
Recompute native-backend state when preferences change
Do not treat is_native_be as immutable state: pref.backend is mutable after transformer creation (for example, tests set t.pref.backend = .arm64 on an existing transformer), but this cache is only initialized once here and then used in many lowering paths. When the backend is changed later, native-only rewrites are skipped (or C-specific rewrites are applied), which can produce incorrect transformed AST/code for that backend. Compute this from t.pref.backend at use sites or update the cache whenever pref.backend changes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 749669eeaa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| t2 := time.now() | ||
| g.gen_post_pass() | ||
| post_ms := f64(time.since(t2)) / f64(time.millisecond) | ||
| eprintln('ARM64 gen sub: pre=${pre_ms:.1}ms funcs=${funcs_ms:.1}ms post=${post_ms:.1}ms') |
There was a problem hiding this comment.
Gate ARM64 timing logs behind an opt-in flag
gen() now unconditionally writes ARM64 gen sub: ... to stderr on every successful ARM64 compilation. This changes normal compiler output in non-debug builds and can break callers/tests that treat any stderr text as a failure signal or rely on stable diagnostic streams; the timing print should be behind -stats or an env flag (like the detailed timer line below it) instead of always-on.
Useful? React with 👍 / 👎.
Linear-time pending_label scan via per-block linked list, selected_opcode and reg_map micro-opts, assorted arm64 codegen speedups. Squashed from a series of wip commits.
Replaces `saved := t.pending_stmts.clone(); t.pending_stmts.clear()` with
`saved := t.pending_stmts; t.pending_stmts = []ast.Stmt{}` across 19 sites
in expr.v, for.v, and transformer.v. The original pattern allocated a deep
copy on every save and again on every restore; the swap pattern transfers
ownership without copying buffer contents.
Skipping mem2reg + phi elimination by default cuts peak memory significantly and makes compile times shorter. -prod restores the previous optimize-by-default behavior alongside -O3 -flto for the C compiler.
Set V2_MEM=1 to print resident memory at parse / type check / transform / mark used / codegen boundaries. Used to pinpoint phase-level allocation hotspots when chasing v1 memory parity.
Initial flat AST representation: contiguous nodes/edges/strings arenas, FlatBuilder for streaming legacy AST in, FlatReader for rehydration, basic tests.
walk_stmt/walk_expr/walk_type were adding sizeof(Stmt|Expr|Type) at the top of each function while also charging the same 16-byte sumtype slot via add_array_storage at the parent, and never counted the heap-allocated variant payload (FnDecl, CallExpr, ArrayType, ...). Result: legacy memory estimate was undercounted by ~5x and meaningless to compare against flat. Introduce add_variant_payload(size) helper that bumps node_bytes and a new allocs counter together, call it from each variant arm with the real variant struct's sizeof, and strip the top-level sizeof from plain-struct walkers (their storage is already counted at parent/array level). Add LegacyAstStats.allocs and surface it in print_flat_ast_summary: full v2 source now reports 1.15M legacy heap allocations collapsing to 4 flat arenas - the dominant memory win is fewer GC-tracked objects, not raw bytes (still ~2.8% reduction). Round-trip tests (11/11) still pass.
The parser emits ast.empty_expr / ast.empty_stmt (always EmptyExpr(0) / EmptyStmt(0)) anywhere an Expr/Stmt slot is structurally required but absent (for-loop init/cond/post, optional clause arms, ...). The flat converter was minting a fresh 32-byte node for every occurrence: 65,897 expr_empty + 3,032 stmt_empty cells on a full v2 self-build. Cache one singleton per kind in FlatBuilder (same pattern as empty_list_id) and reuse it whenever the source value is the canonical zero. Non-zero u8 payloads still emit fresh nodes for losslessness (none exist in current parser output, but the converter must not silently drop them). Result on cmd/v2/v2.v clean build: - nodes: 854,818 → 785,889 (-68,929, -8.1%) - memory: 31.6 MB → 29.4 MB (-2.2 MB, 9.65% vs legacy AST) - allocs: still 4 arenas Also add FlatAst.count_nodes_by_kind() and a V2_FLAT_HIST=1-gated per-kind histogram in print_flat_ast_summary for future schema work (used to find these singletons in the first place). Round-trip tests (11/11) still pass — signature is unchanged because the singleton has the same extra=0/pos=0 as every original instance.
…lder
Set up the public API surface the type checker, transformer, and SSA
builder will need to consume FlatAst directly. The recursive AST stays
in place for now — downstream consumers haven't been migrated yet — but
this commit lands two things that gate every follow-up step:
1. ast.FlatBuilder is now public, with `new_flat_builder()` and a new
`append_file(File) FlatNodeId` shim. The legacy converters
(add_expr/add_stmt/...) stay internal; front-ends must go through
`append_file` during the transition.
2. Parser.parse_files_to_flat(files, fs) ast.FlatAst is the streaming
entry point: parse one file, convert, drop the legacy struct,
loop. Peak resident AST is one legacy file plus the cumulative
flat, instead of the full legacy forest plus the flat copy that
`ast.flatten_files(p.parse_files(...))` produces today.
flat_streaming_test.v asserts byte-for-byte signature equality between
the streamed FlatAst and the two-step `flatten_files(parse_files())`
output across a synthetic single-file source, a two-file source (to
exercise per-file intern reuse and node-id ordering), and the real
vlib/v2/ast/ast.v. All 11 existing round-trip tests still pass.
Builder still calls parse_files() — wiring it through to the streaming
path comes after the type checker can read FlatAst.
…TRIP) Adds end-to-end validation of the streaming flat AST + lossless roundtrip on the full v2 self-host. With V2_FLAT_ROUNDTRIP=1 the builder routes every parser.parse_files() through parse_files_to_flat + FlatAst.to_files(), producing byte-identical C output (281,705 lines) vs the direct path. - ast.arena_caps_for_bytes(): shared pre-sizing heuristic (~150 nodes/KB, ~170 edges/KB, ~5 strings/KB) used by both flatten_files and the streaming parse path. Eliminates ~10 geometric reallocs on self-host. - parse_files_to_flat: uses the shared helper. - Builder.parse_batch(): wrapper routing parser calls through direct or streaming+roundtrip; controlled by flat_roundtrip_enabled (env-gated). - Warn when V2_FLAT_ROUNDTRIP=1 is set alongside parallel parse (the parallel path doesn't honor the switch yet). Phase 2 invariant validated: streaming parse + lossless conversion. Next step (consumer migration) will let us drop the legacy AST entirely.
Forward declaration of the type checker's FlatAst-aware API surface. The current implementation rehydrates via FlatAst.to_files() then calls check_files; future PRs progressively replace the body with direct flat reads (selector_names first, then file metadata, then statements) so the legacy AST need not be materialized. Includes a regression test that builds both an env-via-files and an env-via-flat for the same source and verifies registered functions and methods match. Guards the eventual direct-FlatAst implementation against behavioral drift.
First real consumer-side replacement: Checker.check_flat now reads the selector_names map directly off FlatFile entries instead of letting the rehydrated legacy ast.File supply them. The rest of check_files's body is inlined into check_flat (rehydrating once into a local) so each migrated step can be pulled out of the to_files() path incrementally. Builder gets a V2_CHECK_FLAT=1 env switch that routes type_check_files through Checker.check_flat. End-to-end validation: V2_CHECK_FLAT=1 (and V2_FLAT_ROUNDTRIP=1 + V2_CHECK_FLAT=1 combined) produce byte-identical C output on the v2 self-host.
Adds public FlatAst.read_file_imports / read_file_stmts / file_mod helpers and migrates register_imported_symbols to a _from_flat variant that drives off them, removing one more dependency on the rehydrated []ast.File path in check_flat. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.
Adds preregister_all_scopes_from_flat and preregister_scopes_from_flat which use FlatAst.file_mod + active_file_imports_from_flat directly, removing one more legacy []ast.File hop in check_flat. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.
Adds preregister_all_types_from_flat and preregister_all_fn_signatures_from_flat that drive their per-file loops off FlatAst.file_mod + read_file_stmts. check_flat now only rehydrates []ast.File for the final per-file check_file pass; everything before that runs straight off the flat representation. C output remains byte-identical with V2_CHECK_FLAT=1.
Adds check_file_from_flat, check_struct_field_defaults_from_flat, and check_enum_field_values_from_flat. check_flat now drives every step directly off FlatAst; the only remaining to_files() call is gated behind \$if ownership ? for the lifetime/escape validators that still walk recursive ast.File. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.
Adds b.flat populated once after parse when V2_CHECK_FLAT=1, and switches type_check_files to consume it directly instead of re-running flatten_files on every checker entry. Sets up the persistent FlatAst surface the transformer and SSA builders will read from next. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.
When V2_CHECK_FLAT=1 the builder now owns a persistent ast.FlatBuilder that every parse_batch streams into via the new Parser.parse_files_into_flat helper. Per batch we rehydrate just the freshly-appended FlatFiles through the new FlatAst.to_files_range helper, so legacy downstream consumers still see []ast.File while b.flat ends up populated as a side effect of parsing. The post-parse flatten_files() pass is replaced with a direct b.flat_builder.flat handoff; the parallel parse path keeps the legacy fallback since it bypasses parse_batch. C output remains byte-identical with V2_CHECK_FLAT=1 vs baseline.
`check_flat` no longer rehydrates the file slice for the `$if ownership ?` gate. Adds `lifetime_validate_files_from_flat` and `escape_validate_files_from_flat` (plus a flat passthrough prescan) so lifetime and escape validation walks `flat.read_file_stmts(ff)` directly. Drops the last `flat.to_files()` call out of the checker entry point.
`parse_files` now resolves the core + expanded-user path list upfront and hands it to `init_flat_builder_for_paths`, which stats the union and calls `arena_caps_for_bytes(total * 2)` to seed the flat arenas before the first `parse_batch`. The 2x scale factor reserves room for the import walk, which is the only batch we cannot enumerate ahead of time. Closes most of the realloc churn the one-shot `flatten_files` was side-stepping. The lazy `ensure_flat_builder_inited` fallback stays in place for entry points that bypass `parse_files` (e.g. tests).
Adds a FlatAst-driven entry point to the sequential transformer:
- pre_pass_from_flat(flat) mirrors pre_pass; the elided_fns walk,
collect_generic_fn_value_refs, and collect_runtime_const_inits all
grow _from_flat variants that iterate flat.files and pull stmts
via flat.read_file_stmts(ff).
- transform_files_from_flat(flat, files) wraps that pre-pass with
the existing per-file transform_file loop and post_pass — the
rehydrated file slice is still consumed by transform_file itself,
so this is purely a source-of-truth change for the accumulators.
- Builder routes the serial transform through the new entry when
flat_check_enabled; parallel transform is unchanged.
Validated byte-identical C output (V2_CHECK_FLAT=1, --no-parallel)
against the legacy path; the existing transformer/flat tests still
pass.
Routes the parallel transformer's pre-pass through pre_pass_from_flat when flat_check_enabled. The per-file workers and the post-pass still walk rehydrated ast.File values; only the accumulator pass changes its source of truth here. Both parallel and serial transform paths now share the same FlatAst- seeded pre-pass, matching the checker's wiring.
…rial Two pairs of files defined duplicate symbols under `-d parallel`: - builder/type_check_parallel.v vs type_check_d_parallel.v - util/worker_pool.v vs worker_pool_d_parallel.v V's file filter only excludes the always-included variant when it ends in `_notd_<flag>.v`. Renaming unblocks the `-d parallel` build (compile-time duplicate-method/struct errors are gone). Also make the real parallel type-check honor V2_CHECK_FLAT=1 by delegating to the serial flat path; a parallel flat pipeline is future work.
The flat-aware transform entry point used to take both `&FlatAst` and `[]ast.File` and only used the latter. Now it materializes the per-file input via `flat.to_files()` itself, so the transformer no longer reads `b.files` on this path. A precondition for eventually skipping `b.files` between parse and transform when V2_CHECK_FLAT=1 is on.
When V2_CHECK_FLAT=1, materialize the per-file input via flat.to_files() once before chunking, so the parallel worker slices no longer alias b.files. Both transform paths (serial and parallel) now read their input from flat in flat mode — b.files is only fed in on the legacy path.
The `files []ast.File` argument was already ignored (`_ = files`) in both constructors. Removing it makes the API match what the transformer actually consumes — files come in via transform_files / transform_files_from_flat or are pulled per-worker through new_worker_clone, not the constructor. Builder and test call sites updated accordingly.
update_parse_summary_counts and count_parsed_v_lines walked b.files purely for file.name. When V2_CHECK_FLAT=1, pull the same names from b.flat.files via flat.file_name(ff) — no rehydration needed. print_flat_ast_summary also stops re-flattening b.files when b.flat is already populated.
The LambdaExpr arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the body expression, a leaf out.emit_expr(ast.Expr(arg)) per arg Ident, and the new emit_lambda_expr_by_ids builder helper. Mirrors add_expr(LambdaExpr) encoding: edge[0] = body, edge[1..] = args. The transform_expr arm for LambdaExpr is a pure wrapper — args ([]Ident) are copied verbatim while only the body is transformed. Direct-emit mirrors that exactly: each arg goes through the leaf emit path (Idents are identity), the body goes through recursive transform-and-emit. Adds fixture_lambda_expr (apply(f fn (int) int, x int) higher-order fn + use_lambda() caller passing |y| y + 1) registered across all 5 harness invariants — 61 → 66 transformer-diff tests. First port pattern where the legacy arm has partially-transformed children — same shape as future FnLiteral (captured_vars verbatim, stmts transformed) and MatchExpr.branches ports.
The FnLiteral arm in transform_expr_to_flat now direct-emits via three leaf/transform child paths plus the new emit_fn_literal_by_ids builder helper: - typ (FnType): out.emit_type(ast.Type(expr.typ)) — verbatim leaf - captured_vars ([]Expr): out.emit_expr(cv) per element — verbatim leaf - stmts ([]Stmt): transform_stmts(expr.stmts) then out.emit_stmt(...) per result — still allocates a []Stmt slice; the transform_stmts_to_flat seam is the next big port target. Mirrors add_expr(FnLiteral) encoding exactly: edge[0]=type, edge[1..1+captured.len]=captured_vars, edge[1+captured.len..]=stmts; captured_vars.len in extra so the boundary is recoverable on decode. Skips the ast.FnLiteral wrapper struct allocation per occurrence. Adds fixture_fn_literal (make_doubler() returning fn(int) int + use_fn_literal() that binds a fn(int) int to a local and calls it) registered across all 5 harness invariants — 66 → 71 transformer-diff tests. First expression port where the legacy arm transforms a stmt list — the "call existing transform_stmts then emit per-result via out.emit_stmt" pattern lets us land per-arm wins without touching the body driver.
The PostfixExpr arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the inner expression plus the new emit_postfix_expr_by_id builder helper. Skips the ast.PostfixExpr struct allocation for plain `++` / `--` (.inc / .dec). Legacy fallback guard: expr.op == .not || expr.op == .question. - native backends keep the postfix node intact - non-native backends rewrite expr! / expr? into a CastExpr over the inner Result/Option base type - string-range check renames substr → substr_checked when inner is a string[range] expression - multiple fallbacks may produce a CastExpr, an Ident, or an unwrapped inner — none of which are bit-equal to a structural PostfixExpr rebuild Adds fixture_postfix_expr (mut-bound local + a++ a++ a-- + return) registered across all 5 harness invariants — 71 → 76 transformer-diff tests. With ParenExpr/PrefixExpr/PostfixExpr/ModifierExpr ported, all four unary-op wrapper variants are now direct-emit.
The CastExpr arm in transform_expr_to_flat now direct-emits via the
leaf out.emit_expr(expr.typ) (type is NOT transformed — copied
verbatim, mirroring transform_expr's CastExpr arm) plus a recursive
transform_expr_to_flat for the inner expression plus the new
emit_cast_expr_by_ids builder helper. Skips the ast.CastExpr struct
allocation for the primitive-cast common case (int(x), f64(y),
u8(z), etc.).
Legacy fallback guard: when expr.typ resolves to a sum type via
t.type_expr_name_full(expr.typ) + t.is_sum_type(name).
transform_cast_expr rewrites sumtype casts into SumType{type_tag,
payload_ptr} struct literal initializers (variant-tag lookup,
payload allocation) — not a bit-equal structural rebuild. Falls
back to out.emit_expr(t.transform_expr(expr)).
Adds fixture_cast_expr (primitive-cast fn use_cast() exercising
f64(3.5), int(a), u8(b + 1), int(c) + b) registered across all 5
invariants — 76 → 81 transformer-diff tests. Existing
fixture_sumtype_is_as already exercises the sumtype-fallback path.
First port to use read-only Transformer helpers
(type_expr_name_full, is_sum_type) inside the arm to gate fast path
vs legacy fallback — the shape future rewrite-aware ports
(InfixExpr operator overload, CallExpr method-to-function,
AsCastExpr smartcast snapshot) will follow.
The FieldInit arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the inner value plus the existing emit_field_init_by_id builder helper. Skips the ast.FieldInit struct allocation per occurrence — transform_expr's FieldInit arm just transforms value and copies name verbatim. FieldInit reaches the expression dispatch when it appears as a standalone call arg in struct-shorthand / named-arg syntax (foo(name: value)). The same emit_field_init_by_id helper that ConstDecl uses for its aux_field_init child node covers the expression position too — add_expr(FieldInit) and add_field_init(field) both route through add_field_init so the encodings are bit-equal. Adds fixture_field_init (struct Cfg + a make(name: 'hello', n: 5) call) registered across all 5 invariants — 81 → 86 transformer-diff tests. First port where the helper was already available — pattern for future variants that share an encoding with an aux node (e.g. StructInit fields list, function-call FieldInit args).
The AsCastExpr arm in transform_expr_to_flat now direct-emits via a recursive transform_expr_to_flat for the inner expression plus the leaf out.emit_expr(expr.typ) (type is NOT transformed — copied verbatim) plus the new emit_as_cast_expr_by_ids builder helper. Skips the ast.AsCastExpr struct allocation per occurrence. First port that mutates Transformer state inside the arm. To keep the `as` cast intact (a same-expression smartcast would otherwise rewrite the operand into a direct sum payload access and hide the original storage type from the backend — wrong for nested sum types), the arm snapshots smartcast_stack + smartcast_expr_counts, drains matching smartcast entries via a bounded remove_smartcast_for_expr loop (limit smartcast_search_limit=128), recurses into the inner expression, then restores the snapshots. The legacy arm and direct-emit share the prologue/drain/restore verbatim — only the inner recursion target changes (flat vs legacy). Adds fixture_as_cast_expr (sum type Shape with Circle + Square variants, pick(s Shape, k int) f64 returning (s as Circle).r or (s as Square).side) registered across all 5 invariants — 86 → 91 transformer-diff tests. The shape for future state-mutating ports (Ident smartcast lookup, IfGuardExpr stmt synthesis, OrExpr prefix_stmts collection, UnsafeExpr stmt-list transform).
The UnsafeExpr arm in transform_expr_to_flat now direct-emits via
two paths matching transform_expr's arm:
1. unsafe { nil } normalises to a plain nil Ident (detected via
t.is_unsafe_nil_expr(expr)) — emitted as a leaf Ident through
out.emit_expr(...). Identity in transform_expr's else case.
2. otherwise transform expr.stmts via transform_stmts and emit
each result via out.emit_stmt(...), then assemble via the new
emit_unsafe_expr_by_ids builder helper. Mirrors
add_expr(UnsafeExpr) encoding exactly (edges = body stmts in
order). Skips the ast.UnsafeExpr struct allocation per non-nil
occurrence.
The body stmt-list is still materialised because transform_stmts
returns one — the stmt-list expansion seam is the next big port
target.
Adds fixture_unsafe_expr (use_unsafe() binding p := unsafe { nil },
q := unsafe { a := 1; a + 2 }, and comparing p == unsafe { nil })
registered across all 5 invariants — 91 → 96 transformer-diff tests.
Second port (after FnLiteral session 9) that runs transform_stmts
inside the arm. Both will share the transform_stmts_to_flat seam
refactor when it lands.
The LockExpr arm in transform_expr_to_flat now rewrites
`lock x { body }` / `rlock x { body }` into a sequence of mutex
lock/unlock calls wrapped in an UnsafeExpr (compound expression).
expand_lock_expr(expr) returns [lock_calls..., body...,
unlock_calls...]; the legacy arm duplicates the last body stmt
after the unlock calls so GCC compound-expression value semantics
return the body's tail value instead of void (since the last stmt
determines the compound's value).
Direct-emit mirrors the rewrite exactly — same expand_lock_expr
call, same duplicate-last fix-up, then emits each stmt via
out.emit_stmt(...) and assembles via the existing
emit_unsafe_expr_by_ids helper (the legacy LockExpr arm produces
an UnsafeExpr so session-14's helper covers this directly). Skips
the ast.UnsafeExpr wrapper struct allocation per occurrence.
CRITICAL encoding detail: the legacy ast.UnsafeExpr{stmts: stmts}
is constructed with no explicit pos (defaults to zero), so direct-
emit must also pass token.Pos{} — passing expr.pos (the LockExpr's
pos) would diverge from the legacy encoding. First port to import
v2.token in flat_write.v for the zero-pos literal.
Adds fixture_lock_expr (Counter{mut value int} struct + use_lock()
with shared c and value-form x := lock c { c.value = 5; c.value })
registered across all 5 invariants — 96 → 101 transformer-diff
tests.
First port where the rewrite produces a different expression shape
than the input (LockExpr → UnsafeExpr) and reuses an existing emit
helper for the target shape rather than the input shape.
The OrExpr arm in transform_expr_to_flat now direct-emits the
expression-context or-block (statement-level sites go through
dedicated handlers and never reach this arm).
expand_single_or_expr returns the data-access expression that
stands in for the or-block value, and appends prefix stmts (typed
temp init, optional-state branch). When prefix_stmts is non-empty
the legacy arm wraps everything in an UnsafeExpr (GCC compound
expression) — direct-emit mirrors that via the existing
emit_unsafe_expr_by_ids helper (passing token.Pos{}). When empty,
the result expression goes through out.emit_expr(...) (leaf).
Skips the ast.UnsafeExpr wrapper allocation per occurrence in the
compound-expression case.
Adds fixture_or_expr (maybe(n) ?int + pair(a, b) + use_or_expr(n)
returning pair(maybe(n) or { -1 }, 2) — or-block nested in call
args) registered across all 5 invariants — 101 → 106
transformer-diff tests.
Third shape-changing port reusing emit_unsafe_expr_by_ids (after
LockExpr session 15). The "lower to UnsafeExpr-with-stmts" pattern
is firmly established.
The Ident arm in `transform_expr_to_flat` direct-emits via the leaf
`out.emit_expr(ast.Expr(expr))` for the default path — Ident is identity
in `transform_expr`'s arm (same shape returned, no allocation). Two
state-dependent rewrite branches fall back to legacy:
1. `@VMODROOT` → `vmodroot_string_literal` produces a StringLiteral.
2. `find_smartcast_for_expr(expr.name)` hit → `apply_smartcast_direct_ctx`
lowers to a sum payload access.
Both gate checks are immutable lookups (`&Transformer` receiver), safe
to probe upfront.
First port where the saving is **dispatch-skip** rather than struct-
allocation skip: Ident reaches `transform_expr_to_flat` recursively from
every ported ancestor arm (ParenExpr, PrefixExpr, ModifierExpr, CastExpr,
AsCastExpr, FieldInit, LambdaExpr, IndexExpr, SelectorExpr, ComptimeExpr
default, InitExpr default fields, etc.) — porting skips the
`transform_expr` function call + match dispatch per Ident reached via
those ports.
Adds `fixture_ident` covering file-scope Idents inside already-ported
expr arms: `const neg_base = -base` (Ident inside PrefixExpr),
`const wrapped_base = (base)` (Ident inside ParenExpr), `const inv_base =
~base` (Ident inside PrefixExpr `~`), plus mut-param + multiple Ident
references in InfixExpr operands. 131 → 136 transformer-diff tests, all
green; harness runs in 984ms.
…ompleteness sweep)
749669e to
13a8614
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13a86143cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| t2 := time.now() | ||
| g.gen_post_pass() | ||
| post_ms := f64(time.since(t2)) / f64(time.millisecond) | ||
| eprintln('ARM64 gen sub: pre=${pre_ms:.1}ms funcs=${funcs_ms:.1}ms post=${post_ms:.1}ms') |
There was a problem hiding this comment.
Gate ARM64 timing logs behind an opt-in flag
This now prints ARM64 gen sub: ... on every arm64 compilation, even when no debug/profiling flag is enabled. Unconditional stderr output from the compiler backend is a behavior regression that can break scripts/tests expecting clean tool output and adds per-compile formatting overhead in normal runs. The summary line should be behind the same env-gated path as detailed timing output (or a stats/debug flag).
Useful? React with 👍 / 👎.
| if sym_obj := import_scope.lookup_parent(symbol.name, 0) { | ||
| file_scope.insert(symbol.name, sym_obj) | ||
| } |
There was a problem hiding this comment.
Preserve module-storage symbol filter in flat import binding
The flat-path importer inserts every looked-up symbol directly into file scope, but the legacy path explicitly skips Global symbols where is_module_storage() is true. With V2_CHECK_FLAT=1, import mod { name } can now bind module-storage globals that were intentionally excluded, changing name resolution semantics between flat and non-flat checking modes.
Useful? React with 👍 / 👎.
Summary
Speeds up the v2 ARM64 backend by ~10x on self-compile and trims hot loops in the transformer and cleanc backend.
ARM64 codegen
gen_funcscanned the fullpending_label_blkslist once per block. Replaced with a per-block linked list (pending_head[blk]+pending_next[i]), backed by a sparse[]int(avoids map ops and a v2-self-compile hang seen with the map variant). ARM64 Gen oncmd/v2/v2.v: cleanc-built v2 ~25s → ~3.3s; arm64-built v3 ~36s → ~6.3s. Chain still converges (v4 = v5 except filename byte).arm64.v(selected_opcode,reg_mapsingle lookup, etc.) — ~6x on top of the original baseline.Transformer
is_native_beon theTransformerstruct so hot loops (transform_stmts,IfGuardExprhandling,OrExprexpansion) skip per-callt.prefnil/enum checks.transform_stmts: most blocks enter with an empty smartcast stack, so skip the.clone()of stack + counts map when there is nothing to restore.cleanc backend
gen_direct_fn_pointer_call: when a fnptr expects a pointer param and the argument is already a pointer (or a mutable param), don't re-&it.mark_cgen_step) insidegen_passes_1_to_4for-stats: surfacessetup.generic_struct_bindings,setup.discover_generic_specs,setup.fn_signatures, etc. — no runtime cost when-statsis off.Test plan
v -o cmd/v2/v2 cmd/v2/v2.v && cmd/v2/v2 --no-parallel -backend arm64 -o v3 cmd/v2/v2.v && ./v3 --no-parallel -gc none -backend arm64 -o v4 cmd/v2/v2.v— v4 = v5 modulo filename byte.