Skip to content

perf(inference): freeze-time BatchNorm folding (Conv/Dense → BN) — Phase 6#1473

Open
ooples wants to merge 1 commit into
masterfrom
perf/1462-conv-bn-fold
Open

perf(inference): freeze-time BatchNorm folding (Conv/Dense → BN) — Phase 6#1473
ooples wants to merge 1 commit into
masterfrom
perf/1462-conv-bn-fold

Conversation

@ooples
Copy link
Copy Markdown
Owner

@ooples ooples commented May 30, 2026

Phase 6 — freeze-time super-folding (compiled-inference plan)

Folds inference-time BatchNorm into a preceding identity-activation linear op and removes the BN layer — the canonical ResNet/VGG/EfficientNet inference optimization, extended to Dense→BN.

What

At inference BatchNorm is a fixed per-channel affine y = γ·(x−μ)/√(σ²+ε) + β. Directly after a linear z = W·x + b (identity activation), BN(z) = W'·x + b' with s = γ/√(σ²+ε), W' = W·s, b' = (b−μ)·s + β. Lossless; eliminates a per-element pass + an intermediate tensor per block.

  • NeuralNetworkBase.FoldBatchNormForInference() — folds Conv2D→BN (per output channel) and Dense→BN (per output feature) in place, then removes the BN layer. Guards: only folds across identity/no activation, and on matching channel counts.
  • InferenceOptimizer.OptimizeForInference gains an ApplyLayerFusion step (before attention rewrites / quantization), gated by InferenceOptimizationConfig.EnableLayerFusion (default true — lossless). Clone-before-mutate now also triggers for foldable Conv/Dense→BN.

Verification

ConvBatchNormFoldTests (4 cases, all green):

  • Conv→BN and Dense→BN fold reproduce the pre-fold output within 1e-4 (non-trivial γ/β + running μ/σ² warmed by training-mode forwards), BN removed.
  • Fusion-off retains BN; Conv(ReLU)→BN is correctly left unfolded.
  • All 30 existing InferenceOptimizer tests pass. net10.0 + net471 build clean.

Note: triggers only on models built for it (linear op with no activation immediately followed by BatchNorm) — targets BN-heavy backbones (ResNet/EfficientNet/CSPDarknet); the parity-benchmark MLP/CNN have no BN.

🤖 Generated with Claude Code

…nferenceOptimizer (Phase 6)

At inference a BatchNorm layer is a fixed per-channel affine
y = γ·(x − μ)/√(σ² + ε) + β. When it directly follows a linear op with
identity activation — the canonical Conv→BN block in ResNet/VGG/EfficientNet,
and Dense→BN in BN-MLPs — it folds into that op's weights and bias with no
change in output: s = γ/√(σ² + ε), W' = W·s, b' = (b − μ)·s + β. The BatchNorm
layer is then removed, eliminating a full per-element pass and an intermediate
tensor per block.

New: NeuralNetworkBase.FoldBatchNormForInference() walks the layer list and
folds every BatchNorm whose predecessor is an identity-activation
ConvolutionalLayer ([outC,inC,kH,kW], per-output-channel) or DenseLayer
([inputSize,outputSize], per-output-feature), mutating the live kernel/weight
and bias tensors in place via GetFilters/GetWeights/GetBiases, then removing the
BN via RemoveLayerFromCollection. It refuses to fold across a real nonlinearity
(only IdentityActivation / no activation qualifies) and on any channel-count
mismatch. Wired into InferenceOptimizer.OptimizeForInference as a new
ApplyLayerFusion step (runs before attention rewrites / quantization so folded
weights flow into them), gated by InferenceOptimizationConfig.EnableLayerFusion
(default true — folding is lossless). The clone-before-mutate guard now also
triggers when foldable Conv/Dense→BN is present.

Verified (ConvBatchNormFoldTests): Conv→BN and Dense→BN folds reproduce the
pre-fold inference output within 1e-4 (non-trivial γ/β plus running μ/σ² warmed
by training-mode forwards), the BN layer is removed, fusion-off retains BN, and
a Conv(ReLU)→BN block is correctly left unfolded. All 30 existing
InferenceOptimizer tests still pass; net10.0 + net471 build clean.

Note: only triggers on models built for it (a Conv/Dense with no activation
immediately followed by BatchNorm); the parity-benchmark MLP/CNN have no BN, so
this targets BN-heavy backbones (ResNet/EfficientNet/CSPDarknet).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
aidotnet_website Ready Ready Preview, Comment May 30, 2026 9:59pm
aidotnet-playground-api Ready Ready Preview, Comment May 30, 2026 9:59pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 30, 2026

Warning

Review limit reached

@ooples, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 47 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 20120ee4-7142-4d2f-9453-1440529f4b21

📥 Commits

Reviewing files that changed from the base of the PR and between 0db695d and ac4c56f.

📒 Files selected for processing (4)
  • src/Configuration/InferenceOptimizationConfig.cs
  • src/Inference/InferenceOptimizer.cs
  • src/NeuralNetworks/NeuralNetworkBase.cs
  • tests/AiDotNet.Tests/UnitTests/Inference/ConvBatchNormFoldTests.cs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/1462-conv-bn-fold

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants