perf(inference): freeze-time BatchNorm folding (Conv/Dense → BN) — Phase 6#1473
perf(inference): freeze-time BatchNorm folding (Conv/Dense → BN) — Phase 6#1473ooples wants to merge 1 commit into
Conversation
…nferenceOptimizer (Phase 6) At inference a BatchNorm layer is a fixed per-channel affine y = γ·(x − μ)/√(σ² + ε) + β. When it directly follows a linear op with identity activation — the canonical Conv→BN block in ResNet/VGG/EfficientNet, and Dense→BN in BN-MLPs — it folds into that op's weights and bias with no change in output: s = γ/√(σ² + ε), W' = W·s, b' = (b − μ)·s + β. The BatchNorm layer is then removed, eliminating a full per-element pass and an intermediate tensor per block. New: NeuralNetworkBase.FoldBatchNormForInference() walks the layer list and folds every BatchNorm whose predecessor is an identity-activation ConvolutionalLayer ([outC,inC,kH,kW], per-output-channel) or DenseLayer ([inputSize,outputSize], per-output-feature), mutating the live kernel/weight and bias tensors in place via GetFilters/GetWeights/GetBiases, then removing the BN via RemoveLayerFromCollection. It refuses to fold across a real nonlinearity (only IdentityActivation / no activation qualifies) and on any channel-count mismatch. Wired into InferenceOptimizer.OptimizeForInference as a new ApplyLayerFusion step (runs before attention rewrites / quantization so folded weights flow into them), gated by InferenceOptimizationConfig.EnableLayerFusion (default true — folding is lossless). The clone-before-mutate guard now also triggers when foldable Conv/Dense→BN is present. Verified (ConvBatchNormFoldTests): Conv→BN and Dense→BN folds reproduce the pre-fold inference output within 1e-4 (non-trivial γ/β plus running μ/σ² warmed by training-mode forwards), the BN layer is removed, fusion-off retains BN, and a Conv(ReLU)→BN block is correctly left unfolded. All 30 existing InferenceOptimizer tests still pass; net10.0 + net471 build clean. Note: only triggers on models built for it (a Conv/Dense with no activation immediately followed by BatchNorm); the parity-benchmark MLP/CNN have no BN, so this targets BN-heavy backbones (ResNet/EfficientNet/CSPDarknet). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Review limit reached
More reviews will be available in 3 minutes and 47 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Phase 6 — freeze-time super-folding (compiled-inference plan)
Folds inference-time BatchNorm into a preceding identity-activation linear op and removes the BN layer — the canonical ResNet/VGG/EfficientNet inference optimization, extended to Dense→BN.
What
At inference BatchNorm is a fixed per-channel affine
y = γ·(x−μ)/√(σ²+ε) + β. Directly after a linearz = W·x + b(identity activation),BN(z) = W'·x + b'withs = γ/√(σ²+ε),W' = W·s,b' = (b−μ)·s + β. Lossless; eliminates a per-element pass + an intermediate tensor per block.NeuralNetworkBase.FoldBatchNormForInference()— foldsConv2D→BN(per output channel) andDense→BN(per output feature) in place, then removes the BN layer. Guards: only folds across identity/no activation, and on matching channel counts.InferenceOptimizer.OptimizeForInferencegains anApplyLayerFusionstep (before attention rewrites / quantization), gated byInferenceOptimizationConfig.EnableLayerFusion(default true — lossless). Clone-before-mutate now also triggers for foldable Conv/Dense→BN.Verification
ConvBatchNormFoldTests(4 cases, all green):1e-4(non-trivial γ/β + running μ/σ² warmed by training-mode forwards), BN removed.Conv(ReLU)→BNis correctly left unfolded.InferenceOptimizertests pass. net10.0 + net471 build clean.Note: triggers only on models built for it (linear op with no activation immediately followed by BatchNorm) — targets BN-heavy backbones (ResNet/EfficientNet/CSPDarknet); the parity-benchmark MLP/CNN have no BN.
🤖 Generated with Claude Code