chore(clusters): replace single_nat_gateway with nat_gateway_eip_count#570
Open
jeanschmidt wants to merge 2 commits into
Open
chore(clusters): replace single_nat_gateway with nat_gateway_eip_count#570jeanschmidt wants to merge 2 commits into
jeanschmidt wants to merge 2 commits into
Conversation
This was referenced May 14, 2026
tofu plan — arc-cbr-production✅ Plan succeeded · commit Plan output |
jeanschmidt
added a commit
that referenced
this pull request
May 14, 2026
**Impact:** clusters.yaml config consumers (deploy pipeline, tofu plan); no runtime change **Risk:** low ## What Replaces the boolean `single_nat_gateway` knob with the new `nat_gateway_eip_count` integer introduced by the per-(bucket, AZ) NAT GW topology refactor (PR 10). Updates the example snippet in `docs/architecture.md` to match. ## Why PR 10 refactored NAT GW provisioning from a single shared gateway (toggled by `single_nat_gateway`) to a per-pod-subnet topology with configurable EIP count per gateway. The old boolean no longer maps to anything in the terraform module — it must be replaced with the new variable that controls EIP allocation. ## How - Default `nat_gateway_eip_count: 8` (AWS hard cap per NAT GW) replaces `single_nat_gateway: false` in defaults — production gets 96 EIPs total (12 NAT GWs × 8) - Staging override `nat_gateway_eip_count: 1` replaces `single_nat_gateway: true` — staging gets 8 EIPs total (8 NAT GWs × 1), preserving cost optimization intent - Documentation example updated to stay consistent with live config ## Changes - `clusters.yaml`: replace `single_nat_gateway: false` in defaults with `nat_gateway_eip_count: 8` - `clusters.yaml`: replace `single_nat_gateway: true` in arc-staging base with `nat_gateway_eip_count: 1` - `docs/architecture.md`: update example clusters.yaml snippet to use `nat_gateway_eip_count: 1` ## Testing - `just lint` — all 13 linters pass - `just test` — all unit tests pass - `tofu plan` for both clusters — verify no unexpected changes (this PR is config-only; the terraform variable was already added by PR 10) Signed-off-by: Jean Schmidt <contato@jschmidt.me> ghstack-source-id: 4e410b6 Pull-Request: #570
huydhn
approved these changes
May 14, 2026
Contributor
|
I wonder if it's better if we go after multi-region first to have a form of backup before deploying this change. How easy it is to roll this back in case thing goes wrong? I assume we would need to create a new cluster from scratch in that case. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Impact: clusters.yaml config consumers (deploy pipeline, tofu plan); no runtime change
Risk: low
What
Replaces the boolean
single_nat_gatewayknob with the newnat_gateway_eip_countinteger introduced by the per-(bucket, AZ) NAT GW topology refactor (PR 10). Updates the example snippet indocs/architecture.mdto match.Why
PR 10 refactored NAT GW provisioning from a single shared gateway (toggled by
single_nat_gateway) to a per-pod-subnet topology with configurable EIP count per gateway. The old boolean no longer maps to anything in the terraform module — it must be replaced with the new variable that controls EIP allocation.How
nat_gateway_eip_count: 8(AWS hard cap per NAT GW) replacessingle_nat_gateway: falsein defaults — production gets 96 EIPs total (12 NAT GWs × 8)nat_gateway_eip_count: 1replacessingle_nat_gateway: true— staging gets 8 EIPs total (8 NAT GWs × 1), preserving cost optimization intentChanges
clusters.yaml: replacesingle_nat_gateway: falsein defaults withnat_gateway_eip_count: 8clusters.yaml: replacesingle_nat_gateway: truein arc-staging base withnat_gateway_eip_count: 1docs/architecture.md: update example clusters.yaml snippet to usenat_gateway_eip_count: 1Testing
just lint— all 13 linters passjust test— all unit tests passtofu planfor both clusters — verify no unexpected changes (this PR is config-only; the terraform variable was already added by PR 10)Signed-off-by: Jean Schmidt contato@jschmidt.me