Skip to content

chore(clusters): replace single_nat_gateway with nat_gateway_eip_count#570

Open
jeanschmidt wants to merge 2 commits into
gh/jeanschmidt/22/basefrom
gh/jeanschmidt/22/head
Open

chore(clusters): replace single_nat_gateway with nat_gateway_eip_count#570
jeanschmidt wants to merge 2 commits into
gh/jeanschmidt/22/basefrom
gh/jeanschmidt/22/head

Conversation

@jeanschmidt
Copy link
Copy Markdown
Contributor

@jeanschmidt jeanschmidt commented May 14, 2026

Stack from ghstack (oldest at bottom):

Impact: clusters.yaml config consumers (deploy pipeline, tofu plan); no runtime change
Risk: low

What

Replaces the boolean single_nat_gateway knob with the new nat_gateway_eip_count integer introduced by the per-(bucket, AZ) NAT GW topology refactor (PR 10). Updates the example snippet in docs/architecture.md to match.

Why

PR 10 refactored NAT GW provisioning from a single shared gateway (toggled by single_nat_gateway) to a per-pod-subnet topology with configurable EIP count per gateway. The old boolean no longer maps to anything in the terraform module — it must be replaced with the new variable that controls EIP allocation.

How

  • Default nat_gateway_eip_count: 8 (AWS hard cap per NAT GW) replaces single_nat_gateway: false in defaults — production gets 96 EIPs total (12 NAT GWs × 8)
  • Staging override nat_gateway_eip_count: 1 replaces single_nat_gateway: true — staging gets 8 EIPs total (8 NAT GWs × 1), preserving cost optimization intent
  • Documentation example updated to stay consistent with live config

Changes

  • clusters.yaml: replace single_nat_gateway: false in defaults with nat_gateway_eip_count: 8
  • clusters.yaml: replace single_nat_gateway: true in arc-staging base with nat_gateway_eip_count: 1
  • docs/architecture.md: update example clusters.yaml snippet to use nat_gateway_eip_count: 1

Testing

  • just lint — all 13 linters pass
  • just test — all unit tests pass
  • tofu plan for both clusters — verify no unexpected changes (this PR is config-only; the terraform variable was already added by PR 10)

Signed-off-by: Jean Schmidt contato@jschmidt.me

[ghstack-poisoned]
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

tofu plan — arc-cbr-production

✅ Plan succeeded · commit 13bdb8d1 · run log

Plan output
Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


data.aws_availability_zones.available: Reading...
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-023207cd15e79c81a]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3]
module.eks.data.aws_caller_identity.current: Reading...
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-084ed6fc52db22c39]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-0078fd5c0f6bc05eb]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-node-role]
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0a126b1613758a408]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=8115d61b-1bc1-49ad-b5a3-e8f88fc50cb1]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNMSO5RRNP]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role-20260308084936816800000005]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role-20260308084936734100000003]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role-20260308084936813000000004]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role-20260316204739334600000001]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role-20260308084936681500000001]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role-20260308084936685500000002]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-eks-secrets]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-009f1fe7d56695348]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-03eb66e57d13af64b]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-04682fc890bfd4630]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-0ce6f1dcb7208cad8]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-07ac52a1aa741f267]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0545d26e4a1d0ba89]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-06a70b2818e270ed8]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0701693364b79c021]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0610564f678f81c5f]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0d2591f24cba79e7b]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-0aa6ea5c845170545]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-04d9bba8d43569bbf]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-07e2274170282eb8c]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-086e3e66fe238d459]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-0f34cc1aafea8fd16]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3-20260308084938596600000006]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0f623a6fa9d7bde45]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-000d05ecec7d4b66e]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0777285eddd2bacd1]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production:vpc-cni]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_launch_template.base: Refreshing state... [id=lt-090bac79dddc5b77f]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-00dacd13031b1f5de]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0ec9764e9015e972e]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-08ccb8cfe4bfa80d7]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/70AA0C12C21E1A843313EF1BDE82D29A]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production:pytorch-arc-cbr-production-base-nodes]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=2255203180]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry-2026030809125509320000000c]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role-2026030809125522790000000d]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production:coredns]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-ebs-csi-driver]

OpenTofu used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
  - destroy

OpenTofu will perform the following actions:

  # module.eks.aws_eks_addon.vpc_cni will be updated in-place
  ~ resource "aws_eks_addon" "vpc_cni" {
      + configuration_values        = jsonencode(
            {
              + env = {
                  + AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
                  + ENABLE_PREFIX_DELEGATION           = "true"
                  + ENI_CONFIG_LABEL_DEF               = "ipam.osdc.internal/eni-config"
                  + WARM_PREFIX_TARGET                 = "1"
                }
            }
        )
        id                          = "pytorch-arc-cbr-production:vpc-cni"
      ~ resolve_conflicts_on_update = "PRESERVE" -> "OVERWRITE"
        tags                        = {
            "Cluster" = "pytorch-arc-cbr-production"
            "Project" = "ciforge"
        }
        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.vpc.aws_ec2_subnet_cidr_reservation.pd_prefix["us-east-2a"] will be created
  + resource "aws_ec2_subnet_cidr_reservation" "pd_prefix" {
      + cidr_block       = "10.4.62.0/23"
      + description      = "VPC CNI Prefix Delegation reservation (us-east-2a)"
      + id               = (known after apply)
      + owner_id         = (known after apply)
      + region           = "us-east-2"
      + reservation_type = "prefix"
      + subnet_id        = "subnet-0545d26e4a1d0ba89"
    }

  # module.vpc.aws_ec2_subnet_cidr_reservation.pd_prefix["us-east-2b"] will be created
  + resource "aws_ec2_subnet_cidr_reservation" "pd_prefix" {
      + cidr_block       = "10.4.126.0/23"
      + description      = "VPC CNI Prefix Delegation reservation (us-east-2b)"
      + id               = (known after apply)
      + owner_id         = (known after apply)
      + region           = "us-east-2"
      + reservation_type = "prefix"
      + subnet_id        = "subnet-04682fc890bfd4630"
    }

  # module.vpc.aws_ec2_subnet_cidr_reservation.pd_prefix["us-east-2c"] will be created
  + resource "aws_ec2_subnet_cidr_reservation" "pd_prefix" {
      + cidr_block       = "10.4.190.0/23"
      + description      = "VPC CNI Prefix Delegation reservation (us-east-2c)"
      + id               = (known after apply)
      + owner_id         = (known after apply)
      + region           = "us-east-2"
      + reservation_type = "prefix"
      + subnet_id        = "subnet-0ce6f1dcb7208cad8"
    }

  # module.vpc.aws_eip.nat[0] will be destroyed
  # (because aws_eip.nat is not in configuration)
  - resource "aws_eip" "nat" {
      - allocation_id        = "eipalloc-084ed6fc52db22c39" -> null
      - arn                  = "arn:aws:ec2:us-east-2:308535385114:elastic-ip/eipalloc-084ed6fc52db22c39" -> null
      - association_id       = "eipassoc-08bbc88a043f442e5" -> null
      - domain               = "vpc" -> null
      - id                   = "eipalloc-084ed6fc52db22c39" -> null
      - network_border_group = "us-east-2" -> null
      - network_interface    = "eni-082b2e229a9a9a88f" -> null
      - private_dns          = "ip-10-4-192-110.us-east-2.compute.internal" -> null
      - private_ip           = "10.4.192.110" -> null
      - public_dns           = "ec2-3-139-130-18.us-east-2.compute.amazonaws.com" -> null
      - public_ip            = "3.139.130.18" -> null
      - public_ipv4_pool     = "amazon" -> null
      - region               = "us-east-2" -> null
      - tags                 = {
          - "Cluster"                                          = "pytorch-arc-cbr-production"
          - "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-1"
          - "Project"                                          = "ciforge"
          - "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
        } -> null
      - tags_all             = {
          - "Cluster"                                          = "pytorch-arc-cbr-production"
          - "ManagedBy"                                        = "opentofu"
          - "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-1"
          - "Project"                                          = "ciforge"
          - "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
        } -> null
    }

  # module.vpc.aws_eip.nat[1] will be destroyed
  # (because aws_eip.nat is not in configuration)
  - resource "aws_eip" "nat" {
      - allocation_id        = "eipalloc-023207cd15e79c81a" -> null
      - arn                  = "arn:aws:ec2:us-east-2:308535385114:elastic-ip/eipalloc-023207cd15e79c81a" -> null
      - association_id       = "eipassoc-0401388fd2b22bfd4" -> null
      - domain               = "vpc" -> null
      - id                   = "eipalloc-023207cd15e79c81a" -> null
      - network_border_group = "us-east-2" -> null
      - network_interface    = "eni-0b6a215b181e7a032" -> null
      - private_dns          = "ip-10-4-193-147.us-east-2.compute.internal" -> null
      - private_ip           = "10.4.193.147" -> null
      - public_dns           = "ec2-3-148-32-154.us-east-2.compute.amazonaws.com" -> null
      - public_ip            = "3.148.32.154" -> null
      - public_ipv4_pool     = "amazon" -> null
      - region               = "us-east-2" -> null
      - tags                 = {
          - "Cluster"                                          = "pytorch-arc-cbr-production"
          - "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-2"
          - "Project"                                          = "ciforge"
          - "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
        } -> null
      - tags_all             = {
          - "Cluster"                                          = "pytorch-arc-cbr-production"
          - "ManagedBy"                                        = "opentofu"
          - "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-2"
          - "Project"                                          = "ciforge"
          - "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
        } -> null
    }

  # module.vpc.aws_eip.nat[2] will be destroyed
  # (because aws_eip.nat is not in configuration)
  - resource "aws_eip" "nat" {
      - allocation_id        = "eipalloc-0078fd5c0f6bc05eb" -> null
      - arn                  = "arn:aws:ec2:us-east-2:308535385114:elastic-ip/eipalloc-0078fd5c0f6bc05eb" -> null
      - association_id       = "eipassoc-0d651661c83551847" -> null
      - domain               = "vpc" -> null
      - id                   = "eipalloc-0078fd5c0f6bc05eb" -> null
      - network_border_group = "us-east-2" -> null
      - network_interface    = "eni-0b2f8239ecbf7a6b7" -> null
      - private_dns          = "ip-10-4-194-52.us-east-2.compute.internal" -> null
      - private_ip           = "10.4.194.52" -> null
      - public_dns           = "ec2-3-20-110-33.us-east-2.compute.amazonaws.com" -> null
      - public_ip            = "3.20.110.33" -> null
      - public_ipv4_pool     = "amazon" -> null
      - region               = "us-east-2" -> null
      - tags                 = {
          - "Cluster"                                          = "pytorch-arc-cbr-production"
          - "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-3"
          - "Project"                                          = "ciforge"
          - "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
        } -> null
      - tags_all             = {
          - "Cluster"                                          = "pytorch-arc-cbr-production"
          - "ManagedBy"                                        = "opentofu"
          - "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-3"
          - "Project"                                          = "ciforge"
          - "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
        } -> null
    }

  # module.vpc.aws_eip.nat_primary["bucket-1-us-east-2a"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-1-us-east-2b"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-1-us-east-2c"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-2-us-east-2a"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-2-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-2"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-2-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-2"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-2-us-east-2b"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-2-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-2"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-2-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-2"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-2-us-east-2c"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-2-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-2"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-2-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-2"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-3-us-east-2a"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-3-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-3"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-3-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-3"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-3-us-east-2b"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-3-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-3"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-3-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-3"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-3-us-east-2c"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-3-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-3"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-3-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-3"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-4-us-east-2a"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-4-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-4"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-4-us-east-2a-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-4"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-4-us-east-2b"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-4-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-4"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-4-us-east-2b-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2b"
          + "osdc.io/nat-bucket"                               = "bucket-4"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_primary["bucket-4-us-east-2c"] will be created
  + resource "aws_eip" "nat_primary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-4-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-4"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-4-us-east-2c-primary"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2c"
          + "osdc.io/nat-bucket"                               = "bucket-4"
          + "osdc.io/nat-eip-role"                             = "primary"
        }
    }

  # module.vpc.aws_eip.nat_secondary["bucket-1-us-east-2a-2"] will be created
  + resource "aws_eip" "nat_secondary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-2"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "2"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-2"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "2"
        }
    }

  # module.vpc.aws_eip.nat_secondary["bucket-1-us-east-2a-3"] will be created
  + resource "aws_eip" "nat_secondary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-3"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "3"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-3"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "3"
        }
    }

  # module.vpc.aws_eip.nat_secondary["bucket-1-us-east-2a-4"] will be created
  + resource "aws_eip" "nat_secondary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-4"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "4"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-4"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "4"
        }
    }

  # module.vpc.aws_eip.nat_secondary["bucket-1-us-east-2a-5"] will be created
  + resource "aws_eip" "nat_secondary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-5"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "5"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-5"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "5"
        }
    }

  # module.vpc.aws_eip.nat_secondary["bucket-1-us-east-2a-6"] will be created
  + resource "aws_eip" "nat_secondary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-6"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "6"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-6"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "6"
        }
    }

  # module.vpc.aws_eip.nat_secondary["bucket-1-us-east-2a-7"] will be created
  + resource "aws_eip" "nat_secondary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after apply)
      + network_border_group = (known after apply)
      + network_interface    = (known after apply)
      + private_dns          = (known after apply)
      + private_ip           = (known after apply)
      + ptr_record           = (known after apply)
      + public_dns           = (known after apply)
      + public_ip            = (known after apply)
      + public_ipv4_pool     = (known after apply)
      + region               = "us-east-2"
      + tags                 = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-7"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "7"
        }
      + tags_all             = {
          + "Cluster"                                          = "pytorch-arc-cbr-production"
          + "ManagedBy"                                        = "opentofu"
          + "Name"                                             = "pytorch-arc-cbr-production-vpc-nat-bucket-1-us-east-2a-secondary-7"
          + "Project"                                          = "ciforge"
          + "kubernetes.io/cluster/pytorch-arc-cbr-production" = "shared"
          + "osdc.io/nat-az"                                   = "us-east-2a"
          + "osdc.io/nat-bucket"                               = "bucket-1"
          + "osdc.io/nat-eip-role"                             = "secondary"
          + "osdc.io/nat-eip-slot"                             = "7"
        }
    }

  # module.vpc.aws_eip.nat_secondary["bucket-1-us-east-2a-8"] will be created
  + resource "aws_eip" "nat_secondary" {
      + allocation_id        = (known after apply)
      + arn                  = (known after apply)
      + association_id       = (known after apply)
      + carrier_ip           = (known after apply)
      + customer_owned_ip    = (known after apply)
      + domain               = "vpc"
      + id                   = (known after apply)
      + instance             = (known after apply)
      + ipam_pool_id         = (known after a
... (truncated — see workflow logs for full plan)

[ghstack-poisoned]
jeanschmidt added a commit that referenced this pull request May 14, 2026
**Impact:** clusters.yaml config consumers (deploy pipeline, tofu plan); no runtime change
**Risk:** low

## What
Replaces the boolean `single_nat_gateway` knob with the new `nat_gateway_eip_count` integer introduced by the per-(bucket, AZ) NAT GW topology refactor (PR 10). Updates the example snippet in `docs/architecture.md` to match.

## Why
PR 10 refactored NAT GW provisioning from a single shared gateway (toggled by `single_nat_gateway`) to a per-pod-subnet topology with configurable EIP count per gateway. The old boolean no longer maps to anything in the terraform module — it must be replaced with the new variable that controls EIP allocation.

## How
- Default `nat_gateway_eip_count: 8` (AWS hard cap per NAT GW) replaces `single_nat_gateway: false` in defaults — production gets 96 EIPs total (12 NAT GWs × 8)
- Staging override `nat_gateway_eip_count: 1` replaces `single_nat_gateway: true` — staging gets 8 EIPs total (8 NAT GWs × 1), preserving cost optimization intent
- Documentation example updated to stay consistent with live config

## Changes
- `clusters.yaml`: replace `single_nat_gateway: false` in defaults with `nat_gateway_eip_count: 8`
- `clusters.yaml`: replace `single_nat_gateway: true` in arc-staging base with `nat_gateway_eip_count: 1`
- `docs/architecture.md`: update example clusters.yaml snippet to use `nat_gateway_eip_count: 1`

## Testing
- `just lint` — all 13 linters pass
- `just test` — all unit tests pass
- `tofu plan` for both clusters — verify no unexpected changes (this PR is config-only; the terraform variable was already added by PR 10)

Signed-off-by: Jean Schmidt <contato@jschmidt.me>
ghstack-source-id: 4e410b6
Pull-Request: #570
@huydhn
Copy link
Copy Markdown
Contributor

huydhn commented May 14, 2026

I wonder if it's better if we go after multi-region first to have a form of backup before deploying this change. How easy it is to roll this back in case thing goes wrong? I assume we would need to create a new cluster from scratch in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants