Skip to content

gossip: introduce hybrid worker bootstrap for gossip clusters #18245

Open
hakman wants to merge 5 commits into
kubernetes:masterfrom
hakman:gossip-migration
Open

gossip: introduce hybrid worker bootstrap for gossip clusters #18245
hakman wants to merge 5 commits into
kubernetes:masterfrom
hakman:gossip-migration

Conversation

@hakman
Copy link
Copy Markdown
Member

@hakman hakman commented Apr 25, 2026

Implements the hybrid mode from #18240. Workers in gossip clusters bootstrap via the cluster load balancer (or control-plane IPs) instead of via protokube. Control plane keeps gossip and dns-controller.

This gives operators a single kops reconcile path to remove protokube off worker nodes and with it the unmaintained weaveworks/mesh / memberlistmesh dependencies and the over-broad worker permissions they require.

Activates for gossip clusters with an API load balancer:

  • AWS (NLB)
  • GCE, Azure, OpenStack, Scaleway, DigitalOcean, Hetzner

/cc @justinsb @rifelpet @ameukam

@k8s-ci-robot k8s-ci-robot requested a review from ameukam April 25, 2026 17:05
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 25, 2026
@k8s-ci-robot k8s-ci-robot added area/api area/nodeup area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/gcp Issues or PRs related to gcp provider size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/provider/openstack Issues or PRs related to openstack provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 25, 2026
@hakman hakman force-pushed the gossip-migration branch from 4fb6bbb to 92e4e83 Compare April 26, 2026 03:29
@hakman hakman force-pushed the gossip-migration branch from 92e4e83 to 07af909 Compare April 26, 2026 18:17
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 27, 2026
@hakman hakman force-pushed the gossip-migration branch from 4e1693e to 7662bba Compare April 27, 2026 04:43
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 27, 2026
@hakman hakman changed the title WIP gossip: Introduce hybrid worker bootstrap for gossip clusters WIP gossip: introduce hybrid worker bootstrap for gossip clusters Apr 27, 2026
@hakman hakman force-pushed the gossip-migration branch from 7662bba to 0e6d12e Compare April 27, 2026 18:58
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 27, 2026
@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 27, 2026

/assign @justinsb

@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 30, 2026

/test pull-kops-verify-gofmt

@hakman hakman changed the title WIP gossip: introduce hybrid worker bootstrap for gossip clusters gossip: introduce hybrid worker bootstrap for gossip clusters Apr 30, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2026
@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 30, 2026

/hold for feedback

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2026
@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 10, 2026

/test pull-kops-aws-upgrade-gossip
/test pull-kops-gce-upgrade-gossip

@ameukam
Copy link
Copy Markdown
Member

ameukam commented May 11, 2026

/retest

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-aws-gossip
/test pull-kops-aws-gossip-ha
/test pull-kops-azure-gossip
/test pull-kops-azure-gossip-ha
/test pull-kops-gce-gossip
/test pull-kops-gce-gossip-ha
/test pull-kops-do-gossip
/test pull-kops-do-gossip-ha

Workers with a fixed list of API server IPs in BootConfig don't need
protokube to populate /etc/hosts. Use that signal directly instead of
gating on cluster-wide gossip mode.
@hakman hakman force-pushed the gossip-migration branch from 0e6d12e to 2ac74a1 Compare May 11, 2026 08:35
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from justinsb. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hakman added 4 commits May 11, 2026 11:52
Add UseLoadBalancerForKopsController. When it returns true, expose
kops-controller on the API NLB and bake the LB IPs into worker
BootConfig.APIServerIPs so workers bootstrap without protokube.
Control-plane nodes keep gossip.
Advertise control-plane port IPs as KubeAPIServer endpoints so workers
seed /etc/hosts with control-plane addresses and reach kops-controller
on port 3988 directly. Existing firewall rules already permit the
worker to control-plane path.
Expose kops-controller on the cluster load balancer for gossip Azure
clusters with an API LB. Drop the unused UsesPrivateDNS clause.

Signed-off-by: Ciprian Hacman <[email protected]>
Signed-off-by: Ciprian Hacman <[email protected]>
@hakman hakman force-pushed the gossip-migration branch from 2ac74a1 to 5cc3894 Compare May 11, 2026 08:53
@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-aws-gossip
/test pull-kops-aws-gossip-ha
/test pull-kops-azure-gossip
/test pull-kops-azure-gossip-ha
/test pull-kops-gce-gossip
/test pull-kops-gce-gossip-ha
/test pull-kops-do-gossip
/test pull-kops-do-gossip-ha

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-do-gossip

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-gce-upgrade-gossip

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-do-gossip-ha

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-gce-upgrade-gossip

2 similar comments
@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-gce-upgrade-gossip

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 11, 2026

/test pull-kops-gce-upgrade-gossip

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 13, 2026

/test pull-kops-aws-upgrade-gossip
/test pull-kops-azure-upgrade-gossip
/test pull-kops-gce-upgrade-gossip

@hakman
Copy link
Copy Markdown
Member Author

hakman commented May 13, 2026

/test pull-kops-azure-upgrade-gossip

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@hakman: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-e2e-gce-cni-calico 4e1693e link false /test pull-kops-e2e-gce-cni-calico
pull-kops-azure-gossip-ha 5cc3894 link false /test pull-kops-azure-gossip-ha
pull-kops-azure-upgrade-gossip 5cc3894 link false /test pull-kops-azure-upgrade-gossip

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api area/nodeup area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/gcp Issues or PRs related to gcp provider area/provider/openstack Issues or PRs related to openstack provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants