design-proposal: migrate kubernetes workers to Talos and split into kubernetes-nodes by kvaps · Pull Request #8 · cozystack/community

Andrei Kvapil (kvaps) · 2026-05-04T18:11:22Z

Summary

A two-phase reshape of the kubernetes application:

Phase 1: replace Ubuntu+kubeadm worker bootstrap with Talos + cluster-api-bootstrap-provider-talos, inside the existing monolithic chart. No user-facing API change. Seamless migration via standard CAPI MachineDeployment rolling update. Requires a patch to clastix/cluster-api-control-plane-provider-kamaji to render the talos-csr-signer sidecar in the TenantControlPlane.
Phase 2: once workers are uniformly Talos, split the chart into kubernetes (control-plane only) + kubernetes-nodes (one per pool), modelled on the vm-instance / vm-disk precedent. Single backend (kubevirt-talos), no backend.type field yet.

Hybrid clusters (workers in external clouds, BYO, bare-metal) are deferred as Phase 3 to a separate placeholder draft: #9.

Supersedes an earlier scope that tried to land multiple backends and the split in one shot — see commit history for the rescoping rationale.

Test plan

Implementation testing is scoped in the proposal:

Phase 1 unit: Talos machineconfig generation, MachineDeployment shape, signer sidecar wiring
Phase 1 integration: kind + KubeVirt + Kamaji + CABPT + signer, end-to-end Talos worker join + talosctl
Phase 1 migration: synthetic Ubuntu+kubeadm tenant rolled over to Talos via CAPI rolling update
Phase 2 unit: kubernetes-nodes chart rendering
Phase 2 migration: synthetic kubernetes HelmRelease with nodeGroups migrated to split shape

Propose extracting node pools from the kubernetes application into a sibling kubernetes-nodes application, modelled on the vm-instance/vm-disk split. Add a backend abstraction that supports the existing KubeVirt+kubeadm flow alongside new Talos backends: KubeVirt+Talos via clastix/talos-csr-signer, and cloud-talos for Hetzner and Azure without Cluster API. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>

gemini-code-assist

Code Review

This pull request proposes a significant architectural change to split the monolithic kubernetes package into separate control-plane and node-pool components, enabling independent lifecycles and multi-backend support for KubeVirt and cloud-native Talos workers. The review feedback identifies a technical error regarding the gRPC protocol (TCP vs UDP), suggests consolidating the node-lifecycle-controller and Talos tokens at the cluster level for better efficiency and simpler configuration, and requests further implementation details for the dependency discovery mechanism in the proposed admission webhook.

gemini-code-assist · 2026-05-04T18:12:29Z

+
+Renders the same CAPI/CAPK objects as above, but with a `TalosConfigTemplate` (from `cluster-api-bootstrap-provider-talos`) replacing `KubeadmConfigTemplate`. Worker VMs boot from a Talos image. Bootstrap fetches the Talos machineconfig from CAPI and joins the cluster via standard Talos PKI.
+
+The tenant's `KamajiControlPlane` carries an `additionalContainers` entry running `clastix/talos-csr-signer` listening on UDP/50001, exposed alongside `:6443` on the tenant API LoadBalancer. This is what allows `talosctl` to operate against worker nodes whose control-plane is Kamaji rather than Talos.


The proposal mentions clastix/talos-csr-signer listening on UDP/50001. Since this is a gRPC-based service (as noted in line 40), it should be TCP/50001 to match standard Talos trustd and gRPC requirements.

gemini-code-assist · 2026-05-04T18:12:29Z

+
+When `cluster-autoscaler` scales a `cloud-talos-*` pool down, it deletes the cloud VM. The tenant's apiserver still has a `Node` object that will linger until something deletes it. CAPI was previously the agent doing this; without CAPI, we need an equivalent.
+
+The `node-lifecycle-controller` from `cozystack/local-ccm` is a good fit for this role. The `kubernetes-nodes` chart for `cloud-talos-*` backends renders an NLC Deployment that runs in the management cluster but uses a kubeconfig pointing to the **tenant** apiserver. It watches Node objects with the `ToBeDeletedByClusterAutoscaler:NoSchedule` taint and removes them after a configurable grace period and unreachability check.


Deploying a separate node-lifecycle-controller (NLC) for each kubernetes-nodes release leads to redundant processes watching the same tenant API server. A more efficient approach would be to manage a single NLC instance per tenant cluster (e.g., within the kubernetes control-plane chart) that handles node cleanup for all associated node pools.

gemini-code-assist · 2026-05-04T18:12:29Z

+## Failure and edge cases
+
+- **`kubernetes-nodes` HelmRelease created before its parent `kubernetes` HelmRelease** → chart `fail`s the render with a clear error message identifying the missing parent. No partial CAPI/autoscaler resources created.
+- **Parent `kubernetes` HelmRelease deleted while children exist** → all `kubernetes-nodes` HelmReleases for that cluster fail subsequent reconciles. An admission webhook on `kubernetes` HelmRelease delete blocks the operation if any `kubernetes-nodes` references it.


The proposed admission webhook for blocking kubernetes HelmRelease deletion requires a mechanism to discover name-based dependencies. Clarifying where this logic will reside and how it will perform discovery would strengthen the proposal, especially since the linkage doesn't use standard Kubernetes owner references.

gemini-code-assist · 2026-05-04T18:12:29Z

+**System layer (chart-managed, not exposed to user):**
+
+- Cluster CA, machine CA, apiserver endpoint — read at template time via `lookup` from the tenant's `KamajiControlPlane`.
+- Talos token — generated once per pool, stored alongside the machineconfig.


Generating a TALOS_TOKEN per pool while running a single talos-csr-signer sidecar per cluster (as discussed in Open Question #3 on line 253) introduces a configuration challenge for the signer, as it must be aware of all active tokens. Using a single TALOS_TOKEN per tenant cluster would simplify the sidecar configuration and the kubernetes-nodes chart logic while maintaining sufficient isolation between tenants.

coderabbitai · 2026-05-04T18:13:03Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ea550cd-a489-4a9d-954b-557679bbd86e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

@squat

Adjust the proposal to reflect that the controller will be developed as an independent project under the kilo-io organization, per confirmed interest from Kilo maintainer @squat. Generalize the CRD from a tenant-specific TenantMeshLink to a tenant-agnostic ClusterMesh that references peer clusters through a map of kubeconfig Secrets. Move all tenant semantics into a dedicated Cozystack integration section that also accounts for the kubernetes-nodes split (PR cozystack#8) so a single ClusterMesh covers multi-location, multi-backend tenants. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>

Andrei Kvapil (kvaps)

Action items from an internal meeting. Capturing here so the design can be revised before this lands.

1. Zone / location ownership model is undefined

The proposal does not specify whether a "location" (zone) is configured by the platform admin or by the tenant. This shapes the entire kubernetes-nodes API:

Admin-owned location: we need a separate platform-level resource that declares per-zone settings (provider creds, base image, template, network config). Node pools reference it by name and the user never sees cloud-init contents or instance template details.
User-owned location: the tenant gets more freedom to pick instance types, templates, disks — closer to the existing KubeVirt path.

These have very different trust and UX implications. The proposal must pick one (or describe an explicit hybrid) and bake it into the values shape.

2. Base image / template source for autoscaler-managed nodes

For cloud-talos-* backends the proposal describes machineconfig injection but is silent on where the base Talos image for the cloud comes from: pre-baked snapshot/VHD, Talos Image Factory schematic computed on the fly, or user-supplied image reference. This needs an explicit field with documented defaults and an operator workflow for adding new images.

3. Credentials when the user brings their own cloud

If we want to support tenants connecting their own cloud accounts (a direction raised in the meeting), the design must cover:

Where per-tenant cloud credentials live (tenant-namespace Secret references in the kubernetes-nodes HelmRelease).
How the management-cluster cluster-autoscaler reads tenant-specific credentials without privilege escalation.
Trust boundary: a compromised tenant must not be able to leak or repurpose those credentials.

The current proposal implicitly assumes platform-supplied credentials. Add an explicit section on tenant-supplied credentials, or push to an open question with a clear statement.

4. Scheduling class semantics break for external backends

tenant.spec.scheduling (allowed and default scheduling classes) is enforced when KubeVirt VMs are scheduled in the management cluster. For cloud-talos-* backends there is no management-cluster pod, so the existing scheduling-class machinery has nothing to act on.

The proposal should:

State explicitly that scheduling classes only apply to kubevirt-* backends.
Decide whether equivalent constraints (per-tenant restrictions on which kubernetes-nodes configurations are allowed) should be lifted into kubernetes-nodes itself rather than relying on scheduling classes.

5. Backend abstraction: should it be a real contract, not just an enum?

Today backend.type is a string switch and the chart branches internally. As we add backends (KubeVirt-Talos, AWS, GCP, on-prem KVM), this becomes hard to maintain. Worth considering:

A minimal contract any backend must implement (CRs produced, credentials interface, autoscaler integration shape).
Documenting it so external contributors can add a backend without reading the entire chart.
Possibly per-backend sub-charts.

Not blocking, but should be an explicit open question.

6. UI form generation per backend

The dashboard needs a different form per backend.type. The proposal should mention that the values schema is structured so the frontend can render per-backend forms cleanly (tagged-union shape, not a flat values bag with conditional fields).

7. Tangential: management cluster on non-KubeVirt

The same kubernetes-nodes model could plausibly apply to the management cluster's own node pools (running it on Azure VMs, not just KubeVirt). The existing autoscaling docs already show the infra-level pattern. The proposal should:

Explicitly say "out of scope — management cluster node management remains separate", or
Note that the design is intentionally compatible with that future direction.

Will revise the proposal to address these points.

Drop the multi-backend design (cloud-talos-hetzner, cloud-talos-azure, LocationProfile, NLC, etc.) and rewrite around two phases of internal restructuring: - Phase 1: replace Ubuntu+kubeadm worker bootstrap with Talos via CABPT, inside the existing monolithic chart, with no user-facing API change. Patch needed in cluster-api-control-plane-provider-kamaji to render the talos-csr-signer sidecar in TenantControlPlane. - Phase 2: once workers are uniformly Talos, split the chart into kubernetes (control-plane) + kubernetes-nodes (per-pool). Single backend, no backend.type field. Hybrid / external-cloud clusters are deferred to Phase 3, tracked separately as a follow-up draft proposal. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>

Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

Andrei Kvapil (kvaps) commented May 11, 2026

View reviewed changes

Andrei Kvapil (kvaps) mentioned this pull request May 11, 2026

design-proposal: hybrid kubernetes clusters (Phase 3 placeholder) #9

Draft

design-proposal: link to Phase 3 draft (PR cozystack#9)

017cf28

Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>

Andrei Kvapil (kvaps) changed the title ~~design-proposal: split kubernetes package and add Talos backends~~ design-proposal: migrate kubernetes workers to Talos and split into kubernetes-nodes May 11, 2026

This was referenced May 12, 2026

feat(kubernetes): bootstrap tenant workers with Talos via CABPT cozystack/cozystack#2610

Open

feat(kubernetes-nodes): split off worker node pool into its own app (Phase 2) cozystack/cozystack#2620

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design-proposal: migrate kubernetes workers to Talos and split into kubernetes-nodes#8

design-proposal: migrate kubernetes workers to Talos and split into kubernetes-nodes#8
Andrei Kvapil (kvaps) wants to merge 3 commits into
cozystack:mainfrom
kvaps:proposal/kubernetes-nodes-split

Andrei Kvapil (kvaps) commented May 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Review skipped

Uh oh!

Andrei Kvapil (kvaps) left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		Renders the same CAPI/CAPK objects as above, but with a `TalosConfigTemplate` (from `cluster-api-bootstrap-provider-talos`) replacing `KubeadmConfigTemplate`. Worker VMs boot from a Talos image. Bootstrap fetches the Talos machineconfig from CAPI and joins the cluster via standard Talos PKI.

		The tenant's `KamajiControlPlane` carries an `additionalContainers` entry running `clastix/talos-csr-signer` listening on UDP/50001, exposed alongside `:6443` on the tenant API LoadBalancer. This is what allows `talosctl` to operate against worker nodes whose control-plane is Kamaji rather than Talos.


		When `cluster-autoscaler` scales a `cloud-talos-*` pool down, it deletes the cloud VM. The tenant's apiserver still has a `Node` object that will linger until something deletes it. CAPI was previously the agent doing this; without CAPI, we need an equivalent.

		The `node-lifecycle-controller` from `cozystack/local-ccm` is a good fit for this role. The `kubernetes-nodes` chart for `cloud-talos-` backends renders an NLC Deployment that runs in the management cluster but uses a kubeconfig pointing to the tenant* apiserver. It watches Node objects with the `ToBeDeletedByClusterAutoscaler:NoSchedule` taint and removes them after a configurable grace period and unreachability check.

Conversation

Andrei Kvapil (kvaps) commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Andrei Kvapil (kvaps) left a comment

Choose a reason for hiding this comment

1. Zone / location ownership model is undefined

2. Base image / template source for autoscaler-managed nodes

3. Credentials when the user brings their own cloud

4. Scheduling class semantics break for external backends

5. Backend abstraction: should it be a real contract, not just an enum?

6. UI form generation per backend

7. Tangential: management cluster on non-KubeVirt

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented May 4, 2026 •

edited

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading