Skip to content

design-proposal: migrate kubernetes workers to Talos and split into kubernetes-nodes#8

Draft
Andrei Kvapil (kvaps) wants to merge 3 commits into
cozystack:mainfrom
kvaps:proposal/kubernetes-nodes-split
Draft

design-proposal: migrate kubernetes workers to Talos and split into kubernetes-nodes#8
Andrei Kvapil (kvaps) wants to merge 3 commits into
cozystack:mainfrom
kvaps:proposal/kubernetes-nodes-split

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps Andrei Kvapil (kvaps) commented May 4, 2026

Summary

A two-phase reshape of the kubernetes application:

  • Phase 1: replace Ubuntu+kubeadm worker bootstrap with Talos + cluster-api-bootstrap-provider-talos, inside the existing monolithic chart. No user-facing API change. Seamless migration via standard CAPI MachineDeployment rolling update. Requires a patch to clastix/cluster-api-control-plane-provider-kamaji to render the talos-csr-signer sidecar in the TenantControlPlane.
  • Phase 2: once workers are uniformly Talos, split the chart into kubernetes (control-plane only) + kubernetes-nodes (one per pool), modelled on the vm-instance / vm-disk precedent. Single backend (kubevirt-talos), no backend.type field yet.

Hybrid clusters (workers in external clouds, BYO, bare-metal) are deferred as Phase 3 to a separate placeholder draft: #9.

Supersedes an earlier scope that tried to land multiple backends and the split in one shot — see commit history for the rescoping rationale.

Test plan

Implementation testing is scoped in the proposal:

  • Phase 1 unit: Talos machineconfig generation, MachineDeployment shape, signer sidecar wiring
  • Phase 1 integration: kind + KubeVirt + Kamaji + CABPT + signer, end-to-end Talos worker join + talosctl
  • Phase 1 migration: synthetic Ubuntu+kubeadm tenant rolled over to Talos via CAPI rolling update
  • Phase 2 unit: kubernetes-nodes chart rendering
  • Phase 2 migration: synthetic kubernetes HelmRelease with nodeGroups migrated to split shape

Propose extracting node pools from the kubernetes application into a
sibling kubernetes-nodes application, modelled on the vm-instance/vm-disk
split. Add a backend abstraction that supports the existing
KubeVirt+kubeadm flow alongside new Talos backends: KubeVirt+Talos via
clastix/talos-csr-signer, and cloud-talos for Hetzner and Azure without
Cluster API.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes a significant architectural change to split the monolithic kubernetes package into separate control-plane and node-pool components, enabling independent lifecycles and multi-backend support for KubeVirt and cloud-native Talos workers. The review feedback identifies a technical error regarding the gRPC protocol (TCP vs UDP), suggests consolidating the node-lifecycle-controller and Talos tokens at the cluster level for better efficiency and simpler configuration, and requests further implementation details for the dependency discovery mechanism in the proposed admission webhook.


Renders the same CAPI/CAPK objects as above, but with a `TalosConfigTemplate` (from `cluster-api-bootstrap-provider-talos`) replacing `KubeadmConfigTemplate`. Worker VMs boot from a Talos image. Bootstrap fetches the Talos machineconfig from CAPI and joins the cluster via standard Talos PKI.

The tenant's `KamajiControlPlane` carries an `additionalContainers` entry running `clastix/talos-csr-signer` listening on UDP/50001, exposed alongside `:6443` on the tenant API LoadBalancer. This is what allows `talosctl` to operate against worker nodes whose control-plane is Kamaji rather than Talos.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposal mentions clastix/talos-csr-signer listening on UDP/50001. Since this is a gRPC-based service (as noted in line 40), it should be TCP/50001 to match standard Talos trustd and gRPC requirements.


When `cluster-autoscaler` scales a `cloud-talos-*` pool down, it deletes the cloud VM. The tenant's apiserver still has a `Node` object that will linger until something deletes it. CAPI was previously the agent doing this; without CAPI, we need an equivalent.

The `node-lifecycle-controller` from `cozystack/local-ccm` is a good fit for this role. The `kubernetes-nodes` chart for `cloud-talos-*` backends renders an NLC Deployment that runs in the management cluster but uses a kubeconfig pointing to the **tenant** apiserver. It watches Node objects with the `ToBeDeletedByClusterAutoscaler:NoSchedule` taint and removes them after a configurable grace period and unreachability check.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Deploying a separate node-lifecycle-controller (NLC) for each kubernetes-nodes release leads to redundant processes watching the same tenant API server. A more efficient approach would be to manage a single NLC instance per tenant cluster (e.g., within the kubernetes control-plane chart) that handles node cleanup for all associated node pools.

## Failure and edge cases

- **`kubernetes-nodes` HelmRelease created before its parent `kubernetes` HelmRelease** → chart `fail`s the render with a clear error message identifying the missing parent. No partial CAPI/autoscaler resources created.
- **Parent `kubernetes` HelmRelease deleted while children exist** → all `kubernetes-nodes` HelmReleases for that cluster fail subsequent reconciles. An admission webhook on `kubernetes` HelmRelease delete blocks the operation if any `kubernetes-nodes` references it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposed admission webhook for blocking kubernetes HelmRelease deletion requires a mechanism to discover name-based dependencies. Clarifying where this logic will reside and how it will perform discovery would strengthen the proposal, especially since the linkage doesn't use standard Kubernetes owner references.

**System layer (chart-managed, not exposed to user):**

- Cluster CA, machine CA, apiserver endpoint — read at template time via `lookup` from the tenant's `KamajiControlPlane`.
- Talos token — generated once per pool, stored alongside the machineconfig.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Generating a TALOS_TOKEN per pool while running a single talos-csr-signer sidecar per cluster (as discussed in Open Question #3 on line 253) introduces a configuration challenge for the signer, as it must be aware of all active tokens. Using a single TALOS_TOKEN per tenant cluster would simplify the sidecar configuration and the kubernetes-nodes chart logic while maintaining sufficient isolation between tenants.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ea550cd-a489-4a9d-954b-557679bbd86e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Andrei Kvapil (kvaps) added a commit to kvaps/cozystack-community that referenced this pull request May 5, 2026
Adjust the proposal to reflect that the controller will be developed as
an independent project under the kilo-io organization, per confirmed
interest from Kilo maintainer @squat. Generalize the CRD from a
tenant-specific TenantMeshLink to a tenant-agnostic ClusterMesh that
references peer clusters through a map of kubeconfig Secrets. Move all
tenant semantics into a dedicated Cozystack integration section that
also accounts for the kubernetes-nodes split (PR cozystack#8) so a single
ClusterMesh covers multi-location, multi-backend tenants.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
Copy link
Copy Markdown
Member Author

@kvaps Andrei Kvapil (kvaps) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action items from an internal meeting. Capturing here so the design can be revised before this lands.

1. Zone / location ownership model is undefined

The proposal does not specify whether a "location" (zone) is configured by the platform admin or by the tenant. This shapes the entire kubernetes-nodes API:

  • Admin-owned location: we need a separate platform-level resource that declares per-zone settings (provider creds, base image, template, network config). Node pools reference it by name and the user never sees cloud-init contents or instance template details.
  • User-owned location: the tenant gets more freedom to pick instance types, templates, disks — closer to the existing KubeVirt path.

These have very different trust and UX implications. The proposal must pick one (or describe an explicit hybrid) and bake it into the values shape.

2. Base image / template source for autoscaler-managed nodes

For cloud-talos-* backends the proposal describes machineconfig injection but is silent on where the base Talos image for the cloud comes from: pre-baked snapshot/VHD, Talos Image Factory schematic computed on the fly, or user-supplied image reference. This needs an explicit field with documented defaults and an operator workflow for adding new images.

3. Credentials when the user brings their own cloud

If we want to support tenants connecting their own cloud accounts (a direction raised in the meeting), the design must cover:

  • Where per-tenant cloud credentials live (tenant-namespace Secret references in the kubernetes-nodes HelmRelease).
  • How the management-cluster cluster-autoscaler reads tenant-specific credentials without privilege escalation.
  • Trust boundary: a compromised tenant must not be able to leak or repurpose those credentials.

The current proposal implicitly assumes platform-supplied credentials. Add an explicit section on tenant-supplied credentials, or push to an open question with a clear statement.

4. Scheduling class semantics break for external backends

tenant.spec.scheduling (allowed and default scheduling classes) is enforced when KubeVirt VMs are scheduled in the management cluster. For cloud-talos-* backends there is no management-cluster pod, so the existing scheduling-class machinery has nothing to act on.

The proposal should:

  • State explicitly that scheduling classes only apply to kubevirt-* backends.
  • Decide whether equivalent constraints (per-tenant restrictions on which kubernetes-nodes configurations are allowed) should be lifted into kubernetes-nodes itself rather than relying on scheduling classes.

5. Backend abstraction: should it be a real contract, not just an enum?

Today backend.type is a string switch and the chart branches internally. As we add backends (KubeVirt-Talos, AWS, GCP, on-prem KVM), this becomes hard to maintain. Worth considering:

  • A minimal contract any backend must implement (CRs produced, credentials interface, autoscaler integration shape).
  • Documenting it so external contributors can add a backend without reading the entire chart.
  • Possibly per-backend sub-charts.

Not blocking, but should be an explicit open question.

6. UI form generation per backend

The dashboard needs a different form per backend.type. The proposal should mention that the values schema is structured so the frontend can render per-backend forms cleanly (tagged-union shape, not a flat values bag with conditional fields).

7. Tangential: management cluster on non-KubeVirt

The same kubernetes-nodes model could plausibly apply to the management cluster's own node pools (running it on Azure VMs, not just KubeVirt). The existing autoscaling docs already show the infra-level pattern. The proposal should:

  • Explicitly say "out of scope — management cluster node management remains separate", or
  • Note that the design is intentionally compatible with that future direction.

Will revise the proposal to address these points.

Drop the multi-backend design (cloud-talos-hetzner, cloud-talos-azure,
LocationProfile, NLC, etc.) and rewrite around two phases of internal
restructuring:

- Phase 1: replace Ubuntu+kubeadm worker bootstrap with Talos via CABPT,
  inside the existing monolithic chart, with no user-facing API change.
  Patch needed in cluster-api-control-plane-provider-kamaji to render
  the talos-csr-signer sidecar in TenantControlPlane.
- Phase 2: once workers are uniformly Talos, split the chart into
  kubernetes (control-plane) + kubernetes-nodes (per-pool). Single
  backend, no backend.type field.

Hybrid / external-cloud clusters are deferred to Phase 3, tracked
separately as a follow-up draft proposal.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
@kvaps Andrei Kvapil (kvaps) changed the title design-proposal: split kubernetes package and add Talos backends design-proposal: migrate kubernetes workers to Talos and split into kubernetes-nodes May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant