-
Notifications
You must be signed in to change notification settings - Fork 828
feat(server,k8s): implement pause/resume with rootfs snapshot support #668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hittyt
merged 24 commits into
alibaba:main
from
fengcone:feature/public-k8s-pause-resume
Apr 28, 2026
Merged
Changes from 7 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
2f6a1f8
feat(server,k8s): implement pause/resume with rootfs snapshot support
fengcone c90d969
fix(server,k8s): clean up field naming and lint issues
fengcone b4930b5
refactor(server,k8s): redesign SandboxSnapshot spec/status boundary a…
fengcone 3b98183
docs(server,k8s): update doc for pause and resume feature
fengcone 3a13c3a
Merge branch 'main' into feature/public-k8s-pause-resume
fengcone 486658b
fix(server): return workload state when snapshot fails but sandbox st…
fengcone 6e302bc
fix(server,k8s): add sandboxsnapshots RBAC and surface resume failures
fengcone 17d7008
fix(image-committer): replace crictl with nerdctl for container disco…
fengcone 7fa689d
fix(k8s): surface resume failures and fix paused sandbox image URI
fengcone dd1876a
fix(controller): requeue on transient API errors in validatePauseSpec
fengcone f86fd2a
fix(server): include full pause config in re-pause snapshot patch
fengcone 7474fa0
fix(server,k8s): copy user labels to snapshot and verify full label i…
fengcone 9efe77c
docs(k8s): complete pause/resume state machine and add Chinese docume…
fengcone 6f6c231
feat(server): expose intermediate pause/resume states with detailed r…
fengcone 38bee80
fix(kubernetes): replace crictl with nerdctl
fengcone dc5902e
feat(kubernetes): implement snapshot-based pause/resume lifecycle
fengcone 40f9230
Merge branch 'main' into feature/public-k8s-pause-resume
fengcone eebfd43
fix(kubernetes): stabilize Kubernetes pooled pause-resume flow
fengcone c00d9f6
Merge branch 'main' into feature/public-k8s-pause-resume
fengcone 89e1743
fix(kubernetes): stabilize Kubernetes pooled pause-resume flow
fengcone 1806da1
fix(kubernetes): harden snapshot commit jobs
fengcone 77e137e
fix(kubernetes): harden pause/resume snapshot lifecycle
fengcone 7e33f90
fix(kubernetes): harden pause/resume snapshot flow
fengcone 8d0d3fa
Merge branch 'main' into feature/public-k8s-pause-resume
fengcone File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # Kubernetes Operator | ||
|
|
||
| ## Overview | ||
|
|
||
| Kubernetes operator managing sandbox environments via custom resources. Provides BatchSandbox (O(1) batch delivery), Pool (resource pooling for fast provisioning), and optional task orchestration. Built with controller-runtime (Kubebuilder). | ||
|
|
||
| ## Structure | ||
|
|
||
| ``` | ||
| kubernetes/ | ||
| ├── apis/sandbox/v1alpha1/ # CRD type definitions | ||
| │ ├── batchsandbox_types.go # BatchSandbox spec + status | ||
| │ ├── pool_types.go # Pool spec + status | ||
| │ └── sandboxsnapshot_types.go | ||
| ├── cmd/ | ||
| │ ├── controller/main.go # Controller manager entry point | ||
| │ ├── image-committer/main.go # Image committer binary (runs as commit Job) | ||
| │ └── task-executor/main.go # Task executor binary (runs as sidecar) | ||
| ├── internal/ | ||
| │ ├── controller/ # Reconciliation loops | ||
| │ ├── scheduler/ # Pool allocation logic (bufferMin/Max, poolMax) | ||
| │ └── utils/ # Utility functions | ||
| ├── config/ | ||
| │ ├── crd/bases/ # Generated CRD YAML manifests | ||
| │ ├── rbac/ # ClusterRole, ClusterRoleBinding | ||
| │ ├── manager/ # Controller deployment manifest | ||
| │ └── samples/ # Example CRD instances | ||
| ├── charts/ # Helm charts (opensandbox-controller, opensandbox-server, opensandbox) | ||
| ├── test/e2e/ # End-to-end tests + testdata | ||
| └── Dockerfile # Controller image build | ||
| Dockerfile.image-committer # Image-committer image build | ||
| ``` | ||
|
|
||
| ## Where to Look | ||
|
|
||
| | Task | File | Notes | | ||
| |------|------|-------| | ||
| | Add CRD field | `apis/sandbox/v1alpha1/*_types.go` | Run `make install` to update CRDs | | ||
| | Controller logic | `internal/controller/` | BatchSandbox + Pool reconciliation | | ||
| | Pool allocation | `internal/scheduler/` | Buffer management, sandbox→pool assignment | | ||
| | Task execution | `cmd/task-executor/`, `internal/task-executor/` | Process-based tasks in sandboxes | | ||
| | Helm values | `charts/opensandbox-controller/values.yaml` | Controller + task-executor image refs | | ||
| | RBAC permissions | `config/rbac/` | ClusterRole rules | | ||
| | E2E tests | `test/e2e/` | Ginkgo/Gomega test framework | | ||
|
|
||
| ## Conventions | ||
|
|
||
| - **Framework**: Kubebuilder with `controller-runtime` v0.21. | ||
| - **Go version**: 1.24. Own `go.mod` (`github.com/alibaba/opensandbox/sandbox-k8s`). | ||
| - **Concurrency**: BatchSandbox controller concurrency=32, Pool controller concurrency=1. | ||
| - **CRD version**: `v1alpha1` under group `sandbox.opensandbox.io`. | ||
| - **Helm charts**: Umbrella chart (`opensandbox`) wraps controller + server subcharts. | ||
| - **Logging**: `klog/v2` + `zap`. Log level configurable via `--zap-log-level` flag. | ||
|
|
||
| ## Anti-Patterns | ||
|
|
||
| - `pause`/`resume` lifecycle uses SandboxSnapshot CRD + image-committer Job to snapshot and restore containers. | ||
| - BatchSandbox deletion waits for running tasks to terminate before removing the resource. | ||
| - Task-executor requires `shareProcessNamespace: true` and `SYS_PTRACE` capability in pod spec. | ||
| - Pool template changes do not affect already-allocated sandboxes. | ||
|
|
||
| ## Commands | ||
|
|
||
| ```bash | ||
| make install # install CRDs into cluster | ||
| make deploy CONTROLLER_IMG=... TASK_EXECUTOR_IMG=... # deploy controller | ||
| make docker-build # build controller image | ||
| make docker-build-task-executor # build task-executor image | ||
| make docker-build-image-committer # build image-committer image | ||
| make test # run tests | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| # Copyright 2025 Alibaba Group Holding Ltd. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # Build stage | ||
| FROM golang:1.24-alpine AS builder | ||
|
|
||
| # Use Aliyun mirror for faster downloads in China | ||
| RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories | ||
|
|
||
| WORKDIR /workspace | ||
|
|
||
| # Copy go mod files | ||
| COPY go.mod go.sum ./ | ||
| RUN GOPROXY=https://goproxy.cn,direct go mod download | ||
|
|
||
| # Copy source code | ||
| COPY cmd/image-committer/ cmd/image-committer/ | ||
|
|
||
| # Build binary | ||
| RUN CGO_ENABLED=0 GOOS=linux go build -o /usr/local/bin/image-committer ./cmd/image-committer/ | ||
|
|
||
| # Runtime stage | ||
| FROM alpine:3.19 | ||
|
|
||
| # Use Aliyun mirror for faster downloads in China | ||
| RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories | ||
|
|
||
| # Install containerd CLI tools | ||
| RUN apk add --no-cache \ | ||
| containerd-ctr \ | ||
| cri-tools \ | ||
| curl \ | ||
| jq \ | ||
| nerdctl | ||
|
|
||
| # Create directories for socket mounts | ||
| RUN mkdir -p /var/run/containerd /run/k8s/containerd | ||
|
|
||
| # Copy the built binary from builder stage | ||
| COPY --from=builder /usr/local/bin/image-committer /usr/local/bin/image-committer | ||
| RUN chmod +x /usr/local/bin/image-committer | ||
|
|
||
| WORKDIR /workspace | ||
|
|
||
| ENTRYPOINT ["/usr/local/bin/image-committer"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.