WIP OCPNODE-4529: Migrate test case 44493 - configurable terminationGracePeriodSeconds for probes#31170
WIP OCPNODE-4529: Migrate test case 44493 - configurable terminationGracePeriodSeconds for probes#31170BhargaviGudi wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughA new Ginkgo e2e test was added to validate configurable TerminationGracePeriodSeconds for liveness and startup probes by parsing pod Events, measuring kill-to-restart timing, and checking probe-level vs pod-level behavior. ChangesProbe Termination Grace Period E2E Test
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
test/extended/node/node_e2e/node.go (4)
282-432: ⚡ Quick winThree near-identical pod specs — extract a builder helper.
The three pods (
liveness-probe,startup-probe,liveness-probe-no-term) duplicate ~50 lines of spec each (same image, security context, command, ports). Only the probe kind, name, and probe-levelTerminationGracePeriodSecondsdiffer. A small builder would shrink the test by ~100 lines and make the differences explicit.♻️ Sketch
buildProbePod := func(name string, probe *corev1.Probe, kind string) *corev1.Pod { container := corev1.Container{ Name: "test", Image: "quay.io/openshifttest/nginx-alpine@sha256:04f316442d48ba60e3ea0b5a67eb89b0b667abf1c198a3d0056ca748736336a0", Command: []string{"bash", "-c", "sleep 100000000"}, Ports: []corev1.ContainerPort{{ContainerPort: 8080}}, SecurityContext: &corev1.SecurityContext{ AllowPrivilegeEscalation: ptr.To(false), Capabilities: &corev1.Capabilities{Drop: []corev1.Capability{"ALL"}}, }, } switch kind { case "liveness": container.LivenessProbe = probe case "startup": container.StartupProbe = probe } return &corev1.Pod{ ObjectMeta: metav1.ObjectMeta{Name: name, Namespace: namespace}, Spec: corev1.PodSpec{ TerminationGracePeriodSeconds: ptr.To[int64](60), SecurityContext: &corev1.PodSecurityContext{ RunAsNonRoot: ptr.To(true), SeccompProfile: &corev1.SeccompProfile{Type: corev1.SeccompProfileTypeRuntimeDefault}, }, Containers: []corev1.Container{container}, }, } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 282 - 432, The three pod specs (liveness-probe, startup-probe, liveness-probe-no-term) duplicate container/image/security/command/ports setup — extract a builder like buildProbePod(name string, probe *corev1.Probe, kind string) that constructs the common corev1.Container (Image, Command, Ports, SecurityContext) and attaches either LivenessProbe or StartupProbe based on kind, and returns the full *corev1.Pod with the shared PodSpec (pod-level TerminationGracePeriodSeconds and PodSecurityContext); replace the three inline pod literals with calls to buildProbePod("liveness-probe", livenessProbe, "liveness"), buildProbePod("startup-probe", startupProbe, "startup") and buildProbePod("liveness-probe-no-term", nil, "liveness") and keep verifyProbeTermination calls unchanged.
216-279: 🏗️ Heavy liftUse the Events API directly instead of parsing humanized
oc describeoutput.The helper extracts timestamps by splitting describe output and indexing
fields[2]. This is fragile:
- The "Age" column position depends on the describe template and event aggregation. While aggregated events (e.g.,
60s (x3 over 90s)) keep the last-seen value atfields[2], any format change breaks parsing silently.- The substring match
"Started container"and multi-substring match for probe failures could match unrelated events if more containers are added.Use the Events API directly: query
oc.KubeClient().CoreV1().Events(namespace).List(...), filter byinvolvedObjectandreason(Started,Killing), and consumeevent.FirstTimestamp/event.LastTimestampdirectly without parsing humanized durations. This pattern is idiomatic across the test suite.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 216 - 279, The current verifyProbeTermination function is brittle because it parses humanized `oc describe` output and indexes fields[2]; replace that logic with a direct Events API query: inside verifyProbeTermination (and remove dependence on parseDurationToSeconds), call oc.KubeClient().CoreV1().Events(namespace).List(...) (or use the client wrapper available in the test helpers), filter events by event.InvolvedObject.Name == podName and by event.Reason == "Started" for container starts and event.Reason == "Killing" (or the probe-failure reason used by your cluster) for probe/termination events, then use the event.FirstTimestamp/LastTimestamp fields to compute seconds difference and compare to expectedTerminationSec with the same tolerance logic; log the selected event timestamps and keep the polling/wait.PollUntilContextTimeout wrapper and return true when the time diff is within range.
185-213: ⚡ Quick winReplace custom duration parser with
time.ParseDuration.The Go standard library already parses the exact formats kubectl emits (
"45s","1m30s","1h2m3s"). The current implementation silently returns0, nilfor unrecognized inputs (e.g.,"5h"since it matches neither the"m"nor"s"branches), which would cause downstream timing arithmetic to be wrong without surfacing an error. Switching totime.ParseDurationremoves the custom code path and the silent-zero failure mode.♻️ Proposed refactor
- // Helper function to parse duration string like "1m30s" or "45s" to seconds - parseDurationToSeconds := func(durationStr string) (int, error) { - var totalSeconds int - if strings.Contains(durationStr, "m") { - parts := strings.Split(durationStr, "m") - minutes, err := strconv.Atoi(parts[0]) - if err != nil { - return 0, err - } - totalSeconds = minutes * 60 - if len(parts) > 1 && strings.Contains(parts[1], "s") { - secStr := strings.TrimSuffix(parts[1], "s") - if secStr != "" { - seconds, err := strconv.Atoi(secStr) - if err != nil { - return 0, err - } - totalSeconds += seconds - } - } - } else if strings.Contains(durationStr, "s") { - secStr := strings.TrimSuffix(durationStr, "s") - seconds, err := strconv.Atoi(secStr) - if err != nil { - return 0, err - } - totalSeconds = seconds - } - return totalSeconds, nil - } + // Helper function to parse duration string like "1m30s" or "45s" to seconds + parseDurationToSeconds := func(durationStr string) (int, error) { + d, err := time.ParseDuration(durationStr) + if err != nil { + return 0, err + } + return int(d.Seconds()), nil + }The
strconvimport can then be removed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 185 - 213, The custom parser function parseDurationToSeconds should be replaced to use time.ParseDuration: call time.ParseDuration(durationStr), return int(duration.Seconds()) on success and propagate the error on failure so unrecognized inputs (e.g., "5h") don't silently return 0; update the function signature/returns accordingly, remove the now-unused strconv import, and ensure callers still get an int number of seconds from the parsed duration.
288-291: 💤 Low valueUse a pointer helper instead of
&[]T{v}[0].The
&[]int64{60}[0]/&[]bool{true}[0]pattern (used 11 times in this test) allocates a single-element slice solely to take its address. Preferptr.Tofromk8s.io/utils/ptrfor clarity and consistency with the rest of the k8s ecosystem.♻️ Example refactor
import ( ... + "k8s.io/utils/ptr" ... )- TerminationGracePeriodSeconds: &[]int64{60}[0], + TerminationGracePeriodSeconds: ptr.To[int64](60), SecurityContext: &corev1.PodSecurityContext{ - RunAsNonRoot: &[]bool{true}[0], + RunAsNonRoot: ptr.To(true), ... }, ... - AllowPrivilegeEscalation: &[]bool{false}[0], + AllowPrivilegeEscalation: ptr.To(false), ... - TerminationGracePeriodSeconds: &[]int64{10}[0], + TerminationGracePeriodSeconds: ptr.To[int64](10),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 288 - 291, Replace the hacky &[]T{v}[0] pointer constructions with the ptr.To helper from k8s.io/utils/ptr for clarity and consistency: e.g. change TerminationGracePeriodSeconds: &[]int64{60}[0] to use ptr.To(int64(60)) and RunAsNonRoot: &[]bool{true}[0] to ptr.To(true); update all other similar occurrences (about 11 spots) in this file (look for TerminationGracePeriodSeconds, RunAsNonRoot, SeccompProfile usages) and add the import for "k8s.io/utils/ptr" if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 170: Update the test title string used in g.It to reference the correct
Kubernetes field name "terminationGracePeriodSeconds" (currently
"terminationGracePeriod") and ensure any Polarion title/identifier used in the
same test block is updated to match; locate the g.It invocation (the test named
"[OTP] add configurable terminationGracePeriod to liveness and startup probes
[OCP-44493]") and change the human-readable title and Polarion metadata to use
"terminationGracePeriodSeconds" so the test name matches the PR description and
API field.
---
Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 282-432: The three pod specs (liveness-probe, startup-probe,
liveness-probe-no-term) duplicate container/image/security/command/ports setup —
extract a builder like buildProbePod(name string, probe *corev1.Probe, kind
string) that constructs the common corev1.Container (Image, Command, Ports,
SecurityContext) and attaches either LivenessProbe or StartupProbe based on
kind, and returns the full *corev1.Pod with the shared PodSpec (pod-level
TerminationGracePeriodSeconds and PodSecurityContext); replace the three inline
pod literals with calls to buildProbePod("liveness-probe", livenessProbe,
"liveness"), buildProbePod("startup-probe", startupProbe, "startup") and
buildProbePod("liveness-probe-no-term", nil, "liveness") and keep
verifyProbeTermination calls unchanged.
- Around line 216-279: The current verifyProbeTermination function is brittle
because it parses humanized `oc describe` output and indexes fields[2]; replace
that logic with a direct Events API query: inside verifyProbeTermination (and
remove dependence on parseDurationToSeconds), call
oc.KubeClient().CoreV1().Events(namespace).List(...) (or use the client wrapper
available in the test helpers), filter events by event.InvolvedObject.Name ==
podName and by event.Reason == "Started" for container starts and event.Reason
== "Killing" (or the probe-failure reason used by your cluster) for
probe/termination events, then use the event.FirstTimestamp/LastTimestamp fields
to compute seconds difference and compare to expectedTerminationSec with the
same tolerance logic; log the selected event timestamps and keep the
polling/wait.PollUntilContextTimeout wrapper and return true when the time diff
is within range.
- Around line 185-213: The custom parser function parseDurationToSeconds should
be replaced to use time.ParseDuration: call time.ParseDuration(durationStr),
return int(duration.Seconds()) on success and propagate the error on failure so
unrecognized inputs (e.g., "5h") don't silently return 0; update the function
signature/returns accordingly, remove the now-unused strconv import, and ensure
callers still get an int number of seconds from the parsed duration.
- Around line 288-291: Replace the hacky &[]T{v}[0] pointer constructions with
the ptr.To helper from k8s.io/utils/ptr for clarity and consistency: e.g. change
TerminationGracePeriodSeconds: &[]int64{60}[0] to use ptr.To(int64(60)) and
RunAsNonRoot: &[]bool{true}[0] to ptr.To(true); update all other similar
occurrences (about 11 spots) in this file (look for
TerminationGracePeriodSeconds, RunAsNonRoot, SeccompProfile usages) and add the
import for "k8s.io/utils/ptr" if not already present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3b8a78f5-42e4-4bc0-921b-438671a54f52
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
|
Scheduling required tests: |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: b783cc7
New tests seen in this PR at sha: b783cc7
|
|
@BhargaviGudi: This pull request references OCPNODE-4529 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
b783cc7 to
b453e4b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/extended/node/node_e2e/node.go (1)
183-211: ⚡ Quick winPrefer
time.ParseDurationover custom parsing.The custom parser silently returns
0, nilfor unrecognized formats (e.g., empty string, "123") and doesn't handle hours.time.ParseDurationhandles all these cases correctly and is more maintainable.♻️ Proposed fix
- // Helper function to parse duration string like "1m30s" or "45s" to seconds - parseDurationToSeconds := func(durationStr string) (int, error) { - var totalSeconds int - if strings.Contains(durationStr, "m") { - parts := strings.Split(durationStr, "m") - minutes, err := strconv.Atoi(parts[0]) - if err != nil { - return 0, err - } - totalSeconds = minutes * 60 - if len(parts) > 1 && strings.Contains(parts[1], "s") { - secStr := strings.TrimSuffix(parts[1], "s") - if secStr != "" { - seconds, err := strconv.Atoi(secStr) - if err != nil { - return 0, err - } - totalSeconds += seconds - } - } - } else if strings.Contains(durationStr, "s") { - secStr := strings.TrimSuffix(durationStr, "s") - seconds, err := strconv.Atoi(secStr) - if err != nil { - return 0, err - } - totalSeconds = seconds - } - return totalSeconds, nil - } + // Helper function to parse duration string like "1m30s" or "45s" to seconds + parseDurationToSeconds := func(durationStr string) (int, error) { + d, err := time.ParseDuration(durationStr) + if err != nil { + return 0, err + } + return int(d.Seconds()), nil + }This also allows removing the
strconvimport (line 7) if not used elsewhere.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 183 - 211, Replace the custom parseDurationToSeconds implementation with one that uses time.ParseDuration: call time.ParseDuration(durationStr), return an error if parsing fails, convert the resulting time.Duration to seconds (int(d.Seconds())), and ensure edge cases like empty strings or bare numbers bubble up as parse errors; update any callers expecting the same signature and remove the now-unused strconv import if it is no longer referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 323: Change the container Command invocations that use "bash -c" to use
"sh -c" because the nginx-alpine image does not contain bash; update the Command
entries (e.g. the slice literal Command: []string{"bash", "-c", "sleep
100000000"}) to Command: []string{"sh", "-c", "sleep 100000000"} in all three
places referenced (the Command fields at the occurrences around the given diff
and the similar blocks at the other two occurrences mentioned).
---
Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 183-211: Replace the custom parseDurationToSeconds implementation
with one that uses time.ParseDuration: call time.ParseDuration(durationStr),
return an error if parsing fails, convert the resulting time.Duration to seconds
(int(d.Seconds())), and ensure edge cases like empty strings or bare numbers
bubble up as parse errors; update any callers expecting the same signature and
remove the now-unused strconv import if it is no longer referenced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: f58cdb6c-4cd7-49ef-b354-0512c6543c25
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
b453e4b to
af206c7
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: BhargaviGudi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
94cd990 to
084fa7b
Compare
084fa7b to
2aca499
Compare
|
Scheduling required tests: |
…nd startup probes Migrates test from openshift-tests-private to origin. Test validates probe-level terminationGracePeriodSeconds for: - Liveness probes with probe-level terminationGracePeriodSeconds (10s) - Startup probes with probe-level terminationGracePeriodSeconds (10s) - Liveness probes without probe-level (falls back to pod-level 60s) The test creates pods with failing probes and verifies the time difference between probe failure (Killing event) and container restart (Started event) matches the expected termination grace period within acceptable range. Event matching logic parses 'oc describe pod' output for: - Killing events with container name - Started events after restart Updates: - Add test to test/extended/node/node_e2e/node.go - Document test in test/extended/node/README.md Relates: https://issues.redhat.com/browse/OCPBUGS-44493 Signed-off-by: Bhargavi Gudi <[email protected]>
2aca499 to
5b56b62
Compare
|
Scheduling required tests: |
|
@BhargaviGudi: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Migrates test case OCP-44493 from openshift-tests-private to origin.
Validates that Kubernetes liveness and startup probes honor their probe-level
terminationGracePeriodSecondssetting instead of defaulting to the pod-level value.Polarion
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-44493
Bug
OCPBUGS-44493
Test Coverage
Implementation
Testing
Summary by CodeRabbit