ci(manfest) updates the commit-latest script and action for the updated repo structure #3304
ci(manfest) updates the commit-latest script and action for the updated repo structure #3304
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThe workflow trigger paths were broadened from a single params file to variant-specific patterns for The Python script was refactored from single-file inspection to variant-scoped processing: it parses Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Actionable Issuesscripts/update-commit-latest-env.py
.github/workflows/update-commit-latest-env.yaml
Only actionable issues are listed above; address these to reduce risk of code execution, path traversal, secret leakage, and integrity failures. 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3124f74 to
39b5a3c
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3304 +/- ##
========================================
- Coverage 3.34% 3.23% -0.11%
========================================
Files 31 31
Lines 3562 3680 +118
Branches 572 598 +26
========================================
Hits 119 119
- Misses 3441 3559 +118
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
39b5a3c to
a7acbb9
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
| # Using 'docker://' prefix is required for skopeo to identify the transport. | ||
| full_image_url = f"docker://{image_url}" | ||
|
|
||
| # Use 'inspect --config' which is much faster as it only fetches the config blob. |
There was a problem hiding this comment.
why remove the comments, IMO quite nonobvious info
| try: | ||
| async with semaphore: | ||
| log.info(f"Semaphore acquired, starting skopeo inspect for: {image_url}") | ||
| # Create an asynchronous subprocess |
| """ | ||
| Main function to orchestrate the concurrent inspection of multiple images. | ||
| """ | ||
| semaphore = asyncio.Semaphore(22) # Limit concurrent skopeo processes |
There was a problem hiding this comment.
this is somewhat valuable comment
There was a problem hiding this comment.
Thank you for letting me know they we important I'll bring them back
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/update-commit-latest-env.yaml (1)
6-7:⚠️ Potential issue | 🟡 MinorCron comment is incorrect:
0 11 * * *is 11:00 UTC, not midnight.Either fix the comment or adjust the cron expression.
- - cron: '0 11 * * *' # Daily at midnight UTC + - cron: '0 11 * * *' # Daily at 11:00 UTC🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/update-commit-latest-env.yaml around lines 6 - 7, The cron comment for the schedule block is wrong: the existing schedule: - cron: '0 11 * * *' runs at 11:00 UTC, not midnight; update either the cron expression to '0 0 * * *' if you want daily midnight UTC, or change the comment to "Daily at 11:00 UTC" to match the current cron, making the edit next to the schedule -> cron: '0 11 * * *' entry.
🧹 Nitpick comments (4)
scripts/update-commit-latest-env.py (3)
174-174: Magic number 22 for semaphore concurrency limit.Extract to a named constant or CLI argument for clarity and easier tuning.
+MAX_CONCURRENT_INSPECTIONS = 22 + async def inspect_urls(urls: typing.Iterable[str]) -> dict[str, str | None]: unique = list(dict.fromkeys(urls)) - semaphore = asyncio.Semaphore(22) + semaphore = asyncio.Semaphore(MAX_CONCURRENT_INSPECTIONS)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` at line 174, Replace the hard-coded semaphore = asyncio.Semaphore(22) with a named constant or configurable parameter: introduce a constant like MAX_CONCURRENT_TASKS (or add a CLI/ENV argument parsed in main) and use asyncio.Semaphore(MAX_CONCURRENT_TASKS) so the concurrency limit is clearly named and can be tuned; update any help text or defaults where the script parses arguments and use that symbol in place of the magic number in scripts/update-commit-latest-env.py.
57-65: Missing file existence check before open.
load_params_envwill raiseFileNotFoundErrorwith a raw traceback. Consider checking existence and providing a clearer error message, especially since the workflow depends on both files being present.Proposed fix
def load_params_env(path: pathlib.Path) -> list[tuple[str, str]]: + if not path.is_file(): + log.error("params-latest.env not found", path=str(path)) + sys.exit(1) with open(path, "rt") as file:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` around lines 57 - 65, The function load_params_env currently opens the given pathlib.Path directly and will raise a raw FileNotFoundError; add an explicit existence check at the top of load_params_env (use path.exists() and/or path.is_file()) and raise a FileNotFoundError (or ValueError) with a clear, descriptive message that includes the path value if the file is missing or not a regular file, so callers get a readable error instead of an opaque traceback.
78-82: Inconsistent exit patterns: mix ofsys.exit(1)andraise SystemExit(1).Lines 78, 82, 99, 180 use
sys.exit(1)while line 200 usesraise SystemExit(1). Pick one for consistency—sys.exit(1)is more common in scripts.- raise SystemExit(1) + sys.exit(1)Also applies to: 99-100, 199-200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` around lines 78 - 82, The script mixes exit styles (sys.exit(1) and raise SystemExit(1)); standardize to sys.exit(1) throughout: replace the raise SystemExit(1) occurrence with sys.exit(1) so all exits use sys.exit, ensuring the sys module is already imported and leaving existing sys.exit(1) calls (e.g., in the blocks referencing odh_by_var[var] and dummy checks) unchanged; update every instance (around the formerly raise SystemExit usage and the other noted exit calls) to use sys.exit(1) for consistency..github/workflows/update-commit-latest-env.yaml (1)
46-54:git diffmay fail if files don't exist yet.If
commit-latest.envfiles are newly created (first run),git diffon untracked files returns 0 (no error, no diff shown), butgit addwill stage them. The logic works, but consider usinggit diff --exit-codeexplicitly or checking for untracked files to capture new file creation.More robust change detection
- if ! git diff --quiet \ - manifests/odh/base/commit-latest.env \ - manifests/rhoai/base/commit-latest.env - then - git add manifests/odh/base/commit-latest.env manifests/rhoai/base/commit-latest.env + git add manifests/odh/base/commit-latest.env manifests/rhoai/base/commit-latest.env + if ! git diff --cached --quiet; then git commit -m "ci: update commit SHAs for image digests changes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/update-commit-latest-env.yaml around lines 46 - 54, The current check using "git diff --quiet" can miss newly added untracked files; change the flow to stage the two commit-latest.env files first (git add manifests/odh/base/commit-latest.env manifests/rhoai/base/commit-latest.env) and then test staged changes with "git diff --cached --quiet --exit-code" (or "! git diff --cached --quiet --exit-code" in the if condition) before running git commit and git push; update the conditional around the staging/commit logic so new files are detected reliably by referencing the existing manifest file names and the git add/git diff --cached commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 96-100: The current re.sub(r"-n$", "-commit-n", var) silently
leaves var unchanged when it doesn't end with "-n"; update the logic around the
lines.append call to validate var first (e.g., check var.endswith("-n") or use
re.search(r"-n$") on var) and if it doesn't match, call log.error (include var
and url) and exit or skip with a warning, then only append (modified_var,
vcs[:7]) when the validation passes so you avoid producing malformed output;
refer to the variables/functions vcs_by_url, vcs, var, lines.append and the
re.sub usage to locate where to add the check and error handling.
- Around line 53-54: resolve_manifest_dir currently resolves a user-supplied
path against PROJECT_ROOT but does not validate that the resolved path remains
inside PROJECT_ROOT, allowing path traversal; update resolve_manifest_dir to
resolve the candidate (use .resolve(strict=False) if needed) and then verify the
resulting path is inside PROJECT_ROOT (e.g., using
Path.is_relative_to(PROJECT_ROOT) or comparing parents via a loop) and raise an
exception (ValueError) if it is not; ensure the function still returns the
resolved path when validation passes.
---
Outside diff comments:
In @.github/workflows/update-commit-latest-env.yaml:
- Around line 6-7: The cron comment for the schedule block is wrong: the
existing schedule: - cron: '0 11 * * *' runs at 11:00 UTC, not midnight; update
either the cron expression to '0 0 * * *' if you want daily midnight UTC, or
change the comment to "Daily at 11:00 UTC" to match the current cron, making the
edit next to the schedule -> cron: '0 11 * * *' entry.
---
Nitpick comments:
In @.github/workflows/update-commit-latest-env.yaml:
- Around line 46-54: The current check using "git diff --quiet" can miss newly
added untracked files; change the flow to stage the two commit-latest.env files
first (git add manifests/odh/base/commit-latest.env
manifests/rhoai/base/commit-latest.env) and then test staged changes with "git
diff --cached --quiet --exit-code" (or "! git diff --cached --quiet --exit-code"
in the if condition) before running git commit and git push; update the
conditional around the staging/commit logic so new files are detected reliably
by referencing the existing manifest file names and the git add/git diff
--cached commands.
In `@scripts/update-commit-latest-env.py`:
- Line 174: Replace the hard-coded semaphore = asyncio.Semaphore(22) with a
named constant or configurable parameter: introduce a constant like
MAX_CONCURRENT_TASKS (or add a CLI/ENV argument parsed in main) and use
asyncio.Semaphore(MAX_CONCURRENT_TASKS) so the concurrency limit is clearly
named and can be tuned; update any help text or defaults where the script parses
arguments and use that symbol in place of the magic number in
scripts/update-commit-latest-env.py.
- Around line 57-65: The function load_params_env currently opens the given
pathlib.Path directly and will raise a raw FileNotFoundError; add an explicit
existence check at the top of load_params_env (use path.exists() and/or
path.is_file()) and raise a FileNotFoundError (or ValueError) with a clear,
descriptive message that includes the path value if the file is missing or not a
regular file, so callers get a readable error instead of an opaque traceback.
- Around line 78-82: The script mixes exit styles (sys.exit(1) and raise
SystemExit(1)); standardize to sys.exit(1) throughout: replace the raise
SystemExit(1) occurrence with sys.exit(1) so all exits use sys.exit, ensuring
the sys module is already imported and leaving existing sys.exit(1) calls (e.g.,
in the blocks referencing odh_by_var[var] and dummy checks) unchanged; update
every instance (around the formerly raise SystemExit usage and the other noted
exit calls) to use sys.exit(1) for consistency.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: b9eb554d-0fbe-4fbd-ba63-89960ae7d50a
📒 Files selected for processing (4)
.github/workflows/update-commit-latest-env.yamlmanifests/odh/base/commit-latest.envmanifests/rhoai/base/commit-latest.envscripts/update-commit-latest-env.py
scripts/update-commit-latest-env.py
Outdated
| def resolve_rhoai_urls( | ||
| rhoai_pairs: list[tuple[str, str]], | ||
| odh_by_var: dict[str, str], | ||
| ) -> list[tuple[str, str]]: | ||
| """Replace RHOAI dummy placeholders with the same variable's image URL from ODH params.""" |
There was a problem hiding this comment.
wait, what? @coderabbitai how can this possibly produce the right answer?
There was a problem hiding this comment.
🧩 Analysis chain
Let me look at the actual env files and the failing tests to understand this fully.
🏁 Script executed:
#!/bin/bash
# Check variable naming in params-latest.env
echo "=== ODH params-latest.env ==="
head -30 manifests/odh/base/params-latest.env 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== RHOAI params-latest.env ==="
head -30 manifests/rhoai/base/params-latest.env 2>/dev/null || echo "NOT FOUND"
echo ""
# Look at failing tests
echo "=== test_image_manifests_version_alignment ==="
rg -n "test_image_manifests_version_alignment" --type py -A 40
echo ""
echo "=== test_image_pyprojects ==="
rg -n "test_image_pyprojects" --type py -A 20
echo ""
# Check how params-latest and commit-latest are related in tests
echo "=== conftest / fixtures ==="
rg -n "manifest_directory" --type py -A 5 -B 5 | head -100
Repository: opendatahub-io/notebooks
Length of output: 10530
🏁 Script executed:
#!/bin/bash
echo "=== conftest.py manifests_directory fixture ==="
rg -n "manifests_directory" --type py -A 10 -B 2
echo ""
echo "=== load_manifests_file_for ==="
rg -n "load_manifests_file_for" --type py -A 30 -B 2
echo ""
echo "=== Manifest class ==="
rg -n "class Manifest" --type py -A 20
echo ""
echo "=== old manifests/base directory ==="
ls manifests/base/ 2>/dev/null || echo "NOT FOUND (old path)"
ls manifests/odh/base/ 2>/dev/null
ls manifests/rhoai/base/ 2>/dev/null
Repository: opendatahub-io/notebooks
Length of output: 22984
🏁 Script executed:
#!/bin/bash
echo "=== MANIFESTS_ODH_DIR and MANIFESTS_RHOAI_DIR definitions ==="
rg -n "MANIFESTS_ODH_DIR\|MANIFESTS_RHOAI_DIR" --type py -A 2 -B 2
echo ""
echo "=== workflow file ==="
cat .github/workflows/update-commit-latest-env.yaml 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== Check if manifests/odh/base/base exists ==="
ls manifests/odh/base/base/ 2>/dev/null || echo "manifests/odh/base/base/ does NOT exist"
ls manifests/rhoai/base/base/ 2>/dev/null || echo "manifests/rhoai/base/base/ does NOT exist"
Repository: opendatahub-io/notebooks
Length of output: 1877
🏁 Script executed:
#!/bin/bash
echo "=== Search for MANIFESTS constants in tests/manifests.py ==="
cat tests/manifests.py | head -60
echo ""
echo "=== Explicit grep for MANIFESTS dir constants ==="
grep -n "MANIFESTS" tests/manifests.py | head -30
echo ""
echo "=== Check rstudio-notebook-imagestream.yaml in rhoai ==="
ls manifests/rhoai/base/rstudio-notebook-imagestream.yaml 2>/dev/null || echo "MISSING in rhoai"
ls manifests/odh/base/rstudio-notebook-imagestream.yaml 2>/dev/null || echo "MISSING in odh"
Repository: opendatahub-io/notebooks
Length of output: 4324
@jiridanek — here's a full trace of what resolve_rhoai_urls does and why it produces the correct result:
The design: manifests/rhoai/base/params-latest.env uses dummy for every single one of its 18 entries (confirmed from the actual file). The variable names are identical to ODH's. resolve_rhoai_urls substitutes each dummy with the real image URL from ODH params-latest.env by matching on variable name:
RHOAI: odh-workbench-jupyter-minimal-cpu-py312-ubi9-n = dummy
ODH: odh-workbench-jupyter-minimal-cpu-py312-ubi9-n = quay.io/opendatahub/odh-workbench-jupyter-minimal-cpu-py312-ubi9:3.4-v1.43
After substitution, RHOAI's resolved pairs are the ODH image URLs → skopeo inspect is called on those to get vcs-ref → RHOAI's commit-latest.env ends up with the same commit hashes as ODH's. This matches the PR's actual diff: both odh/base/commit-latest.env and rhoai/base/commit-latest.env were updated to the exact same hashes (7f2342d, 335acea, c2e4c8a, etc.).
So it does produce the right answer — the premise is that RHOAI and ODH are built from the same git source, so ODH image vcs-ref values are authoritative for both. The dummy convention is an explicit declaration of that dependency.
However, the 2 failing test_image_manifests_version_alignment tests appear to be a separate, unrelated issue that this PR has exposed:
manifests/odh/base/rstudio-notebook-imagestream.yaml✅ existsmanifests/rhoai/base/rstudio-notebook-imagestream.yaml❌ missing
The test is parametrized over both MANIFESTS_ODH_DIR (manifests/odh) and MANIFESTS_RHOAI_DIR (manifests/rhoai), and get_source_of_truth_filepath will try to open manifests/rhoai/base/rstudio-notebook-imagestream.yaml which doesn't exist, raising FileNotFoundError. The RHOAI base/ directory appears to be missing several imagestream files compared to ODH.
Could you clarify — is rstudio-notebook-imagestream.yaml intentionally absent from RHOAI (in which case the test needs a guard), or does it need to be added/symlinked?
There was a problem hiding this comment.
the premise is that RHOAI and ODH are built from the same git source, so ODH image vcs-ref values are authoritative for both
@coderabbitai exactly, this is a false premise, RHOAI/rhds images are built from rhoai-3.x branch in red-hat-data-services/notebooks repo, the commit hash will be naturally different
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
echo "=== RHOAI params-latest.env (full) ==="
cat manifests/rhoai/base/params-latest.env 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ODH params-latest.env (first few lines) ==="
head -10 manifests/odh/base/params-latest.env 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== RHOAI commit-latest.env (from this PR) ==="
cat manifests/rhoai/base/commit-latest.env 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ODH commit-latest.env (from this PR) ==="
cat manifests/odh/base/commit-latest.env 2>/dev/null || echo "NOT FOUND"
Repository: opendatahub-io/notebooks
Length of output: 4961
@jiridanek — confirmed. The data makes the problem unambiguous: manifests/rhoai/base/commit-latest.env and manifests/odh/base/commit-latest.env are byte-for-byte identical for all 18 shared entries in this PR's diff. That is the smoking gun.
The bug chain in resolve_rhoai_urls:
- RHOAI
params-latest.env— every entry isdummy resolve_rhoai_urlssubstitutes eachdummywith the ODH image URL (e.g.quay.io/opendatahub/odh-workbench-jupyter-minimal-cpu-py312-ubi9:3.4-v1.43)skopeo inspectruns against those ODH images → returns avcs-refpointing to a commit inopendatahub-io/notebooks- That ODH commit hash is written into
manifests/rhoai/base/commit-latest.env
So commit-latest.env for RHOAI is tracking ODH source commits, not RHOAI/RHDS source commits from red-hat-data-services/notebooks @ rhoai-3.x.
The root cause is that manifests/rhoai/base/params-latest.env must be populated with actual RHOAI image URLs (the images built from red-hat-data-services/notebooks) rather than dummy. Only then will skopeo inspect return the correct vcs-ref values from the RHOAI build branch. The dummy substitution mechanism, as designed here, can never yield correct results for RHOAI because it always redirects to ODH images.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
There was a problem hiding this comment.
@coderabbitai manifests/rhoai/base/params-latest.env must remain dummy, it is populated by rhods-operator during openshift deployment, as described at https://github.com/opendatahub-io/notebooks/blob/main/manifests/README.md. dummy values serve to catch when for some reason the substitutions don't happen correctly; it's better to fail the deployment than to deploy the wrong ("fallback") images
There was a problem hiding this comment.
@jiridanek — understood and noted. That is an important design constraint.
This actually strengthens the bug finding: the resolve_rhoai_urls function, rather than being a valid workaround, is definitively broken for RHOAI. Because:
- RHOAI
params-latest.envalways hasdummyvalues in the repo (by design, as a deployment-safety guard) resolve_rhoai_urlsreplaces thosedummyvalues with ODH image URLs frommanifests/odh/base/params-latest.envskopeo inspectthen runs against those ODH images → returnsvcs-reffromopendatahub-io/notebooks- Those ODH commit hashes land in
manifests/rhoai/base/commit-latest.env— which should track commits fromred-hat-data-services/notebooks @ rhoai-3.x
The result is that RHOAI's commit-latest.env silently contains the wrong commit hashes. The CI script, as written in the diff, can never produce correct values for RHOAI — it would need the actual RHOAI image URLs (the real digests served by the registry) to be provided at execution time through a different mechanism (e.g., passed in via CI variable or fetched from the registry catalog), rather than falling back to ODH's params-latest.env.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
a7acbb9 to
f01e078
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
scripts/update-commit-latest-env.py (2)
68-86:⚠️ Potential issue | 🔴 Critical
resolve_rhoai_urlsis writing RHOAI output from ODH images.With the current manifests, every
rhoaientry isdummy, so this function always swaps in the matching ODH URL and inspects that image. The generatedmanifests/rhoai/base/commit-latest.envtherefore records ODHvcs-refvalues instead of the RHDS/RHOAI commits. Fail fast ondummyhere until a real RHOAI image source exists; reusing ODH URLs cannot produce correct RHOAI SHAs.Minimal safe guard
def resolve_rhoai_urls( rhoai_pairs: list[tuple[str, str]], odh_by_var: dict[str, str], ) -> list[tuple[str, str]]: """Replace RHOAI dummy placeholders with the same variable's image URL from ODH params.""" out: list[tuple[str, str]] = [] for var, raw in rhoai_pairs: if raw.strip().lower() == DUMMY_IMAGE: - if var not in odh_by_var: - log.error("No ODH params-latest entry to resolve RHOAI dummy image", variable=var) - sys.exit(1) - url = odh_by_var[var] - if url.strip().lower() == DUMMY_IMAGE: - log.error("ODH params-latest still has dummy for variable", variable=var) - sys.exit(1) - out.append((var, url)) + log.error( + "RHOAI params-latest.env entry cannot be 'dummy'; provide the real RHOAI image URL", + variable=var, + ) + sys.exit(1) else: out.append((var, raw)) return out🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` around lines 68 - 86, The function resolve_rhoai_urls currently replaces RHOAI DUMMY_IMAGE entries with ODH URLs (using rhoai_pairs and odh_by_var), which causes RHOAI commits to be recorded from ODH; change the behavior to fail-fast when a rhoai entry is DUMMY_IMAGE: for each (var, raw) in resolve_rhoai_urls, if raw.strip().lower() == DUMMY_IMAGE log an error including the variable name and exit (do not substitute odh_by_var[var] or append it to out); only allow non-dummy rhoai values to be appended to out so RHOAI SHAs come from real RHOAI image sources.
53-54:⚠️ Potential issue | 🔴 CriticalCWE-22:
--odh-dir/--rhoai-dircan escape the checkout and overwrite files outside the repo.Severity: critical. A caller can pass an absolute path or
../../..., and Line 54 will then readparams-latest.envand writecommit-latest.envoutsidePROJECT_ROOT. Canonicalize the candidate first and reject anything that does not stay under the repository root.Proposed fix
def resolve_manifest_dir(raw: pathlib.Path) -> pathlib.Path: - return raw if raw.is_absolute() else (PROJECT_ROOT / raw).resolve() + project_root = PROJECT_ROOT.resolve() + candidate = raw if raw.is_absolute() else project_root / raw + resolved = candidate.resolve(strict=False) + try: + resolved.relative_to(project_root) + except ValueError: + log.error("Manifest directory must stay within the repository", path=str(resolved)) + sys.exit(1) + return resolvedAs per coding guidelines, "Validate file paths (prevent path traversal)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` around lines 53 - 54, The resolve_manifest_dir function currently allows absolute or traversal paths to escape PROJECT_ROOT; change it to canonicalize the candidate path after joining (use .resolve()) and then verify it is inside PROJECT_ROOT (e.g., using Path.is_relative_to or try: candidate.relative_to(PROJECT_ROOT) except ValueError) and raise an exception or reject the input if the resolved path is not under PROJECT_ROOT; update any callers that use resolve_manifest_dir (e.g., where params-latest.env / commit-latest.env are read/written) to handle the rejection instead of proceeding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 68-86: The function resolve_rhoai_urls currently replaces RHOAI
DUMMY_IMAGE entries with ODH URLs (using rhoai_pairs and odh_by_var), which
causes RHOAI commits to be recorded from ODH; change the behavior to fail-fast
when a rhoai entry is DUMMY_IMAGE: for each (var, raw) in resolve_rhoai_urls, if
raw.strip().lower() == DUMMY_IMAGE log an error including the variable name and
exit (do not substitute odh_by_var[var] or append it to out); only allow
non-dummy rhoai values to be appended to out so RHOAI SHAs come from real RHOAI
image sources.
- Around line 53-54: The resolve_manifest_dir function currently allows absolute
or traversal paths to escape PROJECT_ROOT; change it to canonicalize the
candidate path after joining (use .resolve()) and then verify it is inside
PROJECT_ROOT (e.g., using Path.is_relative_to or try:
candidate.relative_to(PROJECT_ROOT) except ValueError) and raise an exception or
reject the input if the resolved path is not under PROJECT_ROOT; update any
callers that use resolve_manifest_dir (e.g., where params-latest.env /
commit-latest.env are read/written) to handle the rejection instead of
proceeding.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 9e646073-9504-42ec-85a4-e81980939766
📒 Files selected for processing (4)
.github/workflows/update-commit-latest-env.yamlmanifests/odh/base/commit-latest.envmanifests/rhoai/base/commit-latest.envscripts/update-commit-latest-env.py
✅ Files skipped from review due to trivial changes (1)
- manifests/rhoai/base/commit-latest.env
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/update-commit-latest-env.yaml
- manifests/odh/base/commit-latest.env
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/update-commit-latest-env.py (1)
168-180:⚠️ Potential issue | 🟡 MinorInconsistent error handling:
_load_env_dictsilently discards malformed lines.
_parse_env_lines(line 155) correctly raisesValueErroron malformed rows, but_load_env_dict(line 176-177) silently skips them. A typo incommit.envbecomes silent data loss when the file is re-written.Proposed fix
def _load_env_dict(path: pathlib.Path) -> dict[str, str]: d: dict[str, str] = {} if not path.is_file(): return d - for line in path.read_text().splitlines(): + for lineno, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1): - line = line.strip() + line = raw_line.strip() if not line or line.startswith("#"): continue if "=" not in line: - continue + raise ValueError(f"{path}:{lineno}: expected KEY=VALUE") k, _, v = line.partition("=") d[k.strip()] = v.strip() return d🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` around lines 168 - 180, The helper _load_env_dict currently swallows malformed env lines (it skips lines without "=") causing silent data loss; change _load_env_dict to mirror _parse_env_lines by validating each non-empty, non-comment line and raising a ValueError for malformed rows (include the offending line and path or line number in the message) instead of silently skipping, so callers rewriting commit.env will fail fast on typos; locate and update the logic inside the _load_env_dict function to perform the same validation behavior as _parse_env_lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 257-259: The assert that checks vcs is not None in the loop over
pairs/results should be replaced with an explicit conditional that raises or
logs an error and skips or aborts consistently (like the check in
sync_commit_latest); locate the loop using variables pairs, results, commit_key,
vcs and replace the assert vcs is not None with an if not vcs: ... block that
either raises a RuntimeError (including commit_key and context) or handles the
missing value and then sets existing[commit_key] = vcs[:7] only when vcs is
present.
---
Duplicate comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 168-180: The helper _load_env_dict currently swallows malformed
env lines (it skips lines without "=") causing silent data loss; change
_load_env_dict to mirror _parse_env_lines by validating each non-empty,
non-comment line and raising a ValueError for malformed rows (include the
offending line and path or line number in the message) instead of silently
skipping, so callers rewriting commit.env will fail fast on typos; locate and
update the logic inside the _load_env_dict function to perform the same
validation behavior as _parse_env_lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 2d5a62fe-e8fc-40a4-aab8-2ea64d43fd05
📒 Files selected for processing (4)
.github/workflows/update-commit-latest-env.yamlmanifests/odh/base/commit-latest.envmanifests/rhoai/base/commit-latest.envscripts/update-commit-latest-env.py
✅ Files skipped from review due to trivial changes (2)
- manifests/rhoai/base/commit-latest.env
- manifests/odh/base/commit-latest.env
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/update-commit-latest-env.yaml
8e18e70 to
cc1c7bf
Compare
cc1c7bf to
b837269
Compare
b837269 to
b31a687
Compare
b31a687 to
e8af878
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
scripts/update-commit-latest-env.py (2)
174-185:⚠️ Potential issue | 🟠 MajorFail fast on malformed
commit.envrows.
_load_env_dict()still swallows non-comment lines without=, then_write_env()rewrites the file without them. A typo incommit.envbecomes silent data loss on the next run. Mirror_parse_env_lines()here.Proposed fix
def _load_env_dict(path: pathlib.Path) -> dict[str, str]: d: dict[str, str] = {} if not path.is_file(): return d - for line in path.read_text().splitlines(): - line = line.strip() + for lineno, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1): + line = raw_line.strip() if not line or line.startswith("#"): continue if "=" not in line: - continue + raise ValueError(f"_load_env_dict: {path}:{lineno}: expected KEY=VALUE, got {raw_line!r}") k, _, v = line.partition("=") d[k.strip()] = v.strip() return dAs per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` around lines 174 - 185, The _load_env_dict function currently ignores non-comment lines missing an "=" which silently drops malformed rows; update _load_env_dict to fail fast by raising an explicit ValueError (including the offending line text and the provided path) when a non-empty, non-comment line does not contain "=", mirroring the behavior of _parse_env_lines; keep existing trimming and comment/blank skipping logic and ensure callers like _write_env will no longer rewrite away malformed entries silently.
189-212:⚠️ Potential issue | 🟠 Major
--allcannot succeed forrhoaiwith placeholder URLs.This function assumes every
params-latest.envvalue is directly inspectable. With the workflow now always running--all, therhoaibranch will hit checked-indummyplaceholders and callskopeo inspect docker://dummy, then returnFalsebeforecommit.envis refreshed.rhoaineeds a CI-time source of real image refs, or an explicit skip/fail-fast path before inspection.As per coding guidelines, "REVIEW PRIORITIES: 2. Architectural issues and anti-patterns".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/update-commit-latest-env.py` around lines 189 - 212, In sync_commit_latest, detect and handle placeholder image URLs before calling _inspect_pairs: after building pairs from params_latest (using _PARAM_KEY_RE), filter out entries whose url is a placeholder (e.g., "dummy" or known placeholder pattern used for rhoai), log a message indicating those keys were skipped, and if that filtering leaves no remaining inspectable pairs return True (or otherwise short-circuit appropriately) so rhoai's checked-in placeholders don't trigger skopeo inspection and fail the run; otherwise call _inspect_pairs only with the filtered list and proceed as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 322-324: Remove the unreachable
sys.exit(asyncio.run(main_async(variants))) call at module bottom: main()
already calls sys.exit() and variants is local to main, so delete the duplicate
asyncio invocation. Keep only the existing if __name__ == "__main__": main()
entry; do not reintroduce module-level references to variants or call main_async
from top-level. Ensure no other code depends on that removed call.
---
Duplicate comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 174-185: The _load_env_dict function currently ignores non-comment
lines missing an "=" which silently drops malformed rows; update _load_env_dict
to fail fast by raising an explicit ValueError (including the offending line
text and the provided path) when a non-empty, non-comment line does not contain
"=", mirroring the behavior of _parse_env_lines; keep existing trimming and
comment/blank skipping logic and ensure callers like _write_env will no longer
rewrite away malformed entries silently.
- Around line 189-212: In sync_commit_latest, detect and handle placeholder
image URLs before calling _inspect_pairs: after building pairs from
params_latest (using _PARAM_KEY_RE), filter out entries whose url is a
placeholder (e.g., "dummy" or known placeholder pattern used for rhoai), log a
message indicating those keys were skipped, and if that filtering leaves no
remaining inspectable pairs return True (or otherwise short-circuit
appropriately) so rhoai's checked-in placeholders don't trigger skopeo
inspection and fail the run; otherwise call _inspect_pairs only with the
filtered list and proceed as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 6561ae5c-b5c2-4d4f-b670-df70377cf494
📒 Files selected for processing (2)
.github/workflows/update-commit-latest-env.yamlscripts/update-commit-latest-env.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/update-commit-latest-env.yaml
scripts/update-commit-latest-env.py
Outdated
| if __name__ == "__main__": | ||
| main() | ||
| sys.exit(asyncio.run(main_async(variants))) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import ast
import pathlib
path = next(pathlib.Path(".").rglob("update-commit-latest-env.py"))
tree = ast.parse(path.read_text(encoding="utf-8"))
class Visitor(ast.NodeVisitor):
def __init__(self):
self.scope = []
def visit_FunctionDef(self, node):
self.scope.append(f"function {node.name}")
self.generic_visit(node)
self.scope.pop()
visit_AsyncFunctionDef = visit_FunctionDef
def visit_Name(self, node):
if node.id == "variants":
scope = self.scope[-1] if self.scope else "module scope"
print(f"Line {node.lineno}: {type(node.ctx).__name__} in {scope}")
Visitor().visit(tree)
PYRepository: opendatahub-io/notebooks
Length of output: 267
🏁 Script executed:
#!/bin/bash
# Read the relevant section of the file to understand the context
sed -n '310,330p' scripts/update-commit-latest-env.pyRepository: opendatahub-io/notebooks
Length of output: 415
Remove the unreachable sys.exit(asyncio.run(main_async(variants))) on line 324.
The main() function exits via sys.exit() on line 319 and never returns. The duplicate call on line 324 is unreachable dead code, and if sys.exit() is intercepted in tests, it raises NameError because variants is only defined locally within main() and does not exist in module scope.
Proposed fix
if __name__ == "__main__":
main()
- sys.exit(asyncio.run(main_async(variants)))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if __name__ == "__main__": | |
| main() | |
| sys.exit(asyncio.run(main_async(variants))) | |
| if __name__ == "__main__": | |
| main() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/update-commit-latest-env.py` around lines 322 - 324, Remove the
unreachable sys.exit(asyncio.run(main_async(variants))) call at module bottom:
main() already calls sys.exit() and variants is local to main, so delete the
duplicate asyncio invocation. Keep only the existing if __name__ == "__main__":
main() entry; do not reintroduce module-level references to variants or call
main_async from top-level. Ensure no other code depends on that removed call.
| return True | ||
|
|
||
|
|
||
| async def sync_commit_released(variant: str) -> bool: |
There was a problem hiding this comment.
@coderabbitai is this just dupicated async def sync_commit_latest(variant: str) -> bool?
There was a problem hiding this comment.
@jiridanek — Yes, the docstring body of _inspect_pairs is simply duplicated — the same two sentences are pasted twice verbatim:
async def _inspect_pairs(pairs: list[tuple[str, str]]) -> list[tuple[str, str | None]]:
"""
Orchestrate concurrent inspection of multiple images (param key → digest URL pairs).
Limits parallelism with a semaphore (same default concurrency as the historical main-branch script).
Orchestrate concurrent inspection of multiple images (param key → digest URL pairs). # ← duplicate
Limits parallelism with a semaphore (same default concurrency as the historical main-branch script). # ← duplicate
"""This is a copy-paste artifact; the function itself is not a duplicate of sync_commit_latest — they do different things (_inspect_pairs runs skopeo concurrently, sync_commit_latest orchestrates reading/writing env files).
While reviewing the file I also noticed a separate bug in the __main__ guard at the bottom of the file:
if __name__ == "__main__":
main() # calls sys.exit() internally → process ends here
sys.exit(asyncio.run(main_async(variants))) # dead code; `variants` is also not defined in this scope → NameErrorThe second sys.exit(...) line is unreachable dead code and would raise a NameError on variants if it were ever reached. It should simply be removed — main() already handles everything including sys.exit.
| if any(commit_hash is None for variable, commit_hash in results): | ||
| log.error("Failed to get commit hash for some images. Quitting, please try again to try again, like.") | ||
| sys.exit(1) | ||
| async def sync_commit_latest(variant: str) -> bool: |
There was a problem hiding this comment.
@coderabbitai are sync_commit_latest and sync_commit_released duplicates of one another?
There was a problem hiding this comment.
@jiridanek — they are not duplicates, but they are structurally parallel by design. Here's how they differ:
sync_commit_latest |
sync_commit_released |
|
|---|---|---|
| Input file | params-latest.env |
params.env |
| Output file | commit-latest.env |
commit.env |
| Key filter | -n suffix only |
Non--n suffixes (e.g. -1-3) |
| Missing input | Hard error → False |
Soft skip → True |
| Write strategy | Fresh overwrite | Merges into existing entries via _load_env_dict |
| No applicable keys | N/A | Skips silently → True |
sync_commit_latest handles the nightly/latest image tier (-n tags), while sync_commit_released handles versioned/released images (e.g. -1-3 digests). The merge behavior in sync_commit_released is deliberate — it preserves existing commit.env keys for pipeline images that only appear in params-latest.env.
Two unrelated issues worth flagging while we're here:
-
Duplicated docstring in
_inspect_pairs— the last two sentences in its docstring repeat the first two verbatim (visible in the diff). -
Unreachable + broken code at the bottom of the file:
if __name__ == "__main__": main() sys.exit(asyncio.run(main_async(variants))) # unreachable; `variants` not in scope
The second line can never execute (it follows
main()which callssys.exit), andvariantsis not defined at module scope — this would raise aNameErrorif somehow reached.
There was a problem hiding this comment.
@mtchoum1 what's the reason for handling params and params-latest differently, like this?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| - 'manifests/base/params-latest.env' | ||
| - 'manifests/odh/base/params-latest.env' | ||
| - 'manifests/odh/base/params.env' | ||
| - 'manifests/rhoai/base/params-latest.env' |
There was a problem hiding this comment.
@mtchoum1 as we discussed with @coderabbitai in #3304 (comment), what can the script possibly usefully do with rhoai/base/params-latest.env? It only contains dummy values
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
cat manifests/rhoai/base/params-latest.env
echo "---"
cat manifests/odh/base/params-latest.env 2>/dev/null | head -30
Repository: opendatahub-io/notebooks
Length of output: 3658
🏁 Script executed:
#!/bin/bash
echo "=== rhoai/base/params.env ==="
cat manifests/rhoai/base/params.env 2>/dev/null | head -30
echo "=== rhoai/base/commit-latest.env (if exists) ==="
cat manifests/rhoai/base/commit-latest.env 2>/dev/null | head -10
echo "=== rhoai/base/commit.env (if exists) ==="
cat manifests/rhoai/base/commit.env 2>/dev/null | head -10
Repository: opendatahub-io/notebooks
Length of output: 6360
@jiridanek, the short answer is: nothing useful. Here's what the data shows:
manifests/rhoai/base/params-latest.envhas all 18 entries set todummy— there are no real image URLs.- When
sync_commit_latest("rhoai")runs, it will attemptskopeo inspect dummyfor every entry, fail on each, and abort withFalse. Therhoai/base/commit-latest.envwill never get updated by this workflow.
The reason dummy is there is structurally sound: RHOAI doesn't publish nightly/latest images — it only ships versioned releases via registry.redhat.io (which are tracked in params.env, not params-latest.env). So the -n / "latest" concept simply doesn't apply to RHOAI.
Notably though, manifests/rhoai/base/commit-latest.env does exist and contains real commit SHAs (e.g. d3137ca, bff12e2) that match ODH commits — because RHOAI images are built from the same source as ODH. So RHOAI's commit-latest.env is meant to track the ODH source commits.
The fix should be Option B from the earlier discussion: the script should derive RHOAI's commit-latest.env from ODH's params-latest.env (filtered to the keys that exist in rhoai/base/params-latest.env), rather than trying to inspect dummy RHOAI URLs. The rhoai/base/params-latest.env file essentially acts as a key manifest declaring which images are relevant to RHOAI — the actual URLs to inspect are in ODH.
@mtchoum1, does that match the intent? If so, the script's sync_commit_latest for the rhoai variant needs to be rerouted to read image URLs from odh/base/params-latest.env using the keys from rhoai/base/params-latest.env, rather than trying to resolve the dummy values directly.
There was a problem hiding this comment.
because RHOAI images are built from the same source as ODH. So RHOAI's commit-latest.env is meant to track the ODH source commits
@coderabbitai nope, RHOAI images are built from red-hat-data-services/notebooks#rhoai-3.x branch; has different commit hashes
There was a problem hiding this comment.
@jiridanek, thank you for the correction — I'll update my analysis.
That changes the picture significantly. Since RHOAI images are built from red-hat-data-services/notebooks#rhoai-3.x with different commit hashes, Option B is also wrong: using ODH's params-latest.env URLs would yield ODH commit SHAs, not RHOAI ones. So both options from the earlier discussion are invalid for rhoai/base/commit-latest.env.
The core issue is that there are no publicly accessible nightly RHOAI image URLs — rhoai/base/params-latest.env contains only dummy values because RHOAI ships exclusively through versioned releases on registry.redhat.io. The "latest" concept simply doesn't exist for RHOAI's public-facing images.
This means:
sync_commit_latest("rhoai")cannot work in any meaningful way — there's nothing to inspect.rhoai/base/commit-latest.envcannot be automatically maintained by this workflow. It should either be skipped entirely or manually maintained.- The workflow's path trigger on
manifests/rhoai/base/params-latest.envis a no-op since that file will always containdummy.
What could still be useful: the script updating rhoai/base/commit.env by inspecting the versioned images from rhoai/base/params.env (which has real registry.redhat.io URLs) — but that requires registry authentication via the pull-secret for those private images.
@mtchoum1, given this, the rhoai commit-latest.env path in the script should likely be guarded with an explicit skip, and the path trigger for manifests/rhoai/base/params-latest.env in the workflow removed. Does that match the intended design?
📝 Learnings were identified and not saved (knowledge base disabled). Enable
| @@ -27,21 +32,54 @@ jobs: | |||
| token: ${{ secrets.GITHUB_TOKEN }} | |||
| persist-credentials: true | |||
There was a problem hiding this comment.
@coderabbitai it's probably a bit of a security theater, but this can be written without having to use persist-credentials: true. See #3202
Up to you if this is worth implementing. The credentials are still in memory and malicious code, if it ever gets to run in our action, can introspect runner's memory. So maybe it's useless.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
gh pr view 3202 --repo opendatahub-io/notebooks --json body,files | jq -r '.body, "\n---FILES---", (.files[].path)'
Repository: opendatahub-io/notebooks
Length of output: 2348
🏁 Script executed:
gh pr diff 3202 --repo opendatahub-io/notebooks | grep -A 30 -B 5 "persist-credentials\|x-access-token\|push\|token"
Repository: opendatahub-io/notebooks
Length of output: 2405
@jiridanek — you're right that it's a limited mitigation, but the concrete benefit is that the token is never written to .git/config on disk, so code executing in the workspace (e.g. a compromised build tool, a malicious dep pulled during uv run) cannot read it via a simple file read. Memory introspection is a higher bar. Worth doing for consistency with the pattern established in #3202.
The change is small — apply the same x-access-token push pattern:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false # token never written to .git/config
# no `token:` here — supplied only at push time below - name: Commit and push changes
run: |
git config --global user.name 'github-actions[bot]'
git config --global user.email 'github-actions[bot]@users.noreply.github.com'
changed=0
for f in \
manifests/odh/base/commit-latest.env \
manifests/odh/base/commit.env \
manifests/rhoai/base/commit-latest.env \
manifests/rhoai/base/commit.env; do
if ! git diff --quiet -- "$f" 2>/dev/null; then
changed=1
git add -- "$f"
fi
done
if [[ "$changed" -eq 1 ]]; then
git commit -m "ci: refresh commit SHAs from image vcs-ref labels"
# Supply token only at push time — never persisted to .git/config
git push "https://x-access-token:${GITHUB_TOKEN}@github.com/${{ github.repository }}.git" HEAD
else
echo "No effective changes in commit env files, skipping commit."
fi
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}The job-level permissions: contents: write stays as-is — it governs what GITHUB_TOKEN is allowed to do, which is unchanged. This is the same pattern used in #3202 and aligns with the GitHub Security Lab guidance on preventing pwn requests.
There was a problem hiding this comment.
checkout@v6, release notes:
Improved credential security: persist-credentials now stores credentials in a separate file under $RUNNER_TEMP instead of directly in .git/config
There was a problem hiding this comment.
@jiridanek — you're right, and I stand corrected. persist-credentials in v6 now stores credentials in a separate file under $RUNNER_TEMP instead of directly in .git/config, which directly addresses the concern I raised. The token is no longer sitting in a workspace-accessible file where a compromised build tool could trivially read it.
Since the workflow already pins actions/checkout at the SHA corresponding to v6.0.2, the persist-credentials: true + git push pattern here is fine as-is. The x-access-token push workaround I suggested was the right advice for v4/v5, but is redundant with v6's new storage location. Withdrawing that suggestion.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
e8af878 to
b883987
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 208-219: The loop collecting pairs from _parse_env_lines is
allowed to produce an empty pairs list which then leads to calling
_inspect_pairs and ultimately overwriting commit-latest.env with zero rows;
change the flow to treat empty pairs as a hard failure: after the loop that
fills pairs (using _PARAM_KEY_RE and suffix "-n"), check if pairs is empty and
raise an exception or exit with a nonzero status (e.g., raise RuntimeError or
call sys.exit(1)) with a clear message indicating no valid -n keys found, and do
not call _inspect_pairs or proceed to write output; this preserves the existing
file on bad input and mirrors sync_commit_released()'s conservative behavior.
- Around line 124-129: Reject non-SHA vcs-ref values by validating the vcs_ref
variable before accepting it: ensure vcs_ref matches the expected commit-hash
pattern (e.g. a 40-character hex SHA) and if it doesn't, log a warning/error and
treat it as missing (set vcs_ref to None or raise) so malformed values like
"main" or truncated SHAs are not written to commit*.env; update the branch
around the existing vcs_ref check (the block that logs "Successfully found
vcs-ref" / "vcs-ref label not found" and returns image_url, vcs_ref) to perform
this validation and only log success/return a vcs_ref when it passes the SHA
pattern.
- Around line 91-102: The subprocess call created in the block using semaphore
and process = await asyncio.create_subprocess_exec (...) must be wrapped with a
timeout and proper cleanup: call asyncio.wait_for on process.communicate() with
a sensible timeout, catch asyncio.TimeoutError, log the timeout (include
image_url), then kill the process (process.kill() or process.terminate()), await
process.wait() to reap it, and ensure you still capture/handle stdout/stderr or
set returncode appropriately before returning; keep the semaphore context (async
with semaphore) intact and add logging around the timeout and cleanup to avoid
permanently burning a semaphore slot.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 77292ed0-a53e-48e3-bff9-06c4798ac8d3
📒 Files selected for processing (2)
.github/workflows/update-commit-latest-env.yamlscripts/update-commit-latest-env.py
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/update-commit-latest-env.yaml
| try: | ||
| async with semaphore: | ||
| log.info(f"Semaphore acquired, starting skopeo inspect for: {image_url}") | ||
| log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url) | ||
| # Create an asynchronous subprocess | ||
| process = await asyncio.create_subprocess_exec( | ||
| *command, | ||
| stdout=asyncio.subprocess.PIPE, | ||
| stderr=asyncio.subprocess.PIPE | ||
| stderr=asyncio.subprocess.PIPE, | ||
| ) | ||
| # Wait for the command to complete and capture output | ||
| stdout, stderr = await process.communicate() | ||
| returncode = process.returncode |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
sed -n '88,108p' scripts/update-commit-latest-env.py
printf '\n--- timeout-related matches ---\n'
rg -n "communicate\(|wait_for\(|timeout-minutes" scripts/update-commit-latest-env.py .github/workflows/update-commit-latest-env.yamlRepository: opendatahub-io/notebooks
Length of output: 1039
Add a timeout and cleanup around the skopeo subprocess to prevent indefinite hangs.
The call to await process.communicate() (line 101) has no timeout. If the skopeo inspect command stalls on a registry or network failure, that semaphore slot is burned permanently, wedging the entire workflow.
Fix
async with semaphore:
log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url)
- process = await asyncio.create_subprocess_exec(
+ process = await asyncio.create_subprocess_exec(
*command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
- # Wait for the command to complete and capture output
- stdout, stderr = await process.communicate()
+ try:
+ stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=120)
+ except asyncio.TimeoutError:
+ process.kill()
+ stdout, stderr = await process.communicate()
+ log.error("Skopeo inspect timed out", image_url=image_url, timeout_seconds=120)
+ return image_url, None
returncode = process.returncode📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| async with semaphore: | |
| log.info(f"Semaphore acquired, starting skopeo inspect for: {image_url}") | |
| log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url) | |
| # Create an asynchronous subprocess | |
| process = await asyncio.create_subprocess_exec( | |
| *command, | |
| stdout=asyncio.subprocess.PIPE, | |
| stderr=asyncio.subprocess.PIPE | |
| stderr=asyncio.subprocess.PIPE, | |
| ) | |
| # Wait for the command to complete and capture output | |
| stdout, stderr = await process.communicate() | |
| returncode = process.returncode | |
| try: | |
| async with semaphore: | |
| log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url) | |
| # Create an asynchronous subprocess | |
| process = await asyncio.create_subprocess_exec( | |
| *command, | |
| stdout=asyncio.subprocess.PIPE, | |
| stderr=asyncio.subprocess.PIPE, | |
| ) | |
| try: | |
| stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=120) | |
| except asyncio.TimeoutError: | |
| process.kill() | |
| stdout, stderr = await process.communicate() | |
| log.error("Skopeo inspect timed out", image_url=image_url, timeout_seconds=120) | |
| return image_url, None | |
| returncode = process.returncode |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/update-commit-latest-env.py` around lines 91 - 102, The subprocess
call created in the block using semaphore and process = await
asyncio.create_subprocess_exec (...) must be wrapped with a timeout and proper
cleanup: call asyncio.wait_for on process.communicate() with a sensible timeout,
catch asyncio.TimeoutError, log the timeout (include image_url), then kill the
process (process.kill() or process.terminate()), await process.wait() to reap
it, and ensure you still capture/handle stdout/stderr or set returncode
appropriately before returning; keep the semaphore context (async with
semaphore) intact and add logging around the timeout and cleanup to avoid
permanently burning a semaphore slot.
| if vcs_ref: | ||
| log.info(f"Successfully found 'vcs-ref' for {image_url}: {vcs_ref}") | ||
| log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref) | ||
| else: | ||
| log.warning(f"'vcs-ref' label not found for {image_url}.") | ||
| log.warning("vcs-ref label not found", image_url=image_url) | ||
|
|
||
| return image_url, vcs_ref |
There was a problem hiding this comment.
Reject malformed vcs-ref labels before generating commit files.
Any non-empty label is accepted here. A value like main, unknown, or a truncated SHA would be silently written into commit*.env, even though downstream replacements treat these as commit hashes.
Proposed fix
- if vcs_ref:
- log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref)
- else:
- log.warning("vcs-ref label not found", image_url=image_url)
+ if vcs_ref and re.fullmatch(r"[0-9a-fA-F]{7,40}", vcs_ref):
+ log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref)
+ else:
+ log.warning("Invalid or missing vcs-ref label", image_url=image_url, vcs_ref=vcs_ref)
+ return image_url, NoneAs per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if vcs_ref: | |
| log.info(f"Successfully found 'vcs-ref' for {image_url}: {vcs_ref}") | |
| log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref) | |
| else: | |
| log.warning(f"'vcs-ref' label not found for {image_url}.") | |
| log.warning("vcs-ref label not found", image_url=image_url) | |
| return image_url, vcs_ref | |
| if vcs_ref and re.fullmatch(r"[0-9a-fA-F]{7,40}", vcs_ref): | |
| log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref) | |
| else: | |
| log.warning("Invalid or missing vcs-ref label", image_url=image_url, vcs_ref=vcs_ref) | |
| return image_url, None | |
| return image_url, vcs_ref |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/update-commit-latest-env.py` around lines 124 - 129, Reject non-SHA
vcs-ref values by validating the vcs_ref variable before accepting it: ensure
vcs_ref matches the expected commit-hash pattern (e.g. a 40-character hex SHA)
and if it doesn't, log a warning/error and treat it as missing (set vcs_ref to
None or raise) so malformed values like "main" or truncated SHAs are not written
to commit*.env; update the branch around the existing vcs_ref check (the block
that logs "Successfully found vcs-ref" / "vcs-ref label not found" and returns
image_url, vcs_ref) to perform this validation and only log success/return a
vcs_ref when it passes the SHA pattern.
| pairs: list[tuple[str, str]] = [] | ||
| for key, url in _parse_env_lines(params_latest): | ||
| m = _PARAM_KEY_RE.match(key) | ||
| if not m: | ||
| log.warning("Skipping param key (unrecognized pattern)", key=key) | ||
| continue | ||
| if m.group(2) != "-n": | ||
| log.warning("Expected only -n keys in params-latest.env", key=key) | ||
| continue | ||
| pairs.append((key, url)) | ||
|
|
||
| results = await _inspect_pairs([(k, u) for k, u in pairs]) |
There was a problem hiding this comment.
Fail instead of overwriting commit-latest.env with zero rows.
If every row in params-latest.env is skipped by the regex/suffix filters, Line 219 still proceeds and Lines 236-237 replace the existing file with an empty one. sync_commit_released() already preserves the previous output on empty input; this path should fail closed too.
Proposed fix
for key, url in _parse_env_lines(params_latest):
m = _PARAM_KEY_RE.match(key)
if not m:
log.warning("Skipping param key (unrecognized pattern)", key=key)
continue
if m.group(2) != "-n":
log.warning("Expected only -n keys in params-latest.env", key=key)
continue
pairs.append((key, url))
- results = await _inspect_pairs([(k, u) for k, u in pairs])
+ if not pairs:
+ log.error("No valid -n entries found in params-latest.env", path=str(params_latest))
+ return False
+
+ results = await _inspect_pairs(pairs)As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pairs: list[tuple[str, str]] = [] | |
| for key, url in _parse_env_lines(params_latest): | |
| m = _PARAM_KEY_RE.match(key) | |
| if not m: | |
| log.warning("Skipping param key (unrecognized pattern)", key=key) | |
| continue | |
| if m.group(2) != "-n": | |
| log.warning("Expected only -n keys in params-latest.env", key=key) | |
| continue | |
| pairs.append((key, url)) | |
| results = await _inspect_pairs([(k, u) for k, u in pairs]) | |
| pairs: list[tuple[str, str]] = [] | |
| for key, url in _parse_env_lines(params_latest): | |
| m = _PARAM_KEY_RE.match(key) | |
| if not m: | |
| log.warning("Skipping param key (unrecognized pattern)", key=key) | |
| continue | |
| if m.group(2) != "-n": | |
| log.warning("Expected only -n keys in params-latest.env", key=key) | |
| continue | |
| pairs.append((key, url)) | |
| if not pairs: | |
| log.error("No valid -n entries found in params-latest.env", path=str(params_latest)) | |
| return False | |
| results = await _inspect_pairs(pairs) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/update-commit-latest-env.py` around lines 208 - 219, The loop
collecting pairs from _parse_env_lines is allowed to produce an empty pairs list
which then leads to calling _inspect_pairs and ultimately overwriting
commit-latest.env with zero rows; change the flow to treat empty pairs as a hard
failure: after the loop that fills pairs (using _PARAM_KEY_RE and suffix "-n"),
check if pairs is empty and raise an exception or exit with a nonzero status
(e.g., raise RuntimeError or call sys.exit(1)) with a clear message indicating
no valid -n keys found, and do not call _inspect_pairs or proceed to write
output; this preserves the existing file on bad input and mirrors
sync_commit_released()'s conservative behavior.
Description
changed scripts/update-commit-latest-env.py to look at both odh and rhoai folders in manfest for params-latest
changed .github/workflows/update-commit-latest-env.yaml' to utilize the update of scripts/update-commit-latest-env.py`
How Has This Been Tested?
Self checklist (all need to be checked):
make test(gmakeon macOS) before asking for reviewDockerfile.konfluxfiles should be done inodh/notebooksand automatically synced torhds/notebooks. For Konflux-specific changes, modifyDockerfile.konfluxfiles directly inrhds/notebooksas these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit