fix: _copy_tree respects .dockerignore patterns to avoid OSError on long paths#3166
fix: _copy_tree respects .dockerignore patterns to avoid OSError on long paths#3166dreamcreated wants to merge 1 commit intoopendatahub-io:mainfrom
Conversation
|
Hi @dreamcreated. Thanks for your PR. I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdded .dockerignore-aware directory pruning to the sandbox copy logic. New helpers Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
scripts/sandbox_tests.py (2)
64-119: Missing test coverage for edge cases in pattern parsing.Consider adding tests for:
- Malformed patterns like
**/(empty name after stripping)- Patterns with trailing whitespace before newline
.dockerignorewith Windows-style line endings (\r\n)These edge cases could surface in real repositories.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/sandbox_tests.py` around lines 64 - 119, Add unit tests to cover edge cases in .dockerignore parsing: create tests (e.g., in TestLoadDockerignore) that write malformed patterns like "**/\n" and patterns with trailing whitespace ("bin/ \n") and Windows CRLF endings ("\r\n") then assert _load_dockerignore(root) strips whitespace, ignores empty entries, and normalizes line endings so returned list contains only valid patterns like "bin/"; similarly add tests in TestIgnoredDirNames that write "**/\n", "bin/ \r\n" and CRLF files and assert _ignored_dir_names(root) does not include empty names, correctly includes "bin" for recursive patterns, and ignores root-relative or nested-path patterns as before. Ensure tests reference _load_dockerignore and _ignored_dir_names so parsing logic is exercised for these edge cases.
122-164: No symlink traversal test for_copy_treewithdir_ignore_names.The implementation uses
followlinks=True. A test confirming that ignored directory filtering works correctly when symlinks are involved (e.g., a symlink namednode_modulespointing elsewhere) would strengthen confidence.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/sandbox_tests.py` around lines 122 - 164, Add a test to cover symlink traversal with dir_ignore_names: create a real directory outside the src (e.g., outside tmp_path/src) containing files, then inside src create a symlink named "node_modules" (or ".pnpm-store") pointing to that external directory and call _copy_tree with dir_ignore_names including that symlink name; assert that the symlinked directory and its contents are not copied into dst. Place this new test alongside TestCopyTreeWithIgnore (e.g., test_symlink_named_ignored_dir_not_copied) to ensure _copy_tree's followlinks=True does not cause ignored names to be traversed and copied.scripts/sandbox.py (2)
119-128: Negation patterns (!pattern) are silently ignored.
.dockerignoresupports negation (e.g.,!**/important/). The current implementation silently drops these. If a repo uses negation to un-ignore specific directories, they'll still be excluded here.This is documented as intentional, but consider logging a warning when negation patterns are encountered so users aren't surprised.
Suggested change
def _ignored_dir_names(root: pathlib.Path) -> set[str]: ... ignored: set[str] = set() for pattern in _load_dockerignore(root): + if pattern.startswith("!"): + log.debug("negation pattern ignored (unsupported)", pattern=pattern) + continue # Only accept "**/<name>/" – i.e., starts with "**/" and ends with "/" # with no additional "/" in between. if not pattern.startswith("**/") or not pattern.endswith("/"): continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/sandbox.py` around lines 119 - 128, The loop that builds `ignored` from `_load_dockerignore` silently skips negation patterns like `!**/important/`; update the loop in the same function (the block that iterates `for pattern in _load_dockerignore(root):`) to detect patterns that start with "!" and emit a warning including the raw pattern (e.g., via logging.warning or the module logger) before continuing, so users are informed that negation patterns are intentionally ignored; keep the existing behavior of not treating negations as un-ignore.
103-106: Variable shadowing reduces readability.
lineis reused for both the raw and stripped value. Minor, but a distinct name likestrippedclarifies intent.Suggested change
for line in dockerignore.read_text().splitlines(): - line = line.strip() - if line and not line.startswith("#"): - lines.append(line) + stripped = line.strip() + if stripped and not stripped.startswith("#"): + lines.append(stripped)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/sandbox.py` around lines 103 - 106, The loop that reads dockerignore currently reuses the variable name `line` for both the raw and stripped value, reducing readability; change the loop to iterate with a distinct name (e.g., `raw_line`) and assign `stripped = raw_line.strip()`, then use `stripped` in the conditional and `lines.append(stripped)` so the original raw text and the processed value are not shadowed (update occurrences around the for loop that currently references `line` and `lines`).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/sandbox_tests.py`:
- Around line 64-119: Add unit tests to cover edge cases in .dockerignore
parsing: create tests (e.g., in TestLoadDockerignore) that write malformed
patterns like "**/\n" and patterns with trailing whitespace ("bin/ \n") and
Windows CRLF endings ("\r\n") then assert _load_dockerignore(root) strips
whitespace, ignores empty entries, and normalizes line endings so returned list
contains only valid patterns like "bin/"; similarly add tests in
TestIgnoredDirNames that write "**/\n", "bin/ \r\n" and CRLF files and assert
_ignored_dir_names(root) does not include empty names, correctly includes "bin"
for recursive patterns, and ignores root-relative or nested-path patterns as
before. Ensure tests reference _load_dockerignore and _ignored_dir_names so
parsing logic is exercised for these edge cases.
- Around line 122-164: Add a test to cover symlink traversal with
dir_ignore_names: create a real directory outside the src (e.g., outside
tmp_path/src) containing files, then inside src create a symlink named
"node_modules" (or ".pnpm-store") pointing to that external directory and call
_copy_tree with dir_ignore_names including that symlink name; assert that the
symlinked directory and its contents are not copied into dst. Place this new
test alongside TestCopyTreeWithIgnore (e.g.,
test_symlink_named_ignored_dir_not_copied) to ensure _copy_tree's
followlinks=True does not cause ignored names to be traversed and copied.
In `@scripts/sandbox.py`:
- Around line 119-128: The loop that builds `ignored` from `_load_dockerignore`
silently skips negation patterns like `!**/important/`; update the loop in the
same function (the block that iterates `for pattern in
_load_dockerignore(root):`) to detect patterns that start with "!" and emit a
warning including the raw pattern (e.g., via logging.warning or the module
logger) before continuing, so users are informed that negation patterns are
intentionally ignored; keep the existing behavior of not treating negations as
un-ignore.
- Around line 103-106: The loop that reads dockerignore currently reuses the
variable name `line` for both the raw and stripped value, reducing readability;
change the loop to iterate with a distinct name (e.g., `raw_line`) and assign
`stripped = raw_line.strip()`, then use `stripped` in the conditional and
`lines.append(stripped)` so the original raw text and the processed value are
not shadowed (update occurrences around the for loop that currently references
`line` and `lines`).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 7a60ba32-b509-43b5-b8b2-2a76206bc399
📒 Files selected for processing (2)
scripts/sandbox.pyscripts/sandbox_tests.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3166 +/- ##
========================================
- Coverage 3.32% 3.23% -0.10%
========================================
Files 31 31
Lines 3550 3651 +101
Branches 569 576 +7
========================================
Hits 118 118
- Misses 3430 3531 +101
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
/lgtm, it's what the ticket asked for; I'm still a bit unhappy about the amount of code this requires; but I'd be even unhappier bringing in random deps like https://github.com/mherrmann/gitignore_parser, so, ok... I think we'll merge it |
The sandbox script's _copy_tree walked all directories via os.walk, including content-addressed stores like .pnpm-store that have deeply recursive path structures. This caused: OSError: [Errno 63] File name too long Fixes opendatahub-io#3150 Changes: - Add _load_dockerignore(root) to parse .dockerignore lines - Add _ignored_dir_names(root) to extract bare dir names from **/<name>/ patterns (only globstar patterns apply at arbitrary depth; root-relative patterns like ci/ or bin/ are intentionally excluded) - Modify _copy_tree to accept dir_ignore_names: set[str] and prune matching entries from dirnames in-place before os.walk descends - Update setup_sandbox to compute the ignore set once and pass it to all _copy_tree calls - Add TestLoadDockerignore, TestIgnoredDirNames, TestCopyTreeWithIgnore test classes covering the new helpers and the pruning behaviour No new stdlib imports are required.
d619dbd to
79544dc
Compare
There was a problem hiding this comment.
@coderabbitai the path comparison logic is fragile and will cause problems down the read, confirm
There was a problem hiding this comment.
@jiridanek — confirmed, there are several fragility points in the logic:
1. !-negation patterns are silently swallowed (most impactful)
.dockerignore supports negation with !:
**/node_modules/
!**/node_modules/special-keep/
_load_dockerignore retains !**/node_modules/special-keep/ (it's not a comment), but _ignored_dir_names drops it because it doesn't startswith("**/"). The original **/node_modules/ is still added to the ignore set, so special-keep is incorrectly pruned. Negations are silently lost with no warning.
Fix: strip a leading ! before the startswith check, and skip adding the base pattern when a negation for it exists — or at minimum emit a log.warning that negation is unsupported.
2. if dir_ignore_names: — falsy empty set skips pruning
if dir_ignore_names: # empty set() is falsy
dirnames[:] = [...]If _ignored_dir_names returns set() (no matching patterns), the guard is False and pruning is skipped — which is the correct outcome here — but the intent reads as "was a set passed at all?" A cleaner guard is:
if dir_ignore_names is not None:
dirnames[:] = [d for d in dirnames if d not in dir_ignore_names]This also makes the None-vs-empty-set distinction explicit.
3. Magic-number slice pattern[3:-1]
name = pattern[3:-1] # strip leading "**/" and trailing "/"This works today but is brittle. removeprefix / removesuffix is both safer and self-documenting:
name = pattern.removeprefix("**/").removesuffix("/")4. Case-sensitivity on macOS
d not in dir_ignore_names is a plain string equality check. On a case-insensitive filesystem (macOS default), a directory named Node_Modules would not match node_modules in the set and would slip through. This is a minor/theoretical issue for this use case but worth noting.
Summary table:
| Issue | Severity | Correct behavior today? |
|---|---|---|
! negation silently ignored |
Medium | ❌ incorrect pruning possible |
if dir_ignore_names: vs is not None |
Low | ✅ but misleading |
pattern[3:-1] magic slice |
Low | ✅ but fragile to malformed input |
| Case-insensitive FS | Low/Theoretical | ❌ missed pruning on macOS |
Summary
Fixes #3150
The sandbox script's
_copy_treewalked all directories viaos.walk(followlinks=True)without filtering, including content-addressed directory stores like.pnpm-storethat can have deeply recursive path structures. This caused:Changes
_load_dockerignore(root)— parse.dockerignore, return non-comment, non-empty lines. No new dependencies (stdlib only)._ignored_dir_names(root)— extract bare directory names from**/name/patterns only. Root-relative patterns (e.g.ci/,bin/) are intentionally excluded since they only apply at the repo root, not inside arbitrary prerequisite subdirectories._copy_tree— accepts optionaldir_ignore_names: set[str]parameter; prunes matching entries fromdirnamesin-place soos.walknever descends into them.setup_sandbox— computes the ignore set once from.dockerignoreand passes it to all_copy_treecalls.Tests
Added three new test classes:
TestLoadDockerignore— covers missing file, comment skipping, and normal linesTestIgnoredDirNames— covers globstar extraction, exclusion of root-relative patterns, exclusion of nested path patternsTestCopyTreeWithIgnore— coversnode_modules,.pnpm-store, and no-ignore baselineCurrent
.dockerignorepatterns handledNote:
.pnpm-storeis not yet in the root.dockerignore— it was added separately in PR #3142. Once that merges, this fix will also handle it automatically.Summary by CodeRabbit
New Features
Performance
Tests