Skip to content

Commit efd17a4

Browse files
jiridanekclaude
andcommitted
fix: address review findings for hermetic prefetch scripts
- Fix CACHI2_OUT_DIR not used by hermeto-fetch-gomod.sh and download-rpms.sh, which would write deps to wrong directory under namespaced layout - Fix npm download failures silently swallowed: add return 1 on wget failure and check xargs exit code - Fix should_keep_for_arch false positives: parse wheel platform tag instead of naive substring matching (e.g. "any" in "manylinux" was always true) - Fix --arch silently skipped when yq missing in hermeto-fetch-rpm.sh: now fails fast with clear error - Normalize COMPONENT_DIR (strip trailing slash) before hashing to match Makefile's patsubst behavior - Use symlinks instead of cp -r in hermeto-fetch-rpm.sh arch filtering - Remove dead counter variables in download-npm.sh - Update docs to reflect hashed cachi2/output/<hash>/ layout - Fix stale README claim about download-pip-packages.py not being called - Fix missing newline at end of create-requirements-lockfile.sh Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
1 parent 4baeaa0 commit efd17a4

9 files changed

Lines changed: 93 additions & 68 deletions

File tree

docs/learnings/hermetic-build-architecture_.md

Lines changed: 45 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,11 @@ Orchestrates five lockfile generators in sequence:
2121

2222
| Step | Generator | Input | Output |
2323
|------|-----------|-------|--------|
24-
| 1 | `create-artifact-lockfile.py` | `artifacts.in.yaml` | `cachi2/output/deps/generic/` (GPG keys, nfpm, node headers, oc client, VS Code extensions) |
25-
| 2 | `create-requirements-lockfile.sh` | `pyproject.toml` | `cachi2/output/deps/pip/` (Python wheels) |
26-
| 3 | `download-npm.sh` | `package-lock.json` files | `cachi2/output/deps/npm/` (npm tarballs) |
27-
| 4 | `hermeto-fetch-rpm.sh` | `rpms.lock.yaml` | `cachi2/output/deps/rpm/{arch}/` (RPMs + repo metadata) |
28-
| 5 | `create-go-lockfile.sh` | `go.mod` (via git submodule) | `cachi2/output/deps/gomod/` (Go modules) |
24+
| 1 | `create-artifact-lockfile.py` | `artifacts.in.yaml` | `cachi2/output/<hash>/deps/generic/` (GPG keys, nfpm, node headers, oc client, VS Code extensions) |
25+
| 2 | `create-requirements-lockfile.sh` | `pyproject.toml` | `cachi2/output/<hash>/deps/pip/` (Python wheels) |
26+
| 3 | `download-npm.sh` | `package-lock.json` files | `cachi2/output/<hash>/deps/npm/` (npm tarballs) |
27+
| 4 | `hermeto-fetch-rpm.sh` | `rpms.lock.yaml` | `cachi2/output/<hash>/deps/rpm/{arch}/` (RPMs + repo metadata) |
28+
| 5 | `create-go-lockfile.sh` | `go.mod` (via git submodule) | `cachi2/output/<hash>/deps/gomod/` (Go modules) |
2929

3030
Variants are selected via `--rhds` flag:
3131
- Default (`odh`): uses CentOS Stream + UBI repos (no subscription needed)
@@ -34,15 +34,20 @@ Variants are selected via `--rhds` flag:
3434
### Makefile auto-detection
3535

3636
```makefile
37-
$(eval CACHI2_VOLUME := $(if $(and $(wildcard cachi2/output),$(wildcard $(BUILD_DIR)prefetch-input)),\
38-
--volume $(ROOT_DIR)cachi2/output:/cachi2/output:Z \
39-
--volume $(ROOT_DIR)cachi2/output/deps/rpm/$(RPM_ARCH)/repos.d/:/etc/yum.repos.d/:Z,))
37+
$(eval COMPONENT_DIR_STR := $(patsubst %/,%,$(BUILD_DIR)))
38+
$(eval CACHI2_HASH := $(shell python3 -c "import hashlib; print(hashlib.md5('$(COMPONENT_DIR_STR)'.encode()).hexdigest())"))
39+
$(eval CACHI2_DIR := cachi2/output/$(CACHI2_HASH))
40+
$(eval CACHI2_VOLUME := $(if $(and $(wildcard $(CACHI2_DIR)),$(wildcard $(BUILD_DIR)prefetch-input)),\
41+
--volume $(ROOT_DIR)$(CACHI2_DIR):/cachi2/output:Z \
42+
--volume $(ROOT_DIR)$(CACHI2_DIR)/deps/rpm/$(RPM_ARCH)/repos.d/:/etc/yum.repos.d/:Z,))
4043
```
4144

42-
When both `cachi2/output/` and `<target>/prefetch-input/` exist, the Makefile
43-
automatically mounts the prefetched dependencies into the build. The second
44-
mount overlays `/etc/yum.repos.d/` with hermeto-generated repos, making local
45-
builds behave like Konflux (repos are already in place when the Dockerfile runs).
45+
When both `cachi2/output/<hash>/` and `<target>/prefetch-input/` exist, the Makefile
46+
automatically mounts the per-component prefetched dependencies into the build.
47+
The `<hash>` is the MD5 of the component directory name, allowing concurrent
48+
builds of different components without collisions. The second mount overlays
49+
`/etc/yum.repos.d/` with hermeto-generated repos, making local builds behave
50+
like Konflux (repos are already in place when the Dockerfile runs).
4651

4752
### sandbox.py
4853

@@ -57,25 +62,33 @@ build context.
5762

5863
## cachi2/output Directory Structure
5964

60-
After prefetching, the directory looks like:
65+
After prefetching, each component gets its own namespaced directory under
66+
`cachi2/output/<hash>/` (where `<hash>` is the MD5 of the component directory
67+
name, e.g. `cachi2/output/a1b2c3.../`). This prevents collisions when building
68+
multiple components in parallel.
6169

6270
```text
6371
cachi2/output/
64-
├── deps/
65-
│ ├── rpm/
66-
│ │ ├── x86_64/
67-
│ │ │ ├── <repo-name>/ # RPM files + repodata/
68-
└── repos.d/ # Generated .repo files with file:// URLs
69-
│ │ ├── aarch64/
70-
│ │ ├── ppc64le/
71-
│ │ └── s390x/
72-
│ ├── npm/ # npm tarballs
73-
│ ├── pip/ # Python wheels
74-
│ └── generic/ # GPG keys, tarballs, etc.
75-
├── bom.json
76-
└── .build-config.json
72+
└── <hash>/ # per-component namespace (MD5 of component dir)
73+
├── deps/
74+
│ ├── rpm/
75+
│ │ ├── x86_64/
76+
│ │ │ ├── <repo-name>/ # RPM files + repodata/
77+
└── repos.d/ # Generated .repo files with file:// URLs
78+
│ │ ├── aarch64/
79+
│ │ ├── ppc64le/
80+
│ │ └── s390x/
81+
│ ├── npm/ # npm tarballs
82+
│ ├── pip/ # Python wheels
83+
│ └── generic/ # GPG keys, tarballs, etc.
84+
├── bom.json
85+
└── .build-config.json
7786
```
7887

88+
The Makefile mounts `cachi2/output/<hash>/` at `/cachi2/output/` inside the
89+
container, so the Dockerfile always sees `/cachi2/output/deps/...` regardless
90+
of which component is being built.
91+
7992
Key detail: when `rpms.in.yaml` declares `moduleEnable: [nodejs:22]`, hermeto
8093
downloads module metadata (`modules.yaml`) alongside the RPMs and includes it
8194
in the generated repodata. This allows `dnf module enable nodejs:22` to work
@@ -91,7 +104,7 @@ scripts/lockfile-generators/prefetch-all.sh --component-dir codeserver/ubi9-pyth
91104
make codeserver-ubi9-python-3.12
92105
```
93106

94-
Makefile detects `cachi2/output/` and auto-injects the volume mount.
107+
Makefile detects `cachi2/output/<hash>/` and auto-injects the volume mount.
95108

96109
### GitHub Actions
97110

@@ -108,11 +121,13 @@ The TEMPLATE workflow (`build-notebooks-TEMPLATE.yaml`) handles it transparently
108121

109122
1. PipelineRun YAML declares `prefetch-input` entries pointing to lockfiles
110123
2. cachi2's `prefetch-dependencies` task downloads everything using hermeto
111-
3. Build task mounts `/cachi2/output/` automatically
124+
3. Build task mounts deps at `/cachi2/output/` automatically
112125
4. Network isolation enforced at the pipeline level
113126

114-
All three environments produce the same `/cachi2/output/deps/` structure because
115-
they all use hermeto under the hood for RPM prefetching.
127+
All three environments produce the same `/cachi2/output/deps/` layout inside
128+
the container because they all use hermeto under the hood for RPM prefetching.
129+
On the host, local/GHA builds use `cachi2/output/<hash>/deps/` while Konflux
130+
uses its own staging directory.
116131

117132
## Variant Directories (ODH vs RHDS)
118133

scripts/lockfile-generators/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ internally. Option 6 (Git submodule) is a manual setup.
153153
| Helper | Used by | Purpose |
154154
|--------|---------|---------|
155155
| `helpers/pylock-to-requirements.py` | pip | Convert `pylock.<flavor>.toml` (PEP 751) to pip-compatible `requirements.<flavor>.txt` with `--hash` lines. |
156-
| `helpers/download-pip-packages.py` | pip | Standalone pip downloader: downloads wheels/sdists from a `requirements.txt` (with `--hash` lines) into `cachi2/output/deps/pip/`. Not called by `create-requirements-lockfile.sh` (which has its own inline download from pylock.toml). |
156+
| `helpers/download-pip-packages.py` | pip | Pip downloader: downloads wheels/sdists from a `requirements.txt` (with `--hash` lines) into `cachi2/output/<hash>/deps/pip/`. Called by `create-requirements-lockfile.sh --download`. Supports `--arch` filtering and parallel downloads. |
157157
| `helpers/download-rpms.sh` | RPM | Download RPMs from `rpms.lock.yaml` via `wget` into `cachi2/output/deps/rpm/` and create DNF repo metadata. Standalone alternative to `hermeto-fetch-rpm.sh`. |
158158
| `helpers/hermeto-fetch-rpm.sh` | RPM | Download RPMs from `rpms.lock.yaml` using [Hermeto](https://github.com/hermetoproject/hermeto) in a container. Handles RHEL entitlement cert extraction for `cdn.redhat.com` auth. Called by `create-rpm-lockfile.sh --download`. |
159159
| `helpers/hermeto-fetch-npm.sh` | npm | Alternative npm fetcher using [Hermeto](https://github.com/hermetoproject/hermeto) in a container. |

scripts/lockfile-generators/create-requirements-lockfile.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,4 +171,4 @@ echo " pylock.toml : ${PYLOCK_FILE}"
171171
echo " requirements : ${REQUIREMENTS_FILE}"
172172
if [[ "$DO_DOWNLOAD" == true ]]; then
173173
echo " wheels : ${OUT_DIR}/"
174-
fi
174+
fi

scripts/lockfile-generators/download-npm.sh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -245,10 +245,6 @@ fi
245245
mkdir -p "$DEST_DIR"
246246

247247
total=$(echo "$refs" | wc -l | tr -d ' ')
248-
count=0
249-
downloaded=0
250-
skipped=0
251-
failed=0
252248

253249
echo ""
254250
echo "Found $total unique packages to download."
@@ -266,12 +262,16 @@ download_file() {
266262
else
267263
echo "FAIL Failed: $download_url" >&2
268264
rm -f "$DEST_DIR/$filename"
265+
return 1
269266
fi
270267
fi
271268
}
272269
export -f download_file
273270

274-
echo "$refs" | xargs -n 2 -P 10 bash -c 'download_file "$1" "$2"' _
271+
if ! echo "$refs" | xargs -n 2 -P 10 bash -c 'download_file "$1" "$2"' _; then
272+
echo "Some npm downloads failed" >&2
273+
exit 1
274+
fi
275275

276276
echo ""
277277
echo "Finished! Location: $DEST_DIR"

scripts/lockfile-generators/helpers/download-pip-packages.py

Lines changed: 30 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -180,33 +180,38 @@ def download_and_verify(item):
180180

181181

182182
def should_keep_for_arch(filename: str, arch: str) -> bool:
183-
if "any" in filename or "noarch" in filename:
183+
# sdists and non-wheel files have no platform tag — always keep
184+
if not filename.endswith(".whl"):
184185
return True
185-
186-
arch_tags = []
187-
if arch in ["amd64", "x86_64"]:
188-
arch_tags = ["x86_64", "amd64"]
189-
elif arch in ["arm64", "aarch64"]:
190-
arch_tags = ["aarch64", "arm64"]
191-
elif arch == "ppc64le":
192-
arch_tags = ["ppc64le"]
193-
elif arch == "s390x":
194-
arch_tags = ["s390x"]
195-
196-
all_arches = ["x86_64", "amd64", "aarch64", "arm64", "ppc64le", "s390x"]
197-
has_arch = False
198-
for a in all_arches:
199-
if a in filename:
200-
has_arch = True
201-
break
202-
203-
if not has_arch:
186+
187+
# Wheel format: {name}-{ver}(-{build})?-{python}-{abi}-{platform}.whl
188+
# The platform tag is the last segment before .whl
189+
stem = filename[:-4] # strip .whl
190+
parts = stem.split("-")
191+
if len(parts) < 3:
204192
return True
205-
206-
for a in arch_tags:
207-
if a in filename:
208-
return True
209-
return False
193+
platform_tag = parts[-1]
194+
195+
# Pure-python wheels use "any" or "none" platform tags
196+
if platform_tag in ("any", "none") or "any" in platform_tag.split("_"):
197+
return True
198+
199+
ARCH_ALIASES = {
200+
"amd64": ["x86_64", "amd64"],
201+
"x86_64": ["x86_64", "amd64"],
202+
"arm64": ["aarch64", "arm64"],
203+
"aarch64": ["aarch64", "arm64"],
204+
"ppc64le": ["ppc64le"],
205+
"s390x": ["s390x"],
206+
}
207+
ALL_ARCHES = {"x86_64", "amd64", "aarch64", "arm64", "ppc64le", "s390x"}
208+
209+
# Check if the platform tag mentions any known architecture
210+
if not any(a in platform_tag for a in ALL_ARCHES):
211+
return True
212+
213+
# Keep only if it matches the target architecture
214+
return any(a in platform_tag for a in ARCH_ALIASES.get(arch, []))
210215

211216

212217
def main():

scripts/lockfile-generators/helpers/download-rpms.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ set -euo pipefail
1515

1616
# --- Configuration & Defaults ---
1717
SCRIPTS_PATH="scripts/lockfile-generators"
18-
DEST_DIR="./cachi2/output/deps/rpm"
18+
DEST_DIR="${CACHI2_OUT_DIR:-cachi2/output}/deps/rpm"
1919
LOCKFILE=""
2020

2121
# --- Functions ---

scripts/lockfile-generators/helpers/hermeto-fetch-gomod.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ set -euo pipefail
1212
# directory. No separate lockfile is needed — go.sum pins dependencies.
1313

1414
HERMETO_IMAGE="ghcr.io/hermetoproject/hermeto:0.46.2"
15-
HERMETO_OUTPUT="./cachi2/output"
15+
HERMETO_OUTPUT="${CACHI2_OUT_DIR:-cachi2/output}"
1616

1717
PREFETCH_DIR=""
1818

scripts/lockfile-generators/helpers/hermeto-fetch-rpm.sh

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@ HERMETO_STAGING=$(mktemp -d)
214214
trap 'rm -rf "$HERMETO_STAGING" ${CDN_CERT_DIR:+"$CDN_CERT_DIR"} ${CLEANUP_SOURCE:+"$HERMETO_SOURCE"}' EXIT
215215

216216
HERMETO_SOURCE="$(pwd)/$PREFETCH_DIR"
217-
if [[ -n "$ARCH" ]] && command -v yq &>/dev/null; then
217+
if [[ -n "$ARCH" ]]; then
218+
command -v yq &>/dev/null || error_exit "--arch requires yq to filter rpms.lock.yaml"
218219
# Translate generic arch to rpm arch format
219220
rpm_arch="$ARCH"
220221
[[ "$ARCH" == "amd64" ]] && rpm_arch="x86_64"
@@ -223,8 +224,10 @@ if [[ -n "$ARCH" ]] && command -v yq &>/dev/null; then
223224
echo "--- Filtering rpms.lock.yaml for architecture: $rpm_arch ---"
224225
HERMETO_SOURCE=$(mktemp -d)
225226
CLEANUP_SOURCE=1
226-
cp -r "$PREFETCH_DIR"/* "$HERMETO_SOURCE/"
227-
yq eval "del(.arches[] | select(.arch != \"$rpm_arch\" and .arch != \"noarch\"))" -i "$HERMETO_SOURCE/rpms.lock.yaml"
227+
# Symlink everything, then replace rpms.lock.yaml with a filtered copy
228+
ln -s "$(pwd)/$PREFETCH_DIR"/* "$HERMETO_SOURCE/" 2>/dev/null || true
229+
rm -f "$HERMETO_SOURCE/rpms.lock.yaml"
230+
yq eval "del(.arches[] | select(.arch != \"$rpm_arch\" and .arch != \"noarch\"))" "$(pwd)/$PREFETCH_DIR/rpms.lock.yaml" > "$HERMETO_SOURCE/rpms.lock.yaml"
228231
fi
229232

230233
echo "--- Downloading RPMs via hermeto ---"

scripts/lockfile-generators/prefetch-all.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,9 +141,11 @@ while [[ $# -gt 0 ]]; do
141141
done
142142

143143
[[ -z "$COMPONENT_DIR" ]] && error_exit "--component-dir is required."
144+
COMPONENT_DIR="${COMPONENT_DIR%/}"
144145
[[ -d "$COMPONENT_DIR" ]] || error_exit "Component directory not found: $COMPONENT_DIR"
145146

146-
export CACHI2_OUT_DIR="cachi2/output/$(python3 -c "import hashlib; print(hashlib.md5('$COMPONENT_DIR'.encode()).hexdigest())")"
147+
CACHI2_HASH="$(python3 -c "import hashlib, sys; print(hashlib.md5(sys.argv[1].encode()).hexdigest())" "$COMPONENT_DIR")"
148+
export CACHI2_OUT_DIR="cachi2/output/${CACHI2_HASH}"
147149
export ARCH
148150

149151
# CLI args take priority; fall back to env vars so GHA can pass secrets

0 commit comments

Comments
 (0)