Fix Bencher reporting permanently broken on main pushes#3146
Fix Bencher reporting permanently broken on main pushes#3146alexeyr-ci2 wants to merge 2 commits intomainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The benchmark workflow passed --start-point master --start-point-hash <github.event.before> for push-to-master events. Since master IS the base branch, Bencher tried to look up a version of master at the "before" hash — which often didn't exist (e.g., docs-only commits skipped by paths-ignore). This caused a 404, the report was never stored, and subsequent pushes also failed because their "before" hash was also missing. This cascading failure meant no master data was stored after the first version (Jan 18). Fix: don't pass --start-point args for master pushes (thresholds are defined inline via --threshold-* args). For PRs/dispatch where the start-point hash may be missing, retry without --start-point-hash so the report still gets stored using the latest available baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7b68467 to
05c653d
Compare
Code ReviewOverviewThis PR fixes a cascading failure where benchmarks on The secondary improvement — extracting Issues1. Brittle error detection for the retry (primary concern) The retry condition matches on literal strings from Bencher's error output: grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"If Bencher changes its error message in a future release, the retry silently never fires — and the failure also can't be distinguished from any other non-zero exit (regressions, network errors, auth errors). Consider checking for a Bencher-specific exit code if one is documented, or at minimum logging the full stderr when the condition does NOT match so the failure cause is always visible. 2. RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')After stripping 3. Unquoted The Positive Notes
|
| # Bencher (e.g., the base commit was a docs-only change skipped by | ||
| # paths-ignore), retry without --start-point-hash so the report | ||
| # still gets stored using the latest available baseline. | ||
| if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then |
There was a problem hiding this comment.
The retry condition is brittle — it depends on Bencher's exact error message text. If Bencher updates its wording (e.g. "Head Version" changes capitalisation or phrasing), the retry silently never fires and the run just fails.
Two suggestions:
- If Bencher exposes a documented exit code for "version not found", prefer that over text matching.
- Add an
elsebranch that logs a summary of why the retry was skipped, so on any non-zero exit without the expected message the cause is still surfaced:
| if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then | |
| if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then |
Something like:
elif [ $BENCHER_EXIT_CODE -ne 0 ]; then
echo "ℹ️ Bencher failed (exit $BENCHER_EXIT_CODE) but error didn't match 'Head Version … not found' — not retrying"
fi| # paths-ignore), retry without --start-point-hash so the report | ||
| # still gets stored using the latest available baseline. | ||
| if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then | ||
| RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//') |
There was a problem hiding this comment.
After sed strips --start-point-hash <sha> from the middle of the string, a double space is left behind (e.g. "--start-point main --start-point-clone-thresholds"). It's harmless because $sp_args is intentionally unquoted and word-splits, but collapsing the extra space keeps the debug output clean:
| RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//') | |
| RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//' | tr -s ' ') |
| # start-point args if the pinned hash isn't found in Bencher. | ||
| run_bencher() { | ||
| local sp_args="$1" | ||
| # shellcheck disable=SC2086 |
There was a problem hiding this comment.
The # shellcheck disable=SC2086 is intentional — $sp_args must be word-split to pass multiple flags. Worth adding a brief comment so future maintainers understand the constraint:
| # shellcheck disable=SC2086 | |
| local sp_args="$1" | |
| # $sp_args is intentionally unquoted for word-splitting — branch names | |
| # and git SHAs cannot contain spaces, so this is safe. | |
| # shellcheck disable=SC2086 |
|
View the full report in the job summary. |
|
Code review posted — see inline comments for specifics. |
| _output, status = Open3.capture2( | ||
| "bencher", "run", | ||
| "--project", PROJECT, | ||
| "--token", token, |
There was a problem hiding this comment.
Security: API token exposed in process listing
Passing --token as a CLI argument makes the token visible to any user who runs ps aux on the same host. The bencher CLI already reads BENCHER_API_TOKEN from the environment — set it there instead and drop the flag:
| "--token", token, | |
| _output, status = Open3.capture2( | |
| { "BENCHER_API_TOKEN" => token }, | |
| "bencher", "run", | |
| "--project", PROJECT, | |
| "--branch", "main", |
| end | ||
|
|
||
| def ensure_cmd!(cmd) | ||
| return if system("command -v #{cmd} > /dev/null 2>&1") |
There was a problem hiding this comment.
Minor: shell string interpolation in ensure_cmd!
cmd is interpolated directly into a shell string. It's only called with literal values today, but the pattern is fragile. Prefer passing arguments as an array to avoid any shell involvement:
| return if system("command -v #{cmd} > /dev/null 2>&1") | |
| return if system("which", cmd, out: File::NULL, err: File::NULL) |
| end | ||
|
|
||
| def fetch_bencher_hashes(token: nil) | ||
| uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc") |
There was a problem hiding this comment.
Correctness: only one page fetched — will miss reports beyond 250
With per_page=250 and no pagination loop, once Bencher accumulates >250 main-branch reports this function silently returns an incomplete set. cmd_download caches the result, so an undercount causes already-submitted commits to be re-downloaded and re-pushed on subsequent runs.
Add a page cursor loop — check for a Link: <...>; rel="next" header in the response, or iterate &page=N until you get fewer than per_page results.
| ) | ||
| return true if expires.nil? || expires.empty? | ||
|
|
||
| require "time" |
There was a problem hiding this comment.
Minor: require inside a method body
require is idempotent so this is harmless, but it's unconventional and easy to overlook. Move require "time" up with the other requires at the top of the file.
| puts "\n[#{idx + 1}/#{to_push.size}] #{sha[0, 10]}" | ||
|
|
||
| benchmarks = JSON.parse(File.read(entry[:path])) | ||
| puts " #{benchmarks.size} benchmarks" |
There was a problem hiding this comment.
Minor: full JSON parse just for a count
benchmarks holds the entire parsed structure but is only used for its .size. Parse inline to avoid the unnecessary allocation:
| puts " #{benchmarks.size} benchmarks" | |
| puts " #{JSON.parse(File.read(entry[:path])).size} benchmarks" |
| run: gem install parallel | ||
|
|
||
| - name: Install Bencher CLI | ||
| uses: bencherdev/bencher@main |
There was a problem hiding this comment.
Supply-chain: unpinned @main action
The main benchmark workflow pins Bencher to a specific version tag. Using @main here means any commit pushed to Bencher's default branch — including a compromised one — runs in this workflow. Pin to the same version tag (or a commit SHA) used in benchmark.yml.
| ruby-version: '3.3' | ||
|
|
||
| - name: Install dependencies | ||
| run: gem install parallel |
There was a problem hiding this comment.
Minor: unpinned gem version
| run: gem install parallel | |
| run: gem install parallel --version '~> 1.26' |
Without a version constraint, a future major release with breaking changes could silently break this workflow.
| ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" | ||
|
|
||
| # Count remaining downloadable artifacts | ||
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}') |
There was a problem hiding this comment.
Correctness: script runs twice per loop iteration
The download script is invoked once to do the actual work (line 36), then immediately invoked a second time just to parse the remaining count from its output (line 39). That second invocation re-runs all the download logic and may kick off additional parallel downloads.
Capture the output of the first invocation instead:
| ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" | |
| # Count remaining downloadable artifacts | |
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}') | |
| OUTPUT=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1) | |
| echo "$OUTPUT" | |
| # Count remaining downloadable artifacts | |
| REMAINING=$(echo "$OUTPUT" | grep "To download:" | awk '{print $NF}') |
| # Bencher (e.g., the base commit was a docs-only change skipped by | ||
| # paths-ignore), retry without --start-point-hash so the report | ||
| # still gets stored using the latest available baseline. | ||
| if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then |
There was a problem hiding this comment.
Fragile: retry triggers on exact error message text
Both "Head Version" and "not found" are literal strings from Bencher's current error output. If Bencher changes its error wording in a future release, the condition silently stops matching and the fallback never fires — the run just fails with no clear explanation.
Consider one of:
- Match on the HTTP 404 status code string (e.g.
"404") which is less likely to change - Add a comment here quoting the exact Bencher error message this is matching, so it's easy to spot when a Bencher upgrade breaks it
- Assert the Bencher version in the workflow so unexpected upgrades are caught early
c95250a to
ba26083
Compare
| # max-sample-size limits historical data to the most recent 64 runs | ||
| # to keep the baseline relevant as performance evolves. | ||
| - name: Install Bencher CLI | ||
| uses: bencherdev/bencher@main |
There was a problem hiding this comment.
Unpinned action version — bencherdev/bencher@main floats to whatever HEAD of that repo is. A supply-chain compromise or accidental breaking change there would silently affect every run. Pin to a specific SHA or release tag, e.g.:
| uses: bencherdev/bencher@main | |
| uses: bencherdev/bencher@v0.5.3 |
(replace with whichever release is current)
| FALLBACK_BODY="${FALLBACK_BODY}"$'\n'$'\n'"View the full report in the [job summary](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})." | ||
| gh pr comment ${{ github.event.pull_request.number }} --body "$FALLBACK_BODY" || true | ||
| fi | ||
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}') |
There was a problem hiding this comment.
Double invocation per loop iteration — the download script is called once on line 122 (doing real work) and then called a second time on this line purely to read the count. That second call still hits caches, re-checks file existence, and prints noise. Capture the output of the first call instead:
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}') | |
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | tee /dev/stderr | grep "To download:" | awk '{print $NF}') |
Or restructure to parse stdout from the single invocation, then break if REMAINING == 0.
| "--branch", "main", | ||
| "--hash", sha, | ||
| "--testbed", TESTBED, | ||
| "--adapter", "json", |
There was a problem hiding this comment.
Secret leaks into process arguments — --token token_value is visible in ps aux output and GitHub Actions log echoing. bencher reads BENCHER_API_TOKEN automatically from the environment, so drop the --token argument entirely and let the env var already set in the workflow step do the work:
| "--adapter", "json", | |
| def run_bencher(sha:, file:) | |
| _output, status = Open3.capture2( | |
| "bencher", "run", | |
| "--project", PROJECT, | |
| "--branch", "main", | |
| "--hash", sha, | |
| "--testbed", TESTBED, | |
| "--adapter", "json", | |
| "--file", file, | |
| "--err" | |
| ) | |
| status.success? | |
| end |
Update call sites accordingly: run_bencher(sha: sha, file: entry[:path]).
| abort "Error: #{cmd} not found in PATH" | ||
| end | ||
|
|
||
| def fetch_bencher_hashes(token: nil) |
There was a problem hiding this comment.
No pagination — will silently miss reports beyond 250 — per_page=250 is a single page. If Bencher returns more than 250 reports for main, commits beyond that limit won't appear in existing and will be needlessly re-submitted. Add a loop over pages:
def fetch_bencher_hashes(token: nil)
hashes = []
page = 1
loop do
uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&page=#{page}&direction=asc")
req = Net::HTTP::Get.new(uri)
req["Authorization"] = "Bearer #{token}" if token
resp = Net::HTTP.start(uri.host, uri.port, use_ssl: true) { |http| http.request(req) }
break unless resp.is_a?(Net::HTTPSuccess)
batch = JSON.parse(resp.body)
break if batch.empty?
hashes.concat(batch.filter_map { |r| r.dig("branch", "head", "version", "hash") })
page += 1
end
hashes.uniq
rescue StandardError => e
warn " Warning: could not fetch Bencher hashes: #{e.message}"
[]
end| end | ||
|
|
||
| def ensure_cmd!(cmd) | ||
| return if system("command -v #{cmd} > /dev/null 2>&1") |
There was a problem hiding this comment.
Prefer Kernel#system with an array to avoid any potential shell-injection if cmd ever becomes dynamic in the future:
| return if system("command -v #{cmd} > /dev/null 2>&1") | |
| return if system("command", "-v", cmd, out: File::NULL, err: File::NULL) |
|
|
||
| # ── Helpers ────────────────────────────────────────────────────────────── | ||
|
|
||
| def load_cached(work_dir, filename) |
There was a problem hiding this comment.
Stale cache in the push phase — load_cached is used in cmd_download and saves bencher_hashes.json at download time. But cmd_push calls fetch_bencher_hashes directly (fresh, line 125), which is correct. The risk is if someone runs download first, then push after several reports have been newly added to Bencher — the stale cache in download won't reflect them. Consider adding a cache TTL (e.g., skip the cache if the file is more than an hour old) or documenting that re-running download with --force invalidates the cache.
ba26083 to
c49faa6
Compare
Code ReviewOverviewThis PR adds a one-shot recovery workflow and script to backfill benchmark data that went missing in Bencher from ~Jan 2026 onwards due to the broken Important structural question: The PR replaces the entire IssuesSecurity
Correctness
Minor
Positives
|
c49faa6 to
9a1e275
Compare
| end | ||
|
|
||
| def run_bencher(token:, sha:, file:) | ||
| # No --err: alerts are expected during backfill and should not block submission | ||
| _output, status = Open3.capture2( | ||
| "bencher", "run", | ||
| "--project", PROJECT, | ||
| "--token", token, |
There was a problem hiding this comment.
API token passed as a CLI argument — visible in process listings
Passing token via --token makes it a process argument, which is visible in ps aux / /proc/$PID/cmdline to any user on the same host. In a shared CI runner environment this is a real exposure risk.
bencher run already reads BENCHER_API_TOKEN from the environment automatically. Since the calling workflow already sets that env var, the --token arg can simply be dropped:
| end | |
| def run_bencher(token:, sha:, file:) | |
| # No --err: alerts are expected during backfill and should not block submission | |
| _output, status = Open3.capture2( | |
| "bencher", "run", | |
| "--project", PROJECT, | |
| "--token", token, | |
| def run_bencher(token:, sha:, file:) | |
| _output, status = Open3.capture2( | |
| { "BENCHER_API_TOKEN" => token }, | |
| "bencher", "run", | |
| "--project", PROJECT, | |
| "--branch", "main", | |
| "--hash", sha, | |
| "--testbed", TESTBED, | |
| "--adapter", "json", | |
| "--file", file, | |
| "--err" | |
| ) | |
| status.success? | |
| end |
(Open3.capture2 accepts an env hash as the first argument — this passes the token via the child process's environment rather than its argument list.)
Code ReviewThis PR fixes a real bug (cascading Bencher 404s from missing Critical: Benchmark job is entirely removedThe PR replaces the full
Is this intentional — a temporary recovery-only workflow to be followed by a separate PR restoring the benchmark job? If so, it should be called out explicitly in the PR description and the Security issues
Bugs
Minor notes
|
| fi | ||
| - name: SSH debug on failure | ||
| if: failure() | ||
| uses: mxschmitt/action-tmate@v3 |
There was a problem hiding this comment.
Debugging artifact — must be removed before merge.
action-tmate opens an interactive SSH tunnel into the runner on failure. Even with limit-access-to-actor: true, it blocks the job indefinitely (until a 6-hour timeout) waiting for an SSH connection and can expose any secrets mounted in the runner environment. This should never ship in a merged workflow.
| ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" | ||
|
|
||
| if [ -n "$START_POINT_HASH" ]; then | ||
| echo "Found merge-base via API: $START_POINT_HASH" | ||
| else | ||
| echo "⚠️ Could not find merge-base with main via GitHub API, continuing without it" | ||
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}') |
There was a problem hiding this comment.
Double invocation per iteration. Line 123 runs the full download pass (which does real work — network calls, file writes), then line 125 runs it again just to parse the "To download" count from stdout. This doubles the work on every retry iteration.
A cleaner approach: capture the output from the single download call and parse the count from it, or add a --count-remaining flag to the script that's separate from the full download.
| ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" | |
| if [ -n "$START_POINT_HASH" ]; then | |
| echo "Found merge-base via API: $START_POINT_HASH" | |
| else | |
| echo "⚠️ Could not find merge-base with main via GitHub API, continuing without it" | |
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}') | |
| OUTPUT=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1) | |
| echo "$OUTPUT" | |
| REMAINING=$(echo "$OUTPUT" | grep "To download:" | awk '{print $NF}') |
| end | ||
|
|
||
| def fetch_bencher_hashes(token: nil) | ||
| uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc") |
There was a problem hiding this comment.
No pagination — silently truncates beyond 250 reports.
If the main branch already has more than 250 reports in Bencher, existing will be an incomplete set and the download phase will try to re-fetch data that is already in Bencher, wasting artifact quota and potentially creating duplicate submissions.
Consider adding a pagination loop:
def fetch_bencher_hashes(token: nil)
hashes = []
page = 1
loop do
uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&page=#{page}&direction=asc")
req = Net::HTTP::Get.new(uri)
req["Authorization"] = "Bearer #{token}" if token
resp = Net::HTTP.start(uri.host, uri.port, use_ssl: true) { |http| http.request(req) }
break unless resp.is_a?(Net::HTTPSuccess)
page_reports = JSON.parse(resp.body)
break if page_reports.empty?
hashes.concat(page_reports.filter_map { |r| r.dig("branch", "head", "version", "hash") })
page += 1
end
hashes.uniq
rescue StandardError => e
warn " Warning: could not fetch Bencher hashes: #{e.message}"
[]
end| _output, status = Open3.capture2( | ||
| "bencher", "run", | ||
| "--project", PROJECT, | ||
| "--token", token, |
There was a problem hiding this comment.
Token passed as a CLI argument leaks into the process table.
--token <value> is visible to other processes on the system via ps aux for the duration of the bencher run call. The Bencher CLI also accepts the token via the BENCHER_API_TOKEN environment variable, which is the safer approach:
| "--token", token, | |
| _output, status = Open3.capture2( | |
| { "BENCHER_API_TOKEN" => token }, | |
| "bencher", "run", | |
| "--project", PROJECT, | |
| "--branch", "main", | |
| "--hash", sha, | |
| "--testbed", TESTBED, | |
| "--adapter", "json", | |
| "--file", file | |
| ) |
| end | ||
|
|
||
| def ensure_cmd!(cmd) | ||
| return if system("command -v #{cmd} > /dev/null 2>&1") |
There was a problem hiding this comment.
Minor: cmd is always called with a literal string ("gh" or "bencher"), so the string interpolation is harmless today, but using system with a shell-interpolated string is a code-smell that could become a real injection vector if the call site changes. Prefer the multi-arg form:
| return if system("command -v #{cmd} > /dev/null 2>&1") | |
| def ensure_cmd!(cmd) | |
| return if system("command", "-v", cmd, out: File::NULL, err: File::NULL) | |
| abort "Error: #{cmd} not found in PATH" | |
| end |
|
|
||
| def cmd_download(work_dir) | ||
| ensure_cmd!("gh") | ||
| artifacts_dir = File.join(work_dir, "artifacts") |
There was a problem hiding this comment.
Stale cache risk in the download phase.
bencher_hashes.json is written once and never invalidated. If download is run multiple times (as the retry loop in the workflow does), the cached hash list grows stale — pushes that succeed between runs won't be reflected, and already-uploaded commits will be re-queued for download.
The push phase correctly bypasses the cache and calls fetch_bencher_hashes fresh each time. Consider doing the same in download, or at minimum deleting the cache file at the start of each download run so it is always refreshed:
existing = Set.new(fetch_bencher_hashes)
Code ReviewOverviewThis draft PR has two parts:
The script itself is well-structured and the root-cause analysis in the PR description is sound. However, several issues need to be addressed before this can merge, and the PR description promises a fix (not passing Critical Issues1. The actual benchmark job is gone — no benchmarks run in this state. 2. 3. The described fix is missing. Workflow Issues4. Recovery job runs on the same triggers as the benchmark workflow. 5. Double invocation per retry iteration (see inline comment on lines 123–125). Script Issues6. 7. Token passed as a CLI argument (see inline comment on line 284). 8. Stale cache in the Minor
|
74b2755 to
ab43744
Compare
| t && t >= "2026-04-15T14:04:00" && t <= "2026-04-15T14:06:00" | ||
| end | ||
|
|
||
| puts " #{reports.size} total reports, #{bad_reports.size} from today (#{today})" |
There was a problem hiding this comment.
Bug: NameError — today is undefined
today is referenced here but never defined anywhere in the script. This will raise a NameError whenever cmd_delete is called with matching reports.
| puts " #{reports.size} total reports, #{bad_reports.size} from today (#{today})" | |
| puts " #{reports.size} total reports, #{bad_reports.size} in the target window (2026-04-15T14:04–14:06 UTC)" |
|
|
||
| def ensure_cmd!(cmd) | ||
| return if system("command -v #{cmd} > /dev/null 2>&1") | ||
|
|
There was a problem hiding this comment.
Security: API token exposed as a CLI argument
Passing --token as a command-line argument makes the token visible in process listings (ps aux) and in the bencher CLI's own output/error messages. Prefer the BENCHER_API_TOKEN environment variable, which bencher already supports:
| _output, status = Open3.capture2({"BENCHER_API_TOKEN" => token}, *args) |
And remove "--token", token, from args.
| end | ||
|
|
||
| def fetch_bencher_hashes(token: nil) | ||
| uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc") |
There was a problem hiding this comment.
Missing pagination — silently truncates at 250 reports
If main has more than 250 benchmark reports in Bencher (plausible over time), this will miss older ones and treat them as "not in Bencher," potentially causing duplicate submissions. cmd_delete (line 201) has the same issue.
Consider looping with an offset / page parameter until the API returns fewer results than per_page, or using --paginate via the gh CLI instead of a direct Net::HTTP call.
| for attempt in 1 2 3 4 5; do | ||
| echo "" | ||
| echo "=== Download attempt $attempt ===" | ||
| ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" |
There was a problem hiding this comment.
Double invocation of the download script per retry loop iteration
Lines 129 and 131 both invoke the download script. Line 129 actually downloads artifacts; line 131 re-runs the same script a second time just to parse "To download:" count from its output. This means each iteration performs two full download passes (including API calls and parallel downloads), and the count on line 131 reflects the state after a second download attempt, not after the one on line 129.
Consider having the script emit a machine-readable exit code or a dedicated count command so you don't need to run it twice:
ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"
REMAINING=$(ruby scripts/recover-bencher-data.rb count --work-dir="$WORK_DIR")Or capture the output of the first invocation and parse it.
| - name: SSH debug on failure | ||
| if: failure() | ||
| uses: mxschmitt/action-tmate@v3 | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| LABEL="performance-regression" | ||
| COMMIT_SHA="${{ github.sha }}" | ||
| COMMIT_SHORT="${COMMIT_SHA:0:7}" | ||
| COMMIT_URL="${{ github.server_url }}/${{ github.repository }}/commit/${COMMIT_SHA}" | ||
| RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}" | ||
| BENCHER_URL="https://bencher.dev/perf/react-on-rails-t8a9ncxo" | ||
| ACTOR="${{ github.actor }}" | ||
|
|
||
| # Ensure the label exists (idempotent) | ||
| gh label create "$LABEL" \ | ||
| --description "Automated: benchmark regression detected on main" \ | ||
| --color "D93F0B" \ | ||
| --force 2>/dev/null || true | ||
|
|
||
| # Check for an existing open issue to avoid duplicates | ||
| EXISTING_ISSUE=$(gh issue list \ | ||
| --label "$LABEL" \ | ||
| --state open \ | ||
| --limit 1 \ | ||
| --json number \ | ||
| --jq '.[0].number // empty') | ||
|
|
||
| # Build the benchmark summary snippet (defensive: don't let column failure block alerting) | ||
| SUMMARY="" | ||
| if [ -f bench_results/summary.txt ]; then | ||
| FORMATTED=$(column -t -s $'\t' "bench_results/summary.txt" 2>/dev/null) || FORMATTED=$(cat "bench_results/summary.txt") | ||
| SUMMARY=$(printf '\n### Benchmark Summary\n\n```\n%s\n```' "$FORMATTED") | ||
| fi | ||
|
|
||
| if [ -n "$EXISTING_ISSUE" ]; then | ||
| echo "Open regression issue already exists: #${EXISTING_ISSUE} — adding comment" | ||
|
|
||
| if ! gh issue comment "$EXISTING_ISSUE" --body "$(cat <<EOF | ||
| ## New regression detected | ||
|
|
||
| **Commit:** [\`${COMMIT_SHORT}\`](${COMMIT_URL}) by @${ACTOR} | ||
| **Workflow run:** [Run #${{ github.run_number }}](${RUN_URL}) | ||
| ${SUMMARY} | ||
|
|
||
| > View the full Bencher report in the [workflow run summary](${RUN_URL}) or on the [Bencher dashboard](${BENCHER_URL}). | ||
| EOF | ||
| )"; then | ||
| echo "::warning::Failed to comment on regression issue #${EXISTING_ISSUE}" | ||
| fi | ||
| else | ||
| echo "No open regression issue found — creating one" | ||
|
|
||
| if ! gh issue create \ | ||
| --title "Performance Regression Detected on main (${COMMIT_SHORT})" \ | ||
| --label "$LABEL" \ | ||
| --body "$(cat <<EOF | ||
| ## Performance Regression Detected on main | ||
|
|
||
| A statistically significant performance regression was detected by | ||
| [Bencher](${BENCHER_URL}) using a Student's t-test (95% confidence | ||
| interval, up to 64 sample history). | ||
|
|
||
| | Detail | Value | | ||
| |--------|-------| | ||
| | **Commit** | [\`${COMMIT_SHORT}\`](${COMMIT_URL}) | | ||
| | **Pushed by** | @${ACTOR} | | ||
| | **Workflow run** | [Run #${{ github.run_number }}](${RUN_URL}) | | ||
| | **Bencher dashboard** | [View history](${BENCHER_URL}) | | ||
| ${SUMMARY} | ||
|
|
||
| ### What to do | ||
|
|
||
| 1. Check the [workflow run](${RUN_URL}) for the full Bencher HTML report | ||
| 2. Review the [Bencher dashboard](${BENCHER_URL}) to see which metrics regressed | ||
| 3. Investigate the commit — expected trade-off or unintended regression? | ||
| 4. If unintended, open a fix PR and reference this issue | ||
| 5. Close this issue once resolved — subsequent regressions will open a new one | ||
|
|
||
| --- | ||
| *This issue was created automatically by the benchmark CI workflow.* | ||
| EOF | ||
| )"; then | ||
| echo "::warning::Failed to create regression issue — check GitHub API permissions" | ||
| fi | ||
| fi | ||
|
|
||
| # ============================================ | ||
| # STEP 7c: FAIL WORKFLOW ON MAIN REGRESSION | ||
| # ============================================ | ||
| # Only fail on main — PR benchmarks are informational (triggered by 'benchmark' label). | ||
| # Regressions on PRs are surfaced via Bencher report comments, not workflow failures. | ||
| - name: Fail workflow if Bencher detected regression on main | ||
| if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' | ||
| run: | | ||
| echo "Bencher detected a regression (exit code: ${BENCHER_EXIT_CODE:-1})" | ||
| exit "${BENCHER_EXIT_CODE:-1}" | ||
|
|
||
| # ============================================ | ||
| # STEP 8: WORKFLOW COMPLETION | ||
| # ============================================ | ||
| - name: Workflow summary | ||
| if: always() | ||
| run: | | ||
| echo "📋 Benchmark Workflow Summary" | ||
| echo "====================================" | ||
| echo "Status: ${{ job.status }}" | ||
| echo "Run number: ${{ github.run_number }}" | ||
| echo "Triggered by: ${{ github.actor }}" | ||
| echo "Branch: ${{ github.ref_name }}" | ||
| echo "Run Core: ${{ env.RUN_CORE || 'false' }}" | ||
| echo "Run Pro Rails: ${{ env.RUN_PRO_RAILS || 'false' }}" | ||
| echo "Run Pro Node Renderer: ${{ env.RUN_PRO_NODE_RENDERER || 'false' }}" | ||
| echo "" | ||
| if [ "${{ job.status }}" == "success" ]; then | ||
| echo "✅ All steps completed successfully" | ||
| else | ||
| echo "❌ Workflow encountered errors - check logs above" | ||
| fi | ||
| BENCHER_API_TOKEN: ${{ secrets.BENCHER_API_TOKEN }} | ||
| with: | ||
| limit-access-to-actor: true |
There was a problem hiding this comment.
Security: tmate debug step should be removed before merge
This opens an interactive SSH tunnel on failure, giving shell access to the runner — which has BENCHER_API_TOKEN and GH_TOKEN in its environment. Even with limit-access-to-actor: true, this is a significant attack surface: the actor's GitHub account becoming compromised would directly expose both secrets.
Also, GH_TOKEN and BENCHER_API_TOKEN don't need to be env vars on this step at all (tmate doesn't use them), so they'd be unnecessarily exposed to the tmate session.
This is acceptable for local debugging during development of the script, but should not be merged into a production workflow.
| data | ||
| end | ||
|
|
||
| def ensure_cmd!(cmd) |
There was a problem hiding this comment.
Minor: command injection risk in ensure_cmd!
system("command -v #{cmd} > /dev/null 2>&1")cmd is always a hardcoded literal at the two call sites ("gh", "bencher"), so this is currently safe. But the pattern is fragile — if a future caller passes user-controlled input, this becomes a shell injection vector. Prefer the array form to avoid shell interpretation entirely:
| def ensure_cmd!(cmd) | |
| def ensure_cmd!(cmd) | |
| return if system("command", "-v", cmd, out: File::NULL, err: File::NULL) | |
| abort "Error: #{cmd} not found in PATH" | |
| end |
Code ReviewOverviewThis PR has two parts:
Critical: The benchmark workflow is fully disabled by this PRThe actual benchmarking logic (detect-changes, server startup, Vegeta/k6 runs, Bencher reporting, regression alerting, PR comments) is entirely removed and replaced with a data-recovery job. After merge, no benchmarks will run on push-to-main, PRs, or I assume the intent is either:
Either way, the PR description does not call this out, and the checklist items are all unchecked. Please clarify the intent before merge. Bugs
Security
Data integrity
Minor
|
ab43744 to
4997869
Compare
4997869 to
ee4cf67
Compare
| - name: SSH debug on failure | ||
| if: failure() | ||
| uses: mxschmitt/action-tmate@v3 | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| LABEL="performance-regression" | ||
| COMMIT_SHA="${{ github.sha }}" | ||
| COMMIT_SHORT="${COMMIT_SHA:0:7}" | ||
| COMMIT_URL="${{ github.server_url }}/${{ github.repository }}/commit/${COMMIT_SHA}" | ||
| RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}" | ||
| BENCHER_URL="https://bencher.dev/perf/react-on-rails-t8a9ncxo" | ||
| ACTOR="${{ github.actor }}" | ||
|
|
||
| # Ensure the label exists (idempotent) | ||
| gh label create "$LABEL" \ | ||
| --description "Automated: benchmark regression detected on main" \ | ||
| --color "D93F0B" \ | ||
| --force 2>/dev/null || true | ||
|
|
||
| # Check for an existing open issue to avoid duplicates | ||
| EXISTING_ISSUE=$(gh issue list \ | ||
| --label "$LABEL" \ | ||
| --state open \ | ||
| --limit 1 \ | ||
| --json number \ | ||
| --jq '.[0].number // empty') | ||
|
|
||
| # Build the benchmark summary snippet (defensive: don't let column failure block alerting) | ||
| SUMMARY="" | ||
| if [ -f bench_results/summary.txt ]; then | ||
| FORMATTED=$(column -t -s $'\t' "bench_results/summary.txt" 2>/dev/null) || FORMATTED=$(cat "bench_results/summary.txt") | ||
| SUMMARY=$(printf '\n### Benchmark Summary\n\n```\n%s\n```' "$FORMATTED") | ||
| fi | ||
|
|
||
| if [ -n "$EXISTING_ISSUE" ]; then | ||
| echo "Open regression issue already exists: #${EXISTING_ISSUE} — adding comment" | ||
|
|
||
| if ! gh issue comment "$EXISTING_ISSUE" --body "$(cat <<EOF | ||
| ## New regression detected | ||
|
|
||
| **Commit:** [\`${COMMIT_SHORT}\`](${COMMIT_URL}) by @${ACTOR} | ||
| **Workflow run:** [Run #${{ github.run_number }}](${RUN_URL}) | ||
| ${SUMMARY} | ||
|
|
||
| > View the full Bencher report in the [workflow run summary](${RUN_URL}) or on the [Bencher dashboard](${BENCHER_URL}). | ||
| EOF | ||
| )"; then | ||
| echo "::warning::Failed to comment on regression issue #${EXISTING_ISSUE}" | ||
| fi | ||
| else | ||
| echo "No open regression issue found — creating one" | ||
|
|
||
| if ! gh issue create \ | ||
| --title "Performance Regression Detected on main (${COMMIT_SHORT})" \ | ||
| --label "$LABEL" \ | ||
| --body "$(cat <<EOF | ||
| ## Performance Regression Detected on main | ||
|
|
||
| A statistically significant performance regression was detected by | ||
| [Bencher](${BENCHER_URL}) using a Student's t-test (95% confidence | ||
| interval, up to 64 sample history). | ||
|
|
||
| | Detail | Value | | ||
| |--------|-------| | ||
| | **Commit** | [\`${COMMIT_SHORT}\`](${COMMIT_URL}) | | ||
| | **Pushed by** | @${ACTOR} | | ||
| | **Workflow run** | [Run #${{ github.run_number }}](${RUN_URL}) | | ||
| | **Bencher dashboard** | [View history](${BENCHER_URL}) | | ||
| ${SUMMARY} | ||
|
|
||
| ### What to do | ||
|
|
||
| 1. Check the [workflow run](${RUN_URL}) for the full Bencher HTML report | ||
| 2. Review the [Bencher dashboard](${BENCHER_URL}) to see which metrics regressed | ||
| 3. Investigate the commit — expected trade-off or unintended regression? | ||
| 4. If unintended, open a fix PR and reference this issue | ||
| 5. Close this issue once resolved — subsequent regressions will open a new one | ||
|
|
||
| --- | ||
| *This issue was created automatically by the benchmark CI workflow.* | ||
| EOF | ||
| )"; then | ||
| echo "::warning::Failed to create regression issue — check GitHub API permissions" | ||
| fi | ||
| fi | ||
|
|
||
| # ============================================ | ||
| # STEP 7c: FAIL WORKFLOW ON MAIN REGRESSION | ||
| # ============================================ | ||
| # Only fail on main — PR benchmarks are informational (triggered by 'benchmark' label). | ||
| # Regressions on PRs are surfaced via Bencher report comments, not workflow failures. | ||
| - name: Fail workflow if Bencher detected regression on main | ||
| if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' | ||
| run: | | ||
| echo "Bencher detected a regression (exit code: ${BENCHER_EXIT_CODE:-1})" | ||
| exit "${BENCHER_EXIT_CODE:-1}" | ||
|
|
||
| # ============================================ | ||
| # STEP 8: WORKFLOW COMPLETION | ||
| # ============================================ | ||
| - name: Workflow summary | ||
| if: always() | ||
| run: | | ||
| echo "📋 Benchmark Workflow Summary" | ||
| echo "====================================" | ||
| echo "Status: ${{ job.status }}" | ||
| echo "Run number: ${{ github.run_number }}" | ||
| echo "Triggered by: ${{ github.actor }}" | ||
| echo "Branch: ${{ github.ref_name }}" | ||
| echo "Run Core: ${{ env.RUN_CORE || 'false' }}" | ||
| echo "Run Pro Rails: ${{ env.RUN_PRO_RAILS || 'false' }}" | ||
| echo "Run Pro Node Renderer: ${{ env.RUN_PRO_NODE_RENDERER || 'false' }}" | ||
| echo "" | ||
| if [ "${{ job.status }}" == "success" ]; then | ||
| echo "✅ All steps completed successfully" | ||
| else | ||
| echo "❌ Workflow encountered errors - check logs above" | ||
| fi | ||
| BENCHER_API_TOKEN: ${{ secrets.BENCHER_API_TOKEN }} | ||
| with: | ||
| limit-access-to-actor: true |
There was a problem hiding this comment.
Security: Remove tmate debug step before merging
action-tmate opens a live SSH session into the runner on failure. Even with limit-access-to-actor: true, the environment at that point contains both BENCHER_API_TOKEN and GH_TOKEN — any actor with repo access (and commit push rights) would gain shell access with those credentials.
This is fine for a personal debugging session on a draft branch, but must be removed before this job runs in production (i.e., on any push to main).
|
|
||
| reports = JSON.parse(resp.body) | ||
| # Delete all reports so we can resubmit in correct chronological order | ||
| bad_reports = reports |
There was a problem hiding this comment.
Destructive: bad_reports = reports deletes ALL Bencher history on every run
The comment says "delete today's bad backfill reports" but the code assigns reports (all reports) to bad_reports with no filtering. Coupled with the workflow running on every push to main, this means every push will wipe all Bencher history — the opposite of the recovery goal.
At minimum, filter by creation date so only "today's" bad backfill window is targeted:
# Example: only delete reports created today
today = Time.now.utc.strftime("%Y-%m-%d")
bad_reports = reports.select { |r| r["start_time"]&.start_with?(today) }Or better, guard the entire recover-bencher job so it only runs once (e.g., via workflow_dispatch only, or a one-time flag).
| end | ||
|
|
||
| def fetch_bencher_hashes(token: nil) | ||
| uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc") |
There was a problem hiding this comment.
Missing pagination — only the first 250 reports are checked
per_page=250 is the cap but there's no page-walking loop. Once there are >250 Bencher reports the function silently under-reports existing hashes, causing those commits to be submitted again as duplicates.
cmd_delete has the same issue at line 201.
Consider a pagination loop:
def fetch_bencher_hashes(token: nil)
hashes = []
page = 1
loop do
uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&page=#{page}&direction=asc")
# ... (same request setup)
reports = JSON.parse(resp.body)
break if reports.empty?
hashes.concat(reports.filter_map { |r| r.dig("branch", "head", "version", "hash") })
break if reports.size < 250
page += 1
end
hashes.uniq
end| "bencher", "run", | ||
| "--project", PROJECT, | ||
| "--token", token, | ||
| "--branch", "main", |
There was a problem hiding this comment.
Token exposed in process list
Passing --token token_value as a positional CLI argument means the secret is visible in ps aux output for the duration of the bencher run invocation. Other processes running on the same GitHub Actions runner worker could read it.
Prefer injecting the token via an environment variable that bencher already supports:
| "--branch", "main", | |
| _output, status = Open3.capture2({"BENCHER_API_TOKEN" => token}, *args) |
And remove "--token", token, from the args array.
| ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" | ||
|
|
||
| # Wait for port 3001 to be free | ||
| echo "⏳ Waiting for port 3001 to be free..." | ||
| for _ in {1..10}; do | ||
| if ! lsof -ti:3001 > /dev/null 2>&1; then | ||
| echo "✅ Port 3001 is now free" | ||
| exit 0 | ||
| REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}') |
There was a problem hiding this comment.
Download script runs twice per retry iteration
Line 129 does the actual download pass, then line 131 runs the script again just to parse the "To download:" count from its output. This doubles API calls and GH artifact download attempts per iteration.
The script should output the remaining count as part of its normal run (or write it to a file), so the second invocation isn't needed. Alternatively, parse the count from the first run's output:
OUTPUT=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1)
echo "$OUTPUT"
REMAINING=$(echo "$OUTPUT" | grep "To download:" | awk '{print $NF}')| def sort_by_commit_date(entries) | ||
| # Try to sort by commit date using git log | ||
| sha_to_entry = entries.to_h { |e| [e[:sha], e] } | ||
| shas = entries.map { |e| e[:sha] } | ||
|
|
||
| # git log can sort multiple commits chronologically | ||
| output, status = Open3.capture2( | ||
| "git", "log", "--format=%H", "--no-walk", "--date-order", *shas | ||
| ) | ||
|
|
||
| if status.success? | ||
| sorted_shas = output.lines.map(&:strip).select { |s| sha_to_entry.key?(s) } | ||
| # Reverse because git log shows newest first | ||
| sorted_shas.reverse.map { |s| sha_to_entry[s] } | ||
| else | ||
| # Fall back to original order | ||
| entries | ||
| end | ||
| rescue StandardError | ||
| entries | ||
| end |
There was a problem hiding this comment.
Dead code — sort_by_commit_date is never called
This function was presumably superseded by to_push.sort_by! { |e| e[:created] || "" } on line 156. It can be removed to avoid confusion.
| # STEP 4: RUN CORE BENCHMARKS | ||
| # ============================================ | ||
| - name: Install Bencher CLI | ||
| uses: bencherdev/bencher@main |
There was a problem hiding this comment.
Pin bencherdev/bencher to a specific version tag
@main is a mutable ref — a breaking change upstream will silently break this job on the next run. Pin to a specific release tag (e.g. bencherdev/bencher@v0.4.x) to get reproducible behaviour.
Same concern applies to actions/checkout@v4 which is already pinned — keep that pattern consistent.
| jobs: | ||
| detect-changes: | ||
| permissions: | ||
| contents: read | ||
| actions: read | ||
| runs-on: ubuntu-22.04 | ||
| outputs: | ||
| docs_only: ${{ steps.detect.outputs.docs_only }} | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 50 | ||
| persist-credentials: false | ||
| - name: Detect relevant changes | ||
| id: detect | ||
| run: | | ||
| BASE_REF="${{ github.event.pull_request.base.sha || github.event.before || 'origin/main' }}" | ||
| script/ci-changes-detector "$BASE_REF" | ||
| shell: bash | ||
| - name: Guard docs-only main pushes | ||
| if: github.event_name == 'push' && github.ref == 'refs/heads/main' | ||
| uses: ./.github/actions/ensure-main-docs-safety | ||
| with: | ||
| docs-only: ${{ steps.detect.outputs.docs_only }} | ||
| previous-sha: ${{ github.event.before }} | ||
|
|
||
| benchmark: | ||
| needs: detect-changes | ||
| # Run on: push to main, workflow_dispatch, or PRs with 'benchmark' label. | ||
| # The 'full-ci' label is intentionally excluded — it controls test workflows, | ||
| # not benchmarks. Use the dedicated 'benchmark' label to trigger perf runs on PRs. | ||
| # See https://bencher.dev/docs/how-to/github-actions/#pull-requests for the extra pull_request condition | ||
| # Skip docs-only pushes to main to avoid wasting CI resources on non-code changes. | ||
| if: | | ||
| !( | ||
| github.event_name == 'push' && | ||
| github.ref == 'refs/heads/main' && | ||
| needs.detect-changes.outputs.docs_only == 'true' | ||
| ) && ( | ||
| github.event_name == 'push' || | ||
| github.event_name == 'workflow_dispatch' || | ||
| ( | ||
| github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository && | ||
| contains(github.event.pull_request.labels.*.name, 'benchmark') | ||
| ) | ||
| ) | ||
| recover-bencher: |
There was a problem hiding this comment.
Recovery job has no trigger guard — will run on every push/PR forever
The recover-bencher job runs unconditionally on every push to main, every labeled PR, and every workflow_dispatch. Once the one-time backfill is done, subsequent runs will:
- Call
delete→ wipe all Bencher history (see separate comment oncmd_delete) - Try to download artifacts (most already pushed, nothing to do)
- Open a tmate SSH session on any failure
This job should either be:
- Removed from this workflow once the backfill is complete, or
- Guarded to only run via
workflow_dispatch(if: github.event_name == 'workflow_dispatch'), or - Kept but have the
deletestep removed/guarded
It also needs a no-op path for when there's nothing to recover so subsequent runs don't fail.
Review: Bencher RecoveryGood diagnosis of the root cause — the cascading 404 chain from That said, there are several issues that need addressing before this is safe to merge. Critical (must fix)1. The entire benchmark job was deleted — this PR removes the 2. 3. 4. Recovery job has no trigger guard — it fires on every push to Moderate5. Token in CLI args exposes secret in process list — 6. No pagination for Bencher API — 7. Download retry runs the script twice per iteration — doubles API calls per attempt (see inline comment on the workflow step). Minor8. 9. 10. The backfill logic is well thought-out. The blocking concerns are the destructive delete-all behavior running on every push, and the removal of the benchmark job itself. Happy to re-review once those are addressed. |
Summary
The benchmark workflow passed
--start-point master --start-point-hash <github.event.before>for push-to-master events. Since master IS the base branch, Bencher tried to look up a version of master at the "before" hash — which often didn't exist (e.g., docs-only commits skipped by paths-ignore). This caused a 404, the report was never stored, and subsequent pushes also failed because their "before" hash was also missing. This cascading failure meant no master data was stored after the first version (Jan 18).Fix: don't pass
--start-point*args for master pushes (thresholds are defined inline via --threshold-* args). For PRs/dispatch where the start-point hash may be missing, retry without--start-point-hashso the report still gets stored using the latest available baseline.Pull Request checklist
Fixes #2546.