Skip to content

Fix Bencher reporting permanently broken on main pushes#3146

Closed
alexeyr-ci2 wants to merge 2 commits intomainfrom
alexeyr/2546/Fix-benchmark-workflow
Closed

Fix Bencher reporting permanently broken on main pushes#3146
alexeyr-ci2 wants to merge 2 commits intomainfrom
alexeyr/2546/Fix-benchmark-workflow

Conversation

@alexeyr-ci2
Copy link
Copy Markdown
Collaborator

@alexeyr-ci2 alexeyr-ci2 commented Apr 15, 2026

Summary

The benchmark workflow passed --start-point master --start-point-hash <github.event.before> for push-to-master events. Since master IS the base branch, Bencher tried to look up a version of master at the "before" hash — which often didn't exist (e.g., docs-only commits skipped by paths-ignore). This caused a 404, the report was never stored, and subsequent pushes also failed because their "before" hash was also missing. This cascading failure meant no master data was stored after the first version (Jan 18).

Fix: don't pass --start-point* args for master pushes (thresholds are defined inline via --threshold-* args). For PRs/dispatch where the start-point hash may be missing, retry without --start-point-hash so the report still gets stored using the latest available baseline.

Pull Request checklist

  • Add/update test to cover these changes
  • Update documentation
  • Update CHANGELOG file

Fixes #2546.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0085aaa9-ebb8-4534-a8bc-0e53ac3d139d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch alexeyr/2546/Fix-benchmark-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The benchmark workflow passed --start-point master --start-point-hash
<github.event.before> for push-to-master events. Since master IS the
base branch, Bencher tried to look up a version of master at the
"before" hash — which often didn't exist (e.g., docs-only commits
skipped by paths-ignore). This caused a 404, the report was never
stored, and subsequent pushes also failed because their "before" hash
was also missing. This cascading failure meant no master data was
stored after the first version (Jan 18).

Fix: don't pass --start-point args for master pushes (thresholds are
defined inline via --threshold-* args). For PRs/dispatch where the
start-point hash may be missing, retry without --start-point-hash so
the report still gets stored using the latest available baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/2546/Fix-benchmark-workflow branch from 7b68467 to 05c653d Compare April 15, 2026 12:08
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Code Review

Overview

This PR fixes a cascading failure where benchmarks on main were permanently broken since January because --start-point-hash ${github.event.before} would 404 when the "before" commit was a docs-only change skipped by paths-ignore. The fix is correct: don't pass --start-point* args for main pushes at all.

The secondary improvement — extracting bencher run into a run_bencher() shell function — is a clear win, eliminating ~55 lines of duplication.


Issues

1. Brittle error detection for the retry (primary concern)

The retry condition matches on literal strings from Bencher's error output:

grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"

If Bencher changes its error message in a future release, the retry silently never fires — and the failure also can't be distinguished from any other non-zero exit (regressions, network errors, auth errors). Consider checking for a Bencher-specific exit code if one is documented, or at minimum logging the full stderr when the condition does NOT match so the failure cause is always visible.

2. sed leaves a double-space in RETRY_ARGS

RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')

After stripping --start-point-hash <sha> from the middle of the string, there's a leftover double space (e.g. "--start-point main --start-point-clone-thresholds"). This is harmless because the unquoted $sp_args expansion word-splits correctly, but it would be tidier to collapse extra whitespace: sed 's/--start-point-hash [^ ]*//' | tr -s ' '.

3. Unquoted $sp_args — acknowledged but worth documenting the constraint

The # shellcheck disable=SC2086 on the unquoted expansion is intentional and documented, but it silently breaks if $GITHUB_BASE_REF or $BRANCH ever contains a space. This is unlikely in practice, but a brief comment explaining why splitting is safe here ("branch names cannot contain spaces") would help future maintainers.


Positive Notes

  • Removing the old "suppress 404 and carry on" approach in favor of an actual retry is a meaningfully better strategy — it ensures data gets stored rather than silently skipped.
  • Handling the workflow_dispatch on main case (new if [ "$BRANCH" = "main" ] guard) is correct and was a gap before.
  • --start-point-reset is correctly preserved only for PRs (not dispatch), matching the previous intent.
  • The trap 'rm -f "$BENCHER_STDERR"' EXIT safety net is good defensive cleanup.
  • Comments explaining why main pushes don't need --start-point are clear and helpful.

Comment thread .github/workflows/benchmark.yml Outdated
# Bencher (e.g., the base commit was a docs-only change skipped by
# paths-ignore), retry without --start-point-hash so the report
# still gets stored using the latest available baseline.
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry condition is brittle — it depends on Bencher's exact error message text. If Bencher updates its wording (e.g. "Head Version" changes capitalisation or phrasing), the retry silently never fires and the run just fails.

Two suggestions:

  1. If Bencher exposes a documented exit code for "version not found", prefer that over text matching.
  2. Add an else branch that logs a summary of why the retry was skipped, so on any non-zero exit without the expected message the cause is still surfaced:
Suggested change
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then

Something like:

          elif [ $BENCHER_EXIT_CODE -ne 0 ]; then
            echo "ℹ️ Bencher failed (exit $BENCHER_EXIT_CODE) but error didn't match 'Head Version … not found' — not retrying"
          fi

Comment thread .github/workflows/benchmark.yml Outdated
# paths-ignore), retry without --start-point-hash so the report
# still gets stored using the latest available baseline.
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After sed strips --start-point-hash <sha> from the middle of the string, a double space is left behind (e.g. "--start-point main --start-point-clone-thresholds"). It's harmless because $sp_args is intentionally unquoted and word-splits, but collapsing the extra space keeps the debug output clean:

Suggested change
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//' | tr -s ' ')

Comment thread .github/workflows/benchmark.yml Outdated
# start-point args if the pinned hash isn't found in Bencher.
run_bencher() {
local sp_args="$1"
# shellcheck disable=SC2086
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # shellcheck disable=SC2086 is intentional — $sp_args must be word-split to pass multiple flags. Worth adding a brief comment so future maintainers understand the constraint:

Suggested change
# shellcheck disable=SC2086
local sp_args="$1"
# $sp_args is intentionally unquoted for word-splitting — branch names
# and git SHAs cannot contain spaces, so this is safe.
# shellcheck disable=SC2086

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Bencher report chunks were too large to post as PR comments.

View the full report in the job summary.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Code review posted — see inline comments for specifics.

_output, status = Open3.capture2(
"bencher", "run",
"--project", PROJECT,
"--token", token,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: API token exposed in process listing

Passing --token as a CLI argument makes the token visible to any user who runs ps aux on the same host. The bencher CLI already reads BENCHER_API_TOKEN from the environment — set it there instead and drop the flag:

Suggested change
"--token", token,
_output, status = Open3.capture2(
{ "BENCHER_API_TOKEN" => token },
"bencher", "run",
"--project", PROJECT,
"--branch", "main",

end

def ensure_cmd!(cmd)
return if system("command -v #{cmd} > /dev/null 2>&1")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: shell string interpolation in ensure_cmd!

cmd is interpolated directly into a shell string. It's only called with literal values today, but the pattern is fragile. Prefer passing arguments as an array to avoid any shell involvement:

Suggested change
return if system("command -v #{cmd} > /dev/null 2>&1")
return if system("which", cmd, out: File::NULL, err: File::NULL)

end

def fetch_bencher_hashes(token: nil)
uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: only one page fetched — will miss reports beyond 250

With per_page=250 and no pagination loop, once Bencher accumulates >250 main-branch reports this function silently returns an incomplete set. cmd_download caches the result, so an undercount causes already-submitted commits to be re-downloaded and re-pushed on subsequent runs.

Add a page cursor loop — check for a Link: <...>; rel="next" header in the response, or iterate &page=N until you get fewer than per_page results.

)
return true if expires.nil? || expires.empty?

require "time"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: require inside a method body

require is idempotent so this is harmless, but it's unconventional and easy to overlook. Move require "time" up with the other requires at the top of the file.

puts "\n[#{idx + 1}/#{to_push.size}] #{sha[0, 10]}"

benchmarks = JSON.parse(File.read(entry[:path]))
puts " #{benchmarks.size} benchmarks"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: full JSON parse just for a count

benchmarks holds the entire parsed structure but is only used for its .size. Parse inline to avoid the unnecessary allocation:

Suggested change
puts " #{benchmarks.size} benchmarks"
puts " #{JSON.parse(File.read(entry[:path])).size} benchmarks"

run: gem install parallel

- name: Install Bencher CLI
uses: bencherdev/bencher@main
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supply-chain: unpinned @main action

The main benchmark workflow pins Bencher to a specific version tag. Using @main here means any commit pushed to Bencher's default branch — including a compromised one — runs in this workflow. Pin to the same version tag (or a commit SHA) used in benchmark.yml.

ruby-version: '3.3'

- name: Install dependencies
run: gem install parallel
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: unpinned gem version

Suggested change
run: gem install parallel
run: gem install parallel --version '~> 1.26'

Without a version constraint, a future major release with breaking changes could silently break this workflow.

Comment on lines +36 to +39
ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"

# Count remaining downloadable artifacts
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: script runs twice per loop iteration

The download script is invoked once to do the actual work (line 36), then immediately invoked a second time just to parse the remaining count from its output (line 39). That second invocation re-runs all the download logic and may kick off additional parallel downloads.

Capture the output of the first invocation instead:

Suggested change
ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"
# Count remaining downloadable artifacts
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}')
OUTPUT=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1)
echo "$OUTPUT"
# Count remaining downloadable artifacts
REMAINING=$(echo "$OUTPUT" | grep "To download:" | awk '{print $NF}')

Comment thread .github/workflows/benchmark.yml Outdated
# Bencher (e.g., the base commit was a docs-only change skipped by
# paths-ignore), retry without --start-point-hash so the report
# still gets stored using the latest available baseline.
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version" "$BENCHER_STDERR" && grep -q "not found" "$BENCHER_STDERR"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile: retry triggers on exact error message text

Both "Head Version" and "not found" are literal strings from Bencher's current error output. If Bencher changes its error wording in a future release, the condition silently stops matching and the fallback never fires — the run just fails with no clear explanation.

Consider one of:

  • Match on the HTTP 404 status code string (e.g. "404") which is less likely to change
  • Add a comment here quoting the exact Bencher error message this is matching, so it's easy to spot when a Bencher upgrade breaks it
  • Assert the Bencher version in the workflow so unexpected upgrades are caught early

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/2546/Fix-benchmark-workflow branch from c95250a to ba26083 Compare April 15, 2026 13:58
# max-sample-size limits historical data to the most recent 64 runs
# to keep the baseline relevant as performance evolves.
- name: Install Bencher CLI
uses: bencherdev/bencher@main
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpinned action versionbencherdev/bencher@main floats to whatever HEAD of that repo is. A supply-chain compromise or accidental breaking change there would silently affect every run. Pin to a specific SHA or release tag, e.g.:

Suggested change
uses: bencherdev/bencher@main
uses: bencherdev/bencher@v0.5.3

(replace with whichever release is current)

FALLBACK_BODY="${FALLBACK_BODY}"$'\n'$'\n'"View the full report in the [job summary](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
gh pr comment ${{ github.event.pull_request.number }} --body "$FALLBACK_BODY" || true
fi
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double invocation per loop iteration — the download script is called once on line 122 (doing real work) and then called a second time on this line purely to read the count. That second call still hits caches, re-checks file existence, and prints noise. Capture the output of the first call instead:

Suggested change
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}')
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | tee /dev/stderr | grep "To download:" | awk '{print $NF}')

Or restructure to parse stdout from the single invocation, then break if REMAINING == 0.

@alexeyr-ci2 alexeyr-ci2 changed the title Fix Bencher reporting permanently broken on master pushes Fix Bencher reporting permanently broken on main pushes Apr 15, 2026
"--branch", "main",
"--hash", sha,
"--testbed", TESTBED,
"--adapter", "json",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secret leaks into process arguments--token token_value is visible in ps aux output and GitHub Actions log echoing. bencher reads BENCHER_API_TOKEN automatically from the environment, so drop the --token argument entirely and let the env var already set in the workflow step do the work:

Suggested change
"--adapter", "json",
def run_bencher(sha:, file:)
_output, status = Open3.capture2(
"bencher", "run",
"--project", PROJECT,
"--branch", "main",
"--hash", sha,
"--testbed", TESTBED,
"--adapter", "json",
"--file", file,
"--err"
)
status.success?
end

Update call sites accordingly: run_bencher(sha: sha, file: entry[:path]).

abort "Error: #{cmd} not found in PATH"
end

def fetch_bencher_hashes(token: nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pagination — will silently miss reports beyond 250per_page=250 is a single page. If Bencher returns more than 250 reports for main, commits beyond that limit won't appear in existing and will be needlessly re-submitted. Add a loop over pages:

def fetch_bencher_hashes(token: nil)
  hashes = []
  page = 1
  loop do
    uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&page=#{page}&direction=asc")
    req = Net::HTTP::Get.new(uri)
    req["Authorization"] = "Bearer #{token}" if token

    resp = Net::HTTP.start(uri.host, uri.port, use_ssl: true) { |http| http.request(req) }
    break unless resp.is_a?(Net::HTTPSuccess)

    batch = JSON.parse(resp.body)
    break if batch.empty?

    hashes.concat(batch.filter_map { |r| r.dig("branch", "head", "version", "hash") })
    page += 1
  end
  hashes.uniq
rescue StandardError => e
  warn "  Warning: could not fetch Bencher hashes: #{e.message}"
  []
end

end

def ensure_cmd!(cmd)
return if system("command -v #{cmd} > /dev/null 2>&1")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer Kernel#system with an array to avoid any potential shell-injection if cmd ever becomes dynamic in the future:

Suggested change
return if system("command -v #{cmd} > /dev/null 2>&1")
return if system("command", "-v", cmd, out: File::NULL, err: File::NULL)


# ── Helpers ──────────────────────────────────────────────────────────────

def load_cached(work_dir, filename)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale cache in the push phaseload_cached is used in cmd_download and saves bencher_hashes.json at download time. But cmd_push calls fetch_bencher_hashes directly (fresh, line 125), which is correct. The risk is if someone runs download first, then push after several reports have been newly added to Bencher — the stale cache in download won't reflect them. Consider adding a cache TTL (e.g., skip the cache if the file is more than an hour old) or documenting that re-running download with --force invalidates the cache.

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/2546/Fix-benchmark-workflow branch from ba26083 to c49faa6 Compare April 15, 2026 14:02
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Code Review

Overview

This PR adds a one-shot recovery workflow and script to backfill benchmark data that went missing in Bencher from ~Jan 2026 onwards due to the broken --start-point-hash argument. The root cause fix is sound (the hash pointed at commits Bencher had never seen, producing a cascading 404). The recovery mechanism is well-structured: a two-phase download to push approach with caching and parallel downloads.

Important structural question: The PR replaces the entire benchmark job in benchmark.yml with only the recover-bencher job. As written, no new benchmarks will be collected -- every push to main and every labeled PR will run the recovery script instead of actually benchmarking. All the env vars (RUN_CORE, RUN_PRO, RUBY_VERSION, etc.) are still defined in the file but are now unused, which suggests the benchmark job may have been accidentally omitted. If the recovery is meant to run only once, the workflow triggers should be reduced to workflow_dispatch only rather than firing on every push and labeled PR.


Issues

Security

  • run_bencher passes BENCHER_API_TOKEN as a --token CLI argument, visible in ps aux and CI logs. Fix: drop the arg and let bencher read BENCHER_API_TOKEN from the environment directly (it supports this natively). See inline comment.
  • bencherdev/bencher@main is an unpinned action -- a broken or compromised upstream would affect every run. Pin to a release tag or SHA.

Correctness

  • fetch_bencher_hashes fetches only one page of 250 results with no pagination. If Bencher already has more than 250 reports for main, surplus commits will not appear in existing and will be needlessly re-submitted. See inline comment.
  • The download retry loop calls the script twice per iteration (once to download, once to read the remaining count). The second invocation is redundant. See inline comment.

Minor

  • load_cached will raise on a corrupted JSON cache file rather than falling back to a fresh fetch. A rescue JSON::ParserError around the read path would make it more robust.
  • ensure_cmd! uses string interpolation in system(). The array form is safer. See inline comment.
  • gem install parallel in the workflow step has no version pin -- consider gem install parallel -v X.Y.Z for reproducibility.

Positives

  • The two-phase (download/push) design with idempotent file-based state is clean and retry-friendly.
  • Parallel downloads, with a sensible thread count (4), are a good touch.
  • sort_by_commit_date gracefully falls back if git log is not available.
  • --dry-run support makes local testing safe.
  • artifact_expired? correctly handles the case where GitHub has already purged the artifact.

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/2546/Fix-benchmark-workflow branch from c49faa6 to 9a1e275 Compare April 15, 2026 14:04
Comment on lines +277 to +284
end

def run_bencher(token:, sha:, file:)
# No --err: alerts are expected during backfill and should not block submission
_output, status = Open3.capture2(
"bencher", "run",
"--project", PROJECT,
"--token", token,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API token passed as a CLI argument — visible in process listings

Passing token via --token makes it a process argument, which is visible in ps aux / /proc/$PID/cmdline to any user on the same host. In a shared CI runner environment this is a real exposure risk.

bencher run already reads BENCHER_API_TOKEN from the environment automatically. Since the calling workflow already sets that env var, the --token arg can simply be dropped:

Suggested change
end
def run_bencher(token:, sha:, file:)
# No --err: alerts are expected during backfill and should not block submission
_output, status = Open3.capture2(
"bencher", "run",
"--project", PROJECT,
"--token", token,
def run_bencher(token:, sha:, file:)
_output, status = Open3.capture2(
{ "BENCHER_API_TOKEN" => token },
"bencher", "run",
"--project", PROJECT,
"--branch", "main",
"--hash", sha,
"--testbed", TESTBED,
"--adapter", "json",
"--file", file,
"--err"
)
status.success?
end

(Open3.capture2 accepts an env hash as the first argument — this passes the token via the child process's environment rather than its argument list.)

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Code Review

This PR fixes a real bug (cascading Bencher 404s from missing --start-point-hash baseline commits), and the recovery script is well-structured overall. A few issues need attention before this is ready to merge.


Critical: Benchmark job is entirely removed

The PR replaces the full benchmark job with only the recover-bencher job. After merge:

  • No new benchmarks will be collected on push-to-main or labeled PRs.
  • Anyone who adds the benchmark label to a PR will trigger the recovery job instead of getting actual benchmark results.
  • The workflow_dispatch inputs (routes, rate, duration, etc.) defined in lines 1–88 become dead config with no job to consume them.

Is this intentional — a temporary recovery-only workflow to be followed by a separate PR restoring the benchmark job? If so, it should be called out explicitly in the PR description and the on: triggers should be narrowed to workflow_dispatch only (to avoid spurious recovery runs on every push and labeled PR). If the benchmark job is meant to be restored in this same PR, it appears to be missing.


Security issues

  1. tmate debugging step must be removed before merge — see inline comment on benchmark.yml:145–149.
  2. API token passed as CLI argument — visible in ps aux on shared runners; see inline comment on scripts/recover-bencher-data.rb:277–284.

Bugs

  1. Double script execution per retry iterationdownload runs twice per loop (once to work, once to count remaining); see inline on benchmark.yml:123–125.
  2. fetch_bencher_hashes is not paginated — hard-capped at 250 reports, silently misses older hashes; see inline on scripts/recover-bencher-data.rb:198.
  3. load_cached goes stale — after push adds new hashes to Bencher, subsequent download calls still read the old cached set and re-queue already-pushed artifacts; see inline on scripts/recover-bencher-data.rb:182–188.

Minor notes

  • bencherdev/bencher@main (floating ref) is used for the Bencher CLI install. Pinning to a specific version or commit SHA would make runs reproducible and prevent unexpected breakage from upstream changes.
  • The ruby-version: '3.3' in Setup Ruby is unpinned. The rest of the workflow uses ${{ env.RUBY_VERSION }} (pinned to 3.3.7) — using the env var here would be consistent.
  • sort_by_commit_date calls git log --no-walk on a shallow clone (default fetch-depth: 1). Most of the SHAs being sorted are historical and won't be present in the shallow clone, so the sort will silently fall back to original order on almost every run. Either fetch with sufficient depth or accept that ordering is best-effort and document it.

fi
- name: SSH debug on failure
if: failure()
uses: mxschmitt/action-tmate@v3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging artifact — must be removed before merge.

action-tmate opens an interactive SSH tunnel into the runner on failure. Even with limit-access-to-actor: true, it blocks the job indefinitely (until a 6-hour timeout) waiting for an SSH connection and can expose any secrets mounted in the runner environment. This should never ship in a merged workflow.

Comment on lines +123 to +125
ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"

if [ -n "$START_POINT_HASH" ]; then
echo "Found merge-base via API: $START_POINT_HASH"
else
echo "⚠️ Could not find merge-base with main via GitHub API, continuing without it"
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double invocation per iteration. Line 123 runs the full download pass (which does real work — network calls, file writes), then line 125 runs it again just to parse the "To download" count from stdout. This doubles the work on every retry iteration.

A cleaner approach: capture the output from the single download call and parse the count from it, or add a --count-remaining flag to the script that's separate from the full download.

Suggested change
ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"
if [ -n "$START_POINT_HASH" ]; then
echo "Found merge-base via API: $START_POINT_HASH"
else
echo "⚠️ Could not find merge-base with main via GitHub API, continuing without it"
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}')
OUTPUT=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1)
echo "$OUTPUT"
REMAINING=$(echo "$OUTPUT" | grep "To download:" | awk '{print $NF}')

end

def fetch_bencher_hashes(token: nil)
uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pagination — silently truncates beyond 250 reports.

If the main branch already has more than 250 reports in Bencher, existing will be an incomplete set and the download phase will try to re-fetch data that is already in Bencher, wasting artifact quota and potentially creating duplicate submissions.

Consider adding a pagination loop:

def fetch_bencher_hashes(token: nil)
  hashes = []
  page = 1
  loop do
    uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&page=#{page}&direction=asc")
    req = Net::HTTP::Get.new(uri)
    req["Authorization"] = "Bearer #{token}" if token

    resp = Net::HTTP.start(uri.host, uri.port, use_ssl: true) { |http| http.request(req) }
    break unless resp.is_a?(Net::HTTPSuccess)

    page_reports = JSON.parse(resp.body)
    break if page_reports.empty?

    hashes.concat(page_reports.filter_map { |r| r.dig("branch", "head", "version", "hash") })
    page += 1
  end
  hashes.uniq
rescue StandardError => e
  warn "  Warning: could not fetch Bencher hashes: #{e.message}"
  []
end

_output, status = Open3.capture2(
"bencher", "run",
"--project", PROJECT,
"--token", token,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token passed as a CLI argument leaks into the process table.

--token <value> is visible to other processes on the system via ps aux for the duration of the bencher run call. The Bencher CLI also accepts the token via the BENCHER_API_TOKEN environment variable, which is the safer approach:

Suggested change
"--token", token,
_output, status = Open3.capture2(
{ "BENCHER_API_TOKEN" => token },
"bencher", "run",
"--project", PROJECT,
"--branch", "main",
"--hash", sha,
"--testbed", TESTBED,
"--adapter", "json",
"--file", file
)

end

def ensure_cmd!(cmd)
return if system("command -v #{cmd} > /dev/null 2>&1")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: cmd is always called with a literal string ("gh" or "bencher"), so the string interpolation is harmless today, but using system with a shell-interpolated string is a code-smell that could become a real injection vector if the call site changes. Prefer the multi-arg form:

Suggested change
return if system("command -v #{cmd} > /dev/null 2>&1")
def ensure_cmd!(cmd)
return if system("command", "-v", cmd, out: File::NULL, err: File::NULL)
abort "Error: #{cmd} not found in PATH"
end


def cmd_download(work_dir)
ensure_cmd!("gh")
artifacts_dir = File.join(work_dir, "artifacts")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale cache risk in the download phase.

bencher_hashes.json is written once and never invalidated. If download is run multiple times (as the retry loop in the workflow does), the cached hash list grows stale — pushes that succeed between runs won't be reflected, and already-uploaded commits will be re-queued for download.

The push phase correctly bypasses the cache and calls fetch_bencher_hashes fresh each time. Consider doing the same in download, or at minimum deleting the cache file at the start of each download run so it is always refreshed:

existing = Set.new(fetch_bencher_hashes)

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Code Review

Overview

This draft PR has two parts:

  1. A one-time recovery script (scripts/recover-bencher-data.rb) to backfill the ~3 months of missing Bencher data caused by the broken --start-point-hash flow.
  2. A stripped-down benchmark.yml that replaces the entire benchmark job with just the recovery job.

The script itself is well-structured and the root-cause analysis in the PR description is sound. However, several issues need to be addressed before this can merge, and the PR description promises a fix (not passing --start-point* for main pushes, retrying without the hash for PRs) that does not appear to exist in the diff.


Critical Issues

1. The actual benchmark job is gone — no benchmarks run in this state.
The PR removes the entire benchmark: job and replaces it with recover-bencher:. This is presumably intentional (run recovery first, then restore benchmarks), but it means no performance data will be collected for any push or PR while this is merged. The PR description should clarify whether this is meant to be a temporary state or if the fixed benchmark job is coming in a follow-up.

2. mxschmitt/action-tmate@v3 is left in the workflow (see inline comment on line 147).
This is a debugging artifact that blocks the job for up to 6 hours on failure and exposes runner secrets. It must be removed before merge.

3. The described fix is missing.
The summary says "Fix: don't pass --start-point* args for master pushes" and "For PRs/dispatch where the start-point hash may be missing, retry without --start-point-hash". Neither change appears in the diff. If the plan is to land the recovery script first and the actual benchmark.yml fix in a follow-up, that should be stated explicitly.


Workflow Issues

4. Recovery job runs on the same triggers as the benchmark workflow.
The on: triggers (push to main, benchmark-labelled PRs, workflow_dispatch) are preserved. A recovery job that hits the GitHub artifact API hundreds of times and re-uploads to Bencher should run exactly once via workflow_dispatch, not on every push or benchmark-labelled PR. Consider restricting to workflow_dispatch only, or guarding with a dedicated label.

5. Double invocation per retry iteration (see inline comment on lines 123–125).
Each loop iteration runs recover-bencher-data.rb download twice — once for the actual work and once to parse the remaining count. The suggestion is to capture stdout from the single run.


Script Issues

6. fetch_bencher_hashes paginates only the first 250 reports (see inline comment on line 201).
If there are already more than 250 reports in Bencher (there may be from before January), the existing set will be incomplete and the script will attempt to re-download/re-push data that is already there.

7. Token passed as a CLI argument (see inline comment on line 284).
--token <value> is visible in ps aux for the lifetime of the subprocess. Use the BENCHER_API_TOKEN env var instead, which Open3.capture2 supports as a first argument hash.

8. Stale cache in the download phase (see inline comment on line 64).
bencher_hashes.json is written once and never refreshed. When the retry loop re-runs download, the cached hash list is already stale. Either skip the cache for this particular file or delete it at the start of each run.


Minor

  • rubocop:disable Metrics at the top of the file disables all Metrics cops globally. More targeted per-method disables (# rubocop:disable Metrics/MethodLength) would be less blunt.
  • fetch_successful_runs can produce duplicate SHAs (multiple successful runs for the same commit) — downstream code handles it gracefully, but deduplicating early (runs.uniq { |r| r[:sha] }) would make intent clearer.
  • The hardcoded ruby-version: '3.3' in the workflow step (line 104) diverges from the RUBY_VERSION env var used elsewhere in the file. Consider using the same env var for consistency.

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/2546/Fix-benchmark-workflow branch from 74b2755 to ab43744 Compare April 15, 2026 14:26
Comment thread scripts/recover-bencher-data.rb Outdated
t && t >= "2026-04-15T14:04:00" && t <= "2026-04-15T14:06:00"
end

puts " #{reports.size} total reports, #{bad_reports.size} from today (#{today})"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: NameErrortoday is undefined

today is referenced here but never defined anywhere in the script. This will raise a NameError whenever cmd_delete is called with matching reports.

Suggested change
puts " #{reports.size} total reports, #{bad_reports.size} from today (#{today})"
puts " #{reports.size} total reports, #{bad_reports.size} in the target window (2026-04-15T14:04–14:06 UTC)"


def ensure_cmd!(cmd)
return if system("command -v #{cmd} > /dev/null 2>&1")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: API token exposed as a CLI argument

Passing --token as a command-line argument makes the token visible in process listings (ps aux) and in the bencher CLI's own output/error messages. Prefer the BENCHER_API_TOKEN environment variable, which bencher already supports:

Suggested change
_output, status = Open3.capture2({"BENCHER_API_TOKEN" => token}, *args)

And remove "--token", token, from args.

end

def fetch_bencher_hashes(token: nil)
uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing pagination — silently truncates at 250 reports

If main has more than 250 benchmark reports in Bencher (plausible over time), this will miss older ones and treat them as "not in Bencher," potentially causing duplicate submissions. cmd_delete (line 201) has the same issue.

Consider looping with an offset / page parameter until the API returns fewer results than per_page, or using --paginate via the gh CLI instead of a direct Net::HTTP call.

for attempt in 1 2 3 4 5; do
echo ""
echo "=== Download attempt $attempt ==="
ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double invocation of the download script per retry loop iteration

Lines 129 and 131 both invoke the download script. Line 129 actually downloads artifacts; line 131 re-runs the same script a second time just to parse "To download:" count from its output. This means each iteration performs two full download passes (including API calls and parallel downloads), and the count on line 131 reflects the state after a second download attempt, not after the one on line 129.

Consider having the script emit a machine-readable exit code or a dedicated count command so you don't need to run it twice:

ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"
REMAINING=$(ruby scripts/recover-bencher-data.rb count --work-dir="$WORK_DIR")

Or capture the output of the first invocation and parse it.

Comment on lines +151 to +158
- name: SSH debug on failure
if: failure()
uses: mxschmitt/action-tmate@v3
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
LABEL="performance-regression"
COMMIT_SHA="${{ github.sha }}"
COMMIT_SHORT="${COMMIT_SHA:0:7}"
COMMIT_URL="${{ github.server_url }}/${{ github.repository }}/commit/${COMMIT_SHA}"
RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
BENCHER_URL="https://bencher.dev/perf/react-on-rails-t8a9ncxo"
ACTOR="${{ github.actor }}"

# Ensure the label exists (idempotent)
gh label create "$LABEL" \
--description "Automated: benchmark regression detected on main" \
--color "D93F0B" \
--force 2>/dev/null || true

# Check for an existing open issue to avoid duplicates
EXISTING_ISSUE=$(gh issue list \
--label "$LABEL" \
--state open \
--limit 1 \
--json number \
--jq '.[0].number // empty')

# Build the benchmark summary snippet (defensive: don't let column failure block alerting)
SUMMARY=""
if [ -f bench_results/summary.txt ]; then
FORMATTED=$(column -t -s $'\t' "bench_results/summary.txt" 2>/dev/null) || FORMATTED=$(cat "bench_results/summary.txt")
SUMMARY=$(printf '\n### Benchmark Summary\n\n```\n%s\n```' "$FORMATTED")
fi

if [ -n "$EXISTING_ISSUE" ]; then
echo "Open regression issue already exists: #${EXISTING_ISSUE} — adding comment"

if ! gh issue comment "$EXISTING_ISSUE" --body "$(cat <<EOF
## New regression detected

**Commit:** [\`${COMMIT_SHORT}\`](${COMMIT_URL}) by @${ACTOR}
**Workflow run:** [Run #${{ github.run_number }}](${RUN_URL})
${SUMMARY}

> View the full Bencher report in the [workflow run summary](${RUN_URL}) or on the [Bencher dashboard](${BENCHER_URL}).
EOF
)"; then
echo "::warning::Failed to comment on regression issue #${EXISTING_ISSUE}"
fi
else
echo "No open regression issue found — creating one"

if ! gh issue create \
--title "Performance Regression Detected on main (${COMMIT_SHORT})" \
--label "$LABEL" \
--body "$(cat <<EOF
## Performance Regression Detected on main

A statistically significant performance regression was detected by
[Bencher](${BENCHER_URL}) using a Student's t-test (95% confidence
interval, up to 64 sample history).

| Detail | Value |
|--------|-------|
| **Commit** | [\`${COMMIT_SHORT}\`](${COMMIT_URL}) |
| **Pushed by** | @${ACTOR} |
| **Workflow run** | [Run #${{ github.run_number }}](${RUN_URL}) |
| **Bencher dashboard** | [View history](${BENCHER_URL}) |
${SUMMARY}

### What to do

1. Check the [workflow run](${RUN_URL}) for the full Bencher HTML report
2. Review the [Bencher dashboard](${BENCHER_URL}) to see which metrics regressed
3. Investigate the commit — expected trade-off or unintended regression?
4. If unintended, open a fix PR and reference this issue
5. Close this issue once resolved — subsequent regressions will open a new one

---
*This issue was created automatically by the benchmark CI workflow.*
EOF
)"; then
echo "::warning::Failed to create regression issue — check GitHub API permissions"
fi
fi

# ============================================
# STEP 7c: FAIL WORKFLOW ON MAIN REGRESSION
# ============================================
# Only fail on main — PR benchmarks are informational (triggered by 'benchmark' label).
# Regressions on PRs are surfaced via Bencher report comments, not workflow failures.
- name: Fail workflow if Bencher detected regression on main
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
run: |
echo "Bencher detected a regression (exit code: ${BENCHER_EXIT_CODE:-1})"
exit "${BENCHER_EXIT_CODE:-1}"

# ============================================
# STEP 8: WORKFLOW COMPLETION
# ============================================
- name: Workflow summary
if: always()
run: |
echo "📋 Benchmark Workflow Summary"
echo "===================================="
echo "Status: ${{ job.status }}"
echo "Run number: ${{ github.run_number }}"
echo "Triggered by: ${{ github.actor }}"
echo "Branch: ${{ github.ref_name }}"
echo "Run Core: ${{ env.RUN_CORE || 'false' }}"
echo "Run Pro Rails: ${{ env.RUN_PRO_RAILS || 'false' }}"
echo "Run Pro Node Renderer: ${{ env.RUN_PRO_NODE_RENDERER || 'false' }}"
echo ""
if [ "${{ job.status }}" == "success" ]; then
echo "✅ All steps completed successfully"
else
echo "❌ Workflow encountered errors - check logs above"
fi
BENCHER_API_TOKEN: ${{ secrets.BENCHER_API_TOKEN }}
with:
limit-access-to-actor: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: tmate debug step should be removed before merge

This opens an interactive SSH tunnel on failure, giving shell access to the runner — which has BENCHER_API_TOKEN and GH_TOKEN in its environment. Even with limit-access-to-actor: true, this is a significant attack surface: the actor's GitHub account becoming compromised would directly expose both secrets.

Also, GH_TOKEN and BENCHER_API_TOKEN don't need to be env vars on this step at all (tmate doesn't use them), so they'd be unnecessarily exposed to the tmate session.

This is acceptable for local debugging during development of the script, but should not be merged into a production workflow.

data
end

def ensure_cmd!(cmd)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: command injection risk in ensure_cmd!

system("command -v #{cmd} > /dev/null 2>&1")

cmd is always a hardcoded literal at the two call sites ("gh", "bencher"), so this is currently safe. But the pattern is fragile — if a future caller passes user-controlled input, this becomes a shell injection vector. Prefer the array form to avoid shell interpretation entirely:

Suggested change
def ensure_cmd!(cmd)
def ensure_cmd!(cmd)
return if system("command", "-v", cmd, out: File::NULL, err: File::NULL)
abort "Error: #{cmd} not found in PATH"
end

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Code Review

Overview

This PR has two parts:

  1. scripts/recover-bencher-data.rb - A new Ruby script to backfill ~3 months of missing Bencher data by downloading GitHub Actions artifacts and re-submitting them.
  2. .github/workflows/benchmark.yml - The entire benchmark workflow is replaced with a single one-time recover-bencher job that runs this script.

Critical: The benchmark workflow is fully disabled by this PR

The actual benchmarking logic (detect-changes, server startup, Vegeta/k6 runs, Bencher reporting, regression alerting, PR comments) is entirely removed and replaced with a data-recovery job. After merge, no benchmarks will run on push-to-main, PRs, or workflow_dispatch.

I assume the intent is either:

  • This is a temporary branch to run recovery once, then restore the original workflow; or
  • The original workflow will be restored in a follow-up PR with the --start-point* fix applied.

Either way, the PR description does not call this out, and the checklist items are all unchecked. Please clarify the intent before merge.


Bugs

  • NameError: today is undefined in cmd_delete (line 214 of the script). See inline comment.
  • Double download per retry iteration in the workflow retry loop (lines 129-131 of benchmark.yml). The script is invoked twice per loop iteration - once to actually download, then again just to parse the count. See inline comment.

Security

  • tmate SSH debug step opens a shell to the runner on failure, with BENCHER_API_TOKEN and GH_TOKEN in the environment. Must be removed before merge. See inline comment.
  • API token passed as --token CLI argument in run_bencher makes the token visible in ps aux. Pass it via BENCHER_API_TOKEN env var instead (bencher supports this natively). See inline comment.

Data integrity

  • fetch_bencher_hashes and cmd_delete do not paginate - both cap at 250 reports. Once the project accumulates more than 250 reports, older ones will be missed and potentially re-submitted as duplicates.
  • The cmd_delete timestamp window is hardcoded to a two-minute window on 2026-04-15. Fine for the one-time use case, but it will silently do nothing if the workflow is re-run after those reports are already deleted.
  • load_cached has no TTL - a stale cache from a previous partial run will be reused indefinitely. The bencher_hashes.json cache is risky: if download runs, then more reports are pushed to Bencher externally, then push runs with the stale cache, already-pushed reports could be re-submitted.

Minor

  • ensure_cmd! uses string interpolation in system() - safe with current hardcoded call sites, but the array form is better practice. See inline comment.
  • The script header comment only lists json/open-uri as requirements but also needs parallel and ruby-progressbar (non-stdlib). The workflow installs them, but the comment should be updated.

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/2546/Fix-benchmark-workflow branch from ab43744 to 4997869 Compare April 15, 2026 14:53
@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/2546/Fix-benchmark-workflow branch from 4997869 to ee4cf67 Compare April 15, 2026 15:55
Comment on lines +156 to +163
- name: SSH debug on failure
if: failure()
uses: mxschmitt/action-tmate@v3
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
LABEL="performance-regression"
COMMIT_SHA="${{ github.sha }}"
COMMIT_SHORT="${COMMIT_SHA:0:7}"
COMMIT_URL="${{ github.server_url }}/${{ github.repository }}/commit/${COMMIT_SHA}"
RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
BENCHER_URL="https://bencher.dev/perf/react-on-rails-t8a9ncxo"
ACTOR="${{ github.actor }}"

# Ensure the label exists (idempotent)
gh label create "$LABEL" \
--description "Automated: benchmark regression detected on main" \
--color "D93F0B" \
--force 2>/dev/null || true

# Check for an existing open issue to avoid duplicates
EXISTING_ISSUE=$(gh issue list \
--label "$LABEL" \
--state open \
--limit 1 \
--json number \
--jq '.[0].number // empty')

# Build the benchmark summary snippet (defensive: don't let column failure block alerting)
SUMMARY=""
if [ -f bench_results/summary.txt ]; then
FORMATTED=$(column -t -s $'\t' "bench_results/summary.txt" 2>/dev/null) || FORMATTED=$(cat "bench_results/summary.txt")
SUMMARY=$(printf '\n### Benchmark Summary\n\n```\n%s\n```' "$FORMATTED")
fi

if [ -n "$EXISTING_ISSUE" ]; then
echo "Open regression issue already exists: #${EXISTING_ISSUE} — adding comment"

if ! gh issue comment "$EXISTING_ISSUE" --body "$(cat <<EOF
## New regression detected

**Commit:** [\`${COMMIT_SHORT}\`](${COMMIT_URL}) by @${ACTOR}
**Workflow run:** [Run #${{ github.run_number }}](${RUN_URL})
${SUMMARY}

> View the full Bencher report in the [workflow run summary](${RUN_URL}) or on the [Bencher dashboard](${BENCHER_URL}).
EOF
)"; then
echo "::warning::Failed to comment on regression issue #${EXISTING_ISSUE}"
fi
else
echo "No open regression issue found — creating one"

if ! gh issue create \
--title "Performance Regression Detected on main (${COMMIT_SHORT})" \
--label "$LABEL" \
--body "$(cat <<EOF
## Performance Regression Detected on main

A statistically significant performance regression was detected by
[Bencher](${BENCHER_URL}) using a Student's t-test (95% confidence
interval, up to 64 sample history).

| Detail | Value |
|--------|-------|
| **Commit** | [\`${COMMIT_SHORT}\`](${COMMIT_URL}) |
| **Pushed by** | @${ACTOR} |
| **Workflow run** | [Run #${{ github.run_number }}](${RUN_URL}) |
| **Bencher dashboard** | [View history](${BENCHER_URL}) |
${SUMMARY}

### What to do

1. Check the [workflow run](${RUN_URL}) for the full Bencher HTML report
2. Review the [Bencher dashboard](${BENCHER_URL}) to see which metrics regressed
3. Investigate the commit — expected trade-off or unintended regression?
4. If unintended, open a fix PR and reference this issue
5. Close this issue once resolved — subsequent regressions will open a new one

---
*This issue was created automatically by the benchmark CI workflow.*
EOF
)"; then
echo "::warning::Failed to create regression issue — check GitHub API permissions"
fi
fi

# ============================================
# STEP 7c: FAIL WORKFLOW ON MAIN REGRESSION
# ============================================
# Only fail on main — PR benchmarks are informational (triggered by 'benchmark' label).
# Regressions on PRs are surfaced via Bencher report comments, not workflow failures.
- name: Fail workflow if Bencher detected regression on main
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
run: |
echo "Bencher detected a regression (exit code: ${BENCHER_EXIT_CODE:-1})"
exit "${BENCHER_EXIT_CODE:-1}"

# ============================================
# STEP 8: WORKFLOW COMPLETION
# ============================================
- name: Workflow summary
if: always()
run: |
echo "📋 Benchmark Workflow Summary"
echo "===================================="
echo "Status: ${{ job.status }}"
echo "Run number: ${{ github.run_number }}"
echo "Triggered by: ${{ github.actor }}"
echo "Branch: ${{ github.ref_name }}"
echo "Run Core: ${{ env.RUN_CORE || 'false' }}"
echo "Run Pro Rails: ${{ env.RUN_PRO_RAILS || 'false' }}"
echo "Run Pro Node Renderer: ${{ env.RUN_PRO_NODE_RENDERER || 'false' }}"
echo ""
if [ "${{ job.status }}" == "success" ]; then
echo "✅ All steps completed successfully"
else
echo "❌ Workflow encountered errors - check logs above"
fi
BENCHER_API_TOKEN: ${{ secrets.BENCHER_API_TOKEN }}
with:
limit-access-to-actor: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Remove tmate debug step before merging

action-tmate opens a live SSH session into the runner on failure. Even with limit-access-to-actor: true, the environment at that point contains both BENCHER_API_TOKEN and GH_TOKEN — any actor with repo access (and commit push rights) would gain shell access with those credentials.

This is fine for a personal debugging session on a draft branch, but must be removed before this job runs in production (i.e., on any push to main).


reports = JSON.parse(resp.body)
# Delete all reports so we can resubmit in correct chronological order
bad_reports = reports
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destructive: bad_reports = reports deletes ALL Bencher history on every run

The comment says "delete today's bad backfill reports" but the code assigns reports (all reports) to bad_reports with no filtering. Coupled with the workflow running on every push to main, this means every push will wipe all Bencher history — the opposite of the recovery goal.

At minimum, filter by creation date so only "today's" bad backfill window is targeted:

# Example: only delete reports created today
today = Time.now.utc.strftime("%Y-%m-%d")
bad_reports = reports.select { |r| r["start_time"]&.start_with?(today) }

Or better, guard the entire recover-bencher job so it only runs once (e.g., via workflow_dispatch only, or a one-time flag).

end

def fetch_bencher_hashes(token: nil)
uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&direction=asc")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing pagination — only the first 250 reports are checked

per_page=250 is the cap but there's no page-walking loop. Once there are >250 Bencher reports the function silently under-reports existing hashes, causing those commits to be submitted again as duplicates.

cmd_delete has the same issue at line 201.

Consider a pagination loop:

def fetch_bencher_hashes(token: nil)
  hashes = []
  page = 1
  loop do
    uri = URI("#{BENCHER_API}/projects/#{PROJECT}/reports?branch=main&per_page=250&page=#{page}&direction=asc")
    # ... (same request setup)
    reports = JSON.parse(resp.body)
    break if reports.empty?
    hashes.concat(reports.filter_map { |r| r.dig("branch", "head", "version", "hash") })
    break if reports.size < 250
    page += 1
  end
  hashes.uniq
end

"bencher", "run",
"--project", PROJECT,
"--token", token,
"--branch", "main",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token exposed in process list

Passing --token token_value as a positional CLI argument means the secret is visible in ps aux output for the duration of the bencher run invocation. Other processes running on the same GitHub Actions runner worker could read it.

Prefer injecting the token via an environment variable that bencher already supports:

Suggested change
"--branch", "main",
_output, status = Open3.capture2({"BENCHER_API_TOKEN" => token}, *args)

And remove "--token", token, from the args array.

Comment on lines +129 to +131
ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR"

# Wait for port 3001 to be free
echo "⏳ Waiting for port 3001 to be free..."
for _ in {1..10}; do
if ! lsof -ti:3001 > /dev/null 2>&1; then
echo "✅ Port 3001 is now free"
exit 0
REMAINING=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1 | grep "To download:" | awk '{print $NF}')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download script runs twice per retry iteration

Line 129 does the actual download pass, then line 131 runs the script again just to parse the "To download:" count from its output. This doubles API calls and GH artifact download attempts per iteration.

The script should output the remaining count as part of its normal run (or write it to a file), so the second invocation isn't needed. Alternatively, parse the count from the first run's output:

            OUTPUT=$(ruby scripts/recover-bencher-data.rb download --work-dir="$WORK_DIR" 2>&1)
            echo "$OUTPUT"
            REMAINING=$(echo "$OUTPUT" | grep "To download:" | awk '{print $NF}')

Comment on lines +359 to +379
def sort_by_commit_date(entries)
# Try to sort by commit date using git log
sha_to_entry = entries.to_h { |e| [e[:sha], e] }
shas = entries.map { |e| e[:sha] }

# git log can sort multiple commits chronologically
output, status = Open3.capture2(
"git", "log", "--format=%H", "--no-walk", "--date-order", *shas
)

if status.success?
sorted_shas = output.lines.map(&:strip).select { |s| sha_to_entry.key?(s) }
# Reverse because git log shows newest first
sorted_shas.reverse.map { |s| sha_to_entry[s] }
else
# Fall back to original order
entries
end
rescue StandardError
entries
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code — sort_by_commit_date is never called

This function was presumably superseded by to_push.sort_by! { |e| e[:created] || "" } on line 156. It can be removed to avoid confusion.

# STEP 4: RUN CORE BENCHMARKS
# ============================================
- name: Install Bencher CLI
uses: bencherdev/bencher@main
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin bencherdev/bencher to a specific version tag

@main is a mutable ref — a breaking change upstream will silently break this job on the next run. Pin to a specific release tag (e.g. bencherdev/bencher@v0.4.x) to get reproducible behaviour.

Same concern applies to actions/checkout@v4 which is already pinned — keep that pattern consistent.

Comment on lines 92 to +93
jobs:
detect-changes:
permissions:
contents: read
actions: read
runs-on: ubuntu-22.04
outputs:
docs_only: ${{ steps.detect.outputs.docs_only }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 50
persist-credentials: false
- name: Detect relevant changes
id: detect
run: |
BASE_REF="${{ github.event.pull_request.base.sha || github.event.before || 'origin/main' }}"
script/ci-changes-detector "$BASE_REF"
shell: bash
- name: Guard docs-only main pushes
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: ./.github/actions/ensure-main-docs-safety
with:
docs-only: ${{ steps.detect.outputs.docs_only }}
previous-sha: ${{ github.event.before }}

benchmark:
needs: detect-changes
# Run on: push to main, workflow_dispatch, or PRs with 'benchmark' label.
# The 'full-ci' label is intentionally excluded — it controls test workflows,
# not benchmarks. Use the dedicated 'benchmark' label to trigger perf runs on PRs.
# See https://bencher.dev/docs/how-to/github-actions/#pull-requests for the extra pull_request condition
# Skip docs-only pushes to main to avoid wasting CI resources on non-code changes.
if: |
!(
github.event_name == 'push' &&
github.ref == 'refs/heads/main' &&
needs.detect-changes.outputs.docs_only == 'true'
) && (
github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
(
github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository &&
contains(github.event.pull_request.labels.*.name, 'benchmark')
)
)
recover-bencher:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recovery job has no trigger guard — will run on every push/PR forever

The recover-bencher job runs unconditionally on every push to main, every labeled PR, and every workflow_dispatch. Once the one-time backfill is done, subsequent runs will:

  1. Call delete → wipe all Bencher history (see separate comment on cmd_delete)
  2. Try to download artifacts (most already pushed, nothing to do)
  3. Open a tmate SSH session on any failure

This job should either be:

  • Removed from this workflow once the backfill is complete, or
  • Guarded to only run via workflow_dispatch (if: github.event_name == 'workflow_dispatch'), or
  • Kept but have the delete step removed/guarded

It also needs a no-op path for when there's nothing to recover so subsequent runs don't fail.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Review: Bencher Recovery

Good diagnosis of the root cause — the cascading 404 chain from --start-point-hash pointing at unbenchmarked docs-only commits is well-explained and the recovery approach (download artifacts → delete stale reports → re-push in chronological order) is sound. The Ruby script itself is well-structured for a one-time recovery tool.

That said, there are several issues that need addressing before this is safe to merge.

Critical (must fix)

1. The entire benchmark job was deleted — this PR removes the detect-changes + benchmark jobs and replaces them with only the recovery job. After the backfill, no benchmarks will run on future pushes to main or labeled PRs. Is there a follow-up PR planned to restore a fixed benchmark job? This should be clarified, and ideally the fixed job should be in the same PR.

2. cmd_delete deletes ALL Bencher reports, not just bad onesbad_reports = reports with no filter means the delete step wipes the entire main-branch history on every invocation. Since the workflow runs on every push to main, it will repeatedly wipe and re-backfill (see inline comment on line 209 of the script).

3. action-tmate must be removed — opening a live SSH session in a job that holds BENCHER_API_TOKEN and GH_TOKEN is a meaningful credential exposure risk. Fine for local debugging on a feature branch; not safe in a workflow that merges to main (see inline comment on the step).

4. Recovery job has no trigger guard — it fires on every push to main forever, not just once. At minimum gate it to workflow_dispatch only, or remove it from the workflow file once the backfill is complete (see inline comment on the job).

Moderate

5. Token in CLI args exposes secret in process list--token <value> in run_bencher is visible to ps aux. Pass via env var instead (see inline comment on line 342 of the script).

6. No pagination for Bencher APIper_page=250 with no page loop means >250 reports will cause missed hashes and duplicate submissions (see inline comment on line 259 of the script).

7. Download retry runs the script twice per iteration — doubles API calls per attempt (see inline comment on the workflow step).

Minor

8. sort_by_commit_date is dead code — superseded by the timestamp-based sort but never removed (inline comment on lines 359–379).

9. bencherdev/bencher@main is mutable — pin to a version tag for reproducibility (inline comment on line 110).

10. ruby-version: '3.3' vs env.RUBY_VERSION: '3.3.7' — the recovery job uses a different (floating) Ruby version from the env var defined for the rest of the workflow. Use ${{ env.RUBY_VERSION }} for consistency.


The backfill logic is well thought-out. The blocking concerns are the destructive delete-all behavior running on every push, and the removal of the benchmark job itself. Happy to re-review once those are addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix benchmark workflow

2 participants