Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 88 additions & 74 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -553,104 +553,118 @@ jobs:
BOUNDARY=0.95
MAX_SAMPLE=64

# Set branch and start-point based on event type
# Set branch and start-point based on event type.
# Main pushes don't need --start-point because main IS the base
# branch — new reports compare against main's own history.
# Feature branches (PRs, dispatch) use --start-point to inherit
# historical data and thresholds from main.
if [ "${{ github.event_name }}" = "push" ]; then
BRANCH="main"
START_POINT="main"
START_POINT_HASH="${{ github.event.before }}"
EXTRA_ARGS=""
START_POINT_ARGS=""
elif [ "${{ github.event_name }}" = "pull_request" ]; then
BRANCH="$GITHUB_HEAD_REF"
START_POINT="$GITHUB_BASE_REF"
START_POINT_HASH="${{ github.event.pull_request.base.sha }}"
EXTRA_ARGS="--start-point-reset"
START_POINT_ARGS="--start-point $GITHUB_BASE_REF --start-point-hash ${{ github.event.pull_request.base.sha }} --start-point-clone-thresholds --start-point-reset"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The START_POINT_ARGS string for PR events embeds $GITHUB_BASE_REF and ${{ github.event.pull_request.base.sha }} directly into a space-delimited arg string. This is fine for typical branch names, but GITHUB_BASE_REF values like feature/my branch (with a space) would corrupt the arg list when word-split inside run_bencher. The same applies to the workflow_dispatch path with $BRANCH.

In practice, GitHub branch names almost never contain spaces and the SHA is always hex, so this is a very low risk. Just worth documenting.

Suggested change
START_POINT_ARGS="--start-point $GITHUB_BASE_REF --start-point-hash ${{ github.event.pull_request.base.sha }} --start-point-clone-thresholds --start-point-reset"
START_POINT_ARGS="--start-point $GITHUB_BASE_REF --start-point-hash ${{ github.event.pull_request.base.sha }} --start-point-clone-thresholds --start-point-reset"

(No code change needed unless you want to add a comment noting the branch-name assumption.)

elif [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
# Get merge-base from GitHub API (avoids needing deep fetch)
# See: https://stackoverflow.com/a/74710919
BRANCH="${{ github.ref_name }}"
START_POINT="main"
START_POINT_HASH=$(gh api "repos/${{ github.repository }}/compare/main...$BRANCH" --jq '.merge_base_commit.sha' || true)

if [ -n "$START_POINT_HASH" ]; then
echo "Found merge-base via API: $START_POINT_HASH"
if [ "$BRANCH" = "main" ]; then
START_POINT_ARGS=""
else
echo "⚠️ Could not find merge-base with main via GitHub API, continuing without it"
# Get merge-base from GitHub API (avoids needing deep fetch)
# See: https://stackoverflow.com/a/74710919
START_POINT_HASH=$(gh api "repos/${{ github.repository }}/compare/main...$BRANCH" --jq '.merge_base_commit.sha' || true)
if [ -n "$START_POINT_HASH" ]; then
echo "Found merge-base via API: $START_POINT_HASH"
START_POINT_ARGS="--start-point main --start-point-hash $START_POINT_HASH --start-point-clone-thresholds --start-point-reset"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a silent behavior change from the old code: the dispatch case previously had EXTRA_ARGS="" (no --start-point-reset), so manually-dispatched runs on a branch that already existed in Bencher would append to its history rather than resetting it. The new code adds --start-point-reset here (and on line 580 for the no-hash path), so now Bencher will reset the branch to main's state on every manual dispatch.

This is probably the correct behavior (consistent with how PRs work), but a one-line comment would make the intentionality clear:

Suggested change
START_POINT_ARGS="--start-point main --start-point-hash $START_POINT_HASH --start-point-clone-thresholds --start-point-reset"
START_POINT_ARGS="--start-point main --start-point-hash $START_POINT_HASH --start-point-clone-thresholds --start-point-reset" # reset so repeated dispatches don't accumulate stale history

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior change from old code: the previous dispatch case used EXTRA_ARGS="", so --start-point-reset was never passed for workflow_dispatch runs. This now resets the Bencher branch state to main on every dispatch run. For a branch benchmarked multiple times via dispatch to build up history, that history gets wiped each run.

Is this intentional? If so, worth a brief comment here explaining why dispatch should match PR behavior.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior change from old code: The old dispatch path had EXTRA_ARGS=""--start-point-reset was never passed for dispatch events. Adding it here means every workflow_dispatch run on a non-main branch will reset that branch's Bencher history to main's latest state on each invocation.

If the intent is to always start fresh from main for dispatch runs (consistent with PR behavior), this is correct and intentional. But if someone uses workflow_dispatch to benchmark a long-running branch iteratively, this will wipe their accumulated data each time. Worth a comment either way — e.g.:

Suggested change
START_POINT_ARGS="--start-point main --start-point-hash $START_POINT_HASH --start-point-clone-thresholds --start-point-reset"
START_POINT_ARGS="--start-point main --start-point-hash $START_POINT_HASH --start-point-clone-thresholds --start-point-reset"

(No code change needed if intentional — just add a comment confirming --start-point-reset is deliberate for dispatch.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior change for workflow_dispatch: The old code used EXTRA_ARGS="" for workflow_dispatch — no --start-point-reset. The new code adds --start-point-reset here and on line 580. This aligns workflow_dispatch with PR behavior, which is likely correct, but it isn't mentioned in the PR description. Could you confirm this is intentional?

else
echo "⚠️ Could not find merge-base with main via GitHub API, continuing without hash"
START_POINT_ARGS="--start-point main --start-point-clone-thresholds --start-point-reset"
Comment on lines +577 to +580
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential behavior change from original code: The original workflow_dispatch path had EXTRA_ARGS="", so --start-point-reset was never passed for dispatch events. Both new sub-cases here now include --start-point-reset, which tells Bencher to reset the branch's accumulated history to match main on every run. For a feature branch benchmarked multiple times via dispatch to build up a trending baseline, this would wipe it each time. Was this intentional? If not, --start-point-reset should be removed from these two lines.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intentional and not worth commenting every flag.

fi
fi
EXTRA_ARGS=""
else
echo "❌ ERROR: Unexpected event type: ${{ github.event_name }}"
exit 1
fi

# Run bencher and capture HTML output (stdout) while letting stderr go to a file
# so we can distinguish missing baselines (404) from actual regression alerts.
# Use set +e to capture exit code without failing immediately.
# Wrap bencher run in a function so we can retry with different
# start-point args if the pinned hash isn't found in Bencher.
run_bencher() {
local sp_args="$1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # shellcheck disable=SC2086 on line 592 is correct — word-splitting $sp_args is intentional so the multi-word arg string is split into individual flags. It's worth adding a brief inline note so the next reader understands why it must be unquoted:

Suggested change
local sp_args="$1"
local sp_args="$1"
# $sp_args must be unquoted so the shell word-splits the multi-flag
# string into individual arguments (e.g. "--start-point main --start-point-hash abc123").
# shellcheck disable=SC2086

# Intentional word-splitting: sp_args contains multiple flags
# (e.g. "--start-point main --start-point-hash abc123") that must
# be split into separate argv entries.
# shellcheck disable=SC2086
bencher run \
--project react-on-rails-t8a9ncxo \
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch "$BRANCH" \
$sp_args \
--testbed github-actions \
Comment on lines +590 to +601
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use an array instead of intentional word-splitting. The shellcheck disable=SC2086 escape hatch works for the current inputs (branch names and SHA hashes never contain spaces), but it's fragile for future changes and suppresses a useful lint rule broadly.

read -ra builds a real array and handles the empty-string case (push-to-main where START_POINT_ARGS="") correctly — an empty input produces a zero-element array, so "${sp_args[@]}" expands to nothing:

Suggested change
run_bencher() {
local sp_args="$1"
# Intentional word-splitting: sp_args contains multiple flags
# (e.g. "--start-point main --start-point-hash abc123") that must
# be split into separate argv entries.
# shellcheck disable=SC2086
bencher run \
--project react-on-rails-t8a9ncxo \
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch "$BRANCH" \
$sp_args \
--testbed github-actions \
run_bencher() {
local -a sp_args
read -ra sp_args <<< "$1"
bencher run \
--project react-on-rails-t8a9ncxo \
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch "$BRANCH" \
"${sp_args[@]}" \

--adapter json \
--file bench_results/benchmark.json \
--err \
--quiet \
--format html \
--threshold-measure rps \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary $BOUNDARY \
--threshold-upper-boundary _ \
--threshold-measure p50_latency \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY \
--threshold-measure p90_latency \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY \
--threshold-measure p99_latency \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY \
--threshold-measure failed_pct \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY
}

# Run bencher and capture HTML output (stdout) while letting stderr
# go to a file so we can inspect failure reasons.
BENCHER_STDERR=$(mktemp)
trap 'rm -f "$BENCHER_STDERR"' EXIT
set +e
bencher run \
--project react-on-rails-t8a9ncxo \
--token '${{ secrets.BENCHER_API_TOKEN }}' \
--branch "$BRANCH" \
--start-point "$START_POINT" \
--start-point-hash "$START_POINT_HASH" \
--start-point-clone-thresholds \
--testbed github-actions \
--adapter json \
--file bench_results/benchmark.json \
--err \
--quiet \
--format html \
--threshold-measure rps \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary $BOUNDARY \
--threshold-upper-boundary _ \
--threshold-measure p50_latency \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY \
--threshold-measure p90_latency \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY \
--threshold-measure p99_latency \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY \
--threshold-measure failed_pct \
--threshold-test t_test \
--threshold-max-sample-size $MAX_SAMPLE \
--threshold-lower-boundary _ \
--threshold-upper-boundary $BOUNDARY \
$EXTRA_ARGS > bench_results/bencher_report.html 2>"$BENCHER_STDERR"
run_bencher "$START_POINT_ARGS" > bench_results/bencher_report.html 2>"$BENCHER_STDERR"
BENCHER_EXIT_CODE=$?
set -e

# Print stderr for visibility in logs
cat "$BENCHER_STDERR" >&2

# If bencher failed due to missing baseline data (404 Not Found) and there
# are no regression alerts, treat as a warning instead of failing the workflow.
# This commonly happens when the PR base commit was a docs-only change
# skipped by paths-ignore, so no benchmark data exists in Bencher.
# If bencher failed because the start-point hash doesn't exist in
# Bencher (e.g., the base commit was a docs-only change skipped by
# paths-ignore), retry without --start-point-hash so the report
# still gets stored using the latest available baseline.
#
# Safety checks before overriding exit code:
# 1. stderr must contain "404 Not Found" (HTTP status from Bencher API)
# 2. stderr must NOT contain regression indicators ("alert", "threshold",
# or "boundary") to avoid suppressing actual performance regressions
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "404 Not Found" "$BENCHER_STDERR" && ! grep -qiE "alert|threshold violation|boundary violation" "$BENCHER_STDERR"; then
echo "⚠️ Bencher baseline not found for start-point hash '$START_POINT_HASH' — this is expected when the base commit was not benchmarked (e.g., docs-only changes skipped by paths-ignore)"
echo "⚠️ Benchmark data was collected but regression comparison is unavailable for this run"
echo "📋 Bencher stderr output:"
cat "$BENCHER_STDERR"
echo "::warning::Bencher baseline not found for start-point hash '$START_POINT_HASH' — regression comparison unavailable for this run"
BENCHER_EXIT_CODE=0
# Bencher emits: 'Head Version (..., Some(GitHash("<sha>"))) not found'
# when the pinned hash isn't in its DB. A single combined pattern
# avoids false-positive matches across unrelated lines.
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version.*not found" "$BENCHER_STDERR" && ! grep -qiE "alert|threshold violation|boundary violation" "$BENCHER_STDERR"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says two separate greps make format changes "surface clearly", but the real risk is the opposite: two independent greps can match on different lines, creating a false-positive retry. A combined pattern is both safer and still surfaces format changes:

Suggested change
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "Head Version.*not found" "$BENCHER_STDERR" && ! grep -qiE "alert|threshold violation|boundary violation" "$BENCHER_STDERR"; then
if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -qE "Head Version.*not found" "$BENCHER_STDERR" && ! grep -qiE "alert|threshold violation|boundary violation" "$BENCHER_STDERR"; then

If Bencher's error format ever changes, the combined pattern still stops matching — you don't lose the diagnostic value, and you eliminate the false-positive risk.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version already has a single grep to prevent matching on different lines.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile error-message coupling: The retry is gated on "Head Version.*not found", which is Bencher's current internal error string. If Bencher changes this message in a future release, the retry will silently stop working and the old cascading-404 failure will return — without any obvious signal that it regressed.

Suggestion: add a comment here with a link to the Bencher CLI source or changelog entry that documents this error format, so it's easy to re-check after a Bencher version bump.

RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sed approach works today, but if Bencher ever renames --start-point-hash the substitution silently no-ops, $RETRY_ARGS == $START_POINT_ARGS, and the retry is quietly skipped with no warning. The comment on line 652–653 already notes the concern about format changes surfacing clearly — worth applying the same logic here.

A small improvement: echo + sed can be replaced with bash parameter expansion which is both faster and avoids a subshell:

Suggested change
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')
RETRY_ARGS="${START_POINT_ARGS/--start-point-hash [^ ]*/}"

[^ ]* isn't a POSIX extended regex here but rather a glob/pattern, so use ${var/pattern/replacement} — though note bash glob patterns don't support [^ ]. In practice the simpler approach is fine:

RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/ --start-point-hash [^ ]*//')

The leading space before --start-point-hash avoids leaving a double-space artefact in the retry args, which is cosmetically cleaner (no functional impact since the shell collapses whitespace during word-splitting).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When --start-point-hash <sha> appears in the middle of $START_POINT_ARGS, the substitution leaves a double-space (e.g. --start-point main --start-point-clone-thresholds). Not a functional bug — shell word-splitting collapses it — but a trailing space in the pattern is cleaner:

Suggested change
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]* //')

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intentional and already replied to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed strip leaves a double-space when --start-point-hash sits between two other flags:

"--start-point main --start-point-hash abc123 --start-point-clone-thresholds"
                                             ^ sed leaves two spaces here

This is harmless because the unquoted $sp_args expansion inside run_bencher collapses consecutive whitespace via normal word-splitting. Just noting it so future readers don't think there's a bug.

A slightly cleaner alternative if you ever want it:

Suggested change
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*[[:space:]]*//')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: double space after removal. s/--start-point-hash [^ ]*/ removes the flag+value but leaves a double space in the result (e.g. "--start-point main --start-point-clone-thresholds ..."). Functionally harmless — word-splitting eats it and the RETRY_ARGS != START_POINT_ARGS guard still fires — but including the trailing space in the pattern keeps things tidy:

Suggested change
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]*//')
RETRY_ARGS=$(echo "$START_POINT_ARGS" | sed 's/--start-point-hash [^ ]* //')

if [ "$RETRY_ARGS" != "$START_POINT_ARGS" ]; then
echo ""
echo "⚠️ Start-point hash not found in Bencher — retrying without --start-point-hash (will use latest baseline)"
echo "::warning::Start-point hash not found in Bencher — falling back to latest baseline for comparison"
set +e
run_bencher "$RETRY_ARGS" > bench_results/bencher_report.html 2>"$BENCHER_STDERR"
BENCHER_EXIT_CODE=$?
set -e
cat "$BENCHER_STDERR" >&2
fi
fi
rm -f "$BENCHER_STDERR"
# $BENCHER_STDERR is cleaned up by the EXIT trap

# Export exit code early so downstream alerting steps (7b/7c) always see it,
# even if the post-processing below (step summary, PR comments) fails.
Expand Down
Loading