Skip to content

Commit 5d3aa1a

Browse files
m4r1kclaude
andcommitted
fix: address PR #41 review findings (race + shell quoting)
- src/eneru/multi_ups.py: move _global_shutdown_flag.unlink() inside the _local_shutdown_lock block so the boolean reset and the file removal are atomic with respect to _handle_local_shutdown. Without this, a thread that re-touches the flag between the bool reset and the unlink ends up with the coordinator's in-memory state saying "shutdown in flight" while the on-disk evidence is gone — a state worse than the bug we set out to fix. (cubic.dev finding) - tests/e2e/groups/single-ups.sh Test 34: quote $E2E_DIR / $REARM_DIR / $DAEMON_PID expansions and switch the EXIT trap to a single-quoted command so $DAEMON_PID resolves at trap-fire time, not trap-set time. (CodeRabbit nitpick) Co-Authored-By: Claude Opus 4.7 <[email protected]>
1 parent cf23ff4 commit 5d3aa1a

2 files changed

Lines changed: 24 additions & 11 deletions

File tree

src/eneru/multi_ups.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -380,10 +380,19 @@ def _clear_local_shutdown_state(self):
380380
381381
Idempotent: safe to call repeatedly when no shutdown was in
382382
flight, or when this group is the second OL transition in a row.
383+
384+
The boolean reset AND the file unlink BOTH live inside the lock
385+
so the two halves of "shutdown is no longer in flight" stay
386+
atomic with respect to ``_handle_local_shutdown``. Without
387+
this, an interleaving where (A) we cleared the bool, (B) a
388+
concurrent ON_BATTERY thread re-took the lock and re-touched
389+
the flag, then (C) we unlinked it would leave the coordinator
390+
believing a sequence is in flight while the on-disk evidence
391+
is gone -- a state worse than the one we set out to fix.
383392
"""
384393
with self._local_shutdown_lock:
385394
self._local_shutdown_initiated = False
386-
self._global_shutdown_flag.unlink(missing_ok=True)
395+
self._global_shutdown_flag.unlink(missing_ok=True)
387396

388397
def _handle_local_shutdown(self, triggered_by: str):
389398
"""Execute local shutdown with defense-in-depth protection."""

tests/e2e/groups/single-ups.sh

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -454,8 +454,8 @@ EOF
454454
# test would still "pass" without exercising the re-arm path. Fail
455455
# loud instead. Scenario format is plain ``key: value`` lines; battery
456456
# charge can be a float like ``14`` or ``14.0``.
457-
LB_CHARGE=$(awk '/^battery\.charge:/{gsub(/[^0-9.]/, "", $2); print $2; exit}' $E2E_DIR/scenarios/low-battery.dev)
458-
OL_STATUS=$(awk -F': ' '/^ups\.status:/{print $2; exit}' $E2E_DIR/scenarios/online-charging.dev)
457+
LB_CHARGE=$(awk '/^battery\.charge:/{gsub(/[^0-9.]/, "", $2); print $2; exit}' "$E2E_DIR/scenarios/low-battery.dev")
458+
OL_STATUS=$(awk -F': ' '/^ups\.status:/{print $2; exit}' "$E2E_DIR/scenarios/online-charging.dev")
459459
LB_CHARGE_INT=${LB_CHARGE%.*}
460460
if [ -z "$LB_CHARGE_INT" ] || [ "$LB_CHARGE_INT" -ge 20 ]; then
461461
echo "FAIL (setup): low-battery.dev battery.charge=${LB_CHARGE:-?} expected < 20"
@@ -472,9 +472,13 @@ sleep 3
472472

473473
# Daemon in background. NO --exit-after-shutdown; we want it to stay
474474
# alive across all three transitions.
475-
PYTHONUNBUFFERED=1 eneru run --config $REARM_DIR/config.yaml > $REARM_DIR/daemon.log 2>&1 &
475+
PYTHONUNBUFFERED=1 eneru run --config "$REARM_DIR/config.yaml" > "$REARM_DIR/daemon.log" 2>&1 &
476476
DAEMON_PID=$!
477-
trap "kill $DAEMON_PID 2>/dev/null || true" EXIT
477+
# Single-quoted trap so $DAEMON_PID resolves at trap-fire time, not
478+
# trap-set time. Functionally equivalent here (DAEMON_PID is already
479+
# set when this line runs and never changes), but matches shell-best-
480+
# practice and silences ShellCheck.
481+
trap 'kill $DAEMON_PID 2>/dev/null || true' EXIT
478482

479483
# Let the daemon initialise + observe OL.
480484
sleep 3
@@ -498,7 +502,7 @@ for i in {1..20}; do
498502
done
499503
if [ "${COUNT:-0}" -lt 1 ]; then
500504
echo "FAIL: first EMERGENCY_SHUTDOWN_INITIATED never recorded"
501-
tail -40 $REARM_DIR/daemon.log
505+
tail -40 "$REARM_DIR/daemon.log"
502506
exit 1
503507
fi
504508

@@ -517,12 +521,12 @@ for i in {1..20}; do
517521
done
518522
if [ "${PR_COUNT:-0}" -lt 1 ]; then
519523
echo "FAIL: POWER_RESTORED never recorded"
520-
tail -40 $REARM_DIR/daemon.log
524+
tail -40 "$REARM_DIR/daemon.log"
521525
exit 1
522526
fi
523527
if [ -f "$REARM_DIR/shutdown-flag" ]; then
524528
echo "FAIL: flag still present after POWER_RESTORED -- re-arm broken (bug #4 regression)"
525-
tail -40 $REARM_DIR/daemon.log
529+
tail -40 "$REARM_DIR/daemon.log"
526530
exit 1
527531
fi
528532

@@ -543,13 +547,13 @@ if [ "${COUNT2:-0}" -lt 2 ]; then
543547
echo "FAIL: second EMERGENCY_SHUTDOWN_INITIATED never recorded -- re-arm broken (bug #4 regression)"
544548
echo "events table:"
545549
sqlite3 "$DB" "SELECT ts, event_type, detail FROM events ORDER BY ts" || true
546-
tail -60 $REARM_DIR/daemon.log
550+
tail -60 "$REARM_DIR/daemon.log"
547551
exit 1
548552
fi
549553

550554
# Stop the daemon cleanly.
551-
kill $DAEMON_PID 2>/dev/null || true
552-
wait $DAEMON_PID 2>/dev/null || true
555+
kill "$DAEMON_PID" 2>/dev/null || true
556+
wait "$DAEMON_PID" 2>/dev/null || true
553557
trap - EXIT
554558

555559
# Restore baseline so any downstream tests start fresh.

0 commit comments

Comments
 (0)