chore(release): cut 5.1.1 — bug-fix release from AI code review#31
chore(release): cut 5.1.1 — bug-fix release from AI code review#31
Conversation
Security and supply-chain hardening from a comprehensive third-party
AI code review (CodeRabbit Pro + Cubic.Dev). All four fixes address
exploitable injection vectors or unsafe credential scoping in the
build / release path. No runtime behavior change for users; the
XCP-ng UUID fix repairs a silent no-op on XenServer hosts.
Bundled fixes:
- src/eneru/actions.py:50,56 — XCP-ng `xargs ... xe vm-shutdown uuid=`
left uuid= empty; UUIDs were passed positionally and `xe` (which
parses arguments as key=value pairs) silently no-op'd on XenServer
/ XCP-ng hosts. Switch both the graceful and force passes to
`xargs -r -I {} xe vm-shutdown uuid={}` (and `uuid={} force=true`,
matching xe's keyword form rather than the invalid `--force`).
Reported by @coderabbitai and @cubic-dev-ai.
- src/eneru/actions.py:78,80 (via shutdown/remote.py) — stop_compose
template wrapped {path} in double quotes; double quotes don't block
$(), backticks, or ${...} in the remote shell. Drop the surrounding
double quotes from the template and apply shlex.quote(path) at the
.format() call site in RemoteShutdownMixin so the path is the only
quoting boundary. Reported by @coderabbitai.
- .github/workflows/pypi.yml:45,57 — workflow_dispatch input was
interpolated directly into `run:` shell with PYPI_API_TOKEN in scope
via id-token: write. Pass via env: block, validate against a strict
PEP-440 allow-list before use. Same pattern in release.yml:52
hardened the same way. Reported by @cubic-dev-ai.
- .github/workflows/pypi.yml — single job granted id-token: write to
every step including `pip install build twine`; a compromised
third-party dependency could exfiltrate the OIDC token. Split into
a `build` job (contents: read only, builds the wheel and uploads as
artifact) and a `publish` job (needs: build, id-token: write only,
downloads the artifact and runs twine upload). Reported by
@cubic-dev-ai.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
Production-code correctness fixes from the AI code review. These include silent config-drop bugs (notifications.suppress and multi-UPS statistics), the missing local-host poweroff in is_local redundancy groups, the multi-UPS state-file write race, the TUI live-blending key mismatch, and a long tail of robustness fixes across shutdown, stats, logging, and utility modules. Several findings were cross-confirmed by both reviewers. Bundled fixes: - src/eneru/config.py — _parse_notifications never read `suppress` or `voltage_hysteresis_seconds` from YAML; defaults were always used, validator passed against defaults, user YAML was silently ignored. Read both keys from notif_data and pass into NotificationsConfig. Reported by @cubic-dev-ai. - src/eneru/multi_ups.py:104-110 — per-group monitor-config build dropped `statistics`, so custom stats DB path / retention settings were silently ignored in multi-UPS mode. Pass it through. Reported by @cubic-dev-ai. - src/eneru/redundancy.py + multi_ups.py — for is_local: true redundancy groups, the executor stopped local services + remote servers but never invoked local_shutdown.command. Quorum-loss completed with the local host still running. Add a local_shutdown_callback parameter delegating to the coordinator's defense-in-depth _handle_local_shutdown so the in-memory lock and global flag prevent double-fire if the per-UPS path also requests a local shutdown. Reported by @cubic-dev-ai. - src/eneru/monitor.py:595 + health/battery.py:36 — `with_suffix('.tmp')` replaced the per-UPS suffix (`.ups1`, `.ups2`) with `.tmp`, collapsing every monitor's temp file onto a shared one and racing on the atomic rename. Use `with_name(name + '.tmp')` in both call sites. Reported by @cubic-dev-ai. - src/eneru/shutdown/vms.py:53 — wait-loop `virsh list` ignored exit_code; if libvirtd died mid-shutdown the loop computed remaining_vms=[] from empty stdout and reported "All VMs stopped gracefully", skipping force-destroy. Treat non-zero exit conservatively: keep the prior remaining_vms and re-poll on the next interval. Reported by @cubic-dev-ai. - src/eneru/shutdown/filesystems.py:58-61 — `cmd.append(options)` bundled multi-flag option strings like "-l -f" into a single subprocess argument; umount rejected the literal as an unknown option. Use `cmd.extend(shlex.split(options))`. Reported by @cubic-dev-ai. - src/eneru/tui.py:114 — `_STATE_FILE_TO_COLUMN` was keyed on NUT- style dotted names (`battery.charge`, `ups.load`) but the state file uses uppercase (`BATTERY=`, `LOAD=`). Live blending received zero samples; TUI graph rightmost edge always lagged by ≤10s. Realign the dict to actual state-file keys; drop entries that have no state-file source (battery voltage, temperature, frequencies). Reported by @cubic-dev-ai. - src/eneru/shutdown/remote.py:91 — `cmd.timeout or server.command_timeout` promoted 0 to the default timeout via Python truthiness. Use explicit `is not None` check. Reported by @coderabbitai. - src/eneru/shutdown/remote.py:108 — thread worker swallowed unexpected exceptions silently. Log them with the server display name so silent failures become visible. Reported by @cubic-dev-ai. - src/eneru/shutdown/remote.py:152 — ssh_options normalization silently broke non-`-o`-style flags (e.g., `-i keyfile`). Pass through any flag starting with `-`; only normalize bare key=value pairs to `-o key=value`; split single "-o foo=bar" strings into two argv entries. Reported by @coderabbitai. - src/eneru/shutdown/remote.py:179 — docstring claimed the method returns False on SSH connection failure but it always returns True. Update the docstring to match the actual best-effort behavior. Reported by @coderabbitai. - src/eneru/multi_ups.py:_drain_all_groups — invoked shutdown on peer monitors while their main loops were still running. Restructure: signal stop_event first, give peer loops a brief join window, then run per-monitor shutdown sequences sequentially before the final join. Reported by @coderabbitai. - src/eneru/graph.py:126 — min/max over a list that may contain None / non-numeric placeholders raised TypeError. Filter to numeric values before bounds calculation; render an empty grid when nothing numeric remains. Reported by @cubic-dev-ai. - src/eneru/graph.py:212 — pixel-row aggregator used the wrong direction; the variable named `highest_on` actually tracked the bottommost set pixel, so any "on" cell rendered as the tiniest block. Use min(r) so the topmost set pixel drives the block size. Reported by @cubic-dev-ai. - src/eneru/health/battery.py:35 — `except Exception: pass` in the persistence helper hid all I/O failures. Log at WARNING level and keep the path non-fatal. Reported by @coderabbitai. - src/eneru/health/battery.py:116 — pending anomaly confirmation could fire after the drop was no longer >20%, producing a false alarm. Re-validate the current drop magnitude before firing. Reported by @cubic-dev-ai. - src/eneru/logger.py:29-30 — handler cleanup directly cleared `logger.handlers` and didn't close existing handlers; FileHandler fds leaked on re-init. Iterate, removeHandler, close; also set propagate=False so root-logger handlers don't duplicate output. Reported by @coderabbitai and @cubic-dev-ai. - src/eneru/stats.py:148 — `isolation_level=None` made `with self._conn:` a no-op so each executemany row committed individually instead of being batched. Drop the override and let sqlite3 use the default deferred mode the with-blocks expect. Reported by @coderabbitai. - src/eneru/stats.py:605 — SQLite URI built via raw string interpolation broke on file paths containing `?` or `#`. Apply urllib.parse.quote on the path before assembling the URI. Reported by @cubic-dev-ai. - src/eneru/utils.py:37 — `run_command` returned stdout/stderr as None when capture_output=False; callers that indexed/stripped them crashed. Normalize to empty string in the no-capture branch. Reported by @cubic-dev-ai. - src/eneru/cli.py:204 — validate skipped top-level YAML warnings for single-UPS configs and silently swallowed parse errors. Re- parse for both single-UPS and multi-UPS configs; surface parse errors with a non-zero exit. Reported by @coderabbitai. - src/eneru/completion/eneru.bash:37 — bash completion for --config used unquoted `compgen -f` substitution; filenames with spaces split into multiple completions. Use mapfile -t. Reported by @coderabbitai. - src/eneru/health/voltage.py:324 (F1) — severity escalation from non-severe to severe within the same LOW/HIGH state never refreshed `voltage_pending_severe`, so the severe-bypass path in _maybe_notify_voltage_pending was never reached and operators waited the full hysteresis window for an immediate-class alert. Re-validate severity when the state is unchanged and lift the pending record's severe flag (and refresh the recorded voltage / threshold) so the next _maybe_notify_voltage_pending fires immediately. Reported by @cubic-dev-ai. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Packaging-script hardening for chroot/container build environments,
correct Debian lifecycle handling, and a setuptools dynamic-version
robustness fix.
Bundled fixes:
- pyproject.toml — `version = {attr = "eneru.__version__"}` evaluated
src/eneru/__init__.py at build time; a future top-level import
there would break wheel builds. Point at `eneru.version.__version__`
so setuptools resolves directly from version.py. Reported by
@coderabbitai and @cubic-dev-ai.
- packaging/scripts/preremove.sh + postremove.sh — DEB lifecycle args
`failed-upgrade`, `deconfigure`, `abort-install`, `abort-upgrade`,
and `disappear` fell through to the is_removal=true default; the
service was stopped/disabled during non-removal flows. Replace the
if/elif cascade with an explicit `case` covering the full Debian
Policy enumeration; preserve RPM numeric handling (0 = removal,
>=1 = upgrade) in the fallback. Reported by @coderabbitai and
@cubic-dev-ai.
- packaging/scripts/postinstall.sh + postremove.sh — `systemctl
daemon-reload` under `set -e` aborted in chroot/container builds
without a live systemd, leaving the package half-applied. Guard
each call with `[ -d /run/systemd/system ]`. Reported by
@coderabbitai and @cubic-dev-ai.
- packaging/scripts/postinstall.sh — DEB upgrade detection used
`[ "$1" = "configure" ]` plus `systemctl list-unit-files`, but
`configure` is the arg for BOTH fresh installs AND upgrades, and
the unit file is part of the new package so list-unit-files always
returns true. Disambiguate via `$2` (previous version, empty on
fresh install). Drop the unused `was_enabled` variable shellcheck
flagged. Reported by @coderabbitai and @cubic-dev-ai.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
…view CI / supply-chain hardening (pinning, checksum verification, permission scoping, fail-fast on readiness timeouts), e2e test infrastructure robustness, plus reference-config corrections so copy-pasting an example YAML produces a working configuration. Bundled fixes: - .github/workflows/release.yml — `git push -f` to gh-pages risked clobbering on transient races between concurrent release jobs. Drop --force; let the push fail loudly on non-fast-forward so the operator can reconcile manually. Reported by @cubic-dev-ai. - .github/workflows/integration.yml + release.yml — nFPM was downloaded from `releases/latest` with no version pin or integrity check. Pin to a specific release via NFPM_VERSION env var and verify the archive against the goreleaser-published checksums.txt before extracting / installing. Reported by @cubic-dev-ai. - .github/workflows/integration.yml — DEB and RPM install commands trailed `|| true` that silently masked real install failures (kept CI green even when the package never installed). Drop the `|| true` on dpkg + apt-get install -f; preserve the `rpm -ivh ... || dnf install --skip-broken` chain because it's the documented fallback for missing optional deps. Reported by @cubic-dev-ai. - .github/actions/e2e-setup/action.yml — both readiness loops (NUT on port 3493, SSH on 2222/2223/2224) fell out of the loop on timeout but never `exit 1`; the composite action reported success even when services never came up. Track readiness explicitly per loop and fail with a stderr message if the deadline elapses. Reported by @coderabbitai and @cubic-dev-ai. - .github/workflows/codeql.yml — only `security-events: write` was declared; once any permission is set, the others default to `none` and actions/checkout fails on private repos. Add `contents: read`. Reported by @cubic-dev-ai. - .github/workflows/{validate,integration,e2e,codeql,pypi,release}.yml + .github/actions/e2e-setup/action.yml — every third-party action reference is now SHA-pinned (with the corresponding tag in a comment) so a moved tag in the upstream action repo cannot silently change what runs in CI. Refresh by re-running `gh api repos/<owner>/<repo>/git/refs/tags/<tag>`. Reported by @cubic-dev-ai. - pytest.ini — blanket `filterwarnings = ignore::DeprecationWarning` hid every Python deprecation across the entire suite, including the daemon's own code. Drop it: the suite is currently warning- clean, and a future deprecation should fail loudly so we react. Reported by @coderabbitai. - tests/e2e/ssh-target/Dockerfile + entrypoint.sh — Alpine's openssh-server doesn't create /var/run/sshd at install time and sshd refuses to start without it; the container exited immediately after launch. mkdir -p both at build time and as a defensive fallback in the entrypoint. Reported by @cubic-dev-ai. - examples/config-reference.yaml + config-dual-ups.yaml — SSH `StrictHostKeyChecking=no` was the default in the reference examples; copy-pasters inherited a MITM-able default. Comment the option out, mark it as testing-only, and document that production should rely on populated known_hosts. Reported by @cubic-dev-ai. - examples/config-enterprise.yaml:50 — Slack Apprise URL `slack://xoxb-token/CHANNEL_ID?footer=no` doesn't match either Apprise's webhook or bot-OAuth form. Replace with the documented webhook form and link to the Apprise wiki for the bot-OAuth alternative. Reported by @cubic-dev-ai. - examples/config-reference.yaml:190 — `compose_files:` written as a YAML key with no value parses as `null`, not an empty list; loader/runtime fails on a copy-paste-unmodified reference. Use `compose_files: []` (explicit empty list). Reported by @coderabbitai. - examples/config-homelab.yaml:88-95 (F2) — comment claimed `parallel: false` runs the NAS LAST; the legacy flag actually runs the server in a phase BEFORE the parallel-order=0 batch (config.py RemoteServerConfig: "False = run sequentially before the parallel batch"). Switch to `shutdown_order: 2` and rewrite the comment to document the legacy flag's real semantics. Reported by @cubic-dev-ai. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Test-suite quality fixes: SQLite-store leakage to /var/lib/eneru during pytest runs, run_command patches that missed the shutdown- mixin namespaces, failsafe tests that inlined production logic instead of routing through the real loop, swallowed e2e exit codes, and missing assertions on happy-path tests. None affect runtime behavior; all reduce false-confidence in the test suite. Bundled fixes: - tests/conftest.py — tests instantiating UPSGroupMonitor or StatsStore without overriding StatsConfig.db_directory leaked SQLite files into the real /var/lib/eneru. Add an autouse isolate_stats_db_directory fixture that monkeypatches the dataclass default to a per-test tmp_path so no test can hit the system path. Reported by user during release execution. - tests/conftest.py — add a patch_run_command_everywhere fixture that patches `run_command` in every shutdown-module namespace. `from eneru.utils import run_command` binds the symbol at import time, so a test that only patches `eneru.monitor.run_command` doesn't intercept calls made by the shutdown mixins (vms, containers, filesystems, remote). Reported by @cubic-dev-ai. - tests/test_monitor_core.py:312 — test_connection_lost_while_ob_triggers_shutdown inlined the failsafe logic and asserted the inlined simulation; the real _main_loop could regress and the test would still pass. Route through TestAdvisoryTriggers._run_one_iteration so the production code path is exercised. Reported by @coderabbitai. - tests/test_monitor_core.py:421 — test_shutdown_continues_on_step_failure documented a behavior gap rather than verifying behavior: it caught the RuntimeError in the test body and asserted only that the failing step was called. Rename to test_shutdown_aborts_on_step_failure and assert the actual current contract (raise propagates, subsequent steps are NOT invoked) so the test is honest and a future best-effort change must update it deliberately. Reported by @coderabbitai and @cubic-dev-ai. - tests/test_redundancy.py:343 (F3) — test_evaluator_swallows_exceptions_and_continues didn't assert the thread was alive AFTER the bad monitor had time to raise on every tick. A thread killed by the unhandled exception would still leave `not ev.is_alive()` True after join (a dead thread joins instantly), so the test silently passed even when the swallowing contract broke. Add `assert ev.is_alive()` before signaling stop. Reported by @cubic-dev-ai. - tests/test_cli.py:233 — test_validate_config_nonexistent_file asserted only exit 0 + "Configuration is valid"; passed whether the typo'd path was used or the default-config fallback fired. Now also assert the fallback warning containing the typo'd path in stdout so a regression that swallowed the warning would fail loudly. Reported by @coderabbitai. - tests/test_shutdown_vms.py:101 — test_shutdown_vms_graceful_shutdown_completes_first_poll set up the run_command sequence but never asserted what was actually issued; it would pass even if the wrong virsh command (or none) ran. Assert the virsh list / shutdown sequence and the absence of `virsh destroy` on the happy path. Reported by @cubic-dev-ai. - tests/e2e/groups/single-ups.sh:35,63 — `eneru run ... | tee ... || true` masked any non-zero exit (including a real crash) so the test passed even when eneru never produced log output. Capture the exit code via PIPESTATUS and fail loudly on anything other than 0 (or 124 from the timeout-bounded test). Reported by @cubic-dev-ai. - tests/e2e/groups/multi-ups.sh:391 — parse_epoch silently produced a "now" epoch when the shutdown.log file was empty, corrupting the phase-ordering comparison further down. Fail loudly on empty input. Reported by @cubic-dev-ai. - tests/e2e/scenarios/low-battery.dev — runtime was 180 s, below the 600 s critical-runtime threshold; the scenario actually exercised the critical-runtime trigger, not the low-battery trigger, making test failures hard to localise. Bump runtime to 900 s so the test unambiguously exercises the low-battery path. Reported by @cubic-dev-ai. Co-Authored-By: Claude Opus 4.7 <[email protected]>
…rand Aggregate of cosmetic / lint / robustness fixes from the AI code review, plus the legacy `ups-monitor` syslog identifier rebrand and a hardened SQLite-isolation fixture for the test suite. The emoji- in-logs convention (per root CLAUDE.md "Code Style") is intentional and remains unchanged despite Ruff lint comments to the contrary. Bundled fixes: - src/eneru/monitor.py:391 — power events shelled out to `logger -t ups-monitor`, the legacy package identifier from before the v5.0 rebrand. Every other journal line emitted by the daemon already uses the `eneru` identifier, so journalctl filtering on `-t eneru` silently missed the power-event side channel. Rename to `-t eneru`. Reported by user (release execution). - src/eneru/version.py:3-5 — stale `4.3.0` example in the docstring; refresh to `5.1.0` so the example tracks the current series. - src/eneru/utils.py — `is_numeric` accepted booleans (subtype of int) and NaN/±Inf, both of which the UPS-metric callers (voltage/runtime/load math) treat as garbage. Reject bools and use math.isfinite to fence off the float edge cases. - src/eneru/utils.py — `format_seconds` produced strings like "-1m 30s" when a misbehaving driver briefly returned a negative runtime. Clamp to max(0, ...) before bucket math. - tests/conftest.py — the autouse isolate_stats_db_directory fixture introduced earlier in this release patched the dataclass CLASS attribute, which is a no-op for new instances because the dataclass-generated __init__ captures defaults at decoration time. Replace with a two-layer wrapper that patches both StatsConfig.__init__ (for fresh defaults) and StatsStore.__init__ (for direct paths under /var/lib/eneru). Add a no_stats_isolation marker so tests that specifically need to verify the production default can opt out. Reported by user (release execution): leaked TestUPS@localhost DAEMON_START events showing up in the production /var/lib/eneru/default.db. - pytest.ini — register the new `no_stats_isolation` marker so --strict-markers passes. - tests/test_stats.py — mark test_defaults + test_default_yaml_omitted_section with @pytest.mark.no_stats_isolation so they continue to verify the literal "/var/lib/eneru" default. Project-policy items deliberately NOT changed: - Emoji in log messages (per root CLAUDE.md "Code Style — Emojis in logs/notifications") are intentional semantic markers, not decoration. Suppress the Ruff lint rule rather than the emojis. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Drop the 24h fixed window on the events panel and let the operator scroll through every event in the SQLite store with the arrow keys. The events table is one row per power event (not per poll), so even years of history stay tiny -- truncating it by age was hiding the operator's own outage history without any storage benefit. Changes: - src/eneru/tui.py: query_events_for_display now defaults to no time bound (time_range_seconds=None pulls everything; explicit window still supported by callers that want one). EVENTS_TIME_WINDOW removed; EVENTS_MAX_ROWS_NORMAL=8 / EVENTS_MAX_ROWS_MORE=500 introduced for the <M>-toggled cap. - src/eneru/tui.py: render_logs_panel now takes a scroll_offset so the visible slice anchors against the operator's scroll position instead of always pinning to the most recent rows. - src/eneru/tui.py: arrow-key scrolling in the main loop -- Up/Down moves one row, PgUp/PgDn moves ten, Home jumps to the oldest event, End returns to the most recent. <R> resets the offset. The render path clamps the offset to the current list size so a refresh that shrinks the list doesn't strand the user on an empty view. - src/eneru/tui.py: bottom hint row gains <↑↓> Scroll. - tests/test_tui.py: TestEventsTimescaleDecoupled rewritten to assert the new contract (no positional time-range argument passed to query_events_for_display) instead of the old EVENTS_TIME_WINDOW constant. Reported by user during release execution: a clean install of v5.1.0 would only show the most recent ~1 day of events even with multi-year retention configured, so historical outages dropped off the operator's TUI as soon as the 24h window expired. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Bug-fix release with one small TUI improvement (events panel pulls all events with arrow-key scrolling). Bundles fixes from a third-party AI code review (CodeRabbit Pro + Cubic.Dev) of the v5.1.0 codebase. See docs/changelog.md for the grouped summary; `git log v5.1.0..v5.1.1` for per-commit detail with reviewer attribution on every fix. Co-Authored-By: Claude Opus 4.7 <[email protected]>
The first cut clocked in at ~900 words, which is the v5.1.0 density — too much for a bug-fix-only release. Compress per-fix prose into one-line headlines, collapse the "long tail" enumeration into a single bullet, drop redundant attribution preamble, and point readers at `git log v5.1.0..v5.1.1` for detail. Final entry is ~530 words. Co-Authored-By: Claude Opus 4.7 <[email protected]>
📝 WalkthroughWalkthroughThis PR bumps version to 5.1.1 and includes CI hardening via SHA-pinned GitHub Actions, fixes to remote shutdown command UUID binding for XCP-ng and Docker/Podman, TUI event scrolling enhancements replacing the fixed 24-hour window, battery/voltage anomaly detection improvements, packaging script robustness for systemd detection, and configuration example updates removing insecure SSH defaults. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #31 +/- ##
==========================================
- Coverage 79.87% 79.06% -0.82%
==========================================
Files 23 23
Lines 3593 3664 +71
Branches 692 708 +16
==========================================
+ Hits 2870 2897 +27
- Misses 572 607 +35
- Partials 151 160 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The Commit 5 bump of low-battery.dev runtime to 900s removed the critical-runtime fallback that was previously masking a real bug: UPS2 in tests/e2e/config-e2e-multi-ups.yaml has low_battery_threshold=15, and the LOW BATTERY trigger uses strict `<`, so battery.charge=15 never fires UPS2's shutdown. Test 15 (Non-local failure) sat for the full 15-minute job timeout waiting for a shutdown that wasn't going to come. Drop battery.charge to 14 so the scenario fires LOW BATTERY against both UPS1 (threshold 20) and UPS2 (threshold 15), and the single-UPS tests still exercise the same trigger. Co-Authored-By: Claude Opus 4.7 <[email protected]>
|
@coderabbitai full review |
|
@cubic-dev-ai review this pull request |
✅ Actions performedFull review triggered. |
@m4r1k I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
5 issues found across 49 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="examples/config-enterprise.yaml">
<violation number="1" location="examples/config-enterprise.yaml:54">
P3: The Slack example URL puts `?footer=no` after the `#channel` fragment, so the footer option is not passed as a query parameter.</violation>
</file>
<file name="src/eneru/shutdown/filesystems.py">
<violation number="1" location="src/eneru/shutdown/filesystems.py:64">
P1: `shlex.split(options)` can raise on malformed config input and crash the shutdown sequence; handle parse errors before extending the command.</violation>
</file>
<file name="tests/e2e/groups/single-ups.sh">
<violation number="1" location="tests/e2e/groups/single-ups.sh:43">
P2: Do not accept a clean 0 exit here; that lets a premature daemon exit pass this monitoring test.</violation>
</file>
<file name="src/eneru/tui.py">
<violation number="1" location="src/eneru/tui.py:906">
P1: Enable keypad mode before handling `curses.KEY_*`; otherwise arrow/page/home keys can be read as raw escape input and trigger the ESC-to-quit branch instead of scrolling.</violation>
</file>
<file name="tests/conftest.py">
<violation number="1" location="tests/conftest.py:260">
P2: Patch `eneru.utils.run_command` too; otherwise `command_exists()` still shells out during the shutdown tests.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
examples/config-homelab.yaml (1)
68-99:⚠️ Potential issue | 🟡 MinorUpdate the section guidance to include
shutdown_orderphases.The NAS block now correctly uses
shutdown_order: 2, but Lines 68-71 still describe only the legacy two-phaseparallelmodel. That makes this example internally inconsistent and can mislead users into usingparallel: falsefor “shutdown last”.📝 Suggested wording
-# Remote servers with optional pre-shutdown commands -# Servers are processed in two phases: -# 1. Sequential: Servers with parallel: false are shutdown one by one (in config order) -# 2. Parallel: Remaining servers (parallel: true, the default) are shutdown concurrently +# Remote servers with optional pre-shutdown commands +# Servers are processed by effective shutdown order, lowest to highest. +# Servers with the same order run concurrently. +# - Default: order 0, parallel batch +# - shutdown_order: N: explicit phase; higher numbers run later +# - Legacy parallel: false: sequential phases before order 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/config-homelab.yaml` around lines 68 - 99, The example text describing remote_servers is outdated: it only explains the legacy two-phase "parallel" model and misleads users about how to schedule a server "last"; update the guidance to mention the new shutdown_order phases and how they interact with the old parallel flag (i.e., that shutdown_order: <int> controls later phases and parallel: false runs a server sequentially before the parallel batch per RemoteServerConfig), and clarify that to run something after the parallel batch you must set shutdown_order >= 1 (example: NAS uses shutdown_order: 2) rather than using parallel: false.tests/test_cli.py (1)
245-257:⚠️ Potential issue | 🟡 MinorUse
tmp_pathfor the missing config path.The hard-coded
/nonexistent/path/config.yamlis environment-dependent. A temp path that is intentionally not created keeps the test deterministic.Proposed fix
- def test_validate_config_nonexistent_file(self, capsys): + def test_validate_config_nonexistent_file(self, tmp_path, capsys): @@ - typo_path = "/nonexistent/path/config.yaml" + typo_path = str(tmp_path / "missing-config.yaml")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cli.py` around lines 245 - 257, Replace the hard-coded missing config path with a temp path created from the pytest tmp_path fixture (e.g., typo_path = tmp_path / "config.yaml") and ensure you do not create that file so it remains missing; update the test to use str(typo_path) where needed, keep the argv patching that calls main(), and leave the same SystemExit and captured output assertions (asserting "Configuration is valid", "Config file not found", and the tmp_path string) so the test is deterministic across environments.
🧹 Nitpick comments (15)
.github/workflows/validate.yml (1)
19-39: LGTM on SHA pins.Consider adding the same "how to refresh pins" comment as in
codeql.ymlfor consistency across workflows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/validate.yml around lines 19 - 39, Add the same "how to refresh pins" explanatory comment next to the pinned action SHAs in this workflow (e.g., the uses: actions/checkout@de0fac2e..., uses: actions/setup-python@a309ff8b4..., and uses: codecov/codecov-action@57e3a13... lines) so it matches codeql.yml; insert a brief one-line note explaining the source/command or script to update the SHA pins and where to run it (copy the exact phrasing from codeql.yml for consistency)..github/workflows/release.yml (1)
485-495: Recommended: add aconcurrency:group now thatgit push -fis gone.Removing the force-push is correct — silently clobbering a parallel release's commits is worse than failing. The flip side is that two overlapping release runs (e.g., a tag push and a manual dispatch, or a tag + an rc cut close together) will each rebuild and sign packages, do all the repo work, and only discover the race at the final
git push. The loser then has to be re-run manually.Serializing the job with a concurrency group avoids the wasted work and the operator toil, while still surfacing real divergence if gh-pages was edited out of band.
♻️ Suggested change
on: release: types: [published] workflow_dispatch: @@ permissions: contents: write +# Serialize gh-pages deploys so concurrent releases don't race the +# non-fast-forward push at the end of the job. +concurrency: + group: release-gh-pages + cancel-in-progress: false + jobs: build-packages:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/release.yml around lines 485 - 495, Add a concurrency group to the release workflow so overlapping runs are serialized and wasted work avoided: update the job that contains the "Deploy to gh-pages" step to include a concurrency: block (e.g., concurrency: { group: "${{ github.repository }}-release-gh-pages-${{ github.ref }}", cancel-in-progress: false } or similar) at the job level so concurrent release jobs targeting gh-pages are run one-at-a-time while still failing on real divergence from the remote before pushing with the existing git push step.tests/test_multi_ups.py (1)
1279-1283: Assertion relaxation matches the new two-phase drain.
_drain_all_groupsnow joins each thread twice (short window post-stop_event, then final deadline join), soassert_called_once()would always fail.assert live_thread.join.calledis correct. If you want a slightly stronger invariant without coupling to the exact count,assert live_thread.join.call_count >= 2would lock in the two-phase contract and catch regressions if someone collapses the loops back into one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_multi_ups.py` around lines 1279 - 1283, The test was relaxed to use assert live_thread.join.called because _drain_all_groups now calls Thread.join twice per thread; update the assertion to enforce the two-phase contract by replacing assert live_thread.join.called with assert live_thread.join.call_count >= 2 so the test verifies at least two joins occurred for live_thread (keep the existing check for the timeout log: any("still running after drain timeout" in m for m in logs)).src/eneru/shutdown/filesystems.py (1)
53-57: Dry-run log shows unsplit options.Now that the real path splits
optionsviashlex.split, the[DRY-RUN]line on Line 55 still prints the raw{options}string as a single token, which no longer mirrors the actual argv that would be exec'd (e.g."-l -f"vs-l -f). Cosmetic only, but easy to align:♻️ Proposed fix
- if self.config.behavior.dry_run: - self._log_message( - f" 🧪 [DRY-RUN] Would execute: timeout {timeout}s umount {options} {mount_point}" - ) - continue + if self.config.behavior.dry_run: + argv = ["umount", *shlex.split(options), mount_point] if options else ["umount", mount_point] + self._log_message( + f" 🧪 [DRY-RUN] Would execute: timeout {timeout}s {' '.join(shlex.quote(a) for a in argv)}" + ) + continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/shutdown/filesystems.py` around lines 53 - 57, The dry-run message prints the raw options string which no longer matches the actual argv because the real execution uses shlex.split; update the dry-run branch in the same method that checks self.config.behavior.dry_run to reconstruct the tokenized form (e.g. split options with shlex.split and join with spaces or format as separate tokens) before calling self._log_message so the logged command mirrors the real exec args (use the same shlex.split logic already used elsewhere on the options variable).src/eneru/stats.py (1)
608-619: URL-encoding the DB path — correct, andurlquote's defaultsafe='/'is what you want here.POSIX paths with
?or#would previously be truncated or mis-parsed into URI query/fragment components bysqlite3.connect(..., uri=True).quote(str(path))with the default safe-set leaves/unescaped (so the URI remains a legal path) while encoding?,#, space, etc. Good.Minor: the docstring still says "Returns
Noneif the file doesn't exist" but the declared return type issqlite3.Connection(non-Optional) with a# type: ignoreon Line 610. Consider typing itOptional[sqlite3.Connection]so callers liketui.py:208(which already doesif conn is not None) are type-checked honestly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/stats.py` around lines 608 - 619, The function currently returns None when the DB file doesn't exist but its signature/docstring declare a non-Optional sqlite3.Connection and use a "# type: ignore" to silence this; update the function signature return type to Optional[sqlite3.Connection], remove the "# type: ignore" suppression, and adjust the docstring to say "Returns None if the file doesn't exist" so callers (e.g., the code that checks if conn is not None) are correctly type-checked; no change to the urlquote/uri logic (urlquote, conn, sqlite3.connect, PRAGMA busy_timeout remain the same).src/eneru/utils.py (1)
8-29: LGTM — stricteris_numericand normalizedrun_commandreturns line up with the call-site assumptions.Rejecting
booland non-finite floats matches how callers inmonitor.py,health/voltage.py, and the shutdown mixins consume these values (all doint(float(x))or treatdepletion_rate > 0as a number), and normalizingstdout/stderrto""matchesshutdown/containers.py:142andshutdown/remote.py:193that unconditionally.strip()them. The negative-clamp informat_secondsis also a sensible display fix.One small thing: moving
import mathto module top would be idiomatic (it's the only inline import here andis_numericis on the hot path for every poll) — purely optional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/utils.py` around lines 8 - 29, The is_numeric function currently imports math inline; move the import to the module top for idiomatic style and to avoid the per-call import overhead: add a single top-level "import math" and remove the inline "import math" from inside is_numeric so the function continues to use math.isfinite without repeated imports.tests/test_tui.py (1)
109-128: Source-string inspection is fine as a guardrail, but consider also asserting at call time.Using
inspect.getsource+ substring/regex checks will silently drift if someone reformats the call (e.g. splits args across lines with a weird inner-space, uses a variableme = query_events_for_display, or renames a kwarg). A complementary behavioral assertion that patchesquery_events_for_displayand inspects its actual call args during a shortrun_tuiinvocation would be more robust. Not blocking — the current checks do catch the exact regression you described.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_tui.py` around lines 109 - 128, Add a runtime assertion test that patches/wraps the query_events_for_display function during a short invocation of run_tui to inspect how it was called: use unittest.mock.patch or monkeypatch to replace eneru.tui.query_events_for_display with a wrapper that records args/kwargs, call tui_mod.run_tui(...) with minimal config, then assert the recorded call has no positional time-range argument (i.e., len(args) == 1 and args[0] is the config) and that 'max_events' is present in kwargs, and finally restore the original function; reference run_tui and query_events_for_display when locating where to add this behavioral check.src/eneru/multi_ups.py (1)
265-307: Drain ordering is correct; minor nit on the Phase-1 join budget.Signaling
stop_eventbefore running peer shutdown sequences avoids concurrent state-file writes and notifications inside peer poll cycles. The two-phase join (short window, then full-timeout) is the right shape.Minor:
join_deadline = time.time() + max(1, timeout // 4)caps the per-peer pre-drain wait attimeout/4total across all threads combined (sinceremainingshrinks monotonically as threads are joined serially). With defaulttimeout=120that's a 30s combined budget — fine, but if a single peer hangs for the full 30s, later peers effectively get a 0-timeout join. Consider splitting the budget per-thread if a peer commonly takes seconds to exit its loop:🛠️ Optional per-thread budget
- self._stop_event.set() - join_deadline = time.time() + max(1, timeout // 4) - for thread in self._threads: - remaining = max(0.0, join_deadline - time.time()) - thread.join(timeout=remaining) + self._stop_event.set() + per_thread_budget = max(1.0, (timeout / 4) / max(len(self._threads), 1)) + for thread in self._threads: + thread.join(timeout=per_thread_budget)Not blocking for 5.1.1.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/multi_ups.py` around lines 265 - 307, The pre-drain join loop in _drain_all_groups currently uses a single join_deadline (join_deadline = time.time() + max(1, timeout // 4)) which gives a combined budget across all threads and can starve later threads; change it to compute a per-thread join budget derived from that phase budget divided by max(1, len(self._threads)) (or a sane minimum per-thread like 0.5s) and use that per-thread timeout when calling thread.join so each thread gets its own slice of the Phase-1 window (adjust references to self._threads and join_deadline accordingly in _drain_all_groups).tests/conftest.py (2)
245-275: LGTM — explicit target list is the right trade-off overfrom X import Yrebinding.Documenting why a single
patch("eneru.utils.run_command")is insufficient is exactly the context the next contributor will need. One suggestion for future-proofing: if you add another shutdown mixin (shutdown/foo.py) that doesfrom eneru.utils import run_command, the fixture will silently under-patch it. Consider an assertion or a helper that discovers these sites automatically (e.g. scanningeneru.shutdownsubmodules for arun_commandattribute) so new modules don't escape isolation unnoticed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/conftest.py` around lines 245 - 275, Replace the hardcoded targets in the patch_run_command_everywhere fixture with a discovery step that scans the eneru.shutdown package for submodules that expose a run_command symbol: use pkgutil.iter_modules over eneru.shutdown.__path__ and importlib.import_module to load each submodule, collect module paths where hasattr(module, "run_command"), then build the targets list from those module paths (plus any known non-shutdown locations like eneru.monitor and eneru.multi_ups). After discovery, assert that the discovered shutdown modules are included in the targets (or fail the test) so new shutdown submodules that import run_command don’t silently escape patching; keep the rest of the fixture (creating patchers, starting mocks, default return_value, yield, stop) unchanged.
42-103: Solid two-layer isolation; watch theStatsConfig.__init__wrapper signature if the dataclass grows.The approach is correct for today's
StatsConfig, butpatched_cfg_init(self, db_directory=isolated_str, **kw)bakes in the assumption thatdb_directoryis the only positional-friendly parameter. If someone later adds a second field beforedb_directoryin the dataclass, a positional call likeStatsConfig("/path", 100)will misroute the first arg. A slightly more future-proof shape is to detect whether the caller supplieddb_directoryand only inject the default otherwise:🛠️ Optional hardening
- def patched_cfg_init(self, db_directory=isolated_str, **kw): - return original_cfg_init(self, db_directory=db_directory, **kw) + def patched_cfg_init(self, *args, **kw): + if "db_directory" not in kw and not args: + kw["db_directory"] = isolated_str + return original_cfg_init(self, *args, **kw)Not urgent — current tests pass cleanly with the existing shape.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/conftest.py` around lines 42 - 103, The patched StatsConfig.__init__ currently assumes callers won't pass positional args for db_directory; update patched_cfg_init to accept (*args, **kw) and only inject db_directory=isolated_str when the caller did not supply it (i.e. 'db_directory' not in kw and no positional arg was provided for it). Concretely, change patched_cfg_init to signature patched_cfg_init(self, *args, **kw), and if 'db_directory' not in kw and len(args) == 0 set kw['db_directory'] = isolated_str before calling original_cfg_init(self, *args, **kw); otherwise forward arguments unchanged to original_cfg_init. This preserves current behavior but avoids misrouting when new dataclass fields are added before db_directory.src/eneru/shutdown/remote.py (2)
93-120: LGTM on timeout and thread-worker hardening.The
is not Nonecheck correctly preserves per-commandtimeout=0as an explicit override (matchesRemoteCommandConfig.timeout: Optional[int] = Nonein config.py and the test at test_monitor_core.py:877-886). Theexcept Exceptionin the worker thread at line 113 is appropriate here despite Ruff BLE001 — a bare-crash in a thread target would otherwise disappear silently with no log trail. Consider adding# noqa: BLE001on line 113 to document the deliberate catch-all and silence the static analyzer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/shutdown/remote.py` around lines 93 - 120, Add an inline noqa to the broad exception handler in shutdown_server_thread to document the deliberate catch-all and silence the Ruff BLE001 warning: locate the except Exception as exc: block inside the shutdown_server_thread function (in src/eneru/shutdown/remote.py) and append a trailing comment "# noqa: BLE001" to that except line, keeping the existing log message and behavior intact.
232-249: Validatestop_composepath before formatting the template.The validation that
stop_composehas apathat lines 245-249 runs after the template has already been rendered withshlex.quote("")at line 240. While thecontinueprevents the malformed command from being sent, it means the"stop_compose requires 'path'"warning is effectively dead code path from a correctness standpoint — but it also obscures intent: for any future template that madepathmandatory in a wayshlex.quote("")couldn't satisfy, the ordering would let a bad command slip through. Minor reordering makes the precondition check authoritative.♻️ Proposed reorder
if cmd_config.action: action_name = cmd_config.action.lower() if action_name not in REMOTE_ACTIONS: self._log_message( f" ⚠️ [{idx}/{cmd_count}] Unknown action: {action_name} (skipping)" ) continue + # Validate stop_compose has path + if action_name == "stop_compose" and not cmd_config.path: + self._log_message( + f" ⚠️ [{idx}/{cmd_count}] stop_compose requires 'path' parameter (skipping)" + ) + continue + # Get command template and substitute placeholders. # `path` is shlex-quoted because the template embeds it # directly into the remote shell — without quoting, a # malicious or malformed path could expand $(), `…`, or # ${…} on the remote host. command_template = REMOTE_ACTIONS[action_name] command = command_template.format( timeout=timeout, path=shlex.quote(cmd_config.path or "") ) description = action_name - - # Validate stop_compose has path - if action_name == "stop_compose" and not cmd_config.path: - self._log_message( - f" ⚠️ [{idx}/{cmd_count}] stop_compose requires 'path' parameter (skipping)" - ) - continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/shutdown/remote.py` around lines 232 - 249, Move the "stop_compose requires 'path'" precondition check to run before you fetch/format the REMOTE_ACTIONS template: in the loop where you access REMOTE_ACTIONS[action_name] and call command_template.format(...), check if action_name == "stop_compose" and ensure cmd_config.path is present first (the block that logs the warning and continue), then only construct command_template and call shlex.quote(cmd_config.path) and format the template; this ensures the validation in the loop for stop_compose is authoritative and prevents any template formatting with an empty path.src/eneru/tui.py (2)
1032-1036: Nit: redundantmax(0, …)on KEY_HOME.
len(log_events)is already non-negative, somax(0, len(log_events))is equivalent tolen(log_events). The inner clamp insiderender_logs_panel(lines 721-722) is what actually pins the view to the oldest event — this assignment just needs to overshoot. Trivial simplification:elif key == curses.KEY_HOME: # Jump to the oldest event currently in the list. - events_scroll = max(0, len(log_events)) + events_scroll = len(log_events) elif key == curses.KEY_END: events_scroll = 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/tui.py` around lines 1032 - 1036, The KEY_HOME handler uses max(0, len(log_events)) unnecessarily; replace the assignment events_scroll = max(0, len(log_events)) with the simpler events_scroll = len(log_events) in the input/key handling block where KEY_HOME is processed (the same block that sets events_scroll for KEY_END), since render_logs_panel already clamps the scroll; reference variables/events_scroll, log_events and the render_logs_panel function to locate the change.
962-972: Scroll keys are a no-op in normal mode — consider making<M>implicit on first scroll.When
show_more=False,events_cap = EVENTS_MAX_ROWS_NORMAL = 8at line 963, soquery_events_for_displaypulls at most 8 rows. The events panel typically displays ~8 rows, meaning↑/PgUp/Homehave nothing to scroll to and silently do nothing, despite the always-visible<↑↓> Scrollhint at line 759.The comment at lines 34-39 says the panel "pulls every event from the SQLite store … and lets the operator scroll through with the arrow keys" — but that's only true once the user presses
<M>. Two low-friction options:
- Auto-promote to
show_more=Truewhen the user presses ↑/PgUp/Home (keep<M>as an explicit toggle).- Dim the
<↑↓> Scrollhint whenshow_more=False, or change its label to match reality.Either clears up the "my arrow keys are broken" moment for first-time users. Not a correctness issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/eneru/tui.py` around lines 962 - 972, The events panel is limited to EVENTS_MAX_ROWS_NORMAL when show_more is False, making arrow/PgUp/Home appear to do nothing; modify the scroll-key handler (the function that processes ↑/PgUp/Home) to detect the first scroll attempt when show_more is False, set show_more = True, recompute events_cap = EVENTS_MAX_ROWS_MORE and re-run query_events_for_display(config, max_events=events_cap) (or fallback to parse_log_events) before performing the actual scroll action so the implicit "<M>" behavior is applied on first scroll; ensure the state change is idempotent so subsequent scrolls behave normally.tests/test_monitor_core.py (1)
320-331: Cross-class reference to a helper defined much later in the file.
TestFailsafe.test_connection_lost_while_ob_triggers_shutdowncallsTestAdvisoryTriggers._run_one_iteration(defined 600+ lines below at line 1026). It works because the reference is resolved at test-execution time, but the coupling is awkward — a reader scanningTestFailsafehas no hint where_run_one_iterationlives, andTestAdvisoryTriggersis now an implicit dependency ofTestFailsafe.Consider promoting
_run_one_iterationto a module-level helper (e.g.def _run_one_iteration(monitor, ups_data_response):nearmake_monitor) so both test classes reference it symmetrically.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_monitor_core.py` around lines 320 - 331, TestFailsafe.test_connection_lost_while_ob_triggers_shutdown references TestAdvisoryTriggers._run_one_iteration (defined much later), creating a confusing cross-class dependency; move the helper to module scope so both tests can call it. Extract the implementation of TestAdvisoryTriggers._run_one_iteration into a top-level function (e.g. def _run_one_iteration(monitor, ups_data_response):) placed near make_monitor, update both TestAdvisoryTriggers and TestFailsafe to call the module-level _run_one_iteration, and remove or deprecate the class-level helper to eliminate the implicit dependency while preserving behavior used in test_connection_lost_while_ob_triggers_shutdown and other tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/integration.yml:
- Around line 197-203: The dnf fallback using "--skip-broken" can hide a failed
eneru install when rpm fails, so change the "Install dependencies and package"
step (the rpm -ivh --nodeps || dnf install -y ./eneru-*.rpm --skip-broken
command) to either not use --skip-broken or to explicitly verify the install
succeeded afterward; e.g., run dnf without --skip-broken (so failure propagates)
or keep the fallback but add a check that the installed artifact exists (test -f
/opt/ups-monitor/eneru.py or rpm -q eneru) and exit non-zero if it’s missing so
the job fails fast.
In `@src/eneru/config.py`:
- Around line 592-605: The code uses notif_data.get(..., default) which won't
protect against explicit nulls; update the notifications parsing so you only
override the defaults when the YAML value is present and not None — e.g., read
tmp = notif_data.get('suppress') and if tmp is not None then set notif_suppress
= tmp (same for notif_voltage_hysteresis), and ensure any non-None value is of
the expected type before assigning (to satisfy NotificationsConfig and the
checks in validate_config); refer to notif_suppress, notif_voltage_hysteresis,
NotificationsConfig and validate_config to locate and apply this change.
In `@src/eneru/graph.py`:
- Around line 126-137: The bounds-calculation currently filters numeric values
using isinstance(v, (int, float)), which excludes numeric strings; change the
filter in the y_min/y_max auto-bound logic (the block that builds numeric =
[...]) to accept any value coercible by float() by attempting float(v) in a
try/except and collecting the float-converted values, then compute min/max from
those floats; make the same adjustment in the duplicate bounds block used
elsewhere (the similar numeric-filtering code later in the file) so plotting and
scaling accept numeric strings and other float-coercible inputs.
In `@src/eneru/logger.py`:
- Line 26: The logger name is still "ups-monitor" which breaks the syslog
rebrand; update the call to logging.getLogger(...) in the class where
self.logger is assigned (the self.logger = logging.getLogger("ups-monitor")
statement) to use "eneru" instead so the logger identifier matches the rest of
the codebase and docs.
In `@src/eneru/version.py`:
- Around line 4-5: Update the stale format comment to reflect the current
tagged-release example by changing "5.1.0" to "5.1.1" so the comment above the
__version__ assignment matches the bumped value; locate the comment near the
__version__ = "5.1.1" line and update the example string in that comment
accordingly.
In `@tests/test_redundancy.py`:
- Around line 363-370: The test currently checks ev.is_alive() after a fixed
sleep which can race with the evaluator calling snapshot(); modify the test to
wait (with a short timeout) until the failing snapshot path has actually been
exercised by polling the bad_monitor.state.snapshot flag/call indicator (or
equivalent sentinel) before asserting ev.is_alive(); in practice add a small
loop that repeatedly checks bad_monitor.state.snapshot (or a boolean set by the
snapshot implementation) and breaks on success or fails after timeout, then
perform the ev.is_alive() assertion so the check is gated on snapshot having
been invoked.
In `@tests/test_shutdown_vms.py`:
- Around line 118-127: The test currently only checks membership for per-VM
shutdown commands, which doesn't enforce order; update the assertions after
collecting issued = [c.args[0] for c in mock_run.call_args_list] to assert the
shutdown commands occur in the expected order (e.g., compute indices of
["virsh","shutdown","vm1"] and ["virsh","shutdown","vm2"] in issued and assert
index(vm1) < index(vm2) or assert the slice of issued contains the two shutdown
commands in that exact sequence), leaving the existing checks for the initial
"virsh list ..." and the absence of "virsh destroy" intact.
---
Outside diff comments:
In `@examples/config-homelab.yaml`:
- Around line 68-99: The example text describing remote_servers is outdated: it
only explains the legacy two-phase "parallel" model and misleads users about how
to schedule a server "last"; update the guidance to mention the new
shutdown_order phases and how they interact with the old parallel flag (i.e.,
that shutdown_order: <int> controls later phases and parallel: false runs a
server sequentially before the parallel batch per RemoteServerConfig), and
clarify that to run something after the parallel batch you must set
shutdown_order >= 1 (example: NAS uses shutdown_order: 2) rather than using
parallel: false.
In `@tests/test_cli.py`:
- Around line 245-257: Replace the hard-coded missing config path with a temp
path created from the pytest tmp_path fixture (e.g., typo_path = tmp_path /
"config.yaml") and ensure you do not create that file so it remains missing;
update the test to use str(typo_path) where needed, keep the argv patching that
calls main(), and leave the same SystemExit and captured output assertions
(asserting "Configuration is valid", "Config file not found", and the tmp_path
string) so the test is deterministic across environments.
---
Nitpick comments:
In @.github/workflows/release.yml:
- Around line 485-495: Add a concurrency group to the release workflow so
overlapping runs are serialized and wasted work avoided: update the job that
contains the "Deploy to gh-pages" step to include a concurrency: block (e.g.,
concurrency: { group: "${{ github.repository }}-release-gh-pages-${{ github.ref
}}", cancel-in-progress: false } or similar) at the job level so concurrent
release jobs targeting gh-pages are run one-at-a-time while still failing on
real divergence from the remote before pushing with the existing git push step.
In @.github/workflows/validate.yml:
- Around line 19-39: Add the same "how to refresh pins" explanatory comment next
to the pinned action SHAs in this workflow (e.g., the uses:
actions/checkout@de0fac2e..., uses: actions/setup-python@a309ff8b4..., and uses:
codecov/codecov-action@57e3a13... lines) so it matches codeql.yml; insert a
brief one-line note explaining the source/command or script to update the SHA
pins and where to run it (copy the exact phrasing from codeql.yml for
consistency).
In `@src/eneru/multi_ups.py`:
- Around line 265-307: The pre-drain join loop in _drain_all_groups currently
uses a single join_deadline (join_deadline = time.time() + max(1, timeout // 4))
which gives a combined budget across all threads and can starve later threads;
change it to compute a per-thread join budget derived from that phase budget
divided by max(1, len(self._threads)) (or a sane minimum per-thread like 0.5s)
and use that per-thread timeout when calling thread.join so each thread gets its
own slice of the Phase-1 window (adjust references to self._threads and
join_deadline accordingly in _drain_all_groups).
In `@src/eneru/shutdown/filesystems.py`:
- Around line 53-57: The dry-run message prints the raw options string which no
longer matches the actual argv because the real execution uses shlex.split;
update the dry-run branch in the same method that checks
self.config.behavior.dry_run to reconstruct the tokenized form (e.g. split
options with shlex.split and join with spaces or format as separate tokens)
before calling self._log_message so the logged command mirrors the real exec
args (use the same shlex.split logic already used elsewhere on the options
variable).
In `@src/eneru/shutdown/remote.py`:
- Around line 93-120: Add an inline noqa to the broad exception handler in
shutdown_server_thread to document the deliberate catch-all and silence the Ruff
BLE001 warning: locate the except Exception as exc: block inside the
shutdown_server_thread function (in src/eneru/shutdown/remote.py) and append a
trailing comment "# noqa: BLE001" to that except line, keeping the existing log
message and behavior intact.
- Around line 232-249: Move the "stop_compose requires 'path'" precondition
check to run before you fetch/format the REMOTE_ACTIONS template: in the loop
where you access REMOTE_ACTIONS[action_name] and call
command_template.format(...), check if action_name == "stop_compose" and ensure
cmd_config.path is present first (the block that logs the warning and continue),
then only construct command_template and call shlex.quote(cmd_config.path) and
format the template; this ensures the validation in the loop for stop_compose is
authoritative and prevents any template formatting with an empty path.
In `@src/eneru/stats.py`:
- Around line 608-619: The function currently returns None when the DB file
doesn't exist but its signature/docstring declare a non-Optional
sqlite3.Connection and use a "# type: ignore" to silence this; update the
function signature return type to Optional[sqlite3.Connection], remove the "#
type: ignore" suppression, and adjust the docstring to say "Returns None if the
file doesn't exist" so callers (e.g., the code that checks if conn is not None)
are correctly type-checked; no change to the urlquote/uri logic (urlquote, conn,
sqlite3.connect, PRAGMA busy_timeout remain the same).
In `@src/eneru/tui.py`:
- Around line 1032-1036: The KEY_HOME handler uses max(0, len(log_events))
unnecessarily; replace the assignment events_scroll = max(0, len(log_events))
with the simpler events_scroll = len(log_events) in the input/key handling block
where KEY_HOME is processed (the same block that sets events_scroll for
KEY_END), since render_logs_panel already clamps the scroll; reference
variables/events_scroll, log_events and the render_logs_panel function to locate
the change.
- Around line 962-972: The events panel is limited to EVENTS_MAX_ROWS_NORMAL
when show_more is False, making arrow/PgUp/Home appear to do nothing; modify the
scroll-key handler (the function that processes ↑/PgUp/Home) to detect the first
scroll attempt when show_more is False, set show_more = True, recompute
events_cap = EVENTS_MAX_ROWS_MORE and re-run query_events_for_display(config,
max_events=events_cap) (or fallback to parse_log_events) before performing the
actual scroll action so the implicit "<M>" behavior is applied on first scroll;
ensure the state change is idempotent so subsequent scrolls behave normally.
In `@src/eneru/utils.py`:
- Around line 8-29: The is_numeric function currently imports math inline; move
the import to the module top for idiomatic style and to avoid the per-call
import overhead: add a single top-level "import math" and remove the inline
"import math" from inside is_numeric so the function continues to use
math.isfinite without repeated imports.
In `@tests/conftest.py`:
- Around line 245-275: Replace the hardcoded targets in the
patch_run_command_everywhere fixture with a discovery step that scans the
eneru.shutdown package for submodules that expose a run_command symbol: use
pkgutil.iter_modules over eneru.shutdown.__path__ and importlib.import_module to
load each submodule, collect module paths where hasattr(module, "run_command"),
then build the targets list from those module paths (plus any known non-shutdown
locations like eneru.monitor and eneru.multi_ups). After discovery, assert that
the discovered shutdown modules are included in the targets (or fail the test)
so new shutdown submodules that import run_command don’t silently escape
patching; keep the rest of the fixture (creating patchers, starting mocks,
default return_value, yield, stop) unchanged.
- Around line 42-103: The patched StatsConfig.__init__ currently assumes callers
won't pass positional args for db_directory; update patched_cfg_init to accept
(*args, **kw) and only inject db_directory=isolated_str when the caller did not
supply it (i.e. 'db_directory' not in kw and no positional arg was provided for
it). Concretely, change patched_cfg_init to signature patched_cfg_init(self,
*args, **kw), and if 'db_directory' not in kw and len(args) == 0 set
kw['db_directory'] = isolated_str before calling original_cfg_init(self, *args,
**kw); otherwise forward arguments unchanged to original_cfg_init. This
preserves current behavior but avoids misrouting when new dataclass fields are
added before db_directory.
In `@tests/test_monitor_core.py`:
- Around line 320-331:
TestFailsafe.test_connection_lost_while_ob_triggers_shutdown references
TestAdvisoryTriggers._run_one_iteration (defined much later), creating a
confusing cross-class dependency; move the helper to module scope so both tests
can call it. Extract the implementation of
TestAdvisoryTriggers._run_one_iteration into a top-level function (e.g. def
_run_one_iteration(monitor, ups_data_response):) placed near make_monitor,
update both TestAdvisoryTriggers and TestFailsafe to call the module-level
_run_one_iteration, and remove or deprecate the class-level helper to eliminate
the implicit dependency while preserving behavior used in
test_connection_lost_while_ob_triggers_shutdown and other tests.
In `@tests/test_multi_ups.py`:
- Around line 1279-1283: The test was relaxed to use assert
live_thread.join.called because _drain_all_groups now calls Thread.join twice
per thread; update the assertion to enforce the two-phase contract by replacing
assert live_thread.join.called with assert live_thread.join.call_count >= 2 so
the test verifies at least two joins occurred for live_thread (keep the existing
check for the timeout log: any("still running after drain timeout" in m for m in
logs)).
In `@tests/test_tui.py`:
- Around line 109-128: Add a runtime assertion test that patches/wraps the
query_events_for_display function during a short invocation of run_tui to
inspect how it was called: use unittest.mock.patch or monkeypatch to replace
eneru.tui.query_events_for_display with a wrapper that records args/kwargs, call
tui_mod.run_tui(...) with minimal config, then assert the recorded call has no
positional time-range argument (i.e., len(args) == 1 and args[0] is the config)
and that 'max_events' is present in kwargs, and finally restore the original
function; reference run_tui and query_events_for_display when locating where to
add this behavioral check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7db37aa5-3772-4abb-945b-20cc83661dd1
📒 Files selected for processing (49)
.github/actions/e2e-setup/action.yml.github/workflows/codeql.yml.github/workflows/e2e.yml.github/workflows/integration.yml.github/workflows/pypi.yml.github/workflows/release.yml.github/workflows/validate.ymldocs/changelog.mdexamples/config-dual-ups.yamlexamples/config-enterprise.yamlexamples/config-homelab.yamlexamples/config-reference.yamlpackaging/scripts/postinstall.shpackaging/scripts/postremove.shpackaging/scripts/preremove.shpyproject.tomlpytest.inisrc/eneru/actions.pysrc/eneru/cli.pysrc/eneru/completion/eneru.bashsrc/eneru/config.pysrc/eneru/graph.pysrc/eneru/health/battery.pysrc/eneru/health/voltage.pysrc/eneru/logger.pysrc/eneru/monitor.pysrc/eneru/multi_ups.pysrc/eneru/redundancy.pysrc/eneru/shutdown/filesystems.pysrc/eneru/shutdown/remote.pysrc/eneru/shutdown/vms.pysrc/eneru/stats.pysrc/eneru/tui.pysrc/eneru/utils.pysrc/eneru/version.pytests/conftest.pytests/e2e/groups/multi-ups.shtests/e2e/groups/single-ups.shtests/e2e/scenarios/low-battery.devtests/e2e/ssh-target/Dockerfiletests/e2e/ssh-target/entrypoint.shtests/test_cli.pytests/test_monitor_core.pytests/test_multi_ups.pytests/test_redundancy.pytests/test_remote_commands.pytests/test_shutdown_vms.pytests/test_stats.pytests/test_tui.py
| - name: Install dependencies and package | ||
| run: | | ||
| dnf install -y ${{ matrix.python_pkg }} | ||
| # Install the .rpm package (ignore dependency on nut-client for CI) | ||
| rpm -ivh --nodeps eneru-*.rpm || dnf install -y ./eneru-*.rpm --skip-broken || true | ||
| # Try rpm first (fast, with --nodeps for the missing nut-client). | ||
| # If that fails for non-dep reasons, fall back to dnf with | ||
| # --skip-broken. Real install failures still propagate. | ||
| rpm -ivh --nodeps eneru-*.rpm || dnf install -y ./eneru-*.rpm --skip-broken |
There was a problem hiding this comment.
Minor: dnf install … --skip-broken can still hide a failed eneru install.
Dropping || true is the right call. One residual gap: if the initial rpm -ivh --nodeps fails for a non-dep reason and the fallback runs, dnf --skip-broken will happily exit 0 while silently skipping the eneru-*.rpm itself (e.g., signature/repo issues). The subsequent test -f /opt/ups-monitor/eneru.py in the Verify step does catch this, so it's not a correctness bug today — just worth noting if you ever remove that sanity check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/integration.yml around lines 197 - 203, The dnf fallback
using "--skip-broken" can hide a failed eneru install when rpm fails, so change
the "Install dependencies and package" step (the rpm -ivh --nodeps || dnf
install -y ./eneru-*.rpm --skip-broken command) to either not use --skip-broken
or to explicitly verify the install succeeded afterward; e.g., run dnf without
--skip-broken (so failure propagates) or keep the fallback but add a check that
the installed artifact exists (test -f /opt/ups-monitor/eneru.py or rpm -q
eneru) and exit non-zero if it’s missing so the job fails fast.
| # Defaults match NotificationsConfig dataclass. | ||
| notif_suppress: List[str] = [] | ||
| notif_voltage_hysteresis = 30 | ||
|
|
||
| if 'notifications' in data: | ||
| notif_data = data['notifications'] | ||
| notif_title = notif_data.get('title') | ||
| avatar_url = notif_data.get('avatar_url') | ||
| notif_timeout = notif_data.get('timeout', 10) | ||
| notif_retry_interval = notif_data.get('retry_interval', 5) | ||
| notif_suppress = notif_data.get('suppress', notif_suppress) | ||
| notif_voltage_hysteresis = notif_data.get( | ||
| 'voltage_hysteresis_seconds', notif_voltage_hysteresis, | ||
| ) |
There was a problem hiding this comment.
suppress: null / voltage_hysteresis_seconds: null in YAML bypasses the defaults.
dict.get(key, default) only returns default when the key is absent. If a user writes:
notifications:
suppress: null # or: suppress:
voltage_hysteresis_seconds:…the key is present with value None, so notif_suppress becomes None and notif_voltage_hysteresis becomes None, both of which then flow straight into NotificationsConfig(...) and violate the declared List[str] / int contract. validate_config does catch the int case (Line 935 not isinstance(hys, int)), but suppress=None would fail if config.notifications.suppress: gracefully and never get validated, while any later code assuming a list would break.
🛡️ Proposed fix
- notif_suppress = notif_data.get('suppress', notif_suppress)
- notif_voltage_hysteresis = notif_data.get(
- 'voltage_hysteresis_seconds', notif_voltage_hysteresis,
- )
+ if notif_data.get('suppress') is not None:
+ notif_suppress = notif_data['suppress']
+ if notif_data.get('voltage_hysteresis_seconds') is not None:
+ notif_voltage_hysteresis = notif_data['voltage_hysteresis_seconds']🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/eneru/config.py` around lines 592 - 605, The code uses
notif_data.get(..., default) which won't protect against explicit nulls; update
the notifications parsing so you only override the defaults when the YAML value
is present and not None — e.g., read tmp = notif_data.get('suppress') and if tmp
is not None then set notif_suppress = tmp (same for notif_voltage_hysteresis),
and ensure any non-None value is of the expected type before assigning (to
satisfy NotificationsConfig and the checks in validate_config); refer to
notif_suppress, notif_voltage_hysteresis, NotificationsConfig and
validate_config to locate and apply this change.
| # Auto-bound from the numeric subset only — `data` may carry | ||
| # None or non-numeric placeholders for missing samples; min/max | ||
| # over a mixed list raises TypeError on Python 3. | ||
| if y_min is None or y_max is None: | ||
| numeric = [v for v in data if isinstance(v, (int, float))] | ||
| if not numeric: | ||
| # Nothing to scale against; render an empty grid. | ||
| return [" " * width for _ in range(height)] | ||
| if y_min is None: | ||
| y_min = min(numeric) | ||
| if y_max is None: | ||
| y_max = max(numeric) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
wc -l src/eneru/graph.pyRepository: m4r1k/Eneru
Length of output: 77
🏁 Script executed:
cat -n src/eneru/graph.py | sed -n '120,240p'Repository: m4r1k/Eneru
Length of output: 5730
🏁 Script executed:
cat -n src/eneru/graph.py | sed -n '100,125p'Repository: m4r1k/Eneru
Length of output: 1599
🏁 Script executed:
cat -n src/eneru/graph.py | head -20Repository: m4r1k/Eneru
Length of output: 819
🏁 Script executed:
cat -n src/eneru/graph.py | head -40Repository: m4r1k/Eneru
Length of output: 1487
Align bounds calculation with plotting's numeric coercion.
Bounds use isinstance(v, (int, float)) but plotting accepts anything float(value) can coerce. This can cause graphs to render blank when data contains numeric strings like "230.5".
Proposed fix
+import math
import locale
import os
@@
if y_min is None or y_max is None:
- numeric = [v for v in data if isinstance(v, (int, float))]
+ numeric = []
+ for value in data:
+ try:
+ v = float(value)
+ except (TypeError, ValueError):
+ continue
+ if math.isfinite(v):
+ numeric.append(v)
if not numeric:
# Nothing to scale against; render an empty grid.
return [" " * width for _ in range(height)]Also applies to: 167-170
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/eneru/graph.py` around lines 126 - 137, The bounds-calculation currently
filters numeric values using isinstance(v, (int, float)), which excludes numeric
strings; change the filter in the y_min/y_max auto-bound logic (the block that
builds numeric = [...]) to accept any value coercible by float() by attempting
float(v) in a try/except and collecting the float-converted values, then compute
min/max from those floats; make the same adjustment in the duplicate bounds
block used elsewhere (the similar numeric-filtering code later in the file) so
plotting and scaling accept numeric strings and other float-coercible inputs.
| @@ -26,8 +26,20 @@ def __init__(self, log_file: Optional[str], config: Config): | |||
| self.logger = logging.getLogger("ups-monitor") | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether "ups-monitor" still appears anywhere (tests, systemd units, docs) to
# validate the rebrand scope and whether this file was missed.
rg -nP '\bups-monitor\b' -C2
rg -nP 'getLogger\(' --type=py -C1Repository: m4r1k/Eneru
Length of output: 29390
Update logger name from "ups-monitor" to "eneru" to complete the syslog rebrand.
The changelog and code comments confirm the syslog identifier was renamed to "eneru" in v5.0 (see docs/changelog.md and the comment in src/eneru/monitor.py line 391), but logging.getLogger("ups-monitor") here was not updated. This inconsistency breaks downstream syslog filtering.
Proposed change
- self.logger = logging.getLogger("ups-monitor")
+ self.logger = logging.getLogger("eneru")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.logger = logging.getLogger("ups-monitor") | |
| self.logger = logging.getLogger("eneru") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/eneru/logger.py` at line 26, The logger name is still "ups-monitor" which
breaks the syslog rebrand; update the call to logging.getLogger(...) in the
class where self.logger is assigned (the self.logger =
logging.getLogger("ups-monitor") statement) to use "eneru" instead so the logger
identifier matches the rest of the codebase and docs.
| # Format: "5.1.0" for tagged releases, "5.1.0-5-gabcdef1" for dev builds | ||
| __version__ = "5.1.1" |
There was a problem hiding this comment.
Stale example in the format comment.
The comment still uses 5.1.0 as the tagged-release example after bumping __version__ to 5.1.1. Minor, but worth refreshing to keep the example aligned with the current version.
📝 Proposed tweak
-# Format: "5.1.0" for tagged releases, "5.1.0-5-gabcdef1" for dev builds
+# Format: "5.1.1" for tagged releases, "5.1.1-5-gabcdef1" for dev builds
__version__ = "5.1.1"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Format: "5.1.0" for tagged releases, "5.1.0-5-gabcdef1" for dev builds | |
| __version__ = "5.1.1" | |
| # Format: "5.1.1" for tagged releases, "5.1.1-5-gabcdef1" for dev builds | |
| __version__ = "5.1.1" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/eneru/version.py` around lines 4 - 5, Update the stale format comment to
reflect the current tagged-release example by changing "5.1.0" to "5.1.1" so the
comment above the __version__ assignment matches the bumped value; locate the
comment near the __version__ = "5.1.1" line and update the example string in
that comment accordingly.
| ev.start() | ||
| time.sleep(0.15) | ||
| # F3: assert the evaluator thread survived the exception storm | ||
| # BEFORE we signal stop. A thread killed by the unhandled | ||
| # RuntimeError would still leave `not ev.is_alive()` True after | ||
| # join (a dead thread joins instantly), so the original test | ||
| # would silently pass even if the swallowing contract broke. | ||
| assert ev.is_alive() |
There was a problem hiding this comment.
Wait until the failing snapshot path is exercised.
ev.is_alive() after a fixed sleep can pass even if the evaluator has not called snapshot() yet. Gate the assertion on bad_monitor.state.snapshot being called first.
Proposed test hardening
ev.start()
- time.sleep(0.15)
+ deadline = time.time() + 2
+ while bad_monitor.state.snapshot.call_count == 0 and time.time() < deadline:
+ time.sleep(0.01)
+ assert bad_monitor.state.snapshot.call_count > 0
# F3: assert the evaluator thread survived the exception storm📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ev.start() | |
| time.sleep(0.15) | |
| # F3: assert the evaluator thread survived the exception storm | |
| # BEFORE we signal stop. A thread killed by the unhandled | |
| # RuntimeError would still leave `not ev.is_alive()` True after | |
| # join (a dead thread joins instantly), so the original test | |
| # would silently pass even if the swallowing contract broke. | |
| assert ev.is_alive() | |
| ev.start() | |
| deadline = time.time() + 2 | |
| while bad_monitor.state.snapshot.call_count == 0 and time.time() < deadline: | |
| time.sleep(0.01) | |
| assert bad_monitor.state.snapshot.call_count > 0 | |
| # F3: assert the evaluator thread survived the exception storm | |
| # BEFORE we signal stop. A thread killed by the unhandled | |
| # RuntimeError would still leave `not ev.is_alive()` True after | |
| # join (a dead thread joins instantly), so the original test | |
| # would silently pass even if the swallowing contract broke. | |
| assert ev.is_alive() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_redundancy.py` around lines 363 - 370, The test currently checks
ev.is_alive() after a fixed sleep which can race with the evaluator calling
snapshot(); modify the test to wait (with a short timeout) until the failing
snapshot path has actually been exercised by polling the
bad_monitor.state.snapshot flag/call indicator (or equivalent sentinel) before
asserting ev.is_alive(); in practice add a small loop that repeatedly checks
bad_monitor.state.snapshot (or a boolean set by the snapshot implementation) and
breaks on success or fails after timeout, then perform the ev.is_alive()
assertion so the check is gated on snapshot having been invoked.
| # Assert the actual virsh command sequence — without this, the test | ||
| # passed even when the wrong command (or no command at all) was | ||
| # issued during shutdown. | ||
| issued = [c.args[0] for c in mock_run.call_args_list] | ||
| assert issued[0] == ["virsh", "list", "--name", "--state-running"] | ||
| # Per-VM shutdowns are issued in the order virsh list returned them. | ||
| assert ["virsh", "shutdown", "vm1"] in issued | ||
| assert ["virsh", "shutdown", "vm2"] in issued | ||
| # No `destroy` should fire when graceful shutdown completes cleanly. | ||
| assert not any(c[:2] == ["virsh", "destroy"] for c in issued) |
There was a problem hiding this comment.
Pin the ordered command sequence, not just membership.
Lines 123-125 say the per-VM shutdowns are ordered, but in issued would still pass if vm2 was shut down before vm1.
Proposed test tightening
issued = [c.args[0] for c in mock_run.call_args_list]
- assert issued[0] == ["virsh", "list", "--name", "--state-running"]
- # Per-VM shutdowns are issued in the order virsh list returned them.
- assert ["virsh", "shutdown", "vm1"] in issued
- assert ["virsh", "shutdown", "vm2"] in issued
+ assert issued[:4] == [
+ ["virsh", "list", "--name", "--state-running"],
+ ["virsh", "shutdown", "vm1"],
+ ["virsh", "shutdown", "vm2"],
+ ["virsh", "list", "--name", "--state-running"],
+ ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Assert the actual virsh command sequence — without this, the test | |
| # passed even when the wrong command (or no command at all) was | |
| # issued during shutdown. | |
| issued = [c.args[0] for c in mock_run.call_args_list] | |
| assert issued[0] == ["virsh", "list", "--name", "--state-running"] | |
| # Per-VM shutdowns are issued in the order virsh list returned them. | |
| assert ["virsh", "shutdown", "vm1"] in issued | |
| assert ["virsh", "shutdown", "vm2"] in issued | |
| # No `destroy` should fire when graceful shutdown completes cleanly. | |
| assert not any(c[:2] == ["virsh", "destroy"] for c in issued) | |
| # Assert the actual virsh command sequence — without this, the test | |
| # passed even when the wrong command (or no command at all) was | |
| # issued during shutdown. | |
| issued = [c.args[0] for c in mock_run.call_args_list] | |
| assert issued[:4] == [ | |
| ["virsh", "list", "--name", "--state-running"], | |
| ["virsh", "shutdown", "vm1"], | |
| ["virsh", "shutdown", "vm2"], | |
| ["virsh", "list", "--name", "--state-running"], | |
| ] | |
| # No `destroy` should fire when graceful shutdown completes cleanly. | |
| assert not any(c[:2] == ["virsh", "destroy"] for c in issued) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_shutdown_vms.py` around lines 118 - 127, The test currently only
checks membership for per-VM shutdown commands, which doesn't enforce order;
update the assertions after collecting issued = [c.args[0] for c in
mock_run.call_args_list] to assert the shutdown commands occur in the expected
order (e.g., compute indices of ["virsh","shutdown","vm1"] and
["virsh","shutdown","vm2"] in issued and assert index(vm1) < index(vm2) or
assert the slice of issued contains the two shutdown commands in that exact
sequence), leaving the existing checks for the initial "virsh list ..." and the
absence of "virsh destroy" intact.
Bundle the actionable findings from the second AI review pass (cubic.dev + CodeRabbit Pro on commits 0c16da6..681248b) plus the codecov-driven backfill for the F1 voltage-escalation fix and the new redundancy local_shutdown_callback. Every behaviour change ships with a test. Cubic.dev — P1 fixes: - src/eneru/tui.py: enable curses keypad mode (`stdscr.keypad(True)`) in run_tui's main loop. Without it the new arrow / page / home scroll bindings receive raw escape sequences instead of KEY_* constants, and the leading ESC byte (27) hits the quit branch — the entire TUI scroll feature was broken on terminals where curses doesn't enable keypad by default. Reported by @cubic-dev-ai. + tests/test_tui.py: source-inspection guardrail asserting stdscr.keypad(True) is present in run_tui. - src/eneru/shutdown/filesystems.py: wrap shlex.split(options) in try/except ValueError so a malformed YAML options string (e.g. unclosed quote) skips the offending mount with a logged error instead of crashing the entire shutdown sequence. Reported by @cubic-dev-ai. + tests/test_shutdown_filesystems.py: new test feeds a malformed options string and asserts the bad mount is skipped while the good one still reaches umount. Cubic.dev — P2 fixes: - tests/e2e/groups/single-ups.sh: tighten Test 2's exit-code check to ONLY accept 124 (SIGTERM from `timeout`); the previous "0 or 124" let a premature daemon exit during a "monitor normal state" check pass silently. Reported by @cubic-dev-ai. - tests/conftest.py: patch_run_command_everywhere now also patches eneru.utils.run_command so command_exists() (which resolves through utils at call time) doesn't shell out during shutdown tests. Reported by @cubic-dev-ai. Cubic.dev — P3 fix: - examples/config-enterprise.yaml: Slack URL `?footer=no` moved before `#operations`; the previous order placed the query string inside the URL fragment where Apprise can't parse it. Reported by @cubic-dev-ai. CodeRabbit — actionable fixes: - src/eneru/shutdown/remote.py: validate stop_compose has a `path` BEFORE rendering the REMOTE_ACTIONS template; the old ordering formatted with shlex.quote("") first and the precondition warning was effectively dead code (any future template change that made path mandatory in a way shlex.quote("") couldn't satisfy would silently slip through). Reported by @coderabbitai. + tests/test_remote_commands.py: new test patches the template's .format and asserts it's never invoked when path is missing. - src/eneru/tui.py: arrow / PgUp / Home auto-promote `show_more=True` on first press. The 8-row normal cap matched the visible window, so scrolling toward older history was a silent no-op despite the always-visible `<↑↓> Scroll` hint. Also drop the redundant max(0, len(log_events)) on KEY_HOME (render_logs_panel already clamps). Reported by @coderabbitai. + tests/test_tui.py: source-inspection guardrail asserting the three keys are grouped and show_more flips before scroll math. - examples/config-homelab.yaml: section comment above the remote_servers block rewritten to describe the shutdown_order phase model (the old text only described the legacy two-phase parallel flag, which is exactly what the NAS block stopped using in the F2 fix). Reported by @coderabbitai. - tests/test_cli.py: test_validate_config_nonexistent_file now uses tmp_path instead of a hard-coded /nonexistent/path/config.yaml. Reported by @coderabbitai. - tests/test_monitor_core.py: promote _run_one_iteration to module scope so TestFailsafe doesn't reach across 700 lines into TestAdvisoryTriggers. Class-level shim kept for the existing self._run_one_iteration call sites. Reported by @coderabbitai. - src/eneru/utils.py: `import math` moved to module top (was inline inside is_numeric on the per-poll hot path). Reported by @coderabbitai. - src/eneru/stats.py: open_readonly return type annotated Optional[sqlite3.Connection]; drop the `# type: ignore` since the declared type now matches the documented `None on missing file` behaviour. Reported by @coderabbitai. Codecov-driven backfill: - tests/test_voltage.py: TestVoltageSeverityEscalation covers the F1 fix from Commit 2 (severity escalation within the same LOW/HIGH state must refresh voltage_pending_severe so the immediate-notify bypass fires). Three cases: mild→severe escalation, no double-fire after already-notified, HIGH-side symmetry. - tests/test_redundancy.py: TestLocalShutdownCallback covers the Commit 2 redundancy local-poweroff fix. Four cases: callback fires on is_local quorum loss, callback NOT invoked for non-local groups, callback optional (no crash when unset), callback skipped when remote shutdown raises (documented behaviour). Co-Authored-By: Claude Opus 4.7 <[email protected]>
… maintenance Picked up the four worth-doing items skipped from the previous AI review pass; left the four genuinely-not-worth-it ones for the backlog. CodeRabbit nitpicks landed: - .github/workflows/release.yml: add `concurrency: group: release-gh-pages, cancel-in-progress: false` so two parallel releases (e.g. tag push overlapping with a manual workflow_dispatch) don't both rebuild + sign packages and only collide at the final non-fast-forward push. Reported by @coderabbitai. - tests/conftest.py: rewrite `patched_cfg_init` to use *args/**kw and only inject the isolated default when the caller didn't supply one. Today db_directory is the first dataclass field so positional calls work; this hardens the fixture against future StatsConfig field reordering. Reported by @coderabbitai. + tests/test_stats.py TestIsolateStatsDbDirectoryFixture: covers default construction, explicit-keyword override, explicit-positional override, and pass-through of unrelated kwargs. - src/eneru/shutdown/filesystems.py: dry-run log line previously printed the raw `options` string (`umount -l -f /mnt`) which doesn't match the actual argv after shlex.split. Build the argv up-front and render with shlex.quote in the dry-run line so the log matches reality. As a side benefit, malformed-options detection now also fires in dry-run mode, surfacing config errors before a real shutdown attempt. Reported by @coderabbitai. + tests/test_shutdown_filesystems.py: two new tests cover the argv-rendered dry-run line and the dry-run malformed-options detection. Workflow consistency: - .github/workflows/{validate,integration,e2e,pypi,release}.yml + .github/actions/e2e-setup/action.yml: copy the SHA-pin refresh comment from codeql.yml to every workflow with pinned actions, so a maintainer reading any one file sees the refresh procedure inline. Reported by @coderabbitai. CLAUDE.md: - New "GitHub Actions SHA pin maintenance" section documenting that pins drift over time, refresh cadence (~every 3 months / on upstream security advisory / on major-version bump), the exact `gh api` command for both tag-tracked and branch-tracked actions, and a snapshot table of the currently-pinned action set so a future Claude session has the full picture without grepping every workflow. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Summary
Bug-fix release with one small TUI improvement. Bundles the actionable findings from a third-party AI code review (CodeRabbit Pro + Cubic.Dev) of the v5.1.0 codebase. Drop-in upgrade.
Each commit is one themed group; see the commit body for the per-fix bullets with reviewer attribution.
Commit groups
notifications.suppress+ multi-UPSstatistics),is_localredundancy local poweroff, multi-UPS state-file write race, TUI live-blending key mismatch, virsh-list exit unchecked in wait loop, umount multi-flag options, voltage severity escalation (F1), shutdown/remote.py robustness, graph.py correctness, battery.py / logger.py / stats.py / utils.py / cli.py fixes, bash completion quoting.daemon-reloadchroot guard, postinstall fresh-vs-upgrade disambiguation.|| trueremoval, e2e readiness fail-fast, codeqlcontents:read, every action SHA-pinned across all workflows, ssh-target/var/run/sshdcreation, reference-config corrections (SSH host key checking, Apprise URL formats,compose_files: [], F2 homelabparallel: falsecomment).StatsConfig/StatsStoreisolation fixture (production/var/lib/eneruno longer leaks from tests), multi-namespacerun_commandpatching helper, failsafe tests routed through real_main_loop, F3 thread-survival assertion in redundancy evaluator, e2e exit-code capture viaPIPESTATUS, low-battery scenario runtime above critical threshold, virsh happy-path command sequence assertions.format_secondsnegative clamp,is_numericrejects bool/NaN/Inf, hardened logger handler cleanup, legacyups-monitorsyslog tag renamed toeneru(power events were going to journalctl under the pre-v5.0 identifier).↑/↓/PgUp/PgDn/Home/Endscroll through full history;<M>toggles 8↔500 visible rows.Test plan
validate× 6 Python versions + 4 E2E matrix jobs)sudo python3 /opt/ups-monitor/eneru.py --validate-config <example>after building the deb<↑>/<↓>,<Home>/<End>)-t eneru(was-t ups-monitor)/var/lib/eneru/Notes
@coderabbitai full review+@cubic-dev-ai review this pull request) is gated on operator approval per the project's quota-conservation policy inCLAUDE.md.🤖 Generated with Claude Code
Summary by cubic
Bug-fix release with one TUI improvement: the events panel now scrolls through full history. Includes security hardening, CI/packaging fixes, and multiple correctness fixes. Drop‑in upgrade.
Bug Fixes
force=true.is_local: quorum loss now powers off the host.virsh listfailures no longer yield false “all stopped”; force‑destroy still runs.notifications.suppress,voltage_hysteresis_seconds, and multi‑UPSstatistics.eneru(wasups-monitor).Security, CI, and Packaging
stop_composepath isshlex.quoted; template no longer double‑quotes.id-token: writelimited to publish.CLAUDE.md.gh-pagesdeploys to avoid concurrent races; nogit push -ftogh-pages.|| truemasks); readiness checks error on timeout; added CodeQLcontents: read; E2E low‑battery scenario tuned to fire against all thresholds.systemctl daemon-reloadwhen systemd isn’t present.pyproject.tomlnow reads version fromeneru.version.__version__; examples use safer SSH defaults, correct Slack Apprise URL,compose_files: [], and fixed NAS shutdown ordering.Written for commit e9ae4fc. Summary will update on new commits.
Summary by CodeRabbit
Release Notes - Version 5.1.1
New Features
Bug Fixes
Security
Documentation