feat(v5.1.0): Phase 2 — redundancy groups, health model, SQLite stats, TUI graphs#26
feat(v5.1.0): Phase 2 — redundancy groups, health model, SQLite stats, TUI graphs#26
Conversation
Adds a per-cycle, atomic snapshot of the latest UPS observation that external readers (the upcoming Phase 2 redundancy-group evaluator) can consume safely from another thread. `MonitorState` gains a non-reentrant `_lock` (excluded from `__repr__` and `__eq__`), nine `latest_*`/`trigger_*` fields, and a `snapshot()` helper that returns a frozen `HealthSnapshot` namedtuple. The poll loop in `monitor.py` writes all snapshot fields under one lock acquisition at the bottom of each successful cycle, alongside `previous_status`. The depletion-rate field is updated under the same lock from the on-battery and on-line handlers so the published value reflects the cycle's freshly computed rate (or zeroed when off battery). Pure infrastructure commit: no behaviour change for legacy single-UPS deployments, no new advisory branches yet -- those land in the redundancy-evaluator commit. Existing 410 tests still pass; +8 unit tests cover the lock attribute, snapshot contents, concurrent reader safety, dataclass equality/repr unaffected by the lock field, and the default round trip. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (54)
📝 WalkthroughWalkthroughThis PR introduces Phase 2 of v5.1.0 with four major features: redundancy groups with quorum-based shutdown logic, per-UPS SQLite statistics persistence with background writer, TUI graph rendering using Braille Unicode characters, and voltage notification hysteresis with auto-detection re-snapping. Configuration validation, monitor state management, and CLI are updated throughout. Extensive documentation and test coverage accompany the changes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes The PR introduces four orthogonal major features (redundancy groups, SQLite stats, Braille graphs, voltage hysteresis) with significant logic density and integration across monitor, coordinator, state, and CLI. While changes are well-organized by feature, the heterogeneity of concerns, control-flow complexity (especially health model classification and quorum evaluation), and broad file spread across core modules (60+ files) warrant careful review. Extensive test coverage and documentation mitigate complexity somewhat. Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #26 +/- ##
==========================================
+ Coverage 70.99% 79.11% +8.11%
==========================================
Files 19 23 +4
Lines 2310 3457 +1147
Branches 470 675 +205
==========================================
+ Hits 1640 2735 +1095
- Misses 548 580 +32
- Partials 122 142 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Introduces the Phase 2 config layer for redundancy groups -- the config mirror of UPSGroupConfig that lets multiple UPS sources share a single quorum-gated shutdown decision (dual-PSU racks, A+B feeds, etc.). Behaviour at runtime is unchanged in this commit: the dataclass is parsed and validated but not yet wired into the monitor. The evaluator, advisory triggers, and executor land in the next commit. Config layer: - New RedundancyGroupConfig dataclass mirroring UPSGroupConfig in full (name, ups_sources, min_healthy, degraded/unknown_counts_as, is_local, triggers, remote_servers, virtual_machines, containers, filesystems). - ConfigLoader._parse_redundancy_groups parses the new YAML section, inheriting global triggers when the per-group block is omitted. - validate_config gains the rules: 1 <= min_healthy <= |ups_sources| (== |ups_sources| warns "no redundancy"); reject 0/negative/non-int; reject empty/duplicate/missing names; reject unknown UPS references; reject duplicate sources; enum-check degraded/unknown_counts_as; reject local resources on a non-is_local group; enforce at most one is_local across all UPS+redundancy groups; reject remote-server (host,user) conflicts across tiers and across redundancy groups. - The pre-existing "multiple UPS groups marked is_local" rule is subsumed by the new combined rule and kept message-substring-stable for downstream tests. CLI: eneru validate now prints a "Redundancy groups" section listing sources, quorum, remote servers, and (when is_local) local resources. Examples & docs: - New examples/config-redundancy.yaml -- minimal dual-PSU config. - examples/config-reference.yaml gains a fully-commented redundancy_groups block. - New docs/redundancy-groups.md with the concept guide, min_healthy semantics, scenario tables, and unknown_counts_as rationale; linked from configuration.md and registered in mkdocs.yml. Tests: - tests/test_config_loading.py +9 tests for parsing + defaults + inheritance + multi-group / malformed-entry handling. - tests/test_config_validation.py +24 tests covering every rule above. - tests/test_multi_ups.py: 1 message-substring tweak for the combined is_local check. E2E: - tests/e2e/config-e2e-redundancy.yaml dual-source config. - New e2e step "Test 20" asserts (a) the valid config validates clean with the redundancy section surfaced, and (b) a min_healthy=0 config exits non-zero with the expected error. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…executor
The Phase 2 behavior commit. Brings redundancy groups online: a UPS
listed under `redundancy_groups[*].ups_sources` continues to poll on its
own thread but its per-UPS triggers now record an *advisory* state
flag instead of running a local shutdown. A separate evaluator thread
(one per group, ~1s tick) reads each member's snapshot under the state
lock, applies the group's `degraded_counts_as` / `unknown_counts_as`
policy, and only fires the group's executor when
`healthy_count < min_healthy`.
Health model
- New src/eneru/health_model.py exposes `UPSHealth` (str enum:
HEALTHY/DEGRADED/CRITICAL/UNKNOWN) and a pure `assess_health(snapshot,
triggers, check_interval)` function. Order: stale (`>5*check_interval`)
/ FAILED connection -> UNKNOWN; trigger_active or `FSD` in status ->
CRITICAL; OB or GRACE_PERIOD -> DEGRADED; else HEALTHY. Module name
avoids collision with the existing eneru/health/ mixin package.
Redundancy runtime
- New src/eneru/redundancy.py:
- `RedundancyGroupEvaluator(threading.Thread)`: reads snapshots, maps
DEGRADED/UNKNOWN per the group's policy, edge-detected logging
("quorum LOST" / "quorum restored"), idempotent firing.
- `RedundancyGroupExecutor`: composes VM/Container/Filesystem/Remote
shutdown mixins so the group reuses multi-phase ordering,
`shutdown_safety_margin`, and deadline-based join byte-identically.
Per-group flag file at `/var/run/ups-shutdown-redundancy-<sanitized>`
keeps the shutdown idempotent; sanitization mirrors the per-UPS path.
Monitor advisory branches
- `UPSGroupMonitor` gains `in_redundancy_group: bool` ctor arg + two
helpers (`_record_advisory_trigger`, `_clear_advisory_trigger`) that
set/clear `state.trigger_active`/`trigger_reason` under the state lock.
- 3 trigger sites switch on `self._in_redundancy_group`:
1. T1-T4 in `_handle_on_battery` (line ~680)
2. FAILSAFE in `_main_loop` (lines ~818-833)
3. FSD in `_main_loop` (lines ~921-922)
The non-redundancy `else:` branch is byte-identical to the previous
code path -- regression tests guard this. Returning to OL or
recovering from FAILED clears the advisory.
Coordinator wiring
- `MultiUPSCoordinator` precomputes the in-redundancy UPS-name set,
passes the flag to each `UPSGroupMonitor`, and after monitor startup
spins up one `RedundancyGroupExecutor` + `RedundancyGroupEvaluator`
per `config.redundancy_groups` entry. `_wait_for_completion` and
`_handle_signal` now also track / join evaluator threads.
- CLI `_cmd_run` routes through the coordinator when the config has
redundancy groups even if `multi_ups` is False.
Packaging / public API
- `nfpm.yaml`: `health_model.py` and `redundancy.py` added to the
per-file `contents:` list (deb/rpm builds enumerate, never glob).
- Public exports: UPSHealth, assess_health, RedundancyGroupEvaluator,
RedundancyGroupExecutor.
Tests (+78)
- `test_health_model.py` (32): parametrised classification table,
staleness vs check_interval, priority order between tiers, enum API.
- `test_redundancy.py` (28): evaluator counting + policy translation,
cross-group cascade, executor synthetic Config wiring + flag-file
namespace + sanitization, dry-run cleanup, idempotency in-process and
against pre-existing flag files, local-resource gating on `is_local`,
log-prefix + `@`-escape.
- `test_monitor_core.py` (+12): advisory wiring per trigger site +
regression tests `test_failsafe_unchanged_for_single_ups` and
`test_failsafe_unchanged_for_independent_group`.
- `test_multi_ups.py` (+6): coordinator builds `_in_redundancy` set,
passes the flag to every monitor in order, instantiates evaluator +
executor per redundancy group, joins evaluator threads on signal.
E2E (+7 scenarios)
- New configs: `config-e2e-redundancy-cross-group.yaml`,
`config-e2e-redundancy-separate-eneru.yaml`.
- Tests 21-27 cover: quorum holds (1 of 2 healthy); quorum exhausted
(both critical); UNKNOWN handling default; both UNKNOWN -> fail-safe;
cross-group cascade (UPS in both indep + redundancy); advisory-mode
log signature; separate-Eneru-UPS topology.
Docs
- `docs/redundancy-groups.md` extended with the cascade lifecycle, a
dual-PSU timeline table, and load-redistribution guidance.
- `docs/triggers.md` gains a "Triggers in redundancy groups" section.
- `docs/troubleshooting.md` gains "Why isn't my redundancy-group server
shutting down?".
- `docs/testing.md` updated counts: 410 -> 529 unit, 19 -> 27 E2E.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two layered additions:
1. Per-UPS SQLite statistics store (Phase 2 spec 2.12)
2. Redundancy-evaluator startup grace -- a regression fix discovered
in CI Test 21 against the previous commit
---
Statistics store
- New src/eneru/stats.py exposes:
- StatsStore: WAL-mode SQLite store with `synchronous=NORMAL`,
schema (samples / agg_5min / agg_hourly / events / meta), 10
documented metrics. The hot path is `buffer_sample()` -- a
constant-time append to an in-memory deque, zero I/O. Public
methods (open/close/flush/aggregate/purge/log_event/query_range/
query_events/open_readonly) all catch `sqlite3.Error` + `OSError`,
log once with rate-limit, and swallow.
- StatsWriter(threading.Thread): drains the buffer every 10 s,
runs aggregate+purge every 5 min, also flushes on shutdown.
- SAMPLE_FIELDS, SCHEMA_VERSION, BUCKET_5MIN, BUCKET_HOURLY constants.
- New StatsConfig + StatsRetentionConfig dataclasses
(`statistics:` YAML key, `db_directory: /var/lib/eneru` default,
retention windows 24 h / 30 d / 5 y per tier).
- UPSGroupMonitor wiring:
- One StatsStore per UPS at `<db_directory>/<sanitized-name>.db`.
- `_initialize` opens the store and starts the writer; failures log
once and disable persistence for the run (daemon keeps running).
- `_save_state` calls `buffer_sample(...)` after the text-state
write -- still zero I/O on the hot path.
- `_log_power_event` calls `log_event(...)` so power events appear
in the events table.
- `_cleanup_and_exit` flushes + closes via `_stop_stats()`.
- CLI: routes through MultiUPSCoordinator when `redundancy_groups` is
set even with a single ups_group (already in the previous commit;
unchanged here).
- Public API: StatsConfig, StatsRetentionConfig, StatsStore, StatsWriter
exported from eneru/__init__.py.
- Packaging:
- nfpm.yaml gains a `contents:` entry for src/eneru/stats.py and a
directory entry creating `/var/lib/eneru` (mode 0755, root:root)
on deb/rpm install.
- Pip installs handle the directory creation defensively in
`StatsStore.open()`.
- Example config: examples/config-reference.yaml gains a documented
`statistics:` section.
Tests (+47):
- tests/test_stats.py (42): schema + WAL/synchronous pragmas;
in-memory-only buffer; thread-safe buffering across 10 producers;
loose constant-time microbench; deque overflow drops oldest; lenient
numeric coercion; flush single-transaction; aggregate min/max/avg
semantics; 5-min -> hourly rollup with bucket alignment; purge per
tier; query_range tier-selection rules; query_range NULL filtering;
events round-trip and inclusive bounds; open_readonly returns None
for missing DB and rejects writes; concurrent reader+writer under
WAL; StatsWriter thread lifecycle + shutdown flush; failure-isolation
contract for every public method; rate-limited error logging;
StatsConfig YAML round-trip + defaults.
- tests/test_packaging.py (3, NEW FILE): structural defense against
PR #23-class bugs. Asserts every src/eneru/**/*.py is referenced by
nfpm.yaml; no dangling `src:` references; `/var/lib/eneru` directory
entry is present.
E2E (+2 scenarios):
- tests/e2e/config-e2e-stats.yaml: single-UPS + writable /tmp DB dir.
- Test 28: DB created, samples populated, DAEMON_START event recorded.
- Test 29: Stats writer failure isolation -- a broken db_directory
(file-where-dir-expected) logs the warning but does not crash.
Docs:
- New docs/statistics.md: hybrid architecture rationale, schema, SD-card
/ Raspberry Pi guidance, sqlite3 inspection recipes, backup, failure
isolation. Linked from configuration.md and registered in mkdocs.yml.
- docs/testing.md: counts updated 529 -> 577 unit, 27 -> 29 E2E.
---
Redundancy evaluator startup grace (CI fix)
CI Test 21 caught a regression in the previous commit: the evaluator
ran its first tick before the per-UPS monitors had taken their
initial poll, so every member's snapshot had `last_update_time == 0`
and was classified UNKNOWN. With the default
`unknown_counts_as: critical`, the evaluator spuriously fired the
group's shutdown sequence at start-up.
Fix: RedundancyGroupEvaluator gains a `startup_grace_seconds`
parameter that defaults to `5 * max(member check_interval) + 5` s,
mirroring the stale-snapshot rule. The evaluator waits this long
before its first evaluation, giving monitors time to publish real
snapshots. Override is exposed for tests.
E2E timeouts bumped to clear the grace window (Tests 21-27).
Tests (+3 in test_redundancy.py): default grace from check_interval,
explicit override, regression test that reproduces the spurious
UNKNOWN fire and verifies the grace prevents it.
---
Cumulative test totals after this commit:
- 577 unit tests (was 529)
- 29 E2E scenarios (was 27)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two layered additions:
1. BrailleGraph module + TUI graph integration (Phase 2 spec 2.13)
2. Test 28 (SQLite stats persistence) hardening -- the daemon's
`_wait_for_initial_connection` was eating the entire 25 s test
timeout in CI, and the asserted DB filename was wrong for single-UPS
mode. Both fixed in this commit per the user's "bundle into the
running commit" workflow.
---
BrailleGraph + TUI graphs
New src/eneru/graph.py (BrailleGraph class):
- Pure, stateless renderer using Unicode Braille pattern characters
(U+2800-U+28FF). Each terminal cell encodes a 2x4 dot grid -- 8
binary pixels per cell -- giving high-density line graphs in a few
rows of text.
- supported(): LANG / locale-based capability check; falls back to
block characters (`▁ ▂ ▃ ▄ ▅ ▆ ▇ █`) when not capable.
- plot(data, width=, height=, y_min=, y_max=, force_fallback=) -> List[str]
Auto-scales when bounds omitted; clips out-of-range; skips None /
non-numeric inputs.
- code_point() / cell() expose the dot-bitmask arithmetic for tests.
- render_to_window() best-effort curses helper for callers that want
layout for free.
TUI integration (src/eneru/tui.py):
- New keybindings:
- G cycle graph mode: off → charge → load → voltage → runtime
- T cycle time range: 1h → 6h → 24h → 7d → 30d
- U (multi-UPS) cycle which UPS the graph shows
- New helpers:
- cycle() pure helper used by G/T/U
- stats_db_path_for() mirrors MultiUPSCoordinator's UPS-name
sanitization so the TUI opens the same
DB file the daemon writes
- query_metric_series() uses StatsStore.open_readonly() (URI
?mode=ro), reuses query_range tier-
selection, lazy-opens on first non-off
graph mode
- render_graph_text() line-list rendering used by both
run_once --graph and the curses panel
- render_graph_panel() curses panel placed between the config
and logs panels when graph_mode != off
- footer hints updated to advertise <G> <T> <U>.
CLI (src/eneru/cli.py):
- monitor --graph {charge,load,voltage,runtime} renders the Braille
graph in run_once mode (no curses), suitable for scripts and CI.
- monitor --time {1h,6h,24h,7d,30d} pairs with --graph.
Public API: BrailleGraph exported from eneru/__init__.py.
Packaging: nfpm.yaml gains a `contents:` entry for src/eneru/graph.py.
Tests (+34):
- tests/test_graph.py (24 NEW): code-point arithmetic vs hand-computed
glyphs (top-left, top-right, bottom row, blank, all-dots, invalid);
supported() detection (LANG=C, UTF-8 vs ISO-8859-1); plot() geometry
and auto-scale (max@top, min@bottom, zero-range padding); explicit
bounds clipping (above, below, NULL skipped); fallback path; curses
render_to_window helper.
- tests/test_tui.py (+10): cycle() advances/wraps/resets; stats DB
path mirrors daemon for single + multi UPS; render_graph_text
no-data placeholder, with-samples (writes to a real per-UPS DB),
unknown-metric path; run_once --graph emits the graph block;
run_once without --graph does NOT emit it.
E2E (+1):
- Test 30: `eneru monitor --once --graph charge --time 1h` against the
config-e2e-stats.yaml DB. Reuses Test 28's seeded DB when available;
falls back to spinning a fresh daemon. Asserts the graph header
("charge -- last 1h") and y-axis label ("y-axis: 0-100%") appear.
Docs:
- New docs/tui-graphs.md: keybindings reference, time-range tier
selection table, headless `monitor --once --graph` recipe, fallback
behaviour, troubleshooting. Linked from mkdocs.yml.
- docs/testing.md: counts updated 577 -> 611 unit, 29 -> 30 E2E.
---
Test 28 hardening (CI fix)
Symptoms in CI: Test 28 timed out at 25 s with only "Checking initial
connection to TestUPS@localhost:3493..." in the daemon log. Two bugs:
1. The asserted DB filename was wrong. UPSGroupMonitor in single-UPS
mode has `state_file_suffix=""` -> sanitized="default" -> DB at
`<dir>/default.db`. The test was looking for the multi-UPS-style
`TestUPS-localhost-3493.db` and failing the existence check.
2. The daemon's `_wait_for_initial_connection` is bounded at 30 s
(5 attempts × 5 s). With a 25 s test timeout, the daemon never
reached `_main_loop` to start collecting samples. The test killed
it mid-wait.
Fixes (in this commit, no code change required):
- Test 28 + Test 30 use the correct DB filename (`default.db`).
- Test 28 + Test 30 pre-check NUT responds before launching the
daemon (15 × 1 s `upsc` probe).
- Daemon timeouts bumped from 25 s to 50 s so even the worst-case
connection-wait + writer-flush cycle has headroom.
- PYTHONUNBUFFERED=1 keeps stdout line-buffered under `tee`.
---
Cumulative test totals after this commit:
- 611 unit tests (was 577)
- 30 E2E scenarios (was 29)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…cation
Two layered changes per the user's "bundle into the running commit"
workflow:
1. TUI events panel sourced from each UPS's SQLite events table
(Phase 2 spec, with parse_log_events kept as a fallback).
2. Width-aware text truncation in the gold logs panel -- a regression
the user reported where emoji-heavy event lines spilled past the
panel's right edge.
---
SQLite events panel
- New `query_events_for_display(config, time_range_seconds)` reads the
per-UPS events table from each UPS's SQLite store via
`StatsStore.open_readonly` (URI ?mode=ro). Rows are formatted as:
HH:MM:SS [LABEL] event_type: detail
([LABEL] is suppressed in single-UPS mode). All UPSes' rows are
merged and sorted by timestamp; capped via `max_events`.
- The function returns `[]` when no DB exists for any UPS, signalling
callers to fall back to `parse_log_events` (the v5.0 log-tail path).
This keeps fresh installs and sandbox runs functional.
- `run_tui` and `run_once` now prefer the SQLite path with the
documented fallback.
- New `eneru monitor --once --events-only` flag prints just the events
list (no status/resources/graph block) for scripts and CI.
Tests (+9):
- Single-UPS events: no `[label]` prefix.
- Multi-UPS events: `[label]` prefix; rows from different UPSes
interleave by timestamp.
- Time-window filter (older events excluded), `max_events` cap.
- `run_once --events-only`: prints only events; falls back to log
tail when no DB; "(no events)" placeholder when neither has data.
E2E (+1):
- Test 31: injects a known event row directly into the seeded
SQLite DB and asserts `eneru monitor --once --events-only`
surfaces it.
---
Width-aware truncation (gold logs panel overflow fix)
The previous truncation in `render_logs_panel` and `safe_addstr`
counted code points -- which over-counts ASCII (1 cell each, fine)
but UNDER-counts emoji and CJK (each ≈ 2 cells in most terminals).
Long emoji-rich event lines therefore painted past the panel's
visible right edge, breaking the gold border the user pointed out.
- New `display_width(text)` helper: every code point at or above
U+1100 counts as 2 cells (covers emoji + CJK ranges); everything
else counts as 1. Conservative -- it occasionally over-truncates
exotic glyphs, never under-truncates.
- New `truncate_to_width(text, max_width)` helper: returns the
longest prefix whose `display_width` is <= `max_width`, never
splitting a double-width glyph in half.
- `safe_addstr` clips by display-cell width, not character count,
before calling `addnstr`. The right gutter is preserved verbatim.
- `render_logs_panel` uses `display_width` + `truncate_to_width`
with a 2-cell budget for the trailing "..".
Tests (+8):
- ASCII width == len; emoji counted as 2; CJK counted as 2.
- truncate_to_width: short input passes through; ASCII clip; clip
before partial emoji; zero max returns "".
- `render_logs_panel` regression: a fake window records every
`addnstr` and asserts no painted line's display width exceeds
the visible width, even with an emoji-heavy event.
Bundled into Commit 6 per the user's request to land TUI fixes
together with the SQLite events panel work.
---
Cumulative test totals after this commit:
- 628 unit tests (was 611)
- 31 E2E scenarios (was 30)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Three layered changes that wrap up the Phase 2 series: 1. Version bump 5.1.0-rc3 -> 5.1.0-rc4 plus changelog and roadmap updates so the [Unreleased] block reflects everything in this PR. 2. Humanizer pass over every doc page touched by Phase 2 -- removes AI-writing tells (em-dash overuse, rule-of-three, vague "**X:**" bolded headers, copula avoidance, superficial -ing analyses, filler phrases) without changing technical accuracy. 3. CI noise fix: every existing `tests/e2e/config-*.yaml` now sets `statistics.db_directory: /tmp/eneru-e2e-stats` so the daemon stops logging "[Errno 13] Permission denied: '/var/lib/eneru'" on the unprivileged GitHub runner. The new stats config files shipped in Commits 4-6 already carry the override. --- Version + changelog - src/eneru/version.py: 5.1.0-rc3 -> 5.1.0-rc4. The final v5.1.0 tag will follow once the user has run real-world hardware tests on rc4. Changelog entry stays under [Unreleased]. - docs/changelog.md: Phase 2 additions appended to the existing Added / Changed / Migration notes / Technical details sections. Notes the always-on stats DB at /var/lib/eneru/<sanitized>.db (created on first start of the upgraded daemon) and points the SD-card profile at docs/statistics.md. - docs/roadmap.md: marks v5.1 implementation complete (rc4 available for hardware testing). The package-channels item is marked deferred to a future point release. Humanizer pass The user asked for /humanizer over every doc I added or modified in the Phase 2 PR. Files touched (and only the new sections, not unrelated content): - docs/redundancy-groups.md (NEW): full rewrite for terser voice - docs/statistics.md (NEW): full rewrite - docs/tui-graphs.md (NEW): full rewrite - docs/changelog.md: only the [Unreleased] block - docs/roadmap.md: only the v5.1 block - docs/triggers.md: only the new "Triggers in redundancy groups" - docs/troubleshooting.md: only the new "Why isn't my redundancy- group server shutting down?" - docs/testing.md: only the Phase 2 entries in the test-coverage list and the new Tests 20-31 rows - docs/configuration.md: only the new validate-checks bullet and the statistics link The mkdocs build remains strict-clean. CI noise fix The user reported many "/var/lib/eneru failed: [Errno 13] Permission denied" lines in the green CI output. The daemon defaults to /var/lib/eneru, which the unprivileged CI runner can't create. Stats failure is non-fatal (the daemon catches OSError, logs once, keeps running) but the warning was cluttering CI logs across Tests 1-19 plus 21-27. Every existing tests/e2e/config-*.yaml that runs the daemon now overrides db_directory to /tmp/eneru-e2e-stats. The configs added in earlier Phase 2 commits (config-e2e-stats.yaml and the redundancy / cross-group / separate-eneru configs) already carry the override. --- The full Phase 2 stack is now on the branch. CI's 31/31 e2e tests green; 628 unit tests across 25 files; mkdocs --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Post-rc4 audit flagged three test files coming in below the plan's suggested counts. Coverage areas were already represented; these tests pin the remaining edge cases the audit called out as genuinely worth catching. tests/test_stats.py (+6 in new TestEdgeCases class): - schema_version_persists_across_reopen: catches the "schema reset on reopen" regression class. - text_fields_round_trip: status + connection_state survive flush (only their numeric siblings were directly asserted before). - query_range_for_unaggregated_metric_at_agg_tier_returns_empty: output_voltage / depletion_rate aren't in agg_5min/agg_hourly; the SQL references a non-existent column and the swallow path must return [], not propagate. - aggregate_single_sample_yields_min_eq_max_eq_avg: the boundary case where AVG/MIN/MAX collapse to one number. - purge_keeps_row_at_exact_cutoff: pins the `WHERE ts < cutoff` semantics (rows AT the cutoff stay, only strictly older rows go). - query_range_empty_window_returns_empty_list: covers both no-rows-in-window and inverted (start > end) windows. tests/test_redundancy.py (+3): - Two TestThreeStateMix tests: a 3-UPS group with one HEALTHY + one DEGRADED + one CRITICAL member at the same time, evaluated under both `degraded_counts_as: healthy` (yields healthy_count=2, no shutdown) and `degraded_counts_as: critical` (yields healthy_count=1, fires). The earlier policy tests only mixed two states at a time. - TestExecutorNotificationContent: the executor's headline shutdown notification actually contains the group name, the reason string, and every UPS source (with the @-escape applied). Only the @-escape behaviour was asserted before. Test counts: 628 -> 637 unit tests across 25 files. No behaviour change; pure test coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Bump version 5.1.0-rc4 -> 5.1.0-rc5 and refresh the test count references that lagged behind the +9 tests added in 50f27c2. - src/eneru/version.py: 5.1.0-rc4 -> 5.1.0-rc5 - docs/testing.md: 628 -> 637 unit tests across 25 files - docs/changelog.md: rc4 -> rc5 in the Unreleased status block, the migration-notes line, and the technical-details test counts (410 -> 628 -> 637) - docs/roadmap.md: v5.1 header and "available for hardware testing" line bumped to rc5 Verified locally: eneru version -> "Eneru v5.1.0-rc5"; 637 unit tests pass; mkdocs --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- I1: provision /var/lib/eneru in e2e workflow so Tests 1-27 exercise the same path users hit after deb/rpm install (no more silent "stats store open failed" warnings disabling stats persistence). - I2: stats schema bumped 1 -> 2. Added 4 raw NUT metrics from spec 2.12 (battery.voltage, ups.temperature, input.frequency, output.frequency); closes S3 by adding output_voltage_avg to the agg tables. Auto-migrates pre-existing v1 DBs via idempotent ALTER TABLE on first start; existing samples preserved with NULLs for new columns. - I3: TUI graphs blend SQLite + a per-UPS live deque (spec 2.13). 60-entry deque populated from the state file each refresh; query_metric_series extends the SQLite tail with newer entries deduped by timestamp. Bridges the 0-10s flush gap so the graph's right edge stays current. - S1: PRAGMA busy_timeout=500 on both stats SQLite connections (writer + readonly). Bounds reader-writer contention to half a second on slow storage (SD card on Pi-class hardware). - S2: TUI footer interpolates current cycle state -- "<G> Graph: charge", "<T> Time: 1h", "<U> UPS: 1/2" instead of static labels. Truncates gracefully on narrow terminals. Cuts 5.1.0-rc6 for hardware verification. Test counts: 649 -> 658. Co-Authored-By: Claude Opus 4.7 <[email protected]>
#27) Issue #27 reported noisy OVER_VOLTAGE notifications on a US 120V grid where the UPS firmware mis-reports input.voltage.nominal=230. Five-part fix that solves the noise WITHOUT exposing safety-threshold overrides (which would let a misconfiguration mask real hardware-damaging events): 1. B1 -- Auto-detect: input.voltage.nominal is snapped to the nearest standard grid (100/110/115/120/127/200/208/220/230/240) at startup. After ~10 polls, the median of observed input.voltage is cross- checked against NUT's nominal; if they disagree by more than 25V the nominal is re-snapped (US-grid case where firmware reports 230V on a 120V UPS) and a VOLTAGE_AUTODETECT_MISMATCH event is recorded. No user-tunable warning_low / warning_high / nominal override -- those would defeat the safety contract. 2. B2 -- notifications.voltage_hysteresis_seconds (default 30s) debounces transient flaps. The OVER_VOLTAGE/BROWNOUT log row + SQLite event are written immediately on transition (operational record is sacred); only the notification dispatch is gated. A 2s spike no longer emails; a sustained 30s over-voltage still does with a "(persisted Ns)" annotation. 3. B3 -- notifications.suppress lets users mute chatty informational events (AVR_BOOST_ACTIVE, VOLTAGE_NORMALIZED, BYPASS_MODE_INACTIVE, etc.). Safety-critical events (OVER_VOLTAGE_DETECTED, BROWNOUT_DETECTED, OVERLOAD_ACTIVE, BYPASS_MODE_ACTIVE, ON_BATTERY, CONNECTION_LOST, anything starting with SHUTDOWN_) are hard-blocked in config validation with an error pointing at hysteresis as the right knob for flap-debounce. 4. B4 -- Stats schema bumped 2 -> 3 with idempotent migration: events.notification_sent INTEGER DEFAULT 1 lets users audit muted vs delivered events. Pre-existing rows backfill to 1 (the v2 daemon always notified). New event types VOLTAGE_AUTODETECT_MISMATCH and VOLTAGE_FLAP_SUPPRESSED round out the audit trail. 5. Documentation: schema-evolution convention now lives in both the root CLAUDE.md (one-liner under Conventions) and src/eneru/CLAUDE.md (full pattern + when-to-add-a-column guidance). Future features that grow persistent state follow the documented mechanic. Test 32 in the e2e workflow exercises the headline US-grid scenario end-to-end: a dummy NUT reporting nominal=230 + actual=120 must produce the re-snap log line, the VOLTAGE_AUTODETECT_MISMATCH event in SQLite (with notification_sent=0), and a meta.schema_version=3 DB. Runs against a real /var/lib/eneru thanks to the rc6 CI fix. Cuts 5.1.0-rc7 for hardware verification. Test counts: 658 -> 711 (+53: 17 config validation, 13 schema migration + log_event audit, 28 voltage auto-detect + hysteresis). Closes #27. Co-Authored-By: Claude Opus 4.7 <[email protected]>
The single-UPS dict form (`ups: { name: ... }`) writes its stats DB
to `<db_directory>/default.db`, not to a sanitized-name path -- the
sanitized path is reserved for multi-UPS list configs (see
tui.py:stats_db_path_for). Test 32 was checking
`/var/lib/eneru/TestUPS-localhost-3493.db` which never gets created
in the single-UPS code path; the daemon was working correctly all
along (the auto-detect re-snap log line and VOLTAGE_AUTODETECT_MISMATCH
event were both produced; PASS 32a + 32b were green) but the test's
existence check was looking at the wrong file.
Also filter the events query by `ts >= T_START` so we assert the
event came from THIS test step, not from leftover rows in default.db
written by earlier single-UPS e2e tests (default.db is shared).
Co-Authored-By: Claude Opus 4.7 <[email protected]>
Summary
Implements Phase 2 of the v5.1.0 multi-UPS roadmap on top of the rc3 baseline (Phase 1 + multi-phase shutdown ordering + per-server
shutdown_safety_margin+ mixin decomposition were all shipped earlier under v5.0/v5.1.0-rc1..rc3).This PR layers in 7 atomic commits, each landing one logical chunk so reviewers can read it commit-by-commit while CI exercises the whole stack on every push.
Commits (in order)
feat(state)—MonitorState._lock+ snapshot fields +snapshot()(infrastructure, no behaviour change). ← landedfeat(config)—RedundancyGroupConfigdataclass + parsing + validation rules + example.feat(redundancy)—health_model.py+RedundancyGroupEvaluator+RedundancyGroupExecutor+ advisory branches at the 3 trigger sites inmonitor.py.feat(stats)—StatsStore+StatsWriter(per-UPS SQLite, hybrid in-memory→disk) + monitor integration + packaging.feat(tui)—BrailleGraphmodule +G/T/Ukeys + lazy DB read.feat(tui)— TUI events panel sourced from SQLite (log-tail fallback retained).chore(release)— bump to5.1.0-rc4, append Phase 2 to[Unreleased], polish docs.5.1.0is not cut from this PR — the user wants to verify rc4 against real hardware first. Promotion (-rc4 → 5.1.0, changelog date, tag) is a trivial follow-up after that.Key design decisions
RedundancyGroupConfigmirrorsUPSGroupConfigin full (remote_servers,virtual_machines,containers,filesystems); only anis_local: truegroup can own local resources, and at most one group across all (independent + redundancy) can beis_local.src/eneru/health_model.py(avoids collision with the existingsrc/eneru/health/mixin package).shutdown_safety_margin, and deadline-based join verbatim./var/lib/eneru/{sanitized-name}.db; hot path is in-memory deque only (zero I/O); writer flushes every 10 s, aggregates+purges every 5 min; SQLite errors are logged once and swallowed.Test plan
MonitorState._lock/snapshot()(commit 1)RedundancyGroupConfig(commit 2)StatsStore+ structuralnfpm.yamlpackaging guard (commit 4).github/workflows/e2e.ymlgreenmkdocs build --strictclean🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation