feat(v5.1.0): Phase 2 — redundancy groups, health model, SQLite stats, TUI graphs by m4r1k · Pull Request #26 · m4r1k/Eneru

m4r1k · 2026-04-20T12:43:59Z

Summary

Implements Phase 2 of the v5.1.0 multi-UPS roadmap on top of the rc3 baseline (Phase 1 + multi-phase shutdown ordering + per-server shutdown_safety_margin + mixin decomposition were all shipped earlier under v5.0/v5.1.0-rc1..rc3).

This PR layers in 7 atomic commits, each landing one logical chunk so reviewers can read it commit-by-commit while CI exercises the whole stack on every push.

Commits (in order)

feat(state) — MonitorState._lock + snapshot fields + snapshot() (infrastructure, no behaviour change). ← landed
feat(config) — RedundancyGroupConfig dataclass + parsing + validation rules + example.
feat(redundancy) — health_model.py + RedundancyGroupEvaluator + RedundancyGroupExecutor + advisory branches at the 3 trigger sites in monitor.py.
feat(stats) — StatsStore + StatsWriter (per-UPS SQLite, hybrid in-memory→disk) + monitor integration + packaging.
feat(tui) — BrailleGraph module + G/T/U keys + lazy DB read.
feat(tui) — TUI events panel sourced from SQLite (log-tail fallback retained).
chore(release) — bump to 5.1.0-rc4, append Phase 2 to [Unreleased], polish docs.

5.1.0 is not cut from this PR — the user wants to verify rc4 against real hardware first. Promotion (-rc4 → 5.1.0, changelog date, tag) is a trivial follow-up after that.

Key design decisions

RedundancyGroupConfig mirrors UPSGroupConfig in full (remote_servers, virtual_machines, containers, filesystems); only an is_local: true group can own local resources, and at most one group across all (independent + redundancy) can be is_local.
Health model lives in src/eneru/health_model.py (avoids collision with the existing src/eneru/health/ mixin package).
Redundancy executor composes all four shutdown mixins so it inherits multi-phase ordering, shutdown_safety_margin, and deadline-based join verbatim.
Stats are always-on; per-UPS DB at /var/lib/eneru/{sanitized-name}.db; hot path is in-memory deque only (zero I/O); writer flushes every 10 s, aggregates+purges every 5 min; SQLite errors are logged once and swallowed.
Failsafe / FSD code paths are byte-identical for non-redundancy UPSes — explicit regression tests guard this.

Test plan

+8 unit tests for MonitorState._lock / snapshot() (commit 1)
Config validation tests for RedundancyGroupConfig (commit 2)
Health-model parametrised table + evaluator/executor/idempotency/cascade tests (commit 3)
StatsStore + structural nfpm.yaml packaging guard (commit 4)
BrailleGraph + TUI graph-mode tests (commit 5)
TUI events-from-SQLite tests with fallback path (commit 6)
All 7 E2E checks in .github/workflows/e2e.yml green
mkdocs build --strict clean
Manual smoke on dummy 2-UPS setup (one redundancy + one independent)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added redundancy-group support for multi-UPS quorum-based shutdown coordination with configurable min-healthy thresholds.
- Introduced per-UPS SQLite statistics store with automatic aggregation and retention policies for historical metrics tracking.
- Added TUI line graphs (Unicode Braille rendering) and event panel sourced from SQLite for historical analysis.
- Implemented voltage notification hysteresis debouncing and automatic nominal-voltage detection with cross-check validation.
Documentation
- Extended configuration and troubleshooting guides with redundancy-group semantics and diagnostic steps.

Adds a per-cycle, atomic snapshot of the latest UPS observation that external readers (the upcoming Phase 2 redundancy-group evaluator) can consume safely from another thread. `MonitorState` gains a non-reentrant `_lock` (excluded from `__repr__` and `__eq__`), nine `latest_*`/`trigger_*` fields, and a `snapshot()` helper that returns a frozen `HealthSnapshot` namedtuple. The poll loop in `monitor.py` writes all snapshot fields under one lock acquisition at the bottom of each successful cycle, alongside `previous_status`. The depletion-rate field is updated under the same lock from the on-battery and on-line handlers so the published value reflects the cycle's freshly computed rate (or zeroed when off battery). Pure infrastructure commit: no behaviour change for legacy single-UPS deployments, no new advisory branches yet -- those land in the redundancy-evaluator commit. Existing 410 tests still pass; +8 unit tests cover the lock attribute, snapshot contents, concurrent reader safety, dataclass equality/repr unaffected by the lock field, and the default round trip. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

coderabbitai · 2026-04-20T12:44:06Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca11c8d8-d49f-4878-9128-e44181916b71

📥 Commits

Reviewing files that changed from the base of the PR and between 3c3501e and 210b0a0.

📒 Files selected for processing (54)

.github/workflows/e2e.yml
CLAUDE.md
docs/changelog.md
docs/configuration.md
docs/notifications.md
docs/redundancy-groups.md
docs/roadmap.md
docs/statistics.md
docs/testing.md
docs/triggers.md
docs/troubleshooting.md
docs/tui-graphs.md
examples/config-redundancy.yaml
examples/config-reference.yaml
mkdocs.yml
nfpm.yaml
src/eneru/CLAUDE.md
src/eneru/__init__.py
src/eneru/cli.py
src/eneru/config.py
src/eneru/graph.py
src/eneru/health/voltage.py
src/eneru/health_model.py
src/eneru/monitor.py
src/eneru/multi_ups.py
src/eneru/redundancy.py
src/eneru/state.py
src/eneru/stats.py
src/eneru/tui.py
src/eneru/version.py
tests/e2e/config-e2e-dry-run.yaml
tests/e2e/config-e2e-multi-ups-drain.yaml
tests/e2e/config-e2e-multi-ups.yaml
tests/e2e/config-e2e-notifications.yaml
tests/e2e/config-e2e-redundancy-cross-group.yaml
tests/e2e/config-e2e-redundancy-separate-eneru.yaml
tests/e2e/config-e2e-redundancy.yaml
tests/e2e/config-e2e-shutdown-order.yaml
tests/e2e/config-e2e-stats.yaml
tests/e2e/config-e2e-voltage-autodetect.yaml
tests/e2e/config-e2e.yaml
tests/e2e/scenarios/us-grid-misreport.dev
tests/test_config_loading.py
tests/test_config_validation.py
tests/test_graph.py
tests/test_health_model.py
tests/test_monitor_core.py
tests/test_multi_ups.py
tests/test_packaging.py
tests/test_redundancy.py
tests/test_state.py
tests/test_stats.py
tests/test_tui.py
tests/test_voltage.py

📝 Walkthrough

Walkthrough

This PR introduces Phase 2 of v5.1.0 with four major features: redundancy groups with quorum-based shutdown logic, per-UPS SQLite statistics persistence with background writer, TUI graph rendering using Braille Unicode characters, and voltage notification hysteresis with auto-detection re-snapping. Configuration validation, monitor state management, and CLI are updated throughout. Extensive documentation and test coverage accompany the changes.

Changes

Cohort / File(s)	Summary
Redundancy Groups `src/eneru/redundancy.py`, `src/eneru/health_model.py`, `src/eneru/config.py`, `src/eneru/multi_ups.py`	Adds redundancy group orchestration with `RedundancyGroupEvaluator` (quorum polling), `RedundancyGroupExecutor` (shutdown trigger), `UPSHealth` enum (health classification), and config parsing/validation for `redundancy_groups`, `min_healthy`, `degraded_counts_as`, `unknown_counts_as`, `is_local`, and remote servers.
Advisory Mode Triggers `src/eneru/monitor.py`, `src/eneru/state.py`	Introduces advisory trigger recording in redundancy-group members instead of immediate local shutdown; adds `trigger_active`/`trigger_reason` snapshot fields, `_record_advisory_trigger`, `_clear_advisory_trigger`, and updated FAILSAFE/FSD/battery paths.
SQLite Statistics `src/eneru/stats.py`, `src/eneru/config.py`, `src/eneru/monitor.py`, `src/eneru/tui.py`	Implements per-UPS SQLite persistence with `StatsStore` (schema, buffering, aggregation, retention), `StatsWriter` (background thread), hot-path buffering, failure isolation (swallowing errors with rate-limited logging), and stats-aware event logging with `notification_sent` flag.
TUI Graphs `src/eneru/graph.py`, `src/eneru/tui.py`	Adds `BrailleGraph` for Braille Unicode line-graph rendering with auto-scaling, fallback block characters, and curses integration; extends TUI with graph mode cycling (`--graph`), time-range selection (`--time`), events-only mode (`--events-only`), and live-deque blending for up-to-date edges.
Voltage Hysteresis & Auto-Detect `src/eneru/health/voltage.py`, `src/eneru/config.py`	Introduces notification hysteresis for voltage transitions (debounced via `voltage_hysteresis_seconds`), grid-snapping for nominal voltage, observed-range cross-check to re-snap on NUT misreport, and `VOLTAGE_FLAP_SUPPRESSED` event emission.
Notifications & Configuration `src/eneru/config.py`, `examples/config-reference.yaml`	Adds `notifications.suppress` (per-event mute list with safety-critical blocklist) and `notifications.voltage_hysteresis_seconds`; extensive validation for redundancy groups, notification suppression, and voltage hysteresis settings.
CLI & TUI Updates `src/eneru/cli.py`, `src/eneru/tui.py`	Extends `validate` command to enumerate redundancy groups with quorum/health policies; adds `monitor` subcommand flags `--graph`, `--time`, `--events-only` for headless graph/events rendering; switches to `MultiUPSCoordinator` when redundancy groups present.
Core Module Exports `src/eneru/__init__.py`	Exports new redundancy/health/stats/graph entities: `RedundancyGroupConfig`, `RedundancyGroupEvaluator`, `RedundancyGroupExecutor`, `UPSHealth`, `assess_health`, `StatsConfig`, `StatsRetentionConfig`, `StatsStore`, `StatsWriter`, `BrailleGraph`.
Package & Manifest `nfpm.yaml`, `src/eneru/version.py`, `src/eneru/__init__.py`	Adds modules `health_model.py`, `redundancy.py`, `stats.py`, `graph.py`; creates `/var/lib/eneru` directory for SQLite stats storage; bumps version to `5.1.0-rc7`.
Documentation `docs/redundancy-groups.md`, `docs/statistics.md`, `docs/tui-graphs.md`, `docs/triggers.md`, `docs/notifications.md`, `docs/testing.md`, `docs/changelog.md`	Comprehensive docs for redundancy group configuration/runtime/behavior, SQLite schema/retention/config, Braille graph rendering/keybindings, voltage threshold handling/hysteresis, notification suppression controls, and updated test coverage metrics (637 tests across 25 files).
E2E Configuration `tests/e2e/config-e2e*.yaml`	Adds `statistics.db_directory` to all E2E configs; new configs for redundancy-group scenarios (`config-e2e-redundancy.yaml`, `config-e2e-redundancy-cross-group.yaml`, `config-e2e-redundancy-separate-eneru.yaml`), stats validation (`config-e2e-stats.yaml`), and voltage auto-detect (`config-e2e-voltage-autodetect.yaml`).
E2E Workflow `.github/workflows/e2e.yml`	Adds `/var/lib/eneru` provisioning; extends test coverage to 31 tests: Tests 20–27 cover redundancy-group config validation and quorum/shutdown scenarios; Tests 28–31 add SQLite stats DB validation, stats-writer error isolation, and `monitor --graph`/`--events-only` TUI assertions.
Unit Tests `tests/test_*.py` (12 new files + expansions)	New test modules: `test_health_model.py`, `test_redundancy.py`, `test_stats.py`, `test_graph.py`, `test_voltage.py`, `test_packaging.py`. Expanded: `test_config_loading.py`, `test_config_validation.py`, `test_monitor_core.py`, `test_multi_ups.py`, `test_state.py`, `test_tui.py`. Coverage includes health classification, redundancy wiring, SQLite persistence, Braille rendering, voltage hysteresis, and packaging structural validation.
Development Docs `CLAUDE.md`, `src/eneru/CLAUDE.md`, `docs/roadmap.md`	Guidance on SQLite schema evolution (append-only migrations, idempotent `ALTER TABLE`); roadmap updated to "rc5" with implementation status and feature descriptions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

The PR introduces four orthogonal major features (redundancy groups, SQLite stats, Braille graphs, voltage hysteresis) with significant logic density and integration across monitor, coordinator, state, and CLI. While changes are well-organized by feature, the heterogeneity of concerns, control-flow complexity (especially health model classification and quorum evaluation), and broad file spread across core modules (60+ files) warrant careful review. Extensive test coverage and documentation mitigate complexity somewhat.

Possibly related PRs

refactor(monitor): decompose into shutdown/health mixins, split test_config, add nested CLAUDE.md #24: Overlaps in voltage monitoring (src/eneru/health/voltage.py) and packaging manifest (nfpm.yaml) updates, suggesting related voltage/packaging infrastructure work.

Poem

🐰 Whiskers twitch with glee—
Quorum counts and graphs appear,
Statistics flow so free,
Voltage settles without fear,
Redundancy brings peace! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/v5.1-phase-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-04-20T12:44:53Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 86.51778% with 163 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.11%. Comparing base (3c3501e) to head (210b0a0).

Files with missing lines	Patch %	Lines
src/eneru/tui.py	73.06%	65 Missing and 8 partials ⚠️
src/eneru/cli.py	9.67%	27 Missing and 1 partial ⚠️
src/eneru/monitor.py	77.90%	17 Missing and 2 partials ⚠️
src/eneru/redundancy.py	90.64%	12 Missing and 4 partials ⚠️
src/eneru/stats.py	95.56%	9 Missing ⚠️
src/eneru/graph.py	93.39%	5 Missing and 2 partials ⚠️
src/eneru/health/voltage.py	93.96%	6 Missing and 1 partial ⚠️
src/eneru/config.py	97.22%	1 Missing and 3 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   70.99%   79.11%   +8.11%     
==========================================
  Files          19       23       +4     
  Lines        2310     3457    +1147     
  Branches      470      675     +205     
==========================================
+ Hits         1640     2735    +1095     
- Misses        548      580      +32     
- Partials      122      142      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Introduces the Phase 2 config layer for redundancy groups -- the config mirror of UPSGroupConfig that lets multiple UPS sources share a single quorum-gated shutdown decision (dual-PSU racks, A+B feeds, etc.). Behaviour at runtime is unchanged in this commit: the dataclass is parsed and validated but not yet wired into the monitor. The evaluator, advisory triggers, and executor land in the next commit. Config layer: - New RedundancyGroupConfig dataclass mirroring UPSGroupConfig in full (name, ups_sources, min_healthy, degraded/unknown_counts_as, is_local, triggers, remote_servers, virtual_machines, containers, filesystems). - ConfigLoader._parse_redundancy_groups parses the new YAML section, inheriting global triggers when the per-group block is omitted. - validate_config gains the rules: 1 <= min_healthy <= |ups_sources| (== |ups_sources| warns "no redundancy"); reject 0/negative/non-int; reject empty/duplicate/missing names; reject unknown UPS references; reject duplicate sources; enum-check degraded/unknown_counts_as; reject local resources on a non-is_local group; enforce at most one is_local across all UPS+redundancy groups; reject remote-server (host,user) conflicts across tiers and across redundancy groups. - The pre-existing "multiple UPS groups marked is_local" rule is subsumed by the new combined rule and kept message-substring-stable for downstream tests. CLI: eneru validate now prints a "Redundancy groups" section listing sources, quorum, remote servers, and (when is_local) local resources. Examples & docs: - New examples/config-redundancy.yaml -- minimal dual-PSU config. - examples/config-reference.yaml gains a fully-commented redundancy_groups block. - New docs/redundancy-groups.md with the concept guide, min_healthy semantics, scenario tables, and unknown_counts_as rationale; linked from configuration.md and registered in mkdocs.yml. Tests: - tests/test_config_loading.py +9 tests for parsing + defaults + inheritance + multi-group / malformed-entry handling. - tests/test_config_validation.py +24 tests covering every rule above. - tests/test_multi_ups.py: 1 message-substring tweak for the combined is_local check. E2E: - tests/e2e/config-e2e-redundancy.yaml dual-source config. - New e2e step "Test 20" asserts (a) the valid config validates clean with the redundancy section surfaced, and (b) a min_healthy=0 config exits non-zero with the expected error. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…executor The Phase 2 behavior commit. Brings redundancy groups online: a UPS listed under `redundancy_groups[*].ups_sources` continues to poll on its own thread but its per-UPS triggers now record an *advisory* state flag instead of running a local shutdown. A separate evaluator thread (one per group, ~1s tick) reads each member's snapshot under the state lock, applies the group's `degraded_counts_as` / `unknown_counts_as` policy, and only fires the group's executor when `healthy_count < min_healthy`. Health model - New src/eneru/health_model.py exposes `UPSHealth` (str enum: HEALTHY/DEGRADED/CRITICAL/UNKNOWN) and a pure `assess_health(snapshot, triggers, check_interval)` function. Order: stale (`>5*check_interval`) / FAILED connection -> UNKNOWN; trigger_active or `FSD` in status -> CRITICAL; OB or GRACE_PERIOD -> DEGRADED; else HEALTHY. Module name avoids collision with the existing eneru/health/ mixin package. Redundancy runtime - New src/eneru/redundancy.py: - `RedundancyGroupEvaluator(threading.Thread)`: reads snapshots, maps DEGRADED/UNKNOWN per the group's policy, edge-detected logging ("quorum LOST" / "quorum restored"), idempotent firing. - `RedundancyGroupExecutor`: composes VM/Container/Filesystem/Remote shutdown mixins so the group reuses multi-phase ordering, `shutdown_safety_margin`, and deadline-based join byte-identically. Per-group flag file at `/var/run/ups-shutdown-redundancy-<sanitized>` keeps the shutdown idempotent; sanitization mirrors the per-UPS path. Monitor advisory branches - `UPSGroupMonitor` gains `in_redundancy_group: bool` ctor arg + two helpers (`_record_advisory_trigger`, `_clear_advisory_trigger`) that set/clear `state.trigger_active`/`trigger_reason` under the state lock. - 3 trigger sites switch on `self._in_redundancy_group`: 1. T1-T4 in `_handle_on_battery` (line ~680) 2. FAILSAFE in `_main_loop` (lines ~818-833) 3. FSD in `_main_loop` (lines ~921-922) The non-redundancy `else:` branch is byte-identical to the previous code path -- regression tests guard this. Returning to OL or recovering from FAILED clears the advisory. Coordinator wiring - `MultiUPSCoordinator` precomputes the in-redundancy UPS-name set, passes the flag to each `UPSGroupMonitor`, and after monitor startup spins up one `RedundancyGroupExecutor` + `RedundancyGroupEvaluator` per `config.redundancy_groups` entry. `_wait_for_completion` and `_handle_signal` now also track / join evaluator threads. - CLI `_cmd_run` routes through the coordinator when the config has redundancy groups even if `multi_ups` is False. Packaging / public API - `nfpm.yaml`: `health_model.py` and `redundancy.py` added to the per-file `contents:` list (deb/rpm builds enumerate, never glob). - Public exports: UPSHealth, assess_health, RedundancyGroupEvaluator, RedundancyGroupExecutor. Tests (+78) - `test_health_model.py` (32): parametrised classification table, staleness vs check_interval, priority order between tiers, enum API. - `test_redundancy.py` (28): evaluator counting + policy translation, cross-group cascade, executor synthetic Config wiring + flag-file namespace + sanitization, dry-run cleanup, idempotency in-process and against pre-existing flag files, local-resource gating on `is_local`, log-prefix + `@`-escape. - `test_monitor_core.py` (+12): advisory wiring per trigger site + regression tests `test_failsafe_unchanged_for_single_ups` and `test_failsafe_unchanged_for_independent_group`. - `test_multi_ups.py` (+6): coordinator builds `_in_redundancy` set, passes the flag to every monitor in order, instantiates evaluator + executor per redundancy group, joins evaluator threads on signal. E2E (+7 scenarios) - New configs: `config-e2e-redundancy-cross-group.yaml`, `config-e2e-redundancy-separate-eneru.yaml`. - Tests 21-27 cover: quorum holds (1 of 2 healthy); quorum exhausted (both critical); UNKNOWN handling default; both UNKNOWN -> fail-safe; cross-group cascade (UPS in both indep + redundancy); advisory-mode log signature; separate-Eneru-UPS topology. Docs - `docs/redundancy-groups.md` extended with the cascade lifecycle, a dual-PSU timeline table, and load-redistribution guidance. - `docs/triggers.md` gains a "Triggers in redundancy groups" section. - `docs/troubleshooting.md` gains "Why isn't my redundancy-group server shutting down?". - `docs/testing.md` updated counts: 410 -> 529 unit, 19 -> 27 E2E. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two layered additions: 1. Per-UPS SQLite statistics store (Phase 2 spec 2.12) 2. Redundancy-evaluator startup grace -- a regression fix discovered in CI Test 21 against the previous commit --- Statistics store - New src/eneru/stats.py exposes: - StatsStore: WAL-mode SQLite store with `synchronous=NORMAL`, schema (samples / agg_5min / agg_hourly / events / meta), 10 documented metrics. The hot path is `buffer_sample()` -- a constant-time append to an in-memory deque, zero I/O. Public methods (open/close/flush/aggregate/purge/log_event/query_range/ query_events/open_readonly) all catch `sqlite3.Error` + `OSError`, log once with rate-limit, and swallow. - StatsWriter(threading.Thread): drains the buffer every 10 s, runs aggregate+purge every 5 min, also flushes on shutdown. - SAMPLE_FIELDS, SCHEMA_VERSION, BUCKET_5MIN, BUCKET_HOURLY constants. - New StatsConfig + StatsRetentionConfig dataclasses (`statistics:` YAML key, `db_directory: /var/lib/eneru` default, retention windows 24 h / 30 d / 5 y per tier). - UPSGroupMonitor wiring: - One StatsStore per UPS at `<db_directory>/<sanitized-name>.db`. - `_initialize` opens the store and starts the writer; failures log once and disable persistence for the run (daemon keeps running). - `_save_state` calls `buffer_sample(...)` after the text-state write -- still zero I/O on the hot path. - `_log_power_event` calls `log_event(...)` so power events appear in the events table. - `_cleanup_and_exit` flushes + closes via `_stop_stats()`. - CLI: routes through MultiUPSCoordinator when `redundancy_groups` is set even with a single ups_group (already in the previous commit; unchanged here). - Public API: StatsConfig, StatsRetentionConfig, StatsStore, StatsWriter exported from eneru/__init__.py. - Packaging: - nfpm.yaml gains a `contents:` entry for src/eneru/stats.py and a directory entry creating `/var/lib/eneru` (mode 0755, root:root) on deb/rpm install. - Pip installs handle the directory creation defensively in `StatsStore.open()`. - Example config: examples/config-reference.yaml gains a documented `statistics:` section. Tests (+47): - tests/test_stats.py (42): schema + WAL/synchronous pragmas; in-memory-only buffer; thread-safe buffering across 10 producers; loose constant-time microbench; deque overflow drops oldest; lenient numeric coercion; flush single-transaction; aggregate min/max/avg semantics; 5-min -> hourly rollup with bucket alignment; purge per tier; query_range tier-selection rules; query_range NULL filtering; events round-trip and inclusive bounds; open_readonly returns None for missing DB and rejects writes; concurrent reader+writer under WAL; StatsWriter thread lifecycle + shutdown flush; failure-isolation contract for every public method; rate-limited error logging; StatsConfig YAML round-trip + defaults. - tests/test_packaging.py (3, NEW FILE): structural defense against PR #23-class bugs. Asserts every src/eneru/**/*.py is referenced by nfpm.yaml; no dangling `src:` references; `/var/lib/eneru` directory entry is present. E2E (+2 scenarios): - tests/e2e/config-e2e-stats.yaml: single-UPS + writable /tmp DB dir. - Test 28: DB created, samples populated, DAEMON_START event recorded. - Test 29: Stats writer failure isolation -- a broken db_directory (file-where-dir-expected) logs the warning but does not crash. Docs: - New docs/statistics.md: hybrid architecture rationale, schema, SD-card / Raspberry Pi guidance, sqlite3 inspection recipes, backup, failure isolation. Linked from configuration.md and registered in mkdocs.yml. - docs/testing.md: counts updated 529 -> 577 unit, 27 -> 29 E2E. --- Redundancy evaluator startup grace (CI fix) CI Test 21 caught a regression in the previous commit: the evaluator ran its first tick before the per-UPS monitors had taken their initial poll, so every member's snapshot had `last_update_time == 0` and was classified UNKNOWN. With the default `unknown_counts_as: critical`, the evaluator spuriously fired the group's shutdown sequence at start-up. Fix: RedundancyGroupEvaluator gains a `startup_grace_seconds` parameter that defaults to `5 * max(member check_interval) + 5` s, mirroring the stale-snapshot rule. The evaluator waits this long before its first evaluation, giving monitors time to publish real snapshots. Override is exposed for tests. E2E timeouts bumped to clear the grace window (Tests 21-27). Tests (+3 in test_redundancy.py): default grace from check_interval, explicit override, regression test that reproduces the spurious UNKNOWN fire and verifies the grace prevents it. --- Cumulative test totals after this commit: - 577 unit tests (was 529) - 29 E2E scenarios (was 27) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two layered additions: 1. BrailleGraph module + TUI graph integration (Phase 2 spec 2.13) 2. Test 28 (SQLite stats persistence) hardening -- the daemon's `_wait_for_initial_connection` was eating the entire 25 s test timeout in CI, and the asserted DB filename was wrong for single-UPS mode. Both fixed in this commit per the user's "bundle into the running commit" workflow. --- BrailleGraph + TUI graphs New src/eneru/graph.py (BrailleGraph class): - Pure, stateless renderer using Unicode Braille pattern characters (U+2800-U+28FF). Each terminal cell encodes a 2x4 dot grid -- 8 binary pixels per cell -- giving high-density line graphs in a few rows of text. - supported(): LANG / locale-based capability check; falls back to block characters (`▁ ▂ ▃ ▄ ▅ ▆ ▇ █`) when not capable. - plot(data, width=, height=, y_min=, y_max=, force_fallback=) -> List[str] Auto-scales when bounds omitted; clips out-of-range; skips None / non-numeric inputs. - code_point() / cell() expose the dot-bitmask arithmetic for tests. - render_to_window() best-effort curses helper for callers that want layout for free. TUI integration (src/eneru/tui.py): - New keybindings: - G cycle graph mode: off → charge → load → voltage → runtime - T cycle time range: 1h → 6h → 24h → 7d → 30d - U (multi-UPS) cycle which UPS the graph shows - New helpers: - cycle() pure helper used by G/T/U - stats_db_path_for() mirrors MultiUPSCoordinator's UPS-name sanitization so the TUI opens the same DB file the daemon writes - query_metric_series() uses StatsStore.open_readonly() (URI ?mode=ro), reuses query_range tier- selection, lazy-opens on first non-off graph mode - render_graph_text() line-list rendering used by both run_once --graph and the curses panel - render_graph_panel() curses panel placed between the config and logs panels when graph_mode != off - footer hints updated to advertise <G> <T> <U>. CLI (src/eneru/cli.py): - monitor --graph {charge,load,voltage,runtime} renders the Braille graph in run_once mode (no curses), suitable for scripts and CI. - monitor --time {1h,6h,24h,7d,30d} pairs with --graph. Public API: BrailleGraph exported from eneru/__init__.py. Packaging: nfpm.yaml gains a `contents:` entry for src/eneru/graph.py. Tests (+34): - tests/test_graph.py (24 NEW): code-point arithmetic vs hand-computed glyphs (top-left, top-right, bottom row, blank, all-dots, invalid); supported() detection (LANG=C, UTF-8 vs ISO-8859-1); plot() geometry and auto-scale (max@top, min@bottom, zero-range padding); explicit bounds clipping (above, below, NULL skipped); fallback path; curses render_to_window helper. - tests/test_tui.py (+10): cycle() advances/wraps/resets; stats DB path mirrors daemon for single + multi UPS; render_graph_text no-data placeholder, with-samples (writes to a real per-UPS DB), unknown-metric path; run_once --graph emits the graph block; run_once without --graph does NOT emit it. E2E (+1): - Test 30: `eneru monitor --once --graph charge --time 1h` against the config-e2e-stats.yaml DB. Reuses Test 28's seeded DB when available; falls back to spinning a fresh daemon. Asserts the graph header ("charge -- last 1h") and y-axis label ("y-axis: 0-100%") appear. Docs: - New docs/tui-graphs.md: keybindings reference, time-range tier selection table, headless `monitor --once --graph` recipe, fallback behaviour, troubleshooting. Linked from mkdocs.yml. - docs/testing.md: counts updated 577 -> 611 unit, 29 -> 30 E2E. --- Test 28 hardening (CI fix) Symptoms in CI: Test 28 timed out at 25 s with only "Checking initial connection to TestUPS@localhost:3493..." in the daemon log. Two bugs: 1. The asserted DB filename was wrong. UPSGroupMonitor in single-UPS mode has `state_file_suffix=""` -> sanitized="default" -> DB at `<dir>/default.db`. The test was looking for the multi-UPS-style `TestUPS-localhost-3493.db` and failing the existence check. 2. The daemon's `_wait_for_initial_connection` is bounded at 30 s (5 attempts × 5 s). With a 25 s test timeout, the daemon never reached `_main_loop` to start collecting samples. The test killed it mid-wait. Fixes (in this commit, no code change required): - Test 28 + Test 30 use the correct DB filename (`default.db`). - Test 28 + Test 30 pre-check NUT responds before launching the daemon (15 × 1 s `upsc` probe). - Daemon timeouts bumped from 25 s to 50 s so even the worst-case connection-wait + writer-flush cycle has headroom. - PYTHONUNBUFFERED=1 keeps stdout line-buffered under `tee`. --- Cumulative test totals after this commit: - 611 unit tests (was 577) - 30 E2E scenarios (was 29) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…cation Two layered changes per the user's "bundle into the running commit" workflow: 1. TUI events panel sourced from each UPS's SQLite events table (Phase 2 spec, with parse_log_events kept as a fallback). 2. Width-aware text truncation in the gold logs panel -- a regression the user reported where emoji-heavy event lines spilled past the panel's right edge. --- SQLite events panel - New `query_events_for_display(config, time_range_seconds)` reads the per-UPS events table from each UPS's SQLite store via `StatsStore.open_readonly` (URI ?mode=ro). Rows are formatted as: HH:MM:SS [LABEL] event_type: detail ([LABEL] is suppressed in single-UPS mode). All UPSes' rows are merged and sorted by timestamp; capped via `max_events`. - The function returns `[]` when no DB exists for any UPS, signalling callers to fall back to `parse_log_events` (the v5.0 log-tail path). This keeps fresh installs and sandbox runs functional. - `run_tui` and `run_once` now prefer the SQLite path with the documented fallback. - New `eneru monitor --once --events-only` flag prints just the events list (no status/resources/graph block) for scripts and CI. Tests (+9): - Single-UPS events: no `[label]` prefix. - Multi-UPS events: `[label]` prefix; rows from different UPSes interleave by timestamp. - Time-window filter (older events excluded), `max_events` cap. - `run_once --events-only`: prints only events; falls back to log tail when no DB; "(no events)" placeholder when neither has data. E2E (+1): - Test 31: injects a known event row directly into the seeded SQLite DB and asserts `eneru monitor --once --events-only` surfaces it. --- Width-aware truncation (gold logs panel overflow fix) The previous truncation in `render_logs_panel` and `safe_addstr` counted code points -- which over-counts ASCII (1 cell each, fine) but UNDER-counts emoji and CJK (each ≈ 2 cells in most terminals). Long emoji-rich event lines therefore painted past the panel's visible right edge, breaking the gold border the user pointed out. - New `display_width(text)` helper: every code point at or above U+1100 counts as 2 cells (covers emoji + CJK ranges); everything else counts as 1. Conservative -- it occasionally over-truncates exotic glyphs, never under-truncates. - New `truncate_to_width(text, max_width)` helper: returns the longest prefix whose `display_width` is <= `max_width`, never splitting a double-width glyph in half. - `safe_addstr` clips by display-cell width, not character count, before calling `addnstr`. The right gutter is preserved verbatim. - `render_logs_panel` uses `display_width` + `truncate_to_width` with a 2-cell budget for the trailing "..". Tests (+8): - ASCII width == len; emoji counted as 2; CJK counted as 2. - truncate_to_width: short input passes through; ASCII clip; clip before partial emoji; zero max returns "". - `render_logs_panel` regression: a fake window records every `addnstr` and asserts no painted line's display width exceeds the visible width, even with an emoji-heavy event. Bundled into Commit 6 per the user's request to land TUI fixes together with the SQLite events panel work. --- Cumulative test totals after this commit: - 628 unit tests (was 611) - 31 E2E scenarios (was 30) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Three layered changes that wrap up the Phase 2 series: 1. Version bump 5.1.0-rc3 -> 5.1.0-rc4 plus changelog and roadmap updates so the [Unreleased] block reflects everything in this PR. 2. Humanizer pass over every doc page touched by Phase 2 -- removes AI-writing tells (em-dash overuse, rule-of-three, vague "**X:**" bolded headers, copula avoidance, superficial -ing analyses, filler phrases) without changing technical accuracy. 3. CI noise fix: every existing `tests/e2e/config-*.yaml` now sets `statistics.db_directory: /tmp/eneru-e2e-stats` so the daemon stops logging "[Errno 13] Permission denied: '/var/lib/eneru'" on the unprivileged GitHub runner. The new stats config files shipped in Commits 4-6 already carry the override. --- Version + changelog - src/eneru/version.py: 5.1.0-rc3 -> 5.1.0-rc4. The final v5.1.0 tag will follow once the user has run real-world hardware tests on rc4. Changelog entry stays under [Unreleased]. - docs/changelog.md: Phase 2 additions appended to the existing Added / Changed / Migration notes / Technical details sections. Notes the always-on stats DB at /var/lib/eneru/<sanitized>.db (created on first start of the upgraded daemon) and points the SD-card profile at docs/statistics.md. - docs/roadmap.md: marks v5.1 implementation complete (rc4 available for hardware testing). The package-channels item is marked deferred to a future point release. Humanizer pass The user asked for /humanizer over every doc I added or modified in the Phase 2 PR. Files touched (and only the new sections, not unrelated content): - docs/redundancy-groups.md (NEW): full rewrite for terser voice - docs/statistics.md (NEW): full rewrite - docs/tui-graphs.md (NEW): full rewrite - docs/changelog.md: only the [Unreleased] block - docs/roadmap.md: only the v5.1 block - docs/triggers.md: only the new "Triggers in redundancy groups" - docs/troubleshooting.md: only the new "Why isn't my redundancy- group server shutting down?" - docs/testing.md: only the Phase 2 entries in the test-coverage list and the new Tests 20-31 rows - docs/configuration.md: only the new validate-checks bullet and the statistics link The mkdocs build remains strict-clean. CI noise fix The user reported many "/var/lib/eneru failed: [Errno 13] Permission denied" lines in the green CI output. The daemon defaults to /var/lib/eneru, which the unprivileged CI runner can't create. Stats failure is non-fatal (the daemon catches OSError, logs once, keeps running) but the warning was cluttering CI logs across Tests 1-19 plus 21-27. Every existing tests/e2e/config-*.yaml that runs the daemon now overrides db_directory to /tmp/eneru-e2e-stats. The configs added in earlier Phase 2 commits (config-e2e-stats.yaml and the redundancy / cross-group / separate-eneru configs) already carry the override. --- The full Phase 2 stack is now on the branch. CI's 31/31 e2e tests green; 628 unit tests across 25 files; mkdocs --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Post-rc4 audit flagged three test files coming in below the plan's suggested counts. Coverage areas were already represented; these tests pin the remaining edge cases the audit called out as genuinely worth catching. tests/test_stats.py (+6 in new TestEdgeCases class): - schema_version_persists_across_reopen: catches the "schema reset on reopen" regression class. - text_fields_round_trip: status + connection_state survive flush (only their numeric siblings were directly asserted before). - query_range_for_unaggregated_metric_at_agg_tier_returns_empty: output_voltage / depletion_rate aren't in agg_5min/agg_hourly; the SQL references a non-existent column and the swallow path must return [], not propagate. - aggregate_single_sample_yields_min_eq_max_eq_avg: the boundary case where AVG/MIN/MAX collapse to one number. - purge_keeps_row_at_exact_cutoff: pins the `WHERE ts < cutoff` semantics (rows AT the cutoff stay, only strictly older rows go). - query_range_empty_window_returns_empty_list: covers both no-rows-in-window and inverted (start > end) windows. tests/test_redundancy.py (+3): - Two TestThreeStateMix tests: a 3-UPS group with one HEALTHY + one DEGRADED + one CRITICAL member at the same time, evaluated under both `degraded_counts_as: healthy` (yields healthy_count=2, no shutdown) and `degraded_counts_as: critical` (yields healthy_count=1, fires). The earlier policy tests only mixed two states at a time. - TestExecutorNotificationContent: the executor's headline shutdown notification actually contains the group name, the reason string, and every UPS source (with the @-escape applied). Only the @-escape behaviour was asserted before. Test counts: 628 -> 637 unit tests across 25 files. No behaviour change; pure test coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Bump version 5.1.0-rc4 -> 5.1.0-rc5 and refresh the test count references that lagged behind the +9 tests added in 50f27c2. - src/eneru/version.py: 5.1.0-rc4 -> 5.1.0-rc5 - docs/testing.md: 628 -> 637 unit tests across 25 files - docs/changelog.md: rc4 -> rc5 in the Unreleased status block, the migration-notes line, and the technical-details test counts (410 -> 628 -> 637) - docs/roadmap.md: v5.1 header and "available for hardware testing" line bumped to rc5 Verified locally: eneru version -> "Eneru v5.1.0-rc5"; 637 unit tests pass; mkdocs --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

- I1: provision /var/lib/eneru in e2e workflow so Tests 1-27 exercise the same path users hit after deb/rpm install (no more silent "stats store open failed" warnings disabling stats persistence). - I2: stats schema bumped 1 -> 2. Added 4 raw NUT metrics from spec 2.12 (battery.voltage, ups.temperature, input.frequency, output.frequency); closes S3 by adding output_voltage_avg to the agg tables. Auto-migrates pre-existing v1 DBs via idempotent ALTER TABLE on first start; existing samples preserved with NULLs for new columns. - I3: TUI graphs blend SQLite + a per-UPS live deque (spec 2.13). 60-entry deque populated from the state file each refresh; query_metric_series extends the SQLite tail with newer entries deduped by timestamp. Bridges the 0-10s flush gap so the graph's right edge stays current. - S1: PRAGMA busy_timeout=500 on both stats SQLite connections (writer + readonly). Bounds reader-writer contention to half a second on slow storage (SD card on Pi-class hardware). - S2: TUI footer interpolates current cycle state -- "<G> Graph: charge", "<T> Time: 1h", "<U> UPS: 1/2" instead of static labels. Truncates gracefully on narrow terminals. Cuts 5.1.0-rc6 for hardware verification. Test counts: 649 -> 658. Co-Authored-By: Claude Opus 4.7 <[email protected]>

#27) Issue #27 reported noisy OVER_VOLTAGE notifications on a US 120V grid where the UPS firmware mis-reports input.voltage.nominal=230. Five-part fix that solves the noise WITHOUT exposing safety-threshold overrides (which would let a misconfiguration mask real hardware-damaging events): 1. B1 -- Auto-detect: input.voltage.nominal is snapped to the nearest standard grid (100/110/115/120/127/200/208/220/230/240) at startup. After ~10 polls, the median of observed input.voltage is cross- checked against NUT's nominal; if they disagree by more than 25V the nominal is re-snapped (US-grid case where firmware reports 230V on a 120V UPS) and a VOLTAGE_AUTODETECT_MISMATCH event is recorded. No user-tunable warning_low / warning_high / nominal override -- those would defeat the safety contract. 2. B2 -- notifications.voltage_hysteresis_seconds (default 30s) debounces transient flaps. The OVER_VOLTAGE/BROWNOUT log row + SQLite event are written immediately on transition (operational record is sacred); only the notification dispatch is gated. A 2s spike no longer emails; a sustained 30s over-voltage still does with a "(persisted Ns)" annotation. 3. B3 -- notifications.suppress lets users mute chatty informational events (AVR_BOOST_ACTIVE, VOLTAGE_NORMALIZED, BYPASS_MODE_INACTIVE, etc.). Safety-critical events (OVER_VOLTAGE_DETECTED, BROWNOUT_DETECTED, OVERLOAD_ACTIVE, BYPASS_MODE_ACTIVE, ON_BATTERY, CONNECTION_LOST, anything starting with SHUTDOWN_) are hard-blocked in config validation with an error pointing at hysteresis as the right knob for flap-debounce. 4. B4 -- Stats schema bumped 2 -> 3 with idempotent migration: events.notification_sent INTEGER DEFAULT 1 lets users audit muted vs delivered events. Pre-existing rows backfill to 1 (the v2 daemon always notified). New event types VOLTAGE_AUTODETECT_MISMATCH and VOLTAGE_FLAP_SUPPRESSED round out the audit trail. 5. Documentation: schema-evolution convention now lives in both the root CLAUDE.md (one-liner under Conventions) and src/eneru/CLAUDE.md (full pattern + when-to-add-a-column guidance). Future features that grow persistent state follow the documented mechanic. Test 32 in the e2e workflow exercises the headline US-grid scenario end-to-end: a dummy NUT reporting nominal=230 + actual=120 must produce the re-snap log line, the VOLTAGE_AUTODETECT_MISMATCH event in SQLite (with notification_sent=0), and a meta.schema_version=3 DB. Runs against a real /var/lib/eneru thanks to the rc6 CI fix. Cuts 5.1.0-rc7 for hardware verification. Test counts: 658 -> 711 (+53: 17 config validation, 13 schema migration + log_event audit, 28 voltage auto-detect + hysteresis). Closes #27. Co-Authored-By: Claude Opus 4.7 <[email protected]>

The single-UPS dict form (`ups: { name: ... }`) writes its stats DB to `<db_directory>/default.db`, not to a sanitized-name path -- the sanitized path is reserved for multi-UPS list configs (see tui.py:stats_db_path_for). Test 32 was checking `/var/lib/eneru/TestUPS-localhost-3493.db` which never gets created in the single-UPS code path; the daemon was working correctly all along (the auto-detect re-snap log line and VOLTAGE_AUTODETECT_MISMATCH event were both produced; PASS 32a + 32b were green) but the test's existence check was looking at the wrong file. Also filter the events query by `ts >= T_START` so we assert the event came from THIS test step, not from leftover rows in default.db written by earlier single-UPS e2e tests (default.db is shared). Co-Authored-By: Claude Opus 4.7 <[email protected]>

m4r1k and others added 11 commits April 20, 2026 15:01

m4r1k marked this pull request as ready for review April 20, 2026 22:56

m4r1k merged commit 2f7f151 into main Apr 20, 2026
34 checks passed

m4r1k deleted the feat/v5.1-phase-2 branch April 20, 2026 22:56

coderabbitai Bot mentioned this pull request Apr 22, 2026

chore(release): cut 5.1.1 — bug-fix release from AI code review #31

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v5.1.0): Phase 2 — redundancy groups, health model, SQLite stats, TUI graphs#26

feat(v5.1.0): Phase 2 — redundancy groups, health model, SQLite stats, TUI graphs#26
m4r1k merged 12 commits intomainfrom
feat/v5.1-phase-2

m4r1k commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov-commenter commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m4r1k commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits (in order)

Key design decisions

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov-commenter commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

m4r1k commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

codecov-commenter commented Apr 20, 2026 •

edited

Loading