Skip to content

feat(v5.1.0): Phase 2 — redundancy groups, health model, SQLite stats, TUI graphs#26

Merged
m4r1k merged 12 commits intomainfrom
feat/v5.1-phase-2
Apr 20, 2026
Merged

feat(v5.1.0): Phase 2 — redundancy groups, health model, SQLite stats, TUI graphs#26
m4r1k merged 12 commits intomainfrom
feat/v5.1-phase-2

Conversation

@m4r1k
Copy link
Copy Markdown
Owner

@m4r1k m4r1k commented Apr 20, 2026

Summary

Implements Phase 2 of the v5.1.0 multi-UPS roadmap on top of the rc3 baseline (Phase 1 + multi-phase shutdown ordering + per-server shutdown_safety_margin + mixin decomposition were all shipped earlier under v5.0/v5.1.0-rc1..rc3).

This PR layers in 7 atomic commits, each landing one logical chunk so reviewers can read it commit-by-commit while CI exercises the whole stack on every push.

Commits (in order)

  1. feat(state)MonitorState._lock + snapshot fields + snapshot() (infrastructure, no behaviour change). ← landed
  2. feat(config)RedundancyGroupConfig dataclass + parsing + validation rules + example.
  3. feat(redundancy)health_model.py + RedundancyGroupEvaluator + RedundancyGroupExecutor + advisory branches at the 3 trigger sites in monitor.py.
  4. feat(stats)StatsStore + StatsWriter (per-UPS SQLite, hybrid in-memory→disk) + monitor integration + packaging.
  5. feat(tui)BrailleGraph module + G/T/U keys + lazy DB read.
  6. feat(tui) — TUI events panel sourced from SQLite (log-tail fallback retained).
  7. chore(release) — bump to 5.1.0-rc4, append Phase 2 to [Unreleased], polish docs.

5.1.0 is not cut from this PR — the user wants to verify rc4 against real hardware first. Promotion (-rc4 → 5.1.0, changelog date, tag) is a trivial follow-up after that.

Key design decisions

  • RedundancyGroupConfig mirrors UPSGroupConfig in full (remote_servers, virtual_machines, containers, filesystems); only an is_local: true group can own local resources, and at most one group across all (independent + redundancy) can be is_local.
  • Health model lives in src/eneru/health_model.py (avoids collision with the existing src/eneru/health/ mixin package).
  • Redundancy executor composes all four shutdown mixins so it inherits multi-phase ordering, shutdown_safety_margin, and deadline-based join verbatim.
  • Stats are always-on; per-UPS DB at /var/lib/eneru/{sanitized-name}.db; hot path is in-memory deque only (zero I/O); writer flushes every 10 s, aggregates+purges every 5 min; SQLite errors are logged once and swallowed.
  • Failsafe / FSD code paths are byte-identical for non-redundancy UPSes — explicit regression tests guard this.

Test plan

  • +8 unit tests for MonitorState._lock / snapshot() (commit 1)
  • Config validation tests for RedundancyGroupConfig (commit 2)
  • Health-model parametrised table + evaluator/executor/idempotency/cascade tests (commit 3)
  • StatsStore + structural nfpm.yaml packaging guard (commit 4)
  • BrailleGraph + TUI graph-mode tests (commit 5)
  • TUI events-from-SQLite tests with fallback path (commit 6)
  • All 7 E2E checks in .github/workflows/e2e.yml green
  • mkdocs build --strict clean
  • Manual smoke on dummy 2-UPS setup (one redundancy + one independent)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added redundancy-group support for multi-UPS quorum-based shutdown coordination with configurable min-healthy thresholds.
    • Introduced per-UPS SQLite statistics store with automatic aggregation and retention policies for historical metrics tracking.
    • Added TUI line graphs (Unicode Braille rendering) and event panel sourced from SQLite for historical analysis.
    • Implemented voltage notification hysteresis debouncing and automatic nominal-voltage detection with cross-check validation.
  • Documentation

    • Extended configuration and troubleshooting guides with redundancy-group semantics and diagnostic steps.

Adds a per-cycle, atomic snapshot of the latest UPS observation that
external readers (the upcoming Phase 2 redundancy-group evaluator) can
consume safely from another thread.

`MonitorState` gains a non-reentrant `_lock` (excluded from `__repr__`
and `__eq__`), nine `latest_*`/`trigger_*` fields, and a `snapshot()`
helper that returns a frozen `HealthSnapshot` namedtuple. The poll loop
in `monitor.py` writes all snapshot fields under one lock acquisition at
the bottom of each successful cycle, alongside `previous_status`. The
depletion-rate field is updated under the same lock from the on-battery
and on-line handlers so the published value reflects the cycle's
freshly computed rate (or zeroed when off battery).

Pure infrastructure commit: no behaviour change for legacy single-UPS
deployments, no new advisory branches yet -- those land in the
redundancy-evaluator commit. Existing 410 tests still pass; +8 unit
tests cover the lock attribute, snapshot contents, concurrent reader
safety, dataclass equality/repr unaffected by the lock field, and the
default round trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca11c8d8-d49f-4878-9128-e44181916b71

📥 Commits

Reviewing files that changed from the base of the PR and between 3c3501e and 210b0a0.

📒 Files selected for processing (54)
  • .github/workflows/e2e.yml
  • CLAUDE.md
  • docs/changelog.md
  • docs/configuration.md
  • docs/notifications.md
  • docs/redundancy-groups.md
  • docs/roadmap.md
  • docs/statistics.md
  • docs/testing.md
  • docs/triggers.md
  • docs/troubleshooting.md
  • docs/tui-graphs.md
  • examples/config-redundancy.yaml
  • examples/config-reference.yaml
  • mkdocs.yml
  • nfpm.yaml
  • src/eneru/CLAUDE.md
  • src/eneru/__init__.py
  • src/eneru/cli.py
  • src/eneru/config.py
  • src/eneru/graph.py
  • src/eneru/health/voltage.py
  • src/eneru/health_model.py
  • src/eneru/monitor.py
  • src/eneru/multi_ups.py
  • src/eneru/redundancy.py
  • src/eneru/state.py
  • src/eneru/stats.py
  • src/eneru/tui.py
  • src/eneru/version.py
  • tests/e2e/config-e2e-dry-run.yaml
  • tests/e2e/config-e2e-multi-ups-drain.yaml
  • tests/e2e/config-e2e-multi-ups.yaml
  • tests/e2e/config-e2e-notifications.yaml
  • tests/e2e/config-e2e-redundancy-cross-group.yaml
  • tests/e2e/config-e2e-redundancy-separate-eneru.yaml
  • tests/e2e/config-e2e-redundancy.yaml
  • tests/e2e/config-e2e-shutdown-order.yaml
  • tests/e2e/config-e2e-stats.yaml
  • tests/e2e/config-e2e-voltage-autodetect.yaml
  • tests/e2e/config-e2e.yaml
  • tests/e2e/scenarios/us-grid-misreport.dev
  • tests/test_config_loading.py
  • tests/test_config_validation.py
  • tests/test_graph.py
  • tests/test_health_model.py
  • tests/test_monitor_core.py
  • tests/test_multi_ups.py
  • tests/test_packaging.py
  • tests/test_redundancy.py
  • tests/test_state.py
  • tests/test_stats.py
  • tests/test_tui.py
  • tests/test_voltage.py

📝 Walkthrough

Walkthrough

This PR introduces Phase 2 of v5.1.0 with four major features: redundancy groups with quorum-based shutdown logic, per-UPS SQLite statistics persistence with background writer, TUI graph rendering using Braille Unicode characters, and voltage notification hysteresis with auto-detection re-snapping. Configuration validation, monitor state management, and CLI are updated throughout. Extensive documentation and test coverage accompany the changes.

Changes

Cohort / File(s) Summary
Redundancy Groups
src/eneru/redundancy.py, src/eneru/health_model.py, src/eneru/config.py, src/eneru/multi_ups.py
Adds redundancy group orchestration with RedundancyGroupEvaluator (quorum polling), RedundancyGroupExecutor (shutdown trigger), UPSHealth enum (health classification), and config parsing/validation for redundancy_groups, min_healthy, degraded_counts_as, unknown_counts_as, is_local, and remote servers.
Advisory Mode Triggers
src/eneru/monitor.py, src/eneru/state.py
Introduces advisory trigger recording in redundancy-group members instead of immediate local shutdown; adds trigger_active/trigger_reason snapshot fields, _record_advisory_trigger, _clear_advisory_trigger, and updated FAILSAFE/FSD/battery paths.
SQLite Statistics
src/eneru/stats.py, src/eneru/config.py, src/eneru/monitor.py, src/eneru/tui.py
Implements per-UPS SQLite persistence with StatsStore (schema, buffering, aggregation, retention), StatsWriter (background thread), hot-path buffering, failure isolation (swallowing errors with rate-limited logging), and stats-aware event logging with notification_sent flag.
TUI Graphs
src/eneru/graph.py, src/eneru/tui.py
Adds BrailleGraph for Braille Unicode line-graph rendering with auto-scaling, fallback block characters, and curses integration; extends TUI with graph mode cycling (--graph), time-range selection (--time), events-only mode (--events-only), and live-deque blending for up-to-date edges.
Voltage Hysteresis & Auto-Detect
src/eneru/health/voltage.py, src/eneru/config.py
Introduces notification hysteresis for voltage transitions (debounced via voltage_hysteresis_seconds), grid-snapping for nominal voltage, observed-range cross-check to re-snap on NUT misreport, and VOLTAGE_FLAP_SUPPRESSED event emission.
Notifications & Configuration
src/eneru/config.py, examples/config-reference.yaml
Adds notifications.suppress (per-event mute list with safety-critical blocklist) and notifications.voltage_hysteresis_seconds; extensive validation for redundancy groups, notification suppression, and voltage hysteresis settings.
CLI & TUI Updates
src/eneru/cli.py, src/eneru/tui.py
Extends validate command to enumerate redundancy groups with quorum/health policies; adds monitor subcommand flags --graph, --time, --events-only for headless graph/events rendering; switches to MultiUPSCoordinator when redundancy groups present.
Core Module Exports
src/eneru/__init__.py
Exports new redundancy/health/stats/graph entities: RedundancyGroupConfig, RedundancyGroupEvaluator, RedundancyGroupExecutor, UPSHealth, assess_health, StatsConfig, StatsRetentionConfig, StatsStore, StatsWriter, BrailleGraph.
Package & Manifest
nfpm.yaml, src/eneru/version.py, src/eneru/__init__.py
Adds modules health_model.py, redundancy.py, stats.py, graph.py; creates /var/lib/eneru directory for SQLite stats storage; bumps version to 5.1.0-rc7.
Documentation
docs/redundancy-groups.md, docs/statistics.md, docs/tui-graphs.md, docs/triggers.md, docs/notifications.md, docs/testing.md, docs/changelog.md
Comprehensive docs for redundancy group configuration/runtime/behavior, SQLite schema/retention/config, Braille graph rendering/keybindings, voltage threshold handling/hysteresis, notification suppression controls, and updated test coverage metrics (637 tests across 25 files).
E2E Configuration
tests/e2e/config-e2e*.yaml
Adds statistics.db_directory to all E2E configs; new configs for redundancy-group scenarios (config-e2e-redundancy.yaml, config-e2e-redundancy-cross-group.yaml, config-e2e-redundancy-separate-eneru.yaml), stats validation (config-e2e-stats.yaml), and voltage auto-detect (config-e2e-voltage-autodetect.yaml).
E2E Workflow
.github/workflows/e2e.yml
Adds /var/lib/eneru provisioning; extends test coverage to 31 tests: Tests 20–27 cover redundancy-group config validation and quorum/shutdown scenarios; Tests 28–31 add SQLite stats DB validation, stats-writer error isolation, and monitor --graph/--events-only TUI assertions.
Unit Tests
tests/test_*.py (12 new files + expansions)
New test modules: test_health_model.py, test_redundancy.py, test_stats.py, test_graph.py, test_voltage.py, test_packaging.py. Expanded: test_config_loading.py, test_config_validation.py, test_monitor_core.py, test_multi_ups.py, test_state.py, test_tui.py. Coverage includes health classification, redundancy wiring, SQLite persistence, Braille rendering, voltage hysteresis, and packaging structural validation.
Development Docs
CLAUDE.md, src/eneru/CLAUDE.md, docs/roadmap.md
Guidance on SQLite schema evolution (append-only migrations, idempotent ALTER TABLE); roadmap updated to "rc5" with implementation status and feature descriptions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

The PR introduces four orthogonal major features (redundancy groups, SQLite stats, Braille graphs, voltage hysteresis) with significant logic density and integration across monitor, coordinator, state, and CLI. While changes are well-organized by feature, the heterogeneity of concerns, control-flow complexity (especially health model classification and quorum evaluation), and broad file spread across core modules (60+ files) warrant careful review. Extensive test coverage and documentation mitigate complexity somewhat.

Possibly related PRs

Poem

🐰 Whiskers twitch with glee—
Quorum counts and graphs appear,
Statistics flow so free,
Voltage settles without fear,
Redundancy brings peace!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/v5.1-phase-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 86.51778% with 163 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.11%. Comparing base (3c3501e) to head (210b0a0).

Files with missing lines Patch % Lines
src/eneru/tui.py 73.06% 65 Missing and 8 partials ⚠️
src/eneru/cli.py 9.67% 27 Missing and 1 partial ⚠️
src/eneru/monitor.py 77.90% 17 Missing and 2 partials ⚠️
src/eneru/redundancy.py 90.64% 12 Missing and 4 partials ⚠️
src/eneru/stats.py 95.56% 9 Missing ⚠️
src/eneru/graph.py 93.39% 5 Missing and 2 partials ⚠️
src/eneru/health/voltage.py 93.96% 6 Missing and 1 partial ⚠️
src/eneru/config.py 97.22% 1 Missing and 3 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   70.99%   79.11%   +8.11%     
==========================================
  Files          19       23       +4     
  Lines        2310     3457    +1147     
  Branches      470      675     +205     
==========================================
+ Hits         1640     2735    +1095     
- Misses        548      580      +32     
- Partials      122      142      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

m4r1k and others added 11 commits April 20, 2026 15:01
Introduces the Phase 2 config layer for redundancy groups -- the config
mirror of UPSGroupConfig that lets multiple UPS sources share a single
quorum-gated shutdown decision (dual-PSU racks, A+B feeds, etc.).

Behaviour at runtime is unchanged in this commit: the dataclass is
parsed and validated but not yet wired into the monitor. The evaluator,
advisory triggers, and executor land in the next commit.

Config layer:
- New RedundancyGroupConfig dataclass mirroring UPSGroupConfig in full
  (name, ups_sources, min_healthy, degraded/unknown_counts_as, is_local,
  triggers, remote_servers, virtual_machines, containers, filesystems).
- ConfigLoader._parse_redundancy_groups parses the new YAML section,
  inheriting global triggers when the per-group block is omitted.
- validate_config gains the rules: 1 <= min_healthy <= |ups_sources|
  (== |ups_sources| warns "no redundancy"); reject 0/negative/non-int;
  reject empty/duplicate/missing names; reject unknown UPS references;
  reject duplicate sources; enum-check degraded/unknown_counts_as;
  reject local resources on a non-is_local group; enforce at most one
  is_local across all UPS+redundancy groups; reject remote-server
  (host,user) conflicts across tiers and across redundancy groups.
- The pre-existing "multiple UPS groups marked is_local" rule is
  subsumed by the new combined rule and kept message-substring-stable
  for downstream tests.

CLI: eneru validate now prints a "Redundancy groups" section listing
sources, quorum, remote servers, and (when is_local) local resources.

Examples & docs:
- New examples/config-redundancy.yaml -- minimal dual-PSU config.
- examples/config-reference.yaml gains a fully-commented
  redundancy_groups block.
- New docs/redundancy-groups.md with the concept guide, min_healthy
  semantics, scenario tables, and unknown_counts_as rationale; linked
  from configuration.md and registered in mkdocs.yml.

Tests:
- tests/test_config_loading.py +9 tests for parsing + defaults +
  inheritance + multi-group / malformed-entry handling.
- tests/test_config_validation.py +24 tests covering every rule above.
- tests/test_multi_ups.py: 1 message-substring tweak for the combined
  is_local check.

E2E:
- tests/e2e/config-e2e-redundancy.yaml dual-source config.
- New e2e step "Test 20" asserts (a) the valid config validates clean
  with the redundancy section surfaced, and (b) a min_healthy=0 config
  exits non-zero with the expected error.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…executor

The Phase 2 behavior commit. Brings redundancy groups online: a UPS
listed under `redundancy_groups[*].ups_sources` continues to poll on its
own thread but its per-UPS triggers now record an *advisory* state
flag instead of running a local shutdown. A separate evaluator thread
(one per group, ~1s tick) reads each member's snapshot under the state
lock, applies the group's `degraded_counts_as` / `unknown_counts_as`
policy, and only fires the group's executor when
`healthy_count < min_healthy`.

Health model
- New src/eneru/health_model.py exposes `UPSHealth` (str enum:
  HEALTHY/DEGRADED/CRITICAL/UNKNOWN) and a pure `assess_health(snapshot,
  triggers, check_interval)` function. Order: stale (`>5*check_interval`)
  / FAILED connection -> UNKNOWN; trigger_active or `FSD` in status ->
  CRITICAL; OB or GRACE_PERIOD -> DEGRADED; else HEALTHY. Module name
  avoids collision with the existing eneru/health/ mixin package.

Redundancy runtime
- New src/eneru/redundancy.py:
  - `RedundancyGroupEvaluator(threading.Thread)`: reads snapshots, maps
    DEGRADED/UNKNOWN per the group's policy, edge-detected logging
    ("quorum LOST" / "quorum restored"), idempotent firing.
  - `RedundancyGroupExecutor`: composes VM/Container/Filesystem/Remote
    shutdown mixins so the group reuses multi-phase ordering,
    `shutdown_safety_margin`, and deadline-based join byte-identically.
    Per-group flag file at `/var/run/ups-shutdown-redundancy-<sanitized>`
    keeps the shutdown idempotent; sanitization mirrors the per-UPS path.

Monitor advisory branches
- `UPSGroupMonitor` gains `in_redundancy_group: bool` ctor arg + two
  helpers (`_record_advisory_trigger`, `_clear_advisory_trigger`) that
  set/clear `state.trigger_active`/`trigger_reason` under the state lock.
- 3 trigger sites switch on `self._in_redundancy_group`:
    1. T1-T4 in `_handle_on_battery` (line ~680)
    2. FAILSAFE in `_main_loop` (lines ~818-833)
    3. FSD in `_main_loop` (lines ~921-922)
  The non-redundancy `else:` branch is byte-identical to the previous
  code path -- regression tests guard this. Returning to OL or
  recovering from FAILED clears the advisory.

Coordinator wiring
- `MultiUPSCoordinator` precomputes the in-redundancy UPS-name set,
  passes the flag to each `UPSGroupMonitor`, and after monitor startup
  spins up one `RedundancyGroupExecutor` + `RedundancyGroupEvaluator`
  per `config.redundancy_groups` entry. `_wait_for_completion` and
  `_handle_signal` now also track / join evaluator threads.
- CLI `_cmd_run` routes through the coordinator when the config has
  redundancy groups even if `multi_ups` is False.

Packaging / public API
- `nfpm.yaml`: `health_model.py` and `redundancy.py` added to the
  per-file `contents:` list (deb/rpm builds enumerate, never glob).
- Public exports: UPSHealth, assess_health, RedundancyGroupEvaluator,
  RedundancyGroupExecutor.

Tests (+78)
- `test_health_model.py` (32): parametrised classification table,
  staleness vs check_interval, priority order between tiers, enum API.
- `test_redundancy.py` (28): evaluator counting + policy translation,
  cross-group cascade, executor synthetic Config wiring + flag-file
  namespace + sanitization, dry-run cleanup, idempotency in-process and
  against pre-existing flag files, local-resource gating on `is_local`,
  log-prefix + `@`-escape.
- `test_monitor_core.py` (+12): advisory wiring per trigger site +
  regression tests `test_failsafe_unchanged_for_single_ups` and
  `test_failsafe_unchanged_for_independent_group`.
- `test_multi_ups.py` (+6): coordinator builds `_in_redundancy` set,
  passes the flag to every monitor in order, instantiates evaluator +
  executor per redundancy group, joins evaluator threads on signal.

E2E (+7 scenarios)
- New configs: `config-e2e-redundancy-cross-group.yaml`,
  `config-e2e-redundancy-separate-eneru.yaml`.
- Tests 21-27 cover: quorum holds (1 of 2 healthy); quorum exhausted
  (both critical); UNKNOWN handling default; both UNKNOWN -> fail-safe;
  cross-group cascade (UPS in both indep + redundancy); advisory-mode
  log signature; separate-Eneru-UPS topology.

Docs
- `docs/redundancy-groups.md` extended with the cascade lifecycle, a
  dual-PSU timeline table, and load-redistribution guidance.
- `docs/triggers.md` gains a "Triggers in redundancy groups" section.
- `docs/troubleshooting.md` gains "Why isn't my redundancy-group server
  shutting down?".
- `docs/testing.md` updated counts: 410 -> 529 unit, 19 -> 27 E2E.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two layered additions:

1. Per-UPS SQLite statistics store (Phase 2 spec 2.12)
2. Redundancy-evaluator startup grace -- a regression fix discovered
   in CI Test 21 against the previous commit

---

Statistics store
- New src/eneru/stats.py exposes:
  - StatsStore: WAL-mode SQLite store with `synchronous=NORMAL`,
    schema (samples / agg_5min / agg_hourly / events / meta), 10
    documented metrics. The hot path is `buffer_sample()` -- a
    constant-time append to an in-memory deque, zero I/O. Public
    methods (open/close/flush/aggregate/purge/log_event/query_range/
    query_events/open_readonly) all catch `sqlite3.Error` + `OSError`,
    log once with rate-limit, and swallow.
  - StatsWriter(threading.Thread): drains the buffer every 10 s,
    runs aggregate+purge every 5 min, also flushes on shutdown.
  - SAMPLE_FIELDS, SCHEMA_VERSION, BUCKET_5MIN, BUCKET_HOURLY constants.
- New StatsConfig + StatsRetentionConfig dataclasses
  (`statistics:` YAML key, `db_directory: /var/lib/eneru` default,
  retention windows 24 h / 30 d / 5 y per tier).
- UPSGroupMonitor wiring:
  - One StatsStore per UPS at `<db_directory>/<sanitized-name>.db`.
  - `_initialize` opens the store and starts the writer; failures log
    once and disable persistence for the run (daemon keeps running).
  - `_save_state` calls `buffer_sample(...)` after the text-state
    write -- still zero I/O on the hot path.
  - `_log_power_event` calls `log_event(...)` so power events appear
    in the events table.
  - `_cleanup_and_exit` flushes + closes via `_stop_stats()`.
- CLI: routes through MultiUPSCoordinator when `redundancy_groups` is
  set even with a single ups_group (already in the previous commit;
  unchanged here).
- Public API: StatsConfig, StatsRetentionConfig, StatsStore, StatsWriter
  exported from eneru/__init__.py.
- Packaging:
  - nfpm.yaml gains a `contents:` entry for src/eneru/stats.py and a
    directory entry creating `/var/lib/eneru` (mode 0755, root:root)
    on deb/rpm install.
  - Pip installs handle the directory creation defensively in
    `StatsStore.open()`.
- Example config: examples/config-reference.yaml gains a documented
  `statistics:` section.

Tests (+47):
- tests/test_stats.py (42): schema + WAL/synchronous pragmas;
  in-memory-only buffer; thread-safe buffering across 10 producers;
  loose constant-time microbench; deque overflow drops oldest; lenient
  numeric coercion; flush single-transaction; aggregate min/max/avg
  semantics; 5-min -> hourly rollup with bucket alignment; purge per
  tier; query_range tier-selection rules; query_range NULL filtering;
  events round-trip and inclusive bounds; open_readonly returns None
  for missing DB and rejects writes; concurrent reader+writer under
  WAL; StatsWriter thread lifecycle + shutdown flush; failure-isolation
  contract for every public method; rate-limited error logging;
  StatsConfig YAML round-trip + defaults.
- tests/test_packaging.py (3, NEW FILE): structural defense against
  PR #23-class bugs. Asserts every src/eneru/**/*.py is referenced by
  nfpm.yaml; no dangling `src:` references; `/var/lib/eneru` directory
  entry is present.

E2E (+2 scenarios):
- tests/e2e/config-e2e-stats.yaml: single-UPS + writable /tmp DB dir.
- Test 28: DB created, samples populated, DAEMON_START event recorded.
- Test 29: Stats writer failure isolation -- a broken db_directory
  (file-where-dir-expected) logs the warning but does not crash.

Docs:
- New docs/statistics.md: hybrid architecture rationale, schema, SD-card
  / Raspberry Pi guidance, sqlite3 inspection recipes, backup, failure
  isolation. Linked from configuration.md and registered in mkdocs.yml.
- docs/testing.md: counts updated 529 -> 577 unit, 27 -> 29 E2E.

---

Redundancy evaluator startup grace (CI fix)

CI Test 21 caught a regression in the previous commit: the evaluator
ran its first tick before the per-UPS monitors had taken their
initial poll, so every member's snapshot had `last_update_time == 0`
and was classified UNKNOWN. With the default
`unknown_counts_as: critical`, the evaluator spuriously fired the
group's shutdown sequence at start-up.

Fix: RedundancyGroupEvaluator gains a `startup_grace_seconds`
parameter that defaults to `5 * max(member check_interval) + 5` s,
mirroring the stale-snapshot rule. The evaluator waits this long
before its first evaluation, giving monitors time to publish real
snapshots. Override is exposed for tests.

E2E timeouts bumped to clear the grace window (Tests 21-27).

Tests (+3 in test_redundancy.py): default grace from check_interval,
explicit override, regression test that reproduces the spurious
UNKNOWN fire and verifies the grace prevents it.

---

Cumulative test totals after this commit:
- 577 unit tests (was 529)
- 29 E2E scenarios (was 27)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two layered additions:

1. BrailleGraph module + TUI graph integration (Phase 2 spec 2.13)
2. Test 28 (SQLite stats persistence) hardening -- the daemon's
   `_wait_for_initial_connection` was eating the entire 25 s test
   timeout in CI, and the asserted DB filename was wrong for single-UPS
   mode. Both fixed in this commit per the user's "bundle into the
   running commit" workflow.

---

BrailleGraph + TUI graphs

New src/eneru/graph.py (BrailleGraph class):
- Pure, stateless renderer using Unicode Braille pattern characters
  (U+2800-U+28FF). Each terminal cell encodes a 2x4 dot grid -- 8
  binary pixels per cell -- giving high-density line graphs in a few
  rows of text.
- supported(): LANG / locale-based capability check; falls back to
  block characters (`▁ ▂ ▃ ▄ ▅ ▆ ▇ █`) when not capable.
- plot(data, width=, height=, y_min=, y_max=, force_fallback=) -> List[str]
  Auto-scales when bounds omitted; clips out-of-range; skips None /
  non-numeric inputs.
- code_point() / cell() expose the dot-bitmask arithmetic for tests.
- render_to_window() best-effort curses helper for callers that want
  layout for free.

TUI integration (src/eneru/tui.py):
- New keybindings:
  - G  cycle graph mode: off → charge → load → voltage → runtime
  - T  cycle time range: 1h → 6h → 24h → 7d → 30d
  - U  (multi-UPS) cycle which UPS the graph shows
- New helpers:
  - cycle()                     pure helper used by G/T/U
  - stats_db_path_for()         mirrors MultiUPSCoordinator's UPS-name
                                sanitization so the TUI opens the same
                                DB file the daemon writes
  - query_metric_series()       uses StatsStore.open_readonly() (URI
                                ?mode=ro), reuses query_range tier-
                                selection, lazy-opens on first non-off
                                graph mode
  - render_graph_text()         line-list rendering used by both
                                run_once --graph and the curses panel
  - render_graph_panel()        curses panel placed between the config
                                and logs panels when graph_mode != off
- footer hints updated to advertise <G> <T> <U>.

CLI (src/eneru/cli.py):
- monitor --graph {charge,load,voltage,runtime} renders the Braille
  graph in run_once mode (no curses), suitable for scripts and CI.
- monitor --time {1h,6h,24h,7d,30d} pairs with --graph.

Public API: BrailleGraph exported from eneru/__init__.py.

Packaging: nfpm.yaml gains a `contents:` entry for src/eneru/graph.py.

Tests (+34):
- tests/test_graph.py (24 NEW): code-point arithmetic vs hand-computed
  glyphs (top-left, top-right, bottom row, blank, all-dots, invalid);
  supported() detection (LANG=C, UTF-8 vs ISO-8859-1); plot() geometry
  and auto-scale (max@top, min@bottom, zero-range padding); explicit
  bounds clipping (above, below, NULL skipped); fallback path; curses
  render_to_window helper.
- tests/test_tui.py (+10): cycle() advances/wraps/resets; stats DB
  path mirrors daemon for single + multi UPS; render_graph_text
  no-data placeholder, with-samples (writes to a real per-UPS DB),
  unknown-metric path; run_once --graph emits the graph block;
  run_once without --graph does NOT emit it.

E2E (+1):
- Test 30: `eneru monitor --once --graph charge --time 1h` against the
  config-e2e-stats.yaml DB. Reuses Test 28's seeded DB when available;
  falls back to spinning a fresh daemon. Asserts the graph header
  ("charge -- last 1h") and y-axis label ("y-axis: 0-100%") appear.

Docs:
- New docs/tui-graphs.md: keybindings reference, time-range tier
  selection table, headless `monitor --once --graph` recipe, fallback
  behaviour, troubleshooting. Linked from mkdocs.yml.
- docs/testing.md: counts updated 577 -> 611 unit, 29 -> 30 E2E.

---

Test 28 hardening (CI fix)

Symptoms in CI: Test 28 timed out at 25 s with only "Checking initial
connection to TestUPS@localhost:3493..." in the daemon log. Two bugs:

1. The asserted DB filename was wrong. UPSGroupMonitor in single-UPS
   mode has `state_file_suffix=""` -> sanitized="default" -> DB at
   `<dir>/default.db`. The test was looking for the multi-UPS-style
   `TestUPS-localhost-3493.db` and failing the existence check.

2. The daemon's `_wait_for_initial_connection` is bounded at 30 s
   (5 attempts × 5 s). With a 25 s test timeout, the daemon never
   reached `_main_loop` to start collecting samples. The test killed
   it mid-wait.

Fixes (in this commit, no code change required):
- Test 28 + Test 30 use the correct DB filename (`default.db`).
- Test 28 + Test 30 pre-check NUT responds before launching the
  daemon (15 × 1 s `upsc` probe).
- Daemon timeouts bumped from 25 s to 50 s so even the worst-case
  connection-wait + writer-flush cycle has headroom.
- PYTHONUNBUFFERED=1 keeps stdout line-buffered under `tee`.

---

Cumulative test totals after this commit:
- 611 unit tests (was 577)
- 30 E2E scenarios (was 29)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…cation

Two layered changes per the user's "bundle into the running commit"
workflow:

1. TUI events panel sourced from each UPS's SQLite events table
   (Phase 2 spec, with parse_log_events kept as a fallback).
2. Width-aware text truncation in the gold logs panel -- a regression
   the user reported where emoji-heavy event lines spilled past the
   panel's right edge.

---

SQLite events panel

- New `query_events_for_display(config, time_range_seconds)` reads the
  per-UPS events table from each UPS's SQLite store via
  `StatsStore.open_readonly` (URI ?mode=ro). Rows are formatted as:
    HH:MM:SS  [LABEL] event_type: detail
  ([LABEL] is suppressed in single-UPS mode). All UPSes' rows are
  merged and sorted by timestamp; capped via `max_events`.
- The function returns `[]` when no DB exists for any UPS, signalling
  callers to fall back to `parse_log_events` (the v5.0 log-tail path).
  This keeps fresh installs and sandbox runs functional.
- `run_tui` and `run_once` now prefer the SQLite path with the
  documented fallback.
- New `eneru monitor --once --events-only` flag prints just the events
  list (no status/resources/graph block) for scripts and CI.

Tests (+9):
- Single-UPS events: no `[label]` prefix.
- Multi-UPS events: `[label]` prefix; rows from different UPSes
  interleave by timestamp.
- Time-window filter (older events excluded), `max_events` cap.
- `run_once --events-only`: prints only events; falls back to log
  tail when no DB; "(no events)" placeholder when neither has data.

E2E (+1):
- Test 31: injects a known event row directly into the seeded
  SQLite DB and asserts `eneru monitor --once --events-only`
  surfaces it.

---

Width-aware truncation (gold logs panel overflow fix)

The previous truncation in `render_logs_panel` and `safe_addstr`
counted code points -- which over-counts ASCII (1 cell each, fine)
but UNDER-counts emoji and CJK (each ≈ 2 cells in most terminals).
Long emoji-rich event lines therefore painted past the panel's
visible right edge, breaking the gold border the user pointed out.

- New `display_width(text)` helper: every code point at or above
  U+1100 counts as 2 cells (covers emoji + CJK ranges); everything
  else counts as 1. Conservative -- it occasionally over-truncates
  exotic glyphs, never under-truncates.
- New `truncate_to_width(text, max_width)` helper: returns the
  longest prefix whose `display_width` is <= `max_width`, never
  splitting a double-width glyph in half.
- `safe_addstr` clips by display-cell width, not character count,
  before calling `addnstr`. The right gutter is preserved verbatim.
- `render_logs_panel` uses `display_width` + `truncate_to_width`
  with a 2-cell budget for the trailing "..".

Tests (+8):
- ASCII width == len; emoji counted as 2; CJK counted as 2.
- truncate_to_width: short input passes through; ASCII clip; clip
  before partial emoji; zero max returns "".
- `render_logs_panel` regression: a fake window records every
  `addnstr` and asserts no painted line's display width exceeds
  the visible width, even with an emoji-heavy event.

Bundled into Commit 6 per the user's request to land TUI fixes
together with the SQLite events panel work.

---

Cumulative test totals after this commit:
- 628 unit tests (was 611)
- 31 E2E scenarios (was 30)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Three layered changes that wrap up the Phase 2 series:

1. Version bump 5.1.0-rc3 -> 5.1.0-rc4 plus changelog and roadmap
   updates so the [Unreleased] block reflects everything in this PR.
2. Humanizer pass over every doc page touched by Phase 2 -- removes
   AI-writing tells (em-dash overuse, rule-of-three, vague "**X:**"
   bolded headers, copula avoidance, superficial -ing analyses,
   filler phrases) without changing technical accuracy.
3. CI noise fix: every existing `tests/e2e/config-*.yaml` now sets
   `statistics.db_directory: /tmp/eneru-e2e-stats` so the daemon
   stops logging "[Errno 13] Permission denied: '/var/lib/eneru'"
   on the unprivileged GitHub runner. The new stats config files
   shipped in Commits 4-6 already carry the override.

---

Version + changelog
- src/eneru/version.py: 5.1.0-rc3 -> 5.1.0-rc4. The final v5.1.0
  tag will follow once the user has run real-world hardware tests
  on rc4. Changelog entry stays under [Unreleased].
- docs/changelog.md: Phase 2 additions appended to the existing
  Added / Changed / Migration notes / Technical details sections.
  Notes the always-on stats DB at /var/lib/eneru/<sanitized>.db
  (created on first start of the upgraded daemon) and points the
  SD-card profile at docs/statistics.md.
- docs/roadmap.md: marks v5.1 implementation complete (rc4
  available for hardware testing). The package-channels item is
  marked deferred to a future point release.

Humanizer pass
The user asked for /humanizer over every doc I added or modified
in the Phase 2 PR. Files touched (and only the new sections, not
unrelated content):
- docs/redundancy-groups.md (NEW): full rewrite for terser voice
- docs/statistics.md (NEW): full rewrite
- docs/tui-graphs.md (NEW): full rewrite
- docs/changelog.md: only the [Unreleased] block
- docs/roadmap.md: only the v5.1 block
- docs/triggers.md: only the new "Triggers in redundancy groups"
- docs/troubleshooting.md: only the new "Why isn't my redundancy-
  group server shutting down?"
- docs/testing.md: only the Phase 2 entries in the test-coverage
  list and the new Tests 20-31 rows
- docs/configuration.md: only the new validate-checks bullet and
  the statistics link
The mkdocs build remains strict-clean.

CI noise fix
The user reported many "/var/lib/eneru failed: [Errno 13] Permission
denied" lines in the green CI output. The daemon defaults to
/var/lib/eneru, which the unprivileged CI runner can't create.
Stats failure is non-fatal (the daemon catches OSError, logs once,
keeps running) but the warning was cluttering CI logs across
Tests 1-19 plus 21-27. Every existing tests/e2e/config-*.yaml that
runs the daemon now overrides db_directory to /tmp/eneru-e2e-stats.
The configs added in earlier Phase 2 commits (config-e2e-stats.yaml
and the redundancy / cross-group / separate-eneru configs) already
carry the override.

---

The full Phase 2 stack is now on the branch. CI's 31/31 e2e tests
green; 628 unit tests across 25 files; mkdocs --strict clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Post-rc4 audit flagged three test files coming in below the plan's
suggested counts. Coverage areas were already represented; these
tests pin the remaining edge cases the audit called out as
genuinely worth catching.

tests/test_stats.py (+6 in new TestEdgeCases class):
- schema_version_persists_across_reopen: catches the "schema reset
  on reopen" regression class.
- text_fields_round_trip: status + connection_state survive flush
  (only their numeric siblings were directly asserted before).
- query_range_for_unaggregated_metric_at_agg_tier_returns_empty:
  output_voltage / depletion_rate aren't in agg_5min/agg_hourly;
  the SQL references a non-existent column and the swallow path
  must return [], not propagate.
- aggregate_single_sample_yields_min_eq_max_eq_avg: the boundary
  case where AVG/MIN/MAX collapse to one number.
- purge_keeps_row_at_exact_cutoff: pins the `WHERE ts < cutoff`
  semantics (rows AT the cutoff stay, only strictly older rows go).
- query_range_empty_window_returns_empty_list: covers both
  no-rows-in-window and inverted (start > end) windows.

tests/test_redundancy.py (+3):
- Two TestThreeStateMix tests: a 3-UPS group with one HEALTHY +
  one DEGRADED + one CRITICAL member at the same time, evaluated
  under both `degraded_counts_as: healthy` (yields healthy_count=2,
  no shutdown) and `degraded_counts_as: critical` (yields
  healthy_count=1, fires). The earlier policy tests only mixed two
  states at a time.
- TestExecutorNotificationContent: the executor's headline shutdown
  notification actually contains the group name, the reason string,
  and every UPS source (with the @-escape applied). Only the
  @-escape behaviour was asserted before.

Test counts: 628 -> 637 unit tests across 25 files. No behaviour
change; pure test coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Bump version 5.1.0-rc4 -> 5.1.0-rc5 and refresh the test count
references that lagged behind the +9 tests added in 50f27c2.

- src/eneru/version.py: 5.1.0-rc4 -> 5.1.0-rc5
- docs/testing.md: 628 -> 637 unit tests across 25 files
- docs/changelog.md: rc4 -> rc5 in the Unreleased status block, the
  migration-notes line, and the technical-details test counts
  (410 -> 628 -> 637)
- docs/roadmap.md: v5.1 header and "available for hardware testing"
  line bumped to rc5

Verified locally: eneru version -> "Eneru v5.1.0-rc5"; 637 unit
tests pass; mkdocs --strict clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- I1: provision /var/lib/eneru in e2e workflow so Tests 1-27 exercise
  the same path users hit after deb/rpm install (no more silent
  "stats store open failed" warnings disabling stats persistence).
- I2: stats schema bumped 1 -> 2. Added 4 raw NUT metrics from spec
  2.12 (battery.voltage, ups.temperature, input.frequency,
  output.frequency); closes S3 by adding output_voltage_avg to the
  agg tables. Auto-migrates pre-existing v1 DBs via idempotent
  ALTER TABLE on first start; existing samples preserved with NULLs
  for new columns.
- I3: TUI graphs blend SQLite + a per-UPS live deque (spec 2.13).
  60-entry deque populated from the state file each refresh;
  query_metric_series extends the SQLite tail with newer entries
  deduped by timestamp. Bridges the 0-10s flush gap so the graph's
  right edge stays current.
- S1: PRAGMA busy_timeout=500 on both stats SQLite connections
  (writer + readonly). Bounds reader-writer contention to half a
  second on slow storage (SD card on Pi-class hardware).
- S2: TUI footer interpolates current cycle state -- "<G> Graph:
  charge", "<T> Time: 1h", "<U> UPS: 1/2" instead of static labels.
  Truncates gracefully on narrow terminals.

Cuts 5.1.0-rc6 for hardware verification. Test counts: 649 -> 658.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
#27)

Issue #27 reported noisy OVER_VOLTAGE notifications on a US 120V grid
where the UPS firmware mis-reports input.voltage.nominal=230. Five-part
fix that solves the noise WITHOUT exposing safety-threshold overrides
(which would let a misconfiguration mask real hardware-damaging events):

1. B1 -- Auto-detect: input.voltage.nominal is snapped to the nearest
   standard grid (100/110/115/120/127/200/208/220/230/240) at startup.
   After ~10 polls, the median of observed input.voltage is cross-
   checked against NUT's nominal; if they disagree by more than 25V
   the nominal is re-snapped (US-grid case where firmware reports
   230V on a 120V UPS) and a VOLTAGE_AUTODETECT_MISMATCH event is
   recorded. No user-tunable warning_low / warning_high / nominal
   override -- those would defeat the safety contract.

2. B2 -- notifications.voltage_hysteresis_seconds (default 30s)
   debounces transient flaps. The OVER_VOLTAGE/BROWNOUT log row +
   SQLite event are written immediately on transition (operational
   record is sacred); only the notification dispatch is gated. A 2s
   spike no longer emails; a sustained 30s over-voltage still does
   with a "(persisted Ns)" annotation.

3. B3 -- notifications.suppress lets users mute chatty informational
   events (AVR_BOOST_ACTIVE, VOLTAGE_NORMALIZED, BYPASS_MODE_INACTIVE,
   etc.). Safety-critical events (OVER_VOLTAGE_DETECTED,
   BROWNOUT_DETECTED, OVERLOAD_ACTIVE, BYPASS_MODE_ACTIVE, ON_BATTERY,
   CONNECTION_LOST, anything starting with SHUTDOWN_) are hard-blocked
   in config validation with an error pointing at hysteresis as the
   right knob for flap-debounce.

4. B4 -- Stats schema bumped 2 -> 3 with idempotent migration:
   events.notification_sent INTEGER DEFAULT 1 lets users audit muted
   vs delivered events. Pre-existing rows backfill to 1 (the v2
   daemon always notified). New event types VOLTAGE_AUTODETECT_MISMATCH
   and VOLTAGE_FLAP_SUPPRESSED round out the audit trail.

5. Documentation: schema-evolution convention now lives in both the
   root CLAUDE.md (one-liner under Conventions) and src/eneru/CLAUDE.md
   (full pattern + when-to-add-a-column guidance). Future features
   that grow persistent state follow the documented mechanic.

Test 32 in the e2e workflow exercises the headline US-grid scenario
end-to-end: a dummy NUT reporting nominal=230 + actual=120 must
produce the re-snap log line, the VOLTAGE_AUTODETECT_MISMATCH event
in SQLite (with notification_sent=0), and a meta.schema_version=3
DB. Runs against a real /var/lib/eneru thanks to the rc6 CI fix.

Cuts 5.1.0-rc7 for hardware verification. Test counts: 658 -> 711
(+53: 17 config validation, 13 schema migration + log_event audit,
28 voltage auto-detect + hysteresis).

Closes #27.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
The single-UPS dict form (`ups: { name: ... }`) writes its stats DB
to `<db_directory>/default.db`, not to a sanitized-name path -- the
sanitized path is reserved for multi-UPS list configs (see
tui.py:stats_db_path_for). Test 32 was checking
`/var/lib/eneru/TestUPS-localhost-3493.db` which never gets created
in the single-UPS code path; the daemon was working correctly all
along (the auto-detect re-snap log line and VOLTAGE_AUTODETECT_MISMATCH
event were both produced; PASS 32a + 32b were green) but the test's
existence check was looking at the wrong file.

Also filter the events query by `ts >= T_START` so we assert the
event came from THIS test step, not from leftover rows in default.db
written by earlier single-UPS e2e tests (default.db is shared).

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@m4r1k m4r1k marked this pull request as ready for review April 20, 2026 22:56
@m4r1k m4r1k merged commit 2f7f151 into main Apr 20, 2026
34 checks passed
@m4r1k m4r1k deleted the feat/v5.1-phase-2 branch April 20, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants