Skip to content

Commit d107368

Browse files
authored
Add reconnect logic for handling update failures in DbusService (#250)
* Add reconnect logic for AhoyDTU/OpenDTU after failed update attempts * Add reconnect logic tests for DbusService to handle update failures * Refactor DbusService tests to reset meter data and clean up comments * Implement reconnect logic in DbusService to handle update failures and manage status codes * Refactor ReconnectLogicTest to improve setup and add comprehensive tests for reconnect behavior and status code handling * Enhance reconnect logic in DbusService and configuration files to improve error handling and retry mechanisms * Add configuration options for retry logic in README and config.example * Refactor status code handling in DbusService to use constants and improve update logic * Add error handling modes and update logic in DbusService and tests - Introduced ErrorMode configuration to switch between "retrycount" and "timeout" modes. - Updated DbusService to handle error states based on the selected mode. - Enhanced unit tests to cover new timeout behavior and ensure correct status code handling. * Refactor DbusService update logic to ensure _refresh_data is called in normal operation and add unit tests for successful update scenarios * Add error handling modes to README and config.example for improved error management * Enhance error handling configuration in DbusService and tests - Added error handling mode and related configuration options to config.example. - Refactored DbusService to initialize and load error handling properties from configuration. - Updated unit tests to verify correct reading of error handling configuration values. * Refactor error handling logic in DbusService to fix the retry condition * Refactor error handling configuration in DbusService to use constants and update several places accordingly to the code review * minor changes
1 parent c2ad8e2 commit d107368

5 files changed

Lines changed: 445 additions & 53 deletions

File tree

README.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
- [Default options](#default-options)
1616
- [Inverter options](#inverter-options)
1717
- [Template options](#template-options)
18+
- [Error Handling Modes](#error-handling-modes)
1819
- [Service names](#service-names)
1920
- [Videos how to install](#videos-how-to-install)
2021
- [Use Cases](#use-cases)
@@ -122,12 +123,15 @@ Within the project there is a file `/data/dbus-opendtu/config.ini`. Most importa
122123
| useYieldDay | send YieldDay instead of YieldTotal. Set this to 1 to prevent VRM from adding the total value to the history on one day. E.g. if you don't start using the inverter at 0. |
123124
| ESP8266PollingIntervall | For ESP8266 reduce polling intervall to reduce load, default 10000ms|
124125
| Logging | Valid options for log level: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET, to keep logfile small use ERROR or CRITICAL |
125-
MaxAgeTsLastSuccess | Maximum accepted age of ts_last_success in Ahoy status message. If ts_last_success is older than this number of seconds, values are not used. Set this to < 0 to disable this check. |
126+
| MaxAgeTsLastSuccess | Maximum accepted age of ts_last_success in Ahoy status message. If ts_last_success is older than this number of seconds, values are not used. Set this to < 0 to disable this check. |
126127
| DryRun | Set this to a value different to "0" to prevent values from being sent. Use this for debugging or experiments. |
127128
| Host | IP or hostname of ahoy or OpenDTU API/web-interface |
128129
| HTTPTimeout | Timeout when doing the HTTP request to the DTU or template. Default: 2.5 sec |
129130
| Username | use if authentication required, leave empty if no authentication needed |
130131
| Password | use if authentication required, leave empty if no authentication needed |
132+
| MinRetriesUntilFail | Minimum number of consecutive update failures before entering error state (StatusCode=10, zero values). Default is 3. |
133+
| RetryAfterSeconds | If AhoyDTU/OpenDTU is not reachable, try to reconnect after this many seconds. Default is 120. |
134+
| ErrorMode | Error handling mode: `retrycount` (default, error after N failures) or `timeout` (error after a time period without success). See section below for details. |
131135

132136
*1: Please assure that the order is correct in the DTU, we can only extract the first one in a row.
133137

@@ -182,6 +186,54 @@ This applies to each `TEMPLATE[X]` section. X is the number of Template starting
182186

183187
*4: Path in JSON: use keywords and array index numbers separated by `/`. Example (compare [tasmota_shelly_2pm.json](docs/tasmota_shelly_2pm.json)): `StatusSNS/ENERGY/Current/0` fetches dictionary (map) entry `StatusSNS` containting an entry `ENERGY` containing an entry `Current` containing an array where the first element (index 0) is taken.
184188

189+
---
190+
191+
#### Error Handling Modes
192+
193+
The error handling behavior of dbus-opendtu can be configured using the `ErrorMode` and `ErrorStateAfterSeconds` options in your configuration file. This allows you to choose between two flexible strategies for handling communication errors with your DTU (Data Transfer Unit):
194+
195+
##### 1. `retrycount` Mode (Default)
196+
- **Behavior:**
197+
- The system will attempt to update data from the DTU on every cycle.
198+
- If a number of consecutive update attempts fail (as set by `MinRetriesUntilFail`), the system enters an error state:
199+
- All DBus values are set to zero.
200+
- The DBus `StatusCode` is set to 10 (error).
201+
- After waiting for `RetryAfterSeconds`, the system will attempt to reconnect and recover.
202+
- **Configuration:**
203+
- `ErrorMode=retrycount`
204+
- `MinRetriesUntilFail=3` (default)
205+
- `RetryAfterSeconds=120` (default)
206+
207+
##### 2. `timeout` Mode
208+
- **Behavior:**
209+
- The system always attempts to reconnect and refresh data every `RetryAfterSeconds`.
210+
- Zero values and error state are only set if the time since the last successful update exceeds `ErrorStateAfterSeconds`.
211+
- This means the system will keep trying to reconnect, but will only show an error after a defined timeout period has passed without success.
212+
- **Configuration:**
213+
- `ErrorMode=timeout`
214+
- `ErrorStateAfterSeconds=600` (for example, 10 minutes)
215+
- `RetryAfterSeconds=120` (default)
216+
217+
##### Example Configuration
218+
219+
```
220+
# Error handling mode: "retrycount" (default, as before) or "timeout" (after a time period)
221+
ErrorMode=timeout
222+
# For "timeout" mode:
223+
ErrorStateAfterSeconds=600
224+
# For both modes:
225+
RetryAfterSeconds=120
226+
MinRetriesUntilFail=3
227+
```
228+
229+
##### Summary Table
230+
| Mode | When are zero values set? | When is reconnect attempted? |
231+
|-------------|------------------------------------------|--------------------------------------|
232+
| retrycount | After N consecutive failures | After `RetryAfterSeconds` |
233+
| timeout | After `ErrorStateAfterSeconds` timeout | Always, every `RetryAfterSeconds` |
234+
235+
Choose the mode that best fits your reliability and error reporting needs. For most users, the default `retrycount` mode is sufficient. Use `timeout` mode if you want to avoid error states for short outages and only show errors after a longer period without successful updates.
236+
185237
### Service names
186238

187239
The following servicenames are supported:

config.example

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,19 @@ Logging=ERROR
2727
# Set this to < 0 to disable this check.
2828
MaxAgeTsLastSuccess=600
2929

30+
# Error handling mode: "retrycount" (default, as before) or "timeout" (after a time period)
31+
ErrorMode=retrycount
32+
33+
# If AhoyDTU/OpenDTU is not reachable, try to reconnect after this many seconds Default is 120.
34+
RetryAfterSeconds=120
35+
36+
# Minimum number of consecutive update failures before entering error state (StatusCode=10, zero values). Default is 3.
37+
MinRetriesUntilFail=3
38+
39+
# This configuration option is used for the "timeout" mode.
40+
# The value should be specified in seconds (e.g., 600 seconds for 10 minutes).
41+
ErrorStateAfterSeconds=600
42+
3043
# if this is not 0, then no values are actually sent via dbus to vrm/venus.
3144
DryRun=0
3245

@@ -39,6 +52,7 @@ HTTPTimeout=2.5
3952
Username =
4053
Password =
4154

55+
4256
### Only needed for OpenDTU and ahoy
4357
# Phase: Either L1, L2, L3 or 3P for 3 phase HMT series, if unsure use L1
4458
# AcPosition 0=AC input 1; 1=AC output; 2=AC input 2

constants.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,15 @@
77
DTUVARIANT_TEMPLATE = "template"
88
PRODUCTNAME = "henne49_dbus-opendtu"
99
CONNECTION = "TCP/IP (HTTP)"
10+
MODE_TIMEOUT = "timeout"
11+
MODE_RETRYCOUNT = "retrycount"
12+
13+
# Status codes for the DTU
14+
STATUSCODE_STARTUP = 0
15+
STATUSCODE_RUNNING = 7
16+
STATUSCODE_STANDBY = 8
17+
STATUSCODE_BOOTLOADING = 9
18+
STATUSCODE_ERROR = 10
1019

1120

1221
VICTRON_PATHS = {

dbus_service.py

Lines changed: 154 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,14 @@ def __init__(
7979
self.meter_data = None
8080
self.dtuvariant = None
8181

82+
# Initialize error handling properties
83+
self.error_mode = None
84+
self.retry_after_seconds = 0
85+
self.min_retries_until_fail = 0
86+
self.error_state_after_seconds = 0
87+
self.failed_update_count = 0
88+
self.reset_statuscode_on_next_success = False
89+
8290
if not istemplate:
8391
self._read_config_dtu(actual_inverter)
8492
self.numberofinverters = self.get_number_of_inverters()
@@ -118,7 +126,7 @@ def __init__(
118126
self._dbusservice.add_path("/Serial", self._get_serial(self.pvinverternumber))
119127
self._dbusservice.add_path("/UpdateIndex", 0)
120128
# set path StatusCode to 7=Running so VRM detects a working PV-Inverter
121-
self._dbusservice.add_path("/StatusCode", 7)
129+
self._dbusservice.add_path("/StatusCode", constants.STATUSCODE_RUNNING)
122130

123131
# If the Servicname is an (AC-)Inverter, add the Mode path (to show it as ON)
124132
# Also, we will set different paths and variables in the _update(self) method.
@@ -143,7 +151,7 @@ def __init__(
143151
writeable=True,
144152
onchangecallback=self._handlechangedvalue,
145153
)
146-
154+
147155
self._dbusservice.register()
148156

149157
self.polling_interval = self._get_polling_interval()
@@ -214,6 +222,7 @@ def _read_config_dtu(self, actual_inverter):
214222
self.pollinginterval = int(get_config_value(config, "ESP8266PollingIntervall", "DEFAULT", "", 10000))
215223
self.meter_data = 0
216224
self.httptimeout = get_default_config(config, "HTTPTimeout", 2.5)
225+
self._load_error_handling_config(config)
217226

218227
def _read_config_template(self, template_number):
219228
config = self._get_config()
@@ -267,6 +276,15 @@ def _read_config_template(self, template_number):
267276
self.dry_run = is_true(get_default_config(config, "DryRun", False))
268277
self.meter_data = 0
269278
self.httptimeout = get_default_config(config, "HTTPTimeout", 2.5)
279+
self._load_error_handling_config(config)
280+
281+
def _load_error_handling_config(self, config):
282+
'''Loads error handling configuration values from the provided config object.'''
283+
284+
self.error_mode = get_default_config(config, "ErrorMode", constants.MODE_RETRYCOUNT).strip()
285+
self.retry_after_seconds = int(get_default_config(config, "RetryAfterSeconds", 180))
286+
self.min_retries_until_fail = int(get_default_config(config, "MinRetriesUntilFail", 3))
287+
self.error_state_after_seconds = int(get_default_config(config, "ErrorStateAfterSeconds", 0))
270288

271289
# get the Serialnumber
272290
def _get_serial(self, pvinverternumber):
@@ -539,61 +557,111 @@ def sign_of_life(self):
539557
self.pvinverternumber, self._dbusservice["/Ac/Power"])
540558
return True
541559

560+
def _refresh_and_update(self):
561+
"""
562+
Helper method to refresh data, handle data update if up-to-date, update index, and set successful flag.
563+
"""
564+
self._refresh_data()
565+
if self.is_data_up2date():
566+
self._handle_data_update()
567+
self._update_index()
568+
return True
569+
542570
def update(self):
543571
"""
544-
Updates the data from the DTU (Data Transfer Unit) and sets the DBus values if the data is up-to-date.
545-
546-
This method performs the following steps:
547-
1. Refreshes the data from the DTU.
548-
2. Checks if the data is up-to-date.
549-
3. If in dry run mode, logs that no data is sent.
550-
4. If not in dry run mode, sets the DBus values.
551-
5. Updates the index.
552-
6. Handles various exceptions that may occur during the update process:
553-
- requests.exceptions.RequestException: Logs an HTTP error if the last update was successful.
554-
- ValueError: Logs a value error if the last update was successful.
555-
- Exception: Logs a general error if the last update was successful.
556-
7. Logs a recovery message if the update was successful after a previous failure.
557-
558-
Attributes:
559-
successful (bool): Indicates whether the update was successful.
572+
Updates inverter data from the DTU (Data Transfer Unit) and sets DBus values if the data is up-to-date.
573+
574+
Main logic:
575+
- In timeout mode: Always attempt reconnect every RetryAfterSeconds. Only set zero values after ErrorStateAfterSeconds has elapsed since last success.
576+
- In retrycount mode: After min_retries_until_fail failures, wait RetryAfterSeconds before next attempt and set zero values immediately.
577+
- Always updates the DBus update index after a refresh.
578+
- Tracks success/failure state and manages reconnect timing.
579+
580+
Exception handling:
581+
- Catches and logs HTTP, value, and general exceptions during update.
582+
- Ensures update state is finalized regardless of outcome.
583+
584+
Returns:
585+
None
560586
"""
561587
logging.debug("_update")
562588
successful = False
589+
now = time.time()
563590
try:
564-
# update data from DTU once per _update call:
565-
self._refresh_data()
566-
567-
if self.is_data_up2date():
568-
if self.dry_run:
569-
logging.info("DRY RUN. No data is sent!!")
570-
else:
571-
self.set_dbus_values()
572-
self._update_index()
573-
successful = True
591+
if self.error_mode == constants.MODE_TIMEOUT and self.error_state_after_seconds > 0:
592+
# Set zero values only after ErrorStateAfterSeconds has elapsed since last success
593+
if (not self.last_update_successful and (now - self._last_update) >= self.error_state_after_seconds):
594+
self._handle_reconnect_wait()
595+
# Always allow a reconnect attempt every RetryAfterSeconds
596+
if (now - self._last_update) >= self.retry_after_seconds:
597+
successful = self._refresh_and_update()
598+
# In normal operation (no error), always call _refresh_data on every update
599+
if self.last_update_successful:
600+
successful = self._refresh_and_update()
601+
elif self.error_mode == constants.MODE_RETRYCOUNT:
602+
# Classic retry-count-based error handling
603+
if self.failed_update_count >= self.min_retries_until_fail:
604+
self._handle_reconnect_wait()
605+
# Determine if we should refresh data based on current state and timing
606+
is_last_update_successful = self.last_update_successful
607+
time_since_last_update = now - self._last_update
608+
is_retry_interval_elapsed = time_since_last_update >= self.retry_after_seconds
609+
is_below_min_retries = self.failed_update_count < self.min_retries_until_fail
610+
611+
should_refresh_data = (
612+
is_last_update_successful or
613+
is_retry_interval_elapsed or
614+
is_below_min_retries
615+
)
616+
617+
if should_refresh_data:
618+
successful = self._refresh_and_update()
574619
except requests.exceptions.RequestException as exception:
575-
if self.last_update_successful:
576-
logging.warning(f"HTTP Error at _update for inverter "
577-
f"{self.pvinverternumber} ({self._get_name()}): {str(exception)}")
620+
logging.warning(f"HTTP Error at _update for inverter "
621+
f"{self.pvinverternumber} ({self._get_name()}): {str(exception)}")
578622
except ValueError as error:
579-
if self.last_update_successful:
580-
logging.warning(f"Error at _update for inverter "
581-
f"{self.pvinverternumber} ({self._get_name()}): {str(error)}")
623+
logging.warning(f"Error at _update for inverter "
624+
f"{self.pvinverternumber} ({self._get_name()}): {str(error)}")
582625
except Exception as error: # pylint: disable=broad-except
583-
if self.last_update_successful:
584-
logging.warning(f"Error at _update for inverter "
585-
f"{self.pvinverternumber} ({self._get_name()})", exc_info=error)
626+
logging.warning(f"Error at _update for inverter "
627+
f"{self.pvinverternumber} ({self._get_name()})", exc_info=error)
586628
finally:
587-
if successful:
588-
if not self.last_update_successful:
589-
logging.warning(
590-
f"Recovered inverter {self.pvinverternumber} ({self._get_name()}): "
591-
f"Successfully fetched data now: "
592-
f"{'NOT (yet?)' if not self.is_data_up2date() else 'Is'} up-to-date"
593-
)
594-
self.last_update_successful = True
595-
else:
596-
self.last_update_successful = False
629+
self._finalize_update(successful)
630+
631+
def _handle_reconnect_wait(self):
632+
if not self.reset_statuscode_on_next_success:
633+
self.set_dbus_values_to_zero()
634+
self.reset_statuscode_on_next_success = True
635+
636+
def _should_refresh_data(self, now):
637+
return (
638+
self.last_update_successful or
639+
(now - self._last_update) >= self.retry_after_seconds or
640+
self.failed_update_count < self.min_retries_until_fail
641+
)
642+
643+
def _handle_data_update(self):
644+
if self.dry_run:
645+
logging.info("DRY RUN. No data is sent!!")
646+
else:
647+
self.set_dbus_values()
648+
649+
def _finalize_update(self, successful):
650+
if successful:
651+
if self.reset_statuscode_on_next_success:
652+
self._dbusservice["/StatusCode"] = constants.STATUSCODE_RUNNING
653+
if not self.last_update_successful:
654+
logging.warning(
655+
f"Recovered inverter {self.pvinverternumber} ({self._get_name()}): "
656+
f"Successfully fetched data now: "
657+
f"{'NOT (yet?)' if not self.is_data_up2date() else 'Is'} up-to-date"
658+
)
659+
self.last_update_successful = True
660+
self.failed_update_count = 0
661+
self.reset_statuscode_on_next_success = False
662+
else:
663+
self.last_update_successful = False
664+
self.failed_update_count += 1
597665

598666
def _update_index(self):
599667
if self.dry_run:
@@ -657,12 +725,50 @@ def get_values_for_inverter(self):
657725

658726
return (power, pvyield, current, voltage, dc_voltage)
659727

728+
def set_dbus_values_to_zero(self):
729+
'''zero power data and cleat connection status and set dbus values'''
730+
731+
if self._servicename == "com.victronenergy.inverter":
732+
# see https://github.com/victronenergy/venus/wiki/dbus#inverter
733+
self._dbusservice["/Ac/Out/L1/V"] = 0
734+
self._dbusservice["/Ac/Out/L1/I"] = 0
735+
self._dbusservice["/Ac/Out/L1/P"] = 0
736+
self._dbusservice["/Dc/0/Voltage"] = 0
737+
self._dbusservice["/Ac/Power"] = 0
738+
739+
self._dbusservice["/Ac/L1/Current"] = 0
740+
self._dbusservice["/Ac/L1/Power"] = 0
741+
self._dbusservice["/Ac/L1/Voltage"] = 0
742+
else:
743+
# 0=Startup 0; 1=Startup 1; 2=Startup 2; 3=Startup 3; 4=Startup 4; 5=Startup 5; 6=Startup 6; 7=Running; 8=Standby; 9=Boot loading; 10=Error
744+
self._dbusservice["/StatusCode"] = constants.STATUSCODE_ERROR
745+
746+
# three-phase inverter: split total power equally over all three phases
747+
if "3P" == self.pvinverterphase:
748+
749+
self._dbusservice["/Ac/L1/Voltage"] = 0
750+
self._dbusservice["/Ac/L1/Current"] = 0
751+
self._dbusservice["/Ac/L1/Power"] = 0
752+
self._dbusservice["/Ac/L2/Voltage"] = 0
753+
self._dbusservice["/Ac/L2/Current"] = 0
754+
self._dbusservice["/Ac/L2/Power"] = 0
755+
self._dbusservice["/Ac/L3/Voltage"] = 0
756+
self._dbusservice["/Ac/L3/Current"] = 0
757+
self._dbusservice["/Ac/L3/Power"] = 0
758+
self._dbusservice["/Ac/Power"] = 0
759+
760+
else:
761+
pre = "/Ac/" + self.pvinverterphase
762+
self._dbusservice[pre + "/Voltage"] = 0
763+
self._dbusservice[pre + "/Current"] = 0
764+
self._dbusservice[pre + "/Power"] = 0
765+
self._dbusservice["/Ac/Power"] = 0
766+
660767
def set_dbus_values(self):
661768
'''read data and set dbus values'''
662769
(power, pvyield, current, voltage, dc_voltage) = self.get_values_for_inverter()
663770
state = self.get_ac_inverter_state(current)
664771

665-
# This will be refactored later in classes
666772
if self._servicename == "com.victronenergy.inverter":
667773
# see https://github.com/victronenergy/venus/wiki/dbus#inverter
668774
self._dbusservice["/Ac/Out/L1/V"] = voltage

0 commit comments

Comments
 (0)