refactor(agnocastlib): Refactor epoll processing and clean up hardcoded event-specific logic#1245
refactor(agnocastlib): Refactor epoll processing and clean up hardcoded event-specific logic#1245
Conversation
f8c0de0 to
f37d50c
Compare
Previously, when events managed by epoll changed, we notified each Executor to call `prepare_epoll_impl()` by setting the global atomic variable `need_epoll_updates` to true. However, this implementation will become problematic for future refactoring efforts aimed at extracting event-specific processing from `agnocast_epoll.hpp` and `.cpp`. This commit removes this global variable and introduces an alternative notification mechanism. While the current implementation only supports broadcasting notifications to all Executors—which leaves some performance challenges—it establishes a 1-to-1 tracking structure for each Executor. This lays the groundwork for implementing targeted 1-to-1 notifications in the future. Relates to: #969 Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
Previously, all event-specific logic was hardcoded in `agnocast_epoll.cpp` and `.hpp`. This caused dependency issues and made it difficult to add new event types. This commit refactors the code by categorizing events and moving their implementations into separate source files. Key changes include: - Change epoll_data format from u32 to u64 to hold both event kind and local identifier. - Introduce `EpollManager` to manage and dispatch events from the Epoll class. - Introduce `EpollEventSource` as an abstract base class for specific event handlers. - Replace raw file descriptor `epoll_fd_` usage with the encapsulated `Epoll` class. - Move implementation details from headers to `.cpp` files. Relates to: #969 Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
dd979fe to
0194636
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors agnocastlib’s epoll integration to remove global “polling for updates” (need_epoll_updates), introduce per-executor update tracking, and decouple event-type-specific epoll handling into dedicated event-source classes.
Changes:
- Replaced the global epoll-update flag with an
EpollUpdateDispatcher/EpollUpdateTrackermechanism so each executor can independently decide when to re-prepare epoll. - Introduced
EpollManager+EpollEventSourceabstraction, plusEpollwrapper that packs(event_type, local_id)intoepoll_event.data.u64. - Moved subscription/timer epoll prepare/handle logic out of
agnocast_epoll.cppintoSubscriptionEventSourceandTimerEventSource.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/agnocastlib/test/unit/test_mocked_agnocast.cpp | Updates ID overflow tests to use new max-ID constants. |
| src/agnocastlib/src/node/agnocast_only_single_threaded_executor.cpp | Switches epoll update triggering to epoll_update_tracker_ + epoll_manager_. |
| src/agnocastlib/src/node/agnocast_only_multi_threaded_executor.cpp | Same as above for the MT executor variant. |
| src/agnocastlib/src/node/agnocast_only_executor.cpp | Replaces raw epoll fd usage with EpollManager and registers shutdown via manager. |
| src/agnocastlib/src/agnocast_timer_info.cpp | Replaces global update flag with dispatcher notification; adds TimerEventSource. |
| src/agnocastlib/src/agnocast_single_threaded_executor.cpp | Uses per-executor update tracker instead of global flag. |
| src/agnocastlib/src/agnocast_multi_threaded_executor.cpp | Uses per-executor update tracker instead of global flag. |
| src/agnocastlib/src/agnocast_executor.cpp | Introduces EpollManager and per-executor tracker; delegates prepare/wait to manager. |
| src/agnocastlib/src/agnocast_epoll.cpp | Replaces monolithic event handling with EpollManager dispatch over event sources. |
| src/agnocastlib/src/agnocast_epoll_update_dispatcher.cpp | Adds dispatcher/tracker implementation. |
| src/agnocastlib/src/agnocast_epoll_event.cpp | Adds Epoll wrapper implementation for add/remove/wait with packed event data. |
| src/agnocastlib/src/agnocast_callback_info.cpp | Adds SubscriptionEventSource prepare/handle and new callback-id max check. |
| src/agnocastlib/include/agnocast/node/agnocast_only_executor.hpp | Stores EpollManager + EpollUpdateTracker instead of raw epoll fd. |
| src/agnocastlib/include/agnocast/agnocast_timer_info.hpp | Adds MAX_TIMER_ID and declares TimerEventSource. |
| src/agnocastlib/include/agnocast/agnocast_executor.hpp | Stores EpollManager + EpollUpdateTracker instead of raw epoll fd. |
| src/agnocastlib/include/agnocast/agnocast_epoll.hpp | Defines event-source abstraction and EpollManager. |
| src/agnocastlib/include/agnocast/agnocast_epoll_update_dispatcher.hpp | Declares dispatcher/tracker API. |
| src/agnocastlib/include/agnocast/agnocast_epoll_event.hpp | Declares packed epoll event encoding + Epoll wrapper API. |
| src/agnocastlib/include/agnocast/agnocast_callback_info.hpp | Adds MAX_CALLBACK_INFO_ID, dispatcher notification, and declares SubscriptionEventSource. |
| src/agnocastlib/CMakeLists.txt | Adds new source files to the shared library build. |
Comments suppressed due to low confidence (1)
src/agnocastlib/include/agnocast/agnocast_epoll.hpp:10
agnocast_epoll.hppusesstd::function,std::array, andstd::unique_ptrbut does not include the corresponding standard headers. This makes the header non-self-contained and can cause compilation failures depending on include order. Add the missing includes (e.g.,<functional>,<array>,<memory>) in this header.
#include "agnocast/agnocast_epoll_event.hpp"
#include <rclcpp/callback_group.hpp>
#include <atomic>
#include <mutex>
#include <shared_mutex>
#include <vector>
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/agnocastlib/include/agnocast/agnocast_epoll_update_dispatcher.hpp
Outdated
Show resolved
Hide resolved
Coverage Report (jazzy) |
Coverage Report (humble) |
Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while (spinning_.load()) { | ||
| if (need_epoll_updates.load()) { | ||
| if (epoll_update_tracker_.need_update()) { | ||
| add_callback_groups_from_nodes_associated_to_executor(); | ||
| agnocast::prepare_epoll_impl( | ||
| epoll_fd_, my_pid_, ready_agnocast_executables_mutex_, ready_agnocast_executables_, | ||
| [this](const rclcpp::CallbackGroup::SharedPtr & group) { | ||
| return is_callback_group_associated(group); | ||
| }); | ||
| epoll_manager_->prepare_epoll([this](const rclcpp::CallbackGroup::SharedPtr & group) { | ||
| return is_callback_group_associated(group); | ||
| }); | ||
| } |
There was a problem hiding this comment.
epoll_update_tracker_.need_update() is edge-triggered (it clears the flag via exchange). If prepare_epoll() skips registrations because is_callback_group_associated(group) is false (e.g., subscription/timer created before the node/callback group is added/associated), the corresponding need_epoll_update flags stay true but this loop won’t call prepare_epoll() again unless some unrelated notify_all() happens. This can leave fds permanently unregistered in this executor’s epoll set. Consider re-notifying (or keeping this executor’s tracker flagged) while there are pending need_epoll_update entries, and/or triggering an epoll-update notification from add_node/callback-group association changes.
| while (spinning_.load()) { | ||
| if (need_epoll_updates.load()) { | ||
| if (epoll_update_tracker_.need_update()) { | ||
| add_callback_groups_from_nodes_associated_to_executor(); | ||
| agnocast::prepare_epoll_impl( | ||
| epoll_fd_, my_pid_, ready_agnocast_executables_mutex_, ready_agnocast_executables_, | ||
| [this](const rclcpp::CallbackGroup::SharedPtr & group) { | ||
| return is_callback_group_associated(group); | ||
| }); | ||
| epoll_manager_->prepare_epoll([this](const rclcpp::CallbackGroup::SharedPtr & group) { | ||
| return is_callback_group_associated(group); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Same edge-trigger issue as the single-threaded version: if prepare_epoll() skips entries because callback groups are not yet associated, need_epoll_update remains true but epoll_update_tracker_ is cleared, so later association (e.g., add_node() after subscription creation) won’t cause another prepare_epoll() call. This can strand subscriptions/timers unregistered. Consider keeping the tracker flagged until pending updates are resolved, or notify on node/callback-group association changes.
| TEST(AllocateCallbackInfoIdTest, throws_when_id_has_reserved_epoll_flag_bits) | ||
| { | ||
| // Arrange: Set next_callback_info_id so the next allocation hits SHUTDOWN_EVENT_FLAG (bit 29). | ||
| const uint32_t original = next_callback_info_id.load(); | ||
| next_callback_info_id.store(SHUTDOWN_EVENT_FLAG); | ||
| next_callback_info_id.store(MAX_CALLBACK_INFO_ID); | ||
|
|
||
| // Act & Assert | ||
| EXPECT_THROW(allocate_callback_info_id(), std::runtime_error); | ||
|
|
||
| // Cleanup | ||
| next_callback_info_id.store(original); | ||
| } | ||
|
|
||
| TEST(AllocateCallbackInfoIdTest, succeeds_just_below_reserved_range) | ||
| { | ||
| // Arrange: Set next_callback_info_id to the maximum valid value (one below SHUTDOWN_EVENT_FLAG). | ||
| const uint32_t original = next_callback_info_id.load(); | ||
| next_callback_info_id.store(SHUTDOWN_EVENT_FLAG - 1); | ||
| next_callback_info_id.store(MAX_CALLBACK_INFO_ID - 1); | ||
|
|
||
| // Act & Assert | ||
| EXPECT_EQ(allocate_callback_info_id(), SHUTDOWN_EVENT_FLAG - 1); | ||
| EXPECT_EQ(allocate_callback_info_id(), MAX_CALLBACK_INFO_ID - 1); | ||
|
|
||
| // Cleanup | ||
| next_callback_info_id.store(original); | ||
| } | ||
|
|
||
| TEST(AllocateTimerIdTest, throws_when_id_has_reserved_epoll_flag_bits) | ||
| { | ||
| // Arrange: Set next_timer_id so the returned ID includes a reserved epoll flag bit. | ||
| // This covers the bug where timer IDs OR'd with CLOCK_EVENT_FLAG/TIMER_EVENT_FLAG | ||
| // could collide with reserved flags. | ||
| const uint32_t original = next_timer_id.load(); |
There was a problem hiding this comment.
These test comments still refer to the old epoll-flag-bit scheme (SHUTDOWN_EVENT_FLAG, CLOCK_EVENT_FLAG, TIMER_EVENT_FLAG). After the refactor to MAX_*_ID limits / packed epoll data, the wording is misleading. Update the comments to describe the new overflow boundary being tested (MAX_*_ID) to keep the tests self-explanatory.
| { | ||
|
|
||
| constexpr uint32_t MAX_CALLBACK_INFO_ID = 0x10000000; | ||
|
|
||
| struct AgnocastExecutable; |
There was a problem hiding this comment.
MAX_CALLBACK_INFO_ID is now hardcoded to 0x10000000, which is a stricter limit than the previous reserved-bit boundary and no longer obviously required since epoll data is now packed into a u64 (type in upper 32 bits, id in lower 32). If there’s no remaining bit-level constraint, consider raising this to the previous limit (or to UINT32_MAX-based overflow detection) and/or document why 0x10000000 is the intended cap to avoid an unintended capacity regression.
|
|
||
| constexpr int64_t NANOSECONDS_PER_SECOND = 1000000000; | ||
| constexpr uint32_t MAX_TIMER_ID = 0x10000000; | ||
|
|
||
| struct AgnocastExecutable; | ||
|
|
There was a problem hiding this comment.
MAX_TIMER_ID is hardcoded to 0x10000000. With the new u64 epoll data packing, timer IDs no longer need to avoid top-bit flag collisions. Unless there’s a separate constraint, this lowers the maximum number of timers compared to the prior reserved-bit scheme. Consider using a UINT32_MAX-style overflow check (or documenting the rationale for 0x10000000) to avoid an accidental capacity regression.
| if (!results.empty()) { | ||
| return -1; | ||
| } | ||
|
|
There was a problem hiding this comment.
Epoll::wait() returns -1 if the caller passes a non-empty results vector, but it doesn’t set errno in that path. Since callers (e.g., EpollManager) interpret -1 using errno, this can lead to misleading error handling. Consider always clearing results (or explicitly setting errno = EINVAL / documenting the precondition) rather than treating non-empty input as an error silently.
| if (!results.empty()) { | |
| return -1; | |
| } |
| for (uint32_t type = 0; type < static_cast<size_t>(EpollEventType::NrEventType); type++) { | ||
| if (!sources_[type]) { | ||
| RCLCPP_ERROR(logger, "invalid epoll event source array: sources_[%d] is nullptr", type); | ||
| exit(EXIT_FAILURE); | ||
| } |
There was a problem hiding this comment.
RCLCPP_ERROR(logger, "... sources_[%d] ...", type) uses %d but type is uint32_t. With printf-style formatting this is undefined behavior / can trigger -Wformat warnings. Use %u (or cast type to int if you really want signed).
| if (EpollEventType::NrEventType <= event_type) { | ||
| RCLCPP_ERROR( | ||
| logger, "Agnocast internal implementation error: invalid epoll event type %d", | ||
| static_cast<uint32_t>(event_type)); | ||
| close(agnocast_fd); | ||
| exit(EXIT_FAILURE); |
There was a problem hiding this comment.
RCLCPP_ERROR(logger, "... invalid epoll event type %d", static_cast<uint32_t>(event_type)) uses %d but passes an unsigned value. This is a format/argument type mismatch (potential UB, -Wformat). Prefer %u (or cast to int and keep %d).
Description
This PR refactors
agnocast_epoll.cppand.hppto eliminate tight dependencies, improve code extensibility, and optimize how Executors handle epoll updates.Key Changes & Motivations
Replaced Global Polling with Individual Notifications
need_epoll_updatesglobal variable. Replaced the polling approach with a new mechanism that notifies each Executor individually when an update is needed.Decoupled Event Handling
agnocast_epoll.cppand.hppand moved them into their respective, event-specific files.Future-proofing
Related links
close #969
How was this PR tested?
bash scripts/test/e2e_test_1to1.bash(required)bash scripts/test/e2e_test_2to2.bash(required)bash scripts/test/run_requires_kernel_module_tests.bash(required)Notes for reviewers
Version Update Label (Required)
Please add exactly one of the following labels to this PR:
need-major-update: User API breaking changesneed-minor-update: Internal API breaking changes (heaphook/kmod/agnocastlib compatibility)need-patch-update: Bug fixes and other changesImportant notes:
need-major-updateorneed-minor-update, please include this in the PR title as well.fix(foo)[needs major version update]: barorfeat(baz)[needs minor version update]: quxrun-build-testlabel. The PR can only be merged after the build tests pass.See CONTRIBUTING.md for detailed versioning rules.