viewer: eliminate UI freeze and slider snap-back on option change#15029
viewer: eliminate UI freeze and slider snap-back on option change#15029remibettan wants to merge 4 commits intorealsenseai:developmentfrom
Conversation
… blocks on FW round-trip Each option_model::set_option used to call endpoint->set_option synchronously on the UI thread. That call is ~200 ms on UVC and serializes on the per-device USB lock against options_watcher's 1 s poll cycle, so any slider drag or checkbox toggle visibly froze the viewer. Introduces option_async_setter, a per-option worker thread that coalesces posted values (only the latest between two FW writes wins, no stale-value queue), and routes set_option / slider_selected through it. Adds last_user_set_stopwatch on subdevice_model: option_model::set_option_async resets it on every dispatch, and subdevice_model::update() returns early for 500 ms after the last user write so its per-frame get_option_value() polling and the JSON config-save block don't fight the in-flight worker write on the same USB bus. The cached value still refreshes during the gate via options_watcher -> on_options_changed -> update_value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When _options_invalidated is set, subdevice_model::update() previously called save_processing_block_to_config_file 5x plus once per post-processing model on the UI thread. Each call iterates the block's options and runs config_file::set per option, and every set rewrites the entire JSON config to disk synchronously (see common/rs-config.cpp). For a typical depth+color setup that's tens of full file rewrites per change — ~400 ms of disk-IO blocking the render loop. Adds an anonymous-namespace config_save_worker singleton with one persistent thread. update() now captures the processing-block shared_ptrs by value and posts the save lambda to the worker; the UI thread returns immediately. Posts coalesce by opaque void* key (subdevice identity), so rapid invalidation events collapse into at most one save pass per wake-up cycle. The lambda never dereferences `this` — UAF-safe even if the subdevice is destroyed mid-save. ~subdevice_model still calls cancel(this) to drop any not-yet-started job for tidy shutdown. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… in flight After dispatching an option write asynchronously, the cached option_model::value isn't refreshed until either the options_watcher callback fires (~1 s later) or the post-gate update_all_fields poll reads it back. In the meantime, draw_slider re-reads value_as_float() on the next render frame and the slider visually snaps back to the old value, then jumps to the new value once the echo finally arrives — a regression introduced by making the write async. Adds an optimistic local cache (_optimistic_value + stopwatch) seeded by set_option_async. value_as_float() returns the optimistic value while it's fresh (capped at 2 s in case the FW rejects/clamps and never echoes the requested value), so the slider stays pinned to the user's choice from the moment they release. The flag is cleared as soon as either authoritative path refreshes value: update_value (options_watcher callback) or update_all_fields (per-frame poll). After clearing, value_as_float() returns whatever FW actually applied — so a clamped/rejected value shows real applied state instead of the user's request. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
a39c375 to
bfc3392
Compare
| _async_setter = std::make_shared< option_async_setter >( endpoint, opt ); | ||
| _async_setter->post( value ); | ||
|
|
||
| // Mask the stale cached `value` with the user's choice until the FW echo |
| // Capture by value so the worker stays UAF-safe even if `this` dies mid-save. | ||
| // shared_ptrs keep the underlying processing blocks alive until the job runs. | ||
| auto colorizer = depth_colorizer; | ||
| auto yuy2 = yuy2rgb; | ||
| auto m420 = m420_to_rgb; | ||
| auto nv12 = nv12_to_rgb; | ||
| auto y411_ptr = y411; | ||
| auto pp = post_processing; | ||
| config_save_worker::instance().post( this, | ||
| [ colorizer, yuy2, m420, nv12, y411_ptr, pp ] | ||
| { | ||
| save_processing_block_to_config_file( "colorizer", colorizer ); | ||
| save_processing_block_to_config_file( "yuy2rgb", yuy2 ); | ||
| save_processing_block_to_config_file( "m420_to_rgb", m420 ); | ||
| save_processing_block_to_config_file( "nv12_to_rgb", nv12 ); | ||
| save_processing_block_to_config_file( "y411", y411_ptr ); | ||
| for( auto & pbm : pp ) pbm->save_to_config_file(); | ||
| } ); |
| // While the optimistic cache is fresh, prefer the user's just-committed value. | ||
| // Cap at 2s in case the FW echo never matches (rejected/clamped value). | ||
| if( _has_optimistic && _optimistic_stopwatch.get_elapsed_ms() < 2000 ) | ||
| return _optimistic_value; | ||
|
|
| bool option_model::set_option(rs2_option opt, | ||
| float req_value, | ||
| std::string& error_message, | ||
| std::chrono::steady_clock::duration ignore_period) | ||
| std::string& /*error_message*/, | ||
| std::chrono::steady_clock::duration /*ignore_period*/) | ||
| { | ||
| // Only set the value if `ignore_period` time past since last set_option() call for this option | ||
| if (last_set_stopwatch.get_elapsed() < ignore_period) | ||
| return false; | ||
|
|
||
| try | ||
| { | ||
| last_set_stopwatch.reset(); | ||
| endpoint->set_option(opt, req_value); | ||
| } | ||
| catch (const error& e) | ||
| { | ||
| error_message = error_to_string(e); | ||
| } | ||
|
|
||
| // Only update the cached value once set_option is done! That way, if it doesn't change | ||
| // anything... | ||
| try | ||
| { | ||
| value = endpoint->get_option_value(opt); | ||
| } | ||
| catch (...) | ||
| { | ||
| } | ||
|
|
||
| // All UI-thread option writes go through the async worker so the render loop is never | ||
| // blocked on the FW round-trip (set_option is ~200 ms on UVC; readback is another ~200 ms; | ||
| // and they serialize on the per-device USB lock against options_watcher's 1 s poll cycle). | ||
| // Coalescing in option_async_setter replaces the previous ignore_period rate-limit, and | ||
| // the cached `value` is refreshed by options_watcher -> on_options_changed -> update_value. | ||
| last_set_stopwatch.reset(); | ||
| set_option_async( opt, req_value ); | ||
| return true; | ||
| } |
Nir-Az
left a comment
There was a problem hiding this comment.
🤖✨ Generated by rs-agentic bot ✨🤖
Reviewed PR #15029 — viewer: eliminate UI freeze and slider snap-back on option change.
The three-commit approach is well-reasoned and tackles measurable pain points. Inline comments below cover the key observations. 🔍
| option_async_setter( option_async_setter const & ) = delete; | ||
| option_async_setter & operator=( option_async_setter const & ) = delete; | ||
|
|
||
| void post( float value ); |
There was a problem hiding this comment.
🤖✨ Generated by rs-agentic bot ✨🤖
is_all_integers(), is_enum(), is_checkbox() accidentally made public
In the option_model class, the refactor moved private: two lines down, unintentionally promoting these three predicate helpers out of the private section. They are implementation details of the draw/slider rendering and should not be callable by external code.
Suggested fix — restore them under private::
private:
bool is_all_integers() const;
bool is_enum() const;
bool is_checkbox() const;
bool draw_checkbox(...);🔧🐛
| { | ||
| _endpoint->set_option( _opt, value ); | ||
| } | ||
| catch( ... ) |
There was a problem hiding this comment.
🤖✨ Generated by rs-agentic bot ✨🤖
The catch(...) block suppresses ALL errors from _endpoint->set_option(), including rs2::error messages that carry FW rejection/clamping details. While the rationale is sound (UI thread must not block), with zero logging it becomes very hard to diagnose why a slider settles at an unexpected value.
Consider at minimum a LOG_WARNING:
catch( const rs2::error & e )
{
LOG_WARNING( "Async set_option opt=" << _opt << " value=" << value << " failed: " << e.what() );
}
catch( ... ) {}The update_all_fields readback will still correct the UI; this just adds diagnosability.
🔍🛠️
| // Mask the stale cached `value` with the user's choice until the FW echo | ||
| // refreshes it, so draw_slider doesn't snap back to the old value next frame. | ||
| _optimistic_value = value; | ||
| _has_optimistic = true; |
There was a problem hiding this comment.
🤖✨ Generated by rs-agentic bot ✨🤖
🔍 Dead parameters still in the public set_option signature
error_message and ignore_period are now permanently unused (both silenced with /*...*/), yet they remain in the public signature. Call sites passing an error_message string will silently receive no feedback on FW rejection — a silent behavior change that is easy to miss in code review.
Options:
- Remove both params and update call sites (cleanest).
- Keep for ABI stability but update the doc-comment to clearly state both are no-ops.
- Route async errors back via a
std::optional<std::string> last_async_errorfield.
🛠️📋
| static config_save_worker w; | ||
| return w; | ||
| } | ||
|
|
There was a problem hiding this comment.
🤖✨ Generated by rs-agentic bot ✨🤖
🔒 void* key — narrow ABA/reuse hazard after object destruction
Using this as a raw-pointer key is safe in the common case and cancel(this) in ~subdevice_model() prevents most UAF scenarios. However there is a narrow theoretical hazard:
- Worker dequeues lambda for key
K(the old object), begins executing. ~subdevice_model()callscancel(K)— key is already gone, no-op.- New
subdevice_modelis allocated at the same addressKand callspost(K, newJob). - The OLD lambda finishes and writes stale config data, potentially overwriting what the new object saved.
The shared_ptr captures prevent crashes, but step 4 is a logical-correctness hazard. Consider adding a generation counter to distinguish old from new instances: std::pair<void*, uint64_t> as the map key.
🛡️
| // async option-write worker and with options_watcher's 1 s poll cycle, so leaving | ||
| // it in place reintroduces the UI freeze the async dispatch is meant to fix. | ||
| // Both jobs are coalesced into a single pass once the user stops interacting. | ||
| // `value` stays fresh during the gate via options_watcher -> on_options_changed. |
There was a problem hiding this comment.
🤖✨ Generated by rs-agentic bot ✨🤖
⏸️ Early-return gate blankets ALL of update(), not just the hot paths
The 500 ms gate currently causes update() to return before ANY work runs — including next_option cursor advancement (option-discovery), and any future logic added to update(). If a user changes an option during initial option-discovery, that pass stalls for 500 ms on every interaction.
Consider scoping the guard more narrowly:
void subdevice_model::update(...)
{
bool user_writing = last_user_set_stopwatch.get_elapsed_ms() < 500;
if (!user_writing && _options_invalidated) { /* config save */ }
if (!user_writing && next_option < supported_options.size()) { /* option poll */ }
// other update logic continues unaffected...
}This keeps the USB-contention gate targeted and doesn't throttle unrelated update() logic.
🎯🔧
|
retrigger |
Nir-Az
left a comment
There was a problem hiding this comment.
rs-agentic bot — Auto-generated Review
Summary
This PR addresses a real and measurable UX problem (UI freeze + slider snap-back on option changes) with three layered fixes: async option writes, async config saves, and an optimistic slider cache. The approach is sound, the commit separation is clean, and the comments are thorough. That said, a few thread-safety and API-contract issues need attention before merge.
🔴 Potential Bugs
- Data race on
_has_optimistic/_optimistic_value— written by the UI thread inset_option_asyncand cleared by the options_watcher background thread inupdate_value. No mutex orstd::atomicprotection. Undefined behaviour in C++11 and later. See inline comment. set_option()API contract silently broken —ignore_periodis now always ignored (always returnstrue), anderror_messageis discarded. Any existing caller that passes a non-zero ignore period or inspects the error string will get broken behaviour without a compile-time warning. See inline comment.- 500 ms gate blocks the entire
update()function — the early-return at the top ofsubdevice_model::update()skips all processing (not just the USB polling and config save). If other work is done later in that function it will also be suppressed for 500 ms after every option write. See inline comment. _async_setterre-use doesn't validate_optmatch — the setter is created once for the firstoptpassed toset_option_async. If somehow the method is called later with a differentopt, the wrong option gets the write silently. See inline comment.
🟡 Code Enhancements
- Error swallowing with zero logging — the
catch(...)inoption_async_setter::run()silently drops all firmware errors with no diagnostic trace. At minimum, log the error at debug/trace level viaLOG_DEBUGorLOG_WARNINGso developers can diagnose FW rejections. - One thread per option model —
option_async_setterspawns a dedicatedstd::threadperoption_model(one per sensor option), created lazily. With a sensor exposing 30+ options and multiple sensors, this can add up to many live threads. Consider a shared per-sensor (or per-device) single-threaded executor. - New fields exposed as public —
_async_setter,_optimistic_value,_has_optimistic,_optimistic_stopwatchare implementation details but are declaredpublicin the class. Consider making themprivate(following the naming convention you already used with the leading_).
🧪 Tests
- No automated tests added — the PR description lists manual test steps only. Consider adding unit tests (or extending existing option-model tests) to cover: async dispatch coalescing, optimistic cache lifecycle (set → echo → clear), cache timeout fallback (no FW echo), and error-path behaviour.
| // options_watcher -> update_value or the post-gate update_all_fields poll. | ||
| // Cleared when either authoritative path refreshes `value`, or after the timeout | ||
| // below (in case FW rejects/clamps and never echoes the requested value). | ||
| float _optimistic_value = 0.f; |
There was a problem hiding this comment.
🔴 [BUG — Data Race] _has_optimistic / _optimistic_value are not thread-safe (auto-generated by rs-agentic bot)
_has_optimistic and _optimistic_value are written by the UI thread inside set_option_async (_has_optimistic = true; _optimistic_value = value;) but are cleared by the options_watcher background thread through update_value (_has_optimistic = false). Concurrent unsynchronised access to a non-atomic bool is a data race, which is undefined behaviour in C++11 and later (can manifest as torn reads, stale cache lines, or compiler-optimised reads that never see the update).
Suggested fix: wrap the three optimistic fields and their reads/writes in the same mutex already used for nearby state, or declare _has_optimistic as std::atomic<bool> and _optimistic_value as std::atomic<float>. value_as_float() accesses both under no lock, so it also needs guarding.
| { | ||
| _endpoint->set_option( _opt, value ); | ||
| } | ||
| catch( ... ) |
There was a problem hiding this comment.
🟡 [Enhancement — Observability] Silent error swallowing (auto-generated by rs-agentic bot)
All firmware errors are caught and silently discarded with no diagnostic output. This makes it very hard to distinguish "FW rejected/clamped the write" from "set succeeded but FW echo is slow". At minimum, please add a LOG_DEBUG or LOG_WARNING call here so developers can see firmware rejections in trace logs without attaching a debugger:
catch( const rs2::error & e )
{
LOG_DEBUG( "async set_option " << _opt << " failed: " << e.what() );
}
catch( ... )
{
LOG_DEBUG( "async set_option " << _opt << " failed with unknown exception" );
}| std::string& error_message, | ||
| std::chrono::steady_clock::duration ignore_period) | ||
| std::string& /*error_message*/, | ||
| std::chrono::steady_clock::duration /*ignore_period*/) |
There was a problem hiding this comment.
🔴 [BUG — API Contract] set_option() silently breaks callers that rely on ignore_period or error_message (auto-generated by rs-agentic bot)
The old implementation:
- Returned
falsewhen withinignore_period(rate-limiter). - Populated
error_messageon FW error. - Updated
valuesynchronously so the caller saw the FW-applied value immediately.
All three behaviours are now gone and the signature change is not reflected in the public API (option-model.h): the parameters are still declared with meaningful types. Callers outside the slider path (e.g. checkbox toggles, programmatic option changes, or external consumers of option_model) that:
- Pass
ignore_period > 0sexpecting rate-limiting — will now fire every time. - Check
error_messageafter the call — will always see an empty string even on firmware error. - Inspect the return value for failure — will always get
true.
This is a silent breaking behavioural change. Please either:
- Update the header to document the new semantics clearly, or
- Provide a separate synchronous overload for non-UI callers that need error feedback.
| } | ||
|
|
||
| void option_model::set_option_async( rs2_option opt, float value ) | ||
| { |
There was a problem hiding this comment.
🟡 [Enhancement — Correctness Risk] _async_setter reuse doesn't validate option identity (auto-generated by rs-agentic bot)
if( ! _async_setter )
_async_setter = std::make_shared< option_async_setter >( endpoint, opt );
_async_setter->post( value );_async_setter is created once with a specific rs2_option baked in. If set_option_async is ever called with a different opt (e.g., if two options share an option_model instance, or if the model is reused), the setter will write to the wrong option with no error.
Suggested defensive check:
if( ! _async_setter || _async_setter->opt() != opt )
_async_setter = std::make_shared< option_async_setter >( endpoint, opt );(Requires exposing _opt via a getter on option_async_setter.)
|
|
||
| value = updated_value; | ||
| _has_optimistic = false; | ||
| } |
There was a problem hiding this comment.
🟡 [Enhancement — Race Window] Optimistic cache cleared without holding the slider-hold guard (auto-generated by rs-agentic bot)
update_value has two paths:
- Early-return if
last_slider_hold_stopwatch < 1000 ms(correct — prevents updates during active drag). - Otherwise:
value = updated_value; _has_optimistic = false;
Path 2 clears _has_optimistic from the options_watcher thread with no synchronisation against value_as_float() (UI thread reading _has_optimistic) or set_option_async() (UI thread writing it). This is the same data race flagged on the field declaration. Please ensure all reads and writes of _has_optimistic are protected consistently.
| // it in place reintroduces the UI freeze the async dispatch is meant to fix. | ||
| // Both jobs are coalesced into a single pass once the user stops interacting. | ||
| // `value` stays fresh during the gate via options_watcher -> on_options_changed. | ||
| if (last_user_set_stopwatch.get_elapsed_ms() < 500) |
There was a problem hiding this comment.
🔴 [BUG — Scope Too Broad] 500 ms gate blocks the entire update() function (auto-generated by rs-agentic bot)
if (last_user_set_stopwatch.get_elapsed_ms() < 500)
return;This early return skips all work inside update() — not just the USB-polling and config-save blocks that are the actual source of the freeze. If subdevice_model::update() contains any other processing beyond _options_invalidated and next_option (e.g., stream-state checks, error propagation, or UI feedback logic), those will also be silently suppressed for 500 ms after every option write.
Please scope the gate more tightly — either move the return to just before the two specific blocks you want to skip, or use a local flag to skip only the USB-polling and config-save sections, leaving the rest of the function unaffected. At minimum, add a comment listing exactly which code blocks are intentionally skipped so reviewers and future maintainers know the full impact.
|
|
||
| subdevice_model::~subdevice_model() | ||
| { | ||
| config_save_worker::instance().cancel( this ); |
There was a problem hiding this comment.
🟡 [Enhancement — Shutdown Race] cancel() in destructor does not guarantee the running job has finished (auto-generated by rs-agentic bot)
config_save_worker::instance().cancel( this );cancel() only removes the job from _pending if it has not been picked up yet. If the worker thread has already dequeued the lambda and started executing it, cancel() is a no-op, and this continues to be referenced by the in-flight lambda. Since the lambda captures only shared_ptrs to processing blocks (not this), there is no UAF on subdevice_model itself — good. But this subtlety is not documented. Consider adding a comment explaining that the lambda intentionally does not capture this, and why that makes the cancel-only approach safe here.
- Serialize all config_file mutations: add a recursive_mutex inside config_file so the background config_save_worker (subdevice-model.cpp) and UI-thread callers (e.g., processing-block checkbox handlers in device-model.cpp) no longer race on _j or concurrent on-disk writes. - Make _has_optimistic atomic via a shared_ptr<atomic<bool>> wrapper. The flag is read on the UI thread (value_as_float) and written on both the UI thread (set_option_async, update_all_fields) and the on_options_changed callback thread (update_value); shared_ptr keeps option_model copyable since std::atomic deletes its copy/move ctors. Reorder writes in set_option_async so observers that load the flag set also see matching _optimistic_value / _optimistic_stopwatch. - Drop dead error_message / ignore_period parameters from option_model::set_option (now fire-and-forget). Update all call sites accordingly. Reduce slider_unselected to a no-op since the rate-limit retry path it guarded is now unreachable. - Bypass option_model in on_chip_calib_manager: calibration needs synchronous set/verify semantics, so call sensor->set_option / sensor->get_option directly instead of going through the (now async) UI cache. - Log FW errors from option_async_setter::run with LOG_WARNING for rs2::error / std::exception / unknown — silent rejections/clamps were undiagnosable before. - Scope the 500 ms USB-bus contention gate to just the two paths it protects (config save + per-frame option polling) instead of returning early from the entire subdevice_model::update(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3982965 to
158f4ef
Compare
| float value, | ||
| std::string & error_message, | ||
| notifications_model & model ) | ||
| bool option_model::slider_unselected( rs2_option /*opt*/, |
There was a problem hiding this comment.
slider_unselected was turned into a no-op, removing the previous retry-on-release behavior; revert or document as intended
Details
✨ AI Reasoning
A previously-existing retry-on-release code path (which attempted a deferred retry when rate-limited) was removed and replaced by a no-op with an explanatory comment. That change intentionally disables prior retry behavior and leaves a documented bypass in place; this is a behavioral shortcut introduced by the PR and can be surprising or cause subtle regressions for callers that expected the old retry semantics.
🔧 How do I fix it?
Remove debugging statements like console.log, debugger, dd(), or logic bypasses like || true. Keep legitimate logging for monitoring and error handling.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
Tracked by: RSDSO-20803
Summary
While streaming depth/color in realsense-viewer, every sensor option change (slider drag, checkbox toggle) caused a ~1.5 s UI freeze plus a visible "slider snaps back to old value, then jumps to new value" regression. Three layered fixes:
Each commit compiles independently and addresses a distinct, measurable contributor (UI_GAP & WORKER_SAVE_END trace data captured during diagnosis).
Test plan