viewer: eliminate UI freeze and slider snap-back on option change by remibettan · Pull Request #15029 · realsenseai/librealsense

remibettan · 2026-05-06T10:58:43Z

Tracked by: RSDSO-20803

Summary

While streaming depth/color in realsense-viewer, every sensor option change (slider drag, checkbox toggle) caused a ~1.5 s UI freeze plus a visible "slider snaps back to old value, then jumps to new value" regression. Three layered fixes:

Async option-write dispatch (commit 1) - option_async_setter worker thread + last-value coalescing; option_model::set_option now returns immediately. Adds a 500 ms USB-bus contention gate in subdevice_model::update() so per-frame get_option_value() polling and the JSON save block don't fight the in-flight worker write on the same per-device USB lock.
Async config-file save (commit 2) - moves the _options_invalidated save block (5 fixed processing blocks + per-pp save_to_config_file, each rewriting the entire JSON config to disk synchronously, ~400 ms total) to a background config_save_worker singleton with void*-keyed coalescing.
Optimistic slider value (commit 3) - seeds an optimistic float cache from set_option_async; �alue_as_float() returns it while fresh (capped at 2 s) so draw_slider doesn't snap back to the stale option_model::value between dispatch and FW echo. Cleared as soon as update_value (options_watcher) or update_all_fields (post-gate poll) refreshes �alue authoritatively.

Each commit compiles independently and addresses a distinct, measurable contributor (UI_GAP & WORKER_SAVE_END trace data captured during diagnosis).

Test plan

realsense-viewer streams depth + color
Drag Exposure slider rapidly: no UI freeze, slider stays pinned to chosen value, no snap-back
Toggle Auto Exposure / Backlight Compensation: no visible pause
Slider releases settle to actual FW-applied value (verifies optimistic cache clears on FW echo)
All three commits build cleanly (Release, MSVC)

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

… blocks on FW round-trip Each option_model::set_option used to call endpoint->set_option synchronously on the UI thread. That call is ~200 ms on UVC and serializes on the per-device USB lock against options_watcher's 1 s poll cycle, so any slider drag or checkbox toggle visibly froze the viewer. Introduces option_async_setter, a per-option worker thread that coalesces posted values (only the latest between two FW writes wins, no stale-value queue), and routes set_option / slider_selected through it. Adds last_user_set_stopwatch on subdevice_model: option_model::set_option_async resets it on every dispatch, and subdevice_model::update() returns early for 500 ms after the last user write so its per-frame get_option_value() polling and the JSON config-save block don't fight the in-flight worker write on the same USB bus. The cached value still refreshes during the gate via options_watcher -> on_options_changed -> update_value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When _options_invalidated is set, subdevice_model::update() previously called save_processing_block_to_config_file 5x plus once per post-processing model on the UI thread. Each call iterates the block's options and runs config_file::set per option, and every set rewrites the entire JSON config to disk synchronously (see common/rs-config.cpp). For a typical depth+color setup that's tens of full file rewrites per change — ~400 ms of disk-IO blocking the render loop. Adds an anonymous-namespace config_save_worker singleton with one persistent thread. update() now captures the processing-block shared_ptrs by value and posts the save lambda to the worker; the UI thread returns immediately. Posts coalesce by opaque void* key (subdevice identity), so rapid invalidation events collapse into at most one save pass per wake-up cycle. The lambda never dereferences `this` — UAF-safe even if the subdevice is destroyed mid-save. ~subdevice_model still calls cancel(this) to drop any not-yet-started job for tidy shutdown. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… in flight After dispatching an option write asynchronously, the cached option_model::value isn't refreshed until either the options_watcher callback fires (~1 s later) or the post-gate update_all_fields poll reads it back. In the meantime, draw_slider re-reads value_as_float() on the next render frame and the slider visually snaps back to the old value, then jumps to the new value once the echo finally arrives — a regression introduced by making the write async. Adds an optimistic local cache (_optimistic_value + stopwatch) seeded by set_option_async. value_as_float() returns the optimistic value while it's fresh (capped at 2 s in case the FW rejects/clamps and never echoes the requested value), so the slider stays pinned to the user's choice from the moment they release. The flag is cleared as soon as either authoritative path refreshes value: update_value (options_watcher callback) or update_all_fields (per-frame poll). After clearing, value_as_float() returns whatever FW actually applied — so a clamped/rejected value shows real applied state instead of the user's request. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az · 2026-05-06T11:10:10Z

+        _async_setter = std::make_shared< option_async_setter >( endpoint, opt );
+    _async_setter->post( value );
+
+    // Mask the stale cached `value` with the user's choice until the FW echo


Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Nir-Az

test

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

+            // Capture by value so the worker stays UAF-safe even if `this` dies mid-save.
+            // shared_ptrs keep the underlying processing blocks alive until the job runs.
+            auto colorizer = depth_colorizer;
+            auto yuy2      = yuy2rgb;
+            auto m420      = m420_to_rgb;
+            auto nv12      = nv12_to_rgb;
+            auto y411_ptr  = y411;
+            auto pp        = post_processing;
+            config_save_worker::instance().post( this,
+                [ colorizer, yuy2, m420, nv12, y411_ptr, pp ]
+                {
+                    save_processing_block_to_config_file( "colorizer",   colorizer );
+                    save_processing_block_to_config_file( "yuy2rgb",     yuy2 );
+                    save_processing_block_to_config_file( "m420_to_rgb", m420 );
+                    save_processing_block_to_config_file( "nv12_to_rgb", nv12 );
+                    save_processing_block_to_config_file( "y411",        y411_ptr );
+                    for( auto & pbm : pp ) pbm->save_to_config_file();
+                } );


+    // While the optimistic cache is fresh, prefer the user's just-committed value.
+    // Cap at 2s in case the FW echo never matches (rejected/clamped value).
+    if( _has_optimistic && _optimistic_stopwatch.get_elapsed_ms() < 2000 )
+        return _optimistic_value;
+


 bool option_model::set_option(rs2_option opt,
    float req_value,
-    std::string& error_message,
-    std::chrono::steady_clock::duration ignore_period)
+    std::string& /*error_message*/,
+    std::chrono::steady_clock::duration /*ignore_period*/)
 {
-    // Only set the value if `ignore_period` time past since last set_option() call for this option
-    if (last_set_stopwatch.get_elapsed() < ignore_period)
-        return false;
-
-    try
-    {
-        last_set_stopwatch.reset();
-        endpoint->set_option(opt, req_value);
-    }
-    catch (const error& e)
-    {
-        error_message = error_to_string(e);
-    }
-
-    // Only update the cached value once set_option is done! That way, if it doesn't change
-    // anything...
-    try
-    {
-        value = endpoint->get_option_value(opt);
-    }
-    catch (...)
-    {
-    }
-
+    // All UI-thread option writes go through the async worker so the render loop is never
+    // blocked on the FW round-trip (set_option is ~200 ms on UVC; readback is another ~200 ms;
+    // and they serialize on the per-device USB lock against options_watcher's 1 s poll cycle).
+    // Coalescing in option_async_setter replaces the previous ignore_period rate-limit, and
+    // the cached `value` is refreshed by options_watcher -> on_options_changed -> update_value.
+    last_set_stopwatch.reset();
+    set_option_async( opt, req_value );
    return true;
 }


Nir-Az

🤖✨ Generated by rs-agentic bot ✨🤖

Reviewed PR #15029 — viewer: eliminate UI freeze and slider snap-back on option change.

The three-commit approach is well-reasoned and tackles measurable pain points. Inline comments below cover the key observations. 🔍

Nir-Az · 2026-05-06T11:15:15Z

+        option_async_setter( option_async_setter const & ) = delete;
+        option_async_setter & operator=( option_async_setter const & ) = delete;
+
+        void post( float value );


🤖✨ Generated by rs-agentic bot ✨🤖

⚠️ Internal helpers is_all_integers(), is_enum(), is_checkbox() accidentally made public

In the option_model class, the refactor moved private: two lines down, unintentionally promoting these three predicate helpers out of the private section. They are implementation details of the draw/slider rendering and should not be callable by external code.

Suggested fix — restore them under private::

private: bool is_all_integers() const; bool is_enum() const; bool is_checkbox() const; bool draw_checkbox(...);

🔧🐛

Nir-Az · 2026-05-06T11:15:15Z

+        {
+            _endpoint->set_option( _opt, value );
+        }
+        catch( ... )


🤖✨ Generated by rs-agentic bot ✨🤖

⚠️ Silent exception swallowing with no diagnostics

The catch(...) block suppresses ALL errors from _endpoint->set_option(), including rs2::error messages that carry FW rejection/clamping details. While the rationale is sound (UI thread must not block), with zero logging it becomes very hard to diagnose why a slider settles at an unexpected value.

Consider at minimum a LOG_WARNING:

catch( const rs2::error & e ) { LOG_WARNING( "Async set_option opt=" << _opt << " value=" << value << " failed: " << e.what() ); } catch( ... ) {}

The update_all_fields readback will still correct the UI; this just adds diagnosability.

🔍🛠️

Nir-Az · 2026-05-06T11:15:15Z

+    // Mask the stale cached `value` with the user's choice until the FW echo
+    // refreshes it, so draw_slider doesn't snap back to the old value next frame.
+    _optimistic_value = value;
+    _has_optimistic = true;


🤖✨ Generated by rs-agentic bot ✨🤖

🔍 Dead parameters still in the public set_option signature

error_message and ignore_period are now permanently unused (both silenced with /*...*/), yet they remain in the public signature. Call sites passing an error_message string will silently receive no feedback on FW rejection — a silent behavior change that is easy to miss in code review.

Options:

Remove both params and update call sites (cleanest).

Keep for ABI stability but update the doc-comment to clearly state both are no-ops.

Route async errors back via a std::optional<std::string> last_async_error field.

🛠️📋

Nir-Az · 2026-05-06T11:15:15Z

+        static config_save_worker w;
+        return w;
+    }
+


🤖✨ Generated by rs-agentic bot ✨🤖

🔒 void* key — narrow ABA/reuse hazard after object destruction

Using this as a raw-pointer key is safe in the common case and cancel(this) in ~subdevice_model() prevents most UAF scenarios. However there is a narrow theoretical hazard:

Worker dequeues lambda for key K (the old object), begins executing.

~subdevice_model() calls cancel(K) — key is already gone, no-op.

New subdevice_model is allocated at the same address K and calls post(K, newJob).

The OLD lambda finishes and writes stale config data, potentially overwriting what the new object saved.

The shared_ptr captures prevent crashes, but step 4 is a logical-correctness hazard. Consider adding a generation counter to distinguish old from new instances: std::pair<void*, uint64_t> as the map key.

🛡️⚠️

Nir-Az · 2026-05-06T11:15:15Z

+        //     async option-write worker and with options_watcher's 1 s poll cycle, so leaving
+        //     it in place reintroduces the UI freeze the async dispatch is meant to fix.
+        // Both jobs are coalesced into a single pass once the user stops interacting.
+        // `value` stays fresh during the gate via options_watcher -> on_options_changed.


🤖✨ Generated by rs-agentic bot ✨🤖

⏸️ Early-return gate blankets ALL of update(), not just the hot paths

The 500 ms gate currently causes update() to return before ANY work runs — including next_option cursor advancement (option-discovery), and any future logic added to update(). If a user changes an option during initial option-discovery, that pass stalls for 500 ms on every interaction.

Consider scoping the guard more narrowly:

void subdevice_model::update(...) { bool user_writing = last_user_set_stopwatch.get_elapsed_ms() < 500; if (!user_writing && _options_invalidated) { /* config save */ } if (!user_writing && next_option < supported_options.size()) { /* option poll */ } // other update logic continues unaffected... }

This keeps the USB-contention gate targeted and doesn't throttle unrelated update() logic.

🎯🔧

Nir-Az · 2026-05-06T11:35:35Z

retrigger

Nir-Az

rs-agentic bot — Auto-generated Review

Summary

This PR addresses a real and measurable UX problem (UI freeze + slider snap-back on option changes) with three layered fixes: async option writes, async config saves, and an optimistic slider cache. The approach is sound, the commit separation is clean, and the comments are thorough. That said, a few thread-safety and API-contract issues need attention before merge.

🔴 Potential Bugs

Data race on _has_optimistic / _optimistic_value — written by the UI thread in set_option_async and cleared by the options_watcher background thread in update_value. No mutex or std::atomic protection. Undefined behaviour in C++11 and later. See inline comment.
set_option() API contract silently broken — ignore_period is now always ignored (always returns true), and error_message is discarded. Any existing caller that passes a non-zero ignore period or inspects the error string will get broken behaviour without a compile-time warning. See inline comment.
500 ms gate blocks the entire update() function — the early-return at the top of subdevice_model::update() skips all processing (not just the USB polling and config save). If other work is done later in that function it will also be suppressed for 500 ms after every option write. See inline comment.
_async_setter re-use doesn't validate _opt match — the setter is created once for the first opt passed to set_option_async. If somehow the method is called later with a different opt, the wrong option gets the write silently. See inline comment.

🟡 Code Enhancements

Error swallowing with zero logging — the catch(...) in option_async_setter::run() silently drops all firmware errors with no diagnostic trace. At minimum, log the error at debug/trace level via LOG_DEBUG or LOG_WARNING so developers can diagnose FW rejections.
One thread per option model — option_async_setter spawns a dedicated std::thread per option_model (one per sensor option), created lazily. With a sensor exposing 30+ options and multiple sensors, this can add up to many live threads. Consider a shared per-sensor (or per-device) single-threaded executor.
New fields exposed as public — _async_setter, _optimistic_value, _has_optimistic, _optimistic_stopwatch are implementation details but are declared public in the class. Consider making them private (following the naming convention you already used with the leading _).

🧪 Tests

No automated tests added — the PR description lists manual test steps only. Consider adding unit tests (or extending existing option-model tests) to cover: async dispatch coalescing, optimistic cache lifecycle (set → echo → clear), cache timeout fallback (no FW echo), and error-path behaviour.

Nir-Az · 2026-05-06T11:44:42Z

+        // options_watcher -> update_value or the post-gate update_all_fields poll.
+        // Cleared when either authoritative path refreshes `value`, or after the timeout
+        // below (in case FW rejects/clamps and never echoes the requested value).
+        float _optimistic_value = 0.f;


🔴 [BUG — Data Race] _has_optimistic / _optimistic_value are not thread-safe (auto-generated by rs-agentic bot)

_has_optimistic and _optimistic_value are written by the UI thread inside set_option_async (_has_optimistic = true; _optimistic_value = value;) but are cleared by the options_watcher background thread through update_value (_has_optimistic = false). Concurrent unsynchronised access to a non-atomic bool is a data race, which is undefined behaviour in C++11 and later (can manifest as torn reads, stale cache lines, or compiler-optimised reads that never see the update).

Suggested fix: wrap the three optimistic fields and their reads/writes in the same mutex already used for nearby state, or declare _has_optimistic as std::atomic<bool> and _optimistic_value as std::atomic<float>. value_as_float() accesses both under no lock, so it also needs guarding.

Nir-Az · 2026-05-06T11:44:42Z

+        {
+            _endpoint->set_option( _opt, value );
+        }
+        catch( ... )


🟡 [Enhancement — Observability] Silent error swallowing (auto-generated by rs-agentic bot)

All firmware errors are caught and silently discarded with no diagnostic output. This makes it very hard to distinguish "FW rejected/clamped the write" from "set succeeded but FW echo is slow". At minimum, please add a LOG_DEBUG or LOG_WARNING call here so developers can see firmware rejections in trace logs without attaching a debugger:

catch( const rs2::error & e ) { LOG_DEBUG( "async set_option " << _opt << " failed: " << e.what() ); } catch( ... ) { LOG_DEBUG( "async set_option " << _opt << " failed with unknown exception" ); }

Nir-Az · 2026-05-06T11:44:42Z

-    std::string& error_message,
-    std::chrono::steady_clock::duration ignore_period)
+    std::string& /*error_message*/,
+    std::chrono::steady_clock::duration /*ignore_period*/)


🔴 [BUG — API Contract] set_option() silently breaks callers that rely on ignore_period or error_message (auto-generated by rs-agentic bot)

The old implementation:

Returned false when within ignore_period (rate-limiter).

Populated error_message on FW error.

Updated value synchronously so the caller saw the FW-applied value immediately.

All three behaviours are now gone and the signature change is not reflected in the public API (option-model.h): the parameters are still declared with meaningful types. Callers outside the slider path (e.g. checkbox toggles, programmatic option changes, or external consumers of option_model) that:

Pass ignore_period > 0s expecting rate-limiting — will now fire every time.

Check error_message after the call — will always see an empty string even on firmware error.

Inspect the return value for failure — will always get true.

This is a silent breaking behavioural change. Please either:

Update the header to document the new semantics clearly, or

Provide a separate synchronous overload for non-UI callers that need error feedback.

Nir-Az · 2026-05-06T11:44:42Z

 }

+void option_model::set_option_async( rs2_option opt, float value )
+{


🟡 [Enhancement — Correctness Risk] _async_setter reuse doesn't validate option identity (auto-generated by rs-agentic bot)

if( ! _async_setter ) _async_setter = std::make_shared< option_async_setter >( endpoint, opt ); _async_setter->post( value );

_async_setter is created once with a specific rs2_option baked in. If set_option_async is ever called with a different opt (e.g., if two options share an option_model instance, or if the model is reused), the setter will write to the wrong option with no error.

Suggested defensive check:

if( ! _async_setter || _async_setter->opt() != opt ) _async_setter = std::make_shared< option_async_setter >( endpoint, opt );

(Requires exposing _opt via a getter on option_async_setter.)

Nir-Az · 2026-05-06T11:44:42Z


    value = updated_value;
+    _has_optimistic = false;
 }


🟡 [Enhancement — Race Window] Optimistic cache cleared without holding the slider-hold guard (auto-generated by rs-agentic bot)

update_value has two paths:

Early-return if last_slider_hold_stopwatch < 1000 ms (correct — prevents updates during active drag).

Otherwise: value = updated_value; _has_optimistic = false;

Path 2 clears _has_optimistic from the options_watcher thread with no synchronisation against value_as_float() (UI thread reading _has_optimistic) or set_option_async() (UI thread writing it). This is the same data race flagged on the field declaration. Please ensure all reads and writes of _has_optimistic are protected consistently.

Nir-Az · 2026-05-06T11:44:42Z

+        //     it in place reintroduces the UI freeze the async dispatch is meant to fix.
+        // Both jobs are coalesced into a single pass once the user stops interacting.
+        // `value` stays fresh during the gate via options_watcher -> on_options_changed.
+        if (last_user_set_stopwatch.get_elapsed_ms() < 500)


🔴 [BUG — Scope Too Broad] 500 ms gate blocks the entire update() function (auto-generated by rs-agentic bot)

if (last_user_set_stopwatch.get_elapsed_ms() < 500) return;

This early return skips all work inside update() — not just the USB-polling and config-save blocks that are the actual source of the freeze. If subdevice_model::update() contains any other processing beyond _options_invalidated and next_option (e.g., stream-state checks, error propagation, or UI feedback logic), those will also be silently suppressed for 500 ms after every option write.

Please scope the gate more tightly — either move the return to just before the two specific blocks you want to skip, or use a local flag to skip only the USB-polling and config-save sections, leaving the rest of the function unaffected. At minimum, add a comment listing exactly which code blocks are intentionally skipped so reviewers and future maintainers know the full impact.

Nir-Az · 2026-05-06T11:44:42Z


    subdevice_model::~subdevice_model()
    {
+        config_save_worker::instance().cancel( this );


🟡 [Enhancement — Shutdown Race] cancel() in destructor does not guarantee the running job has finished (auto-generated by rs-agentic bot)

config_save_worker::instance().cancel( this );

cancel() only removes the job from _pending if it has not been picked up yet. If the worker thread has already dequeued the lambda and started executing it, cancel() is a no-op, and this continues to be referenced by the in-flight lambda. Since the lambda captures only shared_ptrs to processing blocks (not this), there is no UAF on subdevice_model itself — good. But this subtlety is not documented. Consider adding a comment explaining that the lambda intentionally does not capture this, and why that makes the cancel-only approach safe here.

- Serialize all config_file mutations: add a recursive_mutex inside config_file so the background config_save_worker (subdevice-model.cpp) and UI-thread callers (e.g., processing-block checkbox handlers in device-model.cpp) no longer race on _j or concurrent on-disk writes. - Make _has_optimistic atomic via a shared_ptr<atomic<bool>> wrapper. The flag is read on the UI thread (value_as_float) and written on both the UI thread (set_option_async, update_all_fields) and the on_options_changed callback thread (update_value); shared_ptr keeps option_model copyable since std::atomic deletes its copy/move ctors. Reorder writes in set_option_async so observers that load the flag set also see matching _optimistic_value / _optimistic_stopwatch. - Drop dead error_message / ignore_period parameters from option_model::set_option (now fire-and-forget). Update all call sites accordingly. Reduce slider_unselected to a no-op since the rate-limit retry path it guarded is now unreachable. - Bypass option_model in on_chip_calib_manager: calibration needs synchronous set/verify semantics, so call sensor->set_option / sensor->get_option directly instead of going through the (now async) UI cache. - Log FW errors from option_async_setter::run with LOG_WARNING for rs2::error / std::exception / unknown — silent rejections/clamps were undiagnosable before. - Scope the 500 ms USB-bus contention gate to just the two paths it protects (config save + per-frame option polling) instead of returning early from the entire subdevice_model::update(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

aikido-pr-checks · 2026-05-06T13:49:52Z

-                                      float value,
-                                      std::string & error_message,
-                                      notifications_model & model )
+bool option_model::slider_unselected( rs2_option /*opt*/,


slider_unselected was turned into a no-op, removing the previous retry-on-release behavior; revert or document as intended

Details

✨ AI Reasoning
A previously-existing retry-on-release code path (which attempted a deferred retry when rate-limited) was removed and replaced by a no-op with an explanatory comment. That change intentionally disables prior retry behavior and leaves a documented bypass in place; this is a behavioral shortcut introduced by the PR and can be surprising or cause subtle regressions for callers that expected the old retry semantics.

🔧 How do I fix it?
Remove debugging statements like console.log, debugger, dd(), or logic bypasses like || true. Keep legitimate logging for monitoring and error handling.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

Copilot AI review requested due to automatic review settings May 6, 2026 10:58

Copilot AI reviewed May 6, 2026

View reviewed changes

remibettan changed the base branch from master to development May 6, 2026 11:01

remibettan and others added 3 commits May 6, 2026 14:04

remibettan force-pushed the viewer-ui-exp-change-avoid-stream-pause branch from a39c375 to bfc3392 Compare May 6, 2026 11:07

aikido-pr-checks Bot reviewed May 6, 2026

View reviewed changes

Comment thread common/option-model.cpp