feat: Add v3 gaze estimation with auto-calibration and head pose compensation by circlenaut · Pull Request #6 · WangWilly/gaze-correction-cam

circlenaut · 2026-02-03T23:48:51Z

Summary

Implements new v3 gaze estimation algorithm using 2D iris displacement with calibration support
Adds head pose compensation using MediaPipe landmarks + OpenCV solvePnP
Auto-calibration ('z' key) that works anytime - just look at camera and press z
New visualization modes: face mesh ('m'), iris landmarks ('l'), vectors ('v')
TensorFlow device selection (--device cpu/gpu/auto)
Auto-detect camera resolution at startup
Verbose debug logging with --verbose flag

New Controls

Key	Function
`g`	Toggle gaze correction on/off
`z`	Auto-calibrate (works anytime)
`c`	Calibration mode (WASD for manual adjustment)
`v`	Vector visualization
`l`	Iris landmark visualization
`m`	Face mesh visualization
`f`	FPS display
`q`	Quit

Technical Details

Gaze Estimation v3

Uses 2D iris displacement (reliable with single camera)
Combines iris gaze (eye-in-head) with head pose for total gaze direction
Falls back gracefully to v1 geometric method when iris points unavailable
Calibration stores neutral gaze position and offsets all future corrections

Head Pose Estimation

Uses 6 facial landmarks with solvePnP
Provides pitch, yaw, roll compensation
Displayed in calibration overlay when using mediapipe backend

Test plan

Run with mediapipe backend: python bin_single_window.py --backend mediapipe
Test auto-calibration: look at camera, press 'z', verify "CALIBRATED" shows
Verify 'z' works outside calibration mode
Toggle all visualizations: 'v', 'l', 'm'
Test --device cpu and --device gpu flags
Verify fallback to v1 with dlib backend

🤖 Generated with Claude Code

Co-authored-by: WangWilly <[email protected]>

# Conflicts: # model_managers/gaze_corrector_v1.py

…ensation This PR introduces significant improvements to gaze tracking and correction: ## New Features ### Gaze Estimation v3 Algorithm - Implements 2D iris displacement-based gaze estimation with calibration support - Adds head pose compensation using MediaPipe landmarks + solvePnP - Combines iris gaze (eye-in-head) with head pose for accurate gaze direction - Falls back gracefully to v1 geometric method when iris points unavailable ### Auto-Calibration ('z' key) - Press 'z' while looking at camera to set neutral gaze position - Works anytime (independent of calibration mode) - Stores raw gaze angles as baseline for future corrections - Calibration persists across sessions via database ### Visualization Enhancements - Face mesh visualization ('m' key): Shows face oval, eyebrows, eyes, nose, lips - Iris landmark visualization ('l' key): Shows 4 iris points per eye with labels - Vector visualization ('v' key): Shows gaze direction vectors - All visualization modes can be toggled independently ### Head Pose Display - Shows pitch, yaw, roll in calibration overlay when using mediapipe backend - HeadPose dataclass for structured head pose data ### TensorFlow Device Selection - New --device flag (auto/cpu/gpu) to control TensorFlow device - Deferred imports to ensure device is configured before TF loads ### Other Improvements - Auto-detect camera resolution at startup - Verbose debug logging with --verbose flag - Updated terminal instructions to show all available controls ## Controls Summary - 'g': Toggle gaze correction - 'z': Auto-calibrate (works anytime) - 'c': Calibration mode (WASD for manual adjustment) - 'v': Vector visualization - 'l': Iris landmark visualization - 'm': Face mesh visualization - 'f': FPS display - 'q': Quit Co-Authored-By: Claude Opus 4.5 <[email protected]>

Copilot

Pull request overview

This PR implements a v3 gaze estimation algorithm with auto-calibration, head pose compensation, and adds a macOS virtual camera implementation using CoreMediaIO. The changes introduce significant new functionality including 2D iris displacement-based gaze tracking, WASD-adjustable manual offsets, and multiple visualization modes.

Changes:

New v3 gaze estimation using 2D iris displacement with calibration support and head pose compensation via MediaPipe + solvePnP
Auto-calibration feature ('z' key) that captures neutral gaze position for relative corrections
macOS virtual camera extension with Swift-based settings app for system-wide gaze correction
New visualization modes for vectors, iris landmarks, and face mesh; FPS display and TensorFlow device selection
Enhanced eye blending with alpha feathering to reduce warping artifacts

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 25 comments.

Show a summary per file

File	Description
pyproject.toml	Updated Python version constraint and added tensorflow-metal dependency
poetry.lock	Updated lock file with new dependency versions
model_managers/gaze_corrector_v1.py	Implemented v3 gaze estimation, auto-calibration, manual offset API, head pose integration, and improved eye blending
displayers/face_predictor.py	Added HeadPose dataclass, iris landmark extraction, solvePnP-based head pose estimation, and updated eye center calculation
displayers/dis_single_window.py	Added new keyboard controls (z/v/l/m/f), FPS tracking, enhanced calibration overlay, and visualization features
bin_single_window.py	Added TensorFlow device configuration, verbose flag, and graceful interrupt handling
VirtualCamera/*	New macOS virtual camera implementation with CoreMediaIO extension and SwiftUI settings app
README.md	Documented virtual camera feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-04T03:19:00Z

displayers/face_predictor.py

+        # 3D model points (generic face model in cm, centered at nose)
+        # These are approximate positions for a generic face
+        model_points = np.array([
+            (0.0, 0.0, 0.0),          # Nose tip
+            (0.0, -3.3, -0.65),       # Chin
+            (-2.25, 1.7, -1.35),      # Left eye outer corner
+            (2.25, 1.7, -1.35),       # Right eye outer corner
+            (-1.5, -1.0, -1.25),      # Left mouth corner
+            (1.5, -1.0, -1.25),       # Right mouth corner
+        ], dtype=np.float64)


The head pose estimation uses hardcoded 3D model points for a generic face. These may not accurately represent all face shapes, particularly for different ethnicities, ages, or genders. Consider adding a note in the documentation about this limitation or allowing users to adjust these values for better accuracy.

Copilot · 2026-02-04T03:19:00Z

VirtualCamera/Sources/VirtualCameraExtension/main.swift

+let providerSource = VirtualCameraProviderSource(clientQueue: nil)
+CMIOExtensionProviderSource.startService(provider: providerSource.provider)
+
+// Keep the extension running
+CFRunLoopRun()


The VirtualCameraProviderSource is instantiated on line 15 with clientQueue: nil, but then stored and never used again. The provider is stored in a local variable that goes out of scope when main.swift finishes executing after CFRunLoopRun(). This means the provider could be deallocated. Store providerSource in a global variable to ensure it stays alive throughout the extension's lifetime.

Copilot · 2026-02-04T03:19:01Z

VirtualCamera/Sources/VirtualCameraExtension/SettingsManager.swift

+    var cameraOffsetX: Float = 0.0
+
+    /// Camera offset Y (vertical, positive = down) in cm
+    var cameraOffsetY: Float = -21.0


The cameraOffsetY default value in the Swift code is -21.0 (line 23), but the Python code has been changed to default to 5.0 (line 69 in gaze_corrector_v1.py). This inconsistency will cause different behavior between the Python backend and the Swift settings app. Ensure both use the same default values or document why they differ.

Suggested change

var cameraOffsetY: Float = -21.0

var cameraOffsetY: Float = 5.0

Copilot · 2026-02-04T03:19:01Z

pyproject.toml

    "pyobjc (>=11.0,<12.0)",
    "mediapipe (>=0.10.32,<0.11.0)",
    "pyyaml (>=6.0.3,<7.0.0)",
+    "tensorflow-metal (>=1.2.0,<2.0.0)",


The tensorflow-metal dependency is platform-specific and only works on macOS with Apple Silicon. This will cause installation failures on other platforms (Linux, Windows, Intel Macs). Consider making this an optional dependency or using platform markers to install it only on compatible systems. Example: tensorflow-metal (>=1.2.0,<2.0.0) ; platform_machine == "arm64" and sys_platform == "darwin"

Suggested change

"tensorflow-metal (>=1.2.0,<2.0.0)",

"tensorflow-metal (>=1.2.0,<2.0.0); platform_machine == \"arm64\" and sys_platform == \"darwin\"",

Copilot · 2026-02-04T03:19:01Z

model_managers/gaze_corrector_v1.py

    ipd: float = 6.3  # Inter-pupillary distance in cm
-    camera_offset: tuple[float, float, float] = (0, -21, -1)  # relative to screen center
-
+    camera_offset: tuple[float, float, float] = (0, 5, -1)  # relative to screen center (Y positive = above)


The default camera_offset Y value has changed from -21 to 5 (positive = above screen center). This is a significant breaking change for existing users who may have calibrated their setup with the old default. Consider documenting this change in a migration guide or adding a version check to migrate old settings.

Copilot · 2026-02-04T03:19:05Z

displayers/face_predictor.py

+    # Eye center calculation using anatomical eye corners (canthus points)
+    # These are FIXED landmarks that don't move with gaze direction
+    # Left eye: inner corner = 362, outer corner = 263
+    # Right eye: inner corner = 133, outer corner = 33
+    LEFT_EYE_CORNERS = (362, 263)   # Inner and outer canthus (stable)
+    RIGHT_EYE_CORNERS = (133, 33)   # Inner and outer canthus (stable)


The LEFT_EYE_CORNERS and RIGHT_EYE_CORNERS have been changed from iris points (474, 476) and (471, 469) to anatomical corners (362, 263) and (133, 33). While the comment correctly explains these are stable landmarks, this is a significant algorithmic change that will affect eye center calculation. Verify that this doesn't break existing calibrations and consider documenting this breaking change.

Copilot · 2026-02-04T03:19:05Z

bin_single_window.py

+def configure_tensorflow_device(device: str) -> str:
+    """
+    Configure TensorFlow to use specified device.
+    Must be called before importing TensorFlow.
+
+    Args:
+        device: 'auto', 'cpu', or 'gpu'
+
+    Returns:
+        Actual device being used
+    """
+    import tensorflow as tf
+
+    if device == "cpu":
+        # Hide all GPUs to force CPU
+        tf.config.set_visible_devices([], 'GPU')
+        print("TensorFlow: Forcing CPU mode")
+        return "CPU"
+    elif device == "gpu":
+        gpus = tf.config.list_physical_devices('GPU')
+        if gpus:
+            print(f"TensorFlow: Using GPU ({gpus[0].name})")
+            return "GPU"
+        else:
+            print("TensorFlow: No GPU found, falling back to CPU")
+            return "CPU"
+    else:  # auto
+        gpus = tf.config.list_physical_devices('GPU')
+        if gpus:
+            print(f"TensorFlow: Auto-detected GPU ({gpus[0].name})")
+            return "GPU"
+        else:
+            print("TensorFlow: Auto-detected CPU only")
+            return "CPU"


The TensorFlow device configuration happens in configure_tensorflow_device which is called before importing TF-dependent modules, but TensorFlow is already imported at line 38 inside the function. This means the device configuration happens after TensorFlow is loaded. For the GPU visibility settings to work correctly, they need to be set before TensorFlow is first imported. Consider using environment variables (TF_CPP_MIN_LOG_LEVEL, CUDA_VISIBLE_DEVICES) instead or ensure no TF imports happen before this configuration.

Copilot · 2026-02-04T03:19:05Z

VirtualCamera/Sources/VirtualCameraExtension/FrameProcessor.swift

+            return nil
+        }
+
+        // In a full implementation, this would:
+        // 1. Convert pixelBuffer to image data
+        // 2. Send to Python backend via named pipe or shared memory
+        // 3. Receive processed frame
+        // 4. Convert back to CVPixelBuffer
+
+        // For now, return nil to indicate backend processing not available
+        return nil


The BackendCommunicator's processFrame method always returns nil (line 184), meaning backend processing is not actually implemented. This makes the virtual camera extension non-functional for gaze correction. The extension will only pass through frames without any correction. Consider adding a TODO comment or implementation plan, or document this limitation clearly in the BUILD.md file.

Suggested change

return nil

}

// In a full implementation, this would:

// 1. Convert pixelBuffer to image data

// 2. Send to Python backend via named pipe or shared memory

// 3. Receive processed frame

// 4. Convert back to CVPixelBuffer

// For now, return nil to indicate backend processing not available

return nil

// Backend is not available; signal caller to use a fallback path.

return nil

}

// TODO: Implement IPC-based gaze correction.

// In a full implementation, this would:

// 1. Convert `pixelBuffer` to a suitable image/data representation.

// 2. Send the data to the Python backend via the configured IPC mechanism

// (named pipes or shared memory) using `framePipePath`.

// 3. Receive the processed frame data from `resultPipePath`.

// 4. Convert the processed data back into a `CVPixelBuffer` and return it.

//

// Until the above is implemented, we return the original frame unchanged

// when the backend is marked as connected, so callers still receive a

// valid buffer instead of `nil`.

return pixelBuffer

Copilot · 2026-02-04T03:19:05Z

model_managers/gaze_corrector_v1.py

+        for i in range(4):
+            for pts in [le_iris_points, re_iris_points]:
+                if np.isnan(pts[i][0]) or np.isnan(pts[i][1]):
+                    if self.debug_v2:
+                        self.logger.log(f"v3: NaN in iris points, falling back to v1")
+                    return self.estimate_gaze_angle(le_center, re_center, video_size)


The same NaN validation issue exists in estimate_gaze_angle_v3. The code should check if le_iris_points or re_iris_points is None before iterating. Add: if le_iris_points is None or re_iris_points is None: before line 873 and fall back to v1.

Copilot · 2026-02-04T03:19:06Z

model_managers/gaze_corrector_v1.py

+        except Exception:
+            # Head pose handling failed, continue without it
+            self.last_head_pose = None
+            raw_gaze_h = iris_gaze_h
+            raw_gaze_v = iris_gaze_v


The head pose extraction uses bare try-except with no specific exception types. This catches all exceptions including KeyboardInterrupt and SystemExit. Change to except Exception: to catch only runtime errors while allowing interrupt signals to propagate.

circlenaut · 2026-02-04T21:50:38Z

Valid points! I'll address these over the weekend and update the PR.

Captures code review findings (WangWilly#3, WangWilly#4, WangWilly#6, WangWilly#7, #8, #9, #10) and architectural improvements (A–D) identified but not yet implemented. Includes commit hashes for already-shipped items for traceability. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

WangWilly and others added 7 commits January 27, 2026 13:35

feat: 🎸 pupil fixing

a0af9fe

Initial plan

426c530

Add macOS Virtual Camera implementation with CoreMediaIO DAL Plugin

9df0252

Co-authored-by: WangWilly <[email protected]>

Address code review feedback for virtual camera implementation

dec69f8

Co-authored-by: WangWilly <[email protected]>

Merge branch 'pr-3' into combined-prs

c75ea39

# Conflicts: # model_managers/gaze_corrector_v1.py

Merge branch 'pr-5' into combined-prs

07d5d42

WangWilly requested a review from Copilot February 4, 2026 03:12

Copilot started reviewing on behalf of WangWilly February 4, 2026 03:12 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add v3 gaze estimation with auto-calibration and head pose compensation#6

feat: Add v3 gaze estimation with auto-calibration and head pose compensation#6
circlenaut wants to merge 7 commits intoWangWilly:masterfrom
circlenaut:combined-prs

circlenaut commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

circlenaut commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	var cameraOffsetY: Float = -21.0
	var cameraOffsetY: Float = 5.0

	"tensorflow-metal (>=1.2.0,<2.0.0)",
	"tensorflow-metal (>=1.2.0,<2.0.0); platform_machine == \"arm64\" and sys_platform == \"darwin\"",

-            return nil
-        }
-        // In a full implementation, this would:
-        // 1. Convert pixelBuffer to image data
-        // 2. Send to Python backend via named pipe or shared memory
-        // 3. Receive processed frame
-        // 4. Convert back to CVPixelBuffer
-        // For now, return nil to indicate backend processing not available
-        return nil
+            // Backend is not available; signal caller to use a fallback path.
+            return nil
+        }
+        // TODO: Implement IPC-based gaze correction.
+        // In a full implementation, this would:
+        // 1. Convert `pixelBuffer` to a suitable image/data representation.
+        // 2. Send the data to the Python backend via the configured IPC mechanism
+        //    (named pipes or shared memory) using `framePipePath`.
+        // 3. Receive the processed frame data from `resultPipePath`.
+        // 4. Convert the processed data back into a `CVPixelBuffer` and return it.
+        //
+        // Until the above is implemented, we return the original frame unchanged
+        // when the backend is marked as connected, so callers still receive a
+        // valid buffer instead of `nil`.
+        return pixelBuffer

Conversation

circlenaut commented Feb 3, 2026

Summary

New Controls

Technical Details

Gaze Estimation v3

Head Pose Estimation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

circlenaut commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants