feat: Add v3 gaze estimation with auto-calibration and head pose compensation#6
feat: Add v3 gaze estimation with auto-calibration and head pose compensation#6circlenaut wants to merge 7 commits intoWangWilly:masterfrom
Conversation
Co-authored-by: WangWilly <[email protected]>
Co-authored-by: WangWilly <[email protected]>
# Conflicts: # model_managers/gaze_corrector_v1.py
…ensation
This PR introduces significant improvements to gaze tracking and correction:
## New Features
### Gaze Estimation v3 Algorithm
- Implements 2D iris displacement-based gaze estimation with calibration support
- Adds head pose compensation using MediaPipe landmarks + solvePnP
- Combines iris gaze (eye-in-head) with head pose for accurate gaze direction
- Falls back gracefully to v1 geometric method when iris points unavailable
### Auto-Calibration ('z' key)
- Press 'z' while looking at camera to set neutral gaze position
- Works anytime (independent of calibration mode)
- Stores raw gaze angles as baseline for future corrections
- Calibration persists across sessions via database
### Visualization Enhancements
- Face mesh visualization ('m' key): Shows face oval, eyebrows, eyes, nose, lips
- Iris landmark visualization ('l' key): Shows 4 iris points per eye with labels
- Vector visualization ('v' key): Shows gaze direction vectors
- All visualization modes can be toggled independently
### Head Pose Display
- Shows pitch, yaw, roll in calibration overlay when using mediapipe backend
- HeadPose dataclass for structured head pose data
### TensorFlow Device Selection
- New --device flag (auto/cpu/gpu) to control TensorFlow device
- Deferred imports to ensure device is configured before TF loads
### Other Improvements
- Auto-detect camera resolution at startup
- Verbose debug logging with --verbose flag
- Updated terminal instructions to show all available controls
## Controls Summary
- 'g': Toggle gaze correction
- 'z': Auto-calibrate (works anytime)
- 'c': Calibration mode (WASD for manual adjustment)
- 'v': Vector visualization
- 'l': Iris landmark visualization
- 'm': Face mesh visualization
- 'f': FPS display
- 'q': Quit
Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR implements a v3 gaze estimation algorithm with auto-calibration, head pose compensation, and adds a macOS virtual camera implementation using CoreMediaIO. The changes introduce significant new functionality including 2D iris displacement-based gaze tracking, WASD-adjustable manual offsets, and multiple visualization modes.
Changes:
- New v3 gaze estimation using 2D iris displacement with calibration support and head pose compensation via MediaPipe + solvePnP
- Auto-calibration feature ('z' key) that captures neutral gaze position for relative corrections
- macOS virtual camera extension with Swift-based settings app for system-wide gaze correction
- New visualization modes for vectors, iris landmarks, and face mesh; FPS display and TensorFlow device selection
- Enhanced eye blending with alpha feathering to reduce warping artifacts
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 25 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Updated Python version constraint and added tensorflow-metal dependency |
| poetry.lock | Updated lock file with new dependency versions |
| model_managers/gaze_corrector_v1.py | Implemented v3 gaze estimation, auto-calibration, manual offset API, head pose integration, and improved eye blending |
| displayers/face_predictor.py | Added HeadPose dataclass, iris landmark extraction, solvePnP-based head pose estimation, and updated eye center calculation |
| displayers/dis_single_window.py | Added new keyboard controls (z/v/l/m/f), FPS tracking, enhanced calibration overlay, and visualization features |
| bin_single_window.py | Added TensorFlow device configuration, verbose flag, and graceful interrupt handling |
| VirtualCamera/* | New macOS virtual camera implementation with CoreMediaIO extension and SwiftUI settings app |
| README.md | Documented virtual camera feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # 3D model points (generic face model in cm, centered at nose) | ||
| # These are approximate positions for a generic face | ||
| model_points = np.array([ | ||
| (0.0, 0.0, 0.0), # Nose tip | ||
| (0.0, -3.3, -0.65), # Chin | ||
| (-2.25, 1.7, -1.35), # Left eye outer corner | ||
| (2.25, 1.7, -1.35), # Right eye outer corner | ||
| (-1.5, -1.0, -1.25), # Left mouth corner | ||
| (1.5, -1.0, -1.25), # Right mouth corner | ||
| ], dtype=np.float64) |
There was a problem hiding this comment.
The head pose estimation uses hardcoded 3D model points for a generic face. These may not accurately represent all face shapes, particularly for different ethnicities, ages, or genders. Consider adding a note in the documentation about this limitation or allowing users to adjust these values for better accuracy.
| let providerSource = VirtualCameraProviderSource(clientQueue: nil) | ||
| CMIOExtensionProviderSource.startService(provider: providerSource.provider) | ||
|
|
||
| // Keep the extension running | ||
| CFRunLoopRun() |
There was a problem hiding this comment.
The VirtualCameraProviderSource is instantiated on line 15 with clientQueue: nil, but then stored and never used again. The provider is stored in a local variable that goes out of scope when main.swift finishes executing after CFRunLoopRun(). This means the provider could be deallocated. Store providerSource in a global variable to ensure it stays alive throughout the extension's lifetime.
| var cameraOffsetX: Float = 0.0 | ||
|
|
||
| /// Camera offset Y (vertical, positive = down) in cm | ||
| var cameraOffsetY: Float = -21.0 |
There was a problem hiding this comment.
The cameraOffsetY default value in the Swift code is -21.0 (line 23), but the Python code has been changed to default to 5.0 (line 69 in gaze_corrector_v1.py). This inconsistency will cause different behavior between the Python backend and the Swift settings app. Ensure both use the same default values or document why they differ.
| var cameraOffsetY: Float = -21.0 | |
| var cameraOffsetY: Float = 5.0 |
| "pyobjc (>=11.0,<12.0)", | ||
| "mediapipe (>=0.10.32,<0.11.0)", | ||
| "pyyaml (>=6.0.3,<7.0.0)", | ||
| "tensorflow-metal (>=1.2.0,<2.0.0)", |
There was a problem hiding this comment.
The tensorflow-metal dependency is platform-specific and only works on macOS with Apple Silicon. This will cause installation failures on other platforms (Linux, Windows, Intel Macs). Consider making this an optional dependency or using platform markers to install it only on compatible systems. Example: tensorflow-metal (>=1.2.0,<2.0.0) ; platform_machine == "arm64" and sys_platform == "darwin"
| "tensorflow-metal (>=1.2.0,<2.0.0)", | |
| "tensorflow-metal (>=1.2.0,<2.0.0); platform_machine == \"arm64\" and sys_platform == \"darwin\"", |
| ipd: float = 6.3 # Inter-pupillary distance in cm | ||
| camera_offset: tuple[float, float, float] = (0, -21, -1) # relative to screen center | ||
|
|
||
| camera_offset: tuple[float, float, float] = (0, 5, -1) # relative to screen center (Y positive = above) |
There was a problem hiding this comment.
The default camera_offset Y value has changed from -21 to 5 (positive = above screen center). This is a significant breaking change for existing users who may have calibrated their setup with the old default. Consider documenting this change in a migration guide or adding a version check to migrate old settings.
| # Eye center calculation using anatomical eye corners (canthus points) | ||
| # These are FIXED landmarks that don't move with gaze direction | ||
| # Left eye: inner corner = 362, outer corner = 263 | ||
| # Right eye: inner corner = 133, outer corner = 33 | ||
| LEFT_EYE_CORNERS = (362, 263) # Inner and outer canthus (stable) | ||
| RIGHT_EYE_CORNERS = (133, 33) # Inner and outer canthus (stable) |
There was a problem hiding this comment.
The LEFT_EYE_CORNERS and RIGHT_EYE_CORNERS have been changed from iris points (474, 476) and (471, 469) to anatomical corners (362, 263) and (133, 33). While the comment correctly explains these are stable landmarks, this is a significant algorithmic change that will affect eye center calculation. Verify that this doesn't break existing calibrations and consider documenting this breaking change.
| def configure_tensorflow_device(device: str) -> str: | ||
| """ | ||
| Configure TensorFlow to use specified device. | ||
| Must be called before importing TensorFlow. | ||
|
|
||
| Args: | ||
| device: 'auto', 'cpu', or 'gpu' | ||
|
|
||
| Returns: | ||
| Actual device being used | ||
| """ | ||
| import tensorflow as tf | ||
|
|
||
| if device == "cpu": | ||
| # Hide all GPUs to force CPU | ||
| tf.config.set_visible_devices([], 'GPU') | ||
| print("TensorFlow: Forcing CPU mode") | ||
| return "CPU" | ||
| elif device == "gpu": | ||
| gpus = tf.config.list_physical_devices('GPU') | ||
| if gpus: | ||
| print(f"TensorFlow: Using GPU ({gpus[0].name})") | ||
| return "GPU" | ||
| else: | ||
| print("TensorFlow: No GPU found, falling back to CPU") | ||
| return "CPU" | ||
| else: # auto | ||
| gpus = tf.config.list_physical_devices('GPU') | ||
| if gpus: | ||
| print(f"TensorFlow: Auto-detected GPU ({gpus[0].name})") | ||
| return "GPU" | ||
| else: | ||
| print("TensorFlow: Auto-detected CPU only") | ||
| return "CPU" |
There was a problem hiding this comment.
The TensorFlow device configuration happens in configure_tensorflow_device which is called before importing TF-dependent modules, but TensorFlow is already imported at line 38 inside the function. This means the device configuration happens after TensorFlow is loaded. For the GPU visibility settings to work correctly, they need to be set before TensorFlow is first imported. Consider using environment variables (TF_CPP_MIN_LOG_LEVEL, CUDA_VISIBLE_DEVICES) instead or ensure no TF imports happen before this configuration.
| return nil | ||
| } | ||
|
|
||
| // In a full implementation, this would: | ||
| // 1. Convert pixelBuffer to image data | ||
| // 2. Send to Python backend via named pipe or shared memory | ||
| // 3. Receive processed frame | ||
| // 4. Convert back to CVPixelBuffer | ||
|
|
||
| // For now, return nil to indicate backend processing not available | ||
| return nil |
There was a problem hiding this comment.
The BackendCommunicator's processFrame method always returns nil (line 184), meaning backend processing is not actually implemented. This makes the virtual camera extension non-functional for gaze correction. The extension will only pass through frames without any correction. Consider adding a TODO comment or implementation plan, or document this limitation clearly in the BUILD.md file.
| return nil | |
| } | |
| // In a full implementation, this would: | |
| // 1. Convert pixelBuffer to image data | |
| // 2. Send to Python backend via named pipe or shared memory | |
| // 3. Receive processed frame | |
| // 4. Convert back to CVPixelBuffer | |
| // For now, return nil to indicate backend processing not available | |
| return nil | |
| // Backend is not available; signal caller to use a fallback path. | |
| return nil | |
| } | |
| // TODO: Implement IPC-based gaze correction. | |
| // In a full implementation, this would: | |
| // 1. Convert `pixelBuffer` to a suitable image/data representation. | |
| // 2. Send the data to the Python backend via the configured IPC mechanism | |
| // (named pipes or shared memory) using `framePipePath`. | |
| // 3. Receive the processed frame data from `resultPipePath`. | |
| // 4. Convert the processed data back into a `CVPixelBuffer` and return it. | |
| // | |
| // Until the above is implemented, we return the original frame unchanged | |
| // when the backend is marked as connected, so callers still receive a | |
| // valid buffer instead of `nil`. | |
| return pixelBuffer |
| for i in range(4): | ||
| for pts in [le_iris_points, re_iris_points]: | ||
| if np.isnan(pts[i][0]) or np.isnan(pts[i][1]): | ||
| if self.debug_v2: | ||
| self.logger.log(f"v3: NaN in iris points, falling back to v1") | ||
| return self.estimate_gaze_angle(le_center, re_center, video_size) |
There was a problem hiding this comment.
The same NaN validation issue exists in estimate_gaze_angle_v3. The code should check if le_iris_points or re_iris_points is None before iterating. Add: if le_iris_points is None or re_iris_points is None: before line 873 and fall back to v1.
| except Exception: | ||
| # Head pose handling failed, continue without it | ||
| self.last_head_pose = None | ||
| raw_gaze_h = iris_gaze_h | ||
| raw_gaze_v = iris_gaze_v |
There was a problem hiding this comment.
The head pose extraction uses bare try-except with no specific exception types. This catches all exceptions including KeyboardInterrupt and SystemExit. Change to except Exception: to catch only runtime errors while allowing interrupt signals to propagate.
|
Valid points! I'll address these over the weekend and update the PR. |
Captures code review findings (WangWilly#3, WangWilly#4, WangWilly#6, WangWilly#7, #8, #9, #10) and architectural improvements (A–D) identified but not yet implemented. Includes commit hashes for already-shipped items for traceability. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Summary
New Controls
gzcvlmfqTechnical Details
Gaze Estimation v3
Head Pose Estimation
Test plan
python bin_single_window.py --backend mediapipe🤖 Generated with Claude Code