Skip to content

feat: Add v3 gaze estimation with auto-calibration and head pose compensation#6

Open
circlenaut wants to merge 7 commits intoWangWilly:masterfrom
circlenaut:combined-prs
Open

feat: Add v3 gaze estimation with auto-calibration and head pose compensation#6
circlenaut wants to merge 7 commits intoWangWilly:masterfrom
circlenaut:combined-prs

Conversation

@circlenaut
Copy link
Copy Markdown

Summary

  • Implements new v3 gaze estimation algorithm using 2D iris displacement with calibration support
  • Adds head pose compensation using MediaPipe landmarks + OpenCV solvePnP
  • Auto-calibration ('z' key) that works anytime - just look at camera and press z
  • New visualization modes: face mesh ('m'), iris landmarks ('l'), vectors ('v')
  • TensorFlow device selection (--device cpu/gpu/auto)
  • Auto-detect camera resolution at startup
  • Verbose debug logging with --verbose flag

New Controls

Key Function
g Toggle gaze correction on/off
z Auto-calibrate (works anytime)
c Calibration mode (WASD for manual adjustment)
v Vector visualization
l Iris landmark visualization
m Face mesh visualization
f FPS display
q Quit

Technical Details

Gaze Estimation v3

  • Uses 2D iris displacement (reliable with single camera)
  • Combines iris gaze (eye-in-head) with head pose for total gaze direction
  • Falls back gracefully to v1 geometric method when iris points unavailable
  • Calibration stores neutral gaze position and offsets all future corrections

Head Pose Estimation

  • Uses 6 facial landmarks with solvePnP
  • Provides pitch, yaw, roll compensation
  • Displayed in calibration overlay when using mediapipe backend

Test plan

  • Run with mediapipe backend: python bin_single_window.py --backend mediapipe
  • Test auto-calibration: look at camera, press 'z', verify "CALIBRATED" shows
  • Verify 'z' works outside calibration mode
  • Toggle all visualizations: 'v', 'l', 'm'
  • Test --device cpu and --device gpu flags
  • Verify fallback to v1 with dlib backend

🤖 Generated with Claude Code

WangWilly and others added 7 commits January 27, 2026 13:35
# Conflicts:
#	model_managers/gaze_corrector_v1.py
…ensation

This PR introduces significant improvements to gaze tracking and correction:

## New Features

### Gaze Estimation v3 Algorithm
- Implements 2D iris displacement-based gaze estimation with calibration support
- Adds head pose compensation using MediaPipe landmarks + solvePnP
- Combines iris gaze (eye-in-head) with head pose for accurate gaze direction
- Falls back gracefully to v1 geometric method when iris points unavailable

### Auto-Calibration ('z' key)
- Press 'z' while looking at camera to set neutral gaze position
- Works anytime (independent of calibration mode)
- Stores raw gaze angles as baseline for future corrections
- Calibration persists across sessions via database

### Visualization Enhancements
- Face mesh visualization ('m' key): Shows face oval, eyebrows, eyes, nose, lips
- Iris landmark visualization ('l' key): Shows 4 iris points per eye with labels
- Vector visualization ('v' key): Shows gaze direction vectors
- All visualization modes can be toggled independently

### Head Pose Display
- Shows pitch, yaw, roll in calibration overlay when using mediapipe backend
- HeadPose dataclass for structured head pose data

### TensorFlow Device Selection
- New --device flag (auto/cpu/gpu) to control TensorFlow device
- Deferred imports to ensure device is configured before TF loads

### Other Improvements
- Auto-detect camera resolution at startup
- Verbose debug logging with --verbose flag
- Updated terminal instructions to show all available controls

## Controls Summary
- 'g': Toggle gaze correction
- 'z': Auto-calibrate (works anytime)
- 'c': Calibration mode (WASD for manual adjustment)
- 'v': Vector visualization
- 'l': Iris landmark visualization
- 'm': Face mesh visualization
- 'f': FPS display
- 'q': Quit

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a v3 gaze estimation algorithm with auto-calibration, head pose compensation, and adds a macOS virtual camera implementation using CoreMediaIO. The changes introduce significant new functionality including 2D iris displacement-based gaze tracking, WASD-adjustable manual offsets, and multiple visualization modes.

Changes:

  • New v3 gaze estimation using 2D iris displacement with calibration support and head pose compensation via MediaPipe + solvePnP
  • Auto-calibration feature ('z' key) that captures neutral gaze position for relative corrections
  • macOS virtual camera extension with Swift-based settings app for system-wide gaze correction
  • New visualization modes for vectors, iris landmarks, and face mesh; FPS display and TensorFlow device selection
  • Enhanced eye blending with alpha feathering to reduce warping artifacts

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 25 comments.

Show a summary per file
File Description
pyproject.toml Updated Python version constraint and added tensorflow-metal dependency
poetry.lock Updated lock file with new dependency versions
model_managers/gaze_corrector_v1.py Implemented v3 gaze estimation, auto-calibration, manual offset API, head pose integration, and improved eye blending
displayers/face_predictor.py Added HeadPose dataclass, iris landmark extraction, solvePnP-based head pose estimation, and updated eye center calculation
displayers/dis_single_window.py Added new keyboard controls (z/v/l/m/f), FPS tracking, enhanced calibration overlay, and visualization features
bin_single_window.py Added TensorFlow device configuration, verbose flag, and graceful interrupt handling
VirtualCamera/* New macOS virtual camera implementation with CoreMediaIO extension and SwiftUI settings app
README.md Documented virtual camera feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +457 to +466
# 3D model points (generic face model in cm, centered at nose)
# These are approximate positions for a generic face
model_points = np.array([
(0.0, 0.0, 0.0), # Nose tip
(0.0, -3.3, -0.65), # Chin
(-2.25, 1.7, -1.35), # Left eye outer corner
(2.25, 1.7, -1.35), # Right eye outer corner
(-1.5, -1.0, -1.25), # Left mouth corner
(1.5, -1.0, -1.25), # Right mouth corner
], dtype=np.float64)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The head pose estimation uses hardcoded 3D model points for a generic face. These may not accurately represent all face shapes, particularly for different ethnicities, ages, or genders. Consider adding a note in the documentation about this limitation or allowing users to adjust these values for better accuracy.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +19
let providerSource = VirtualCameraProviderSource(clientQueue: nil)
CMIOExtensionProviderSource.startService(provider: providerSource.provider)

// Keep the extension running
CFRunLoopRun()
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VirtualCameraProviderSource is instantiated on line 15 with clientQueue: nil, but then stored and never used again. The provider is stored in a local variable that goes out of scope when main.swift finishes executing after CFRunLoopRun(). This means the provider could be deallocated. Store providerSource in a global variable to ensure it stays alive throughout the extension's lifetime.

Copilot uses AI. Check for mistakes.
var cameraOffsetX: Float = 0.0

/// Camera offset Y (vertical, positive = down) in cm
var cameraOffsetY: Float = -21.0
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cameraOffsetY default value in the Swift code is -21.0 (line 23), but the Python code has been changed to default to 5.0 (line 69 in gaze_corrector_v1.py). This inconsistency will cause different behavior between the Python backend and the Swift settings app. Ensure both use the same default values or document why they differ.

Suggested change
var cameraOffsetY: Float = -21.0
var cameraOffsetY: Float = 5.0

Copilot uses AI. Check for mistakes.
"pyobjc (>=11.0,<12.0)",
"mediapipe (>=0.10.32,<0.11.0)",
"pyyaml (>=6.0.3,<7.0.0)",
"tensorflow-metal (>=1.2.0,<2.0.0)",
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tensorflow-metal dependency is platform-specific and only works on macOS with Apple Silicon. This will cause installation failures on other platforms (Linux, Windows, Intel Macs). Consider making this an optional dependency or using platform markers to install it only on compatible systems. Example: tensorflow-metal (>=1.2.0,<2.0.0) ; platform_machine == "arm64" and sys_platform == "darwin"

Suggested change
"tensorflow-metal (>=1.2.0,<2.0.0)",
"tensorflow-metal (>=1.2.0,<2.0.0); platform_machine == \"arm64\" and sys_platform == \"darwin\"",

Copilot uses AI. Check for mistakes.
ipd: float = 6.3 # Inter-pupillary distance in cm
camera_offset: tuple[float, float, float] = (0, -21, -1) # relative to screen center

camera_offset: tuple[float, float, float] = (0, 5, -1) # relative to screen center (Y positive = above)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default camera_offset Y value has changed from -21 to 5 (positive = above screen center). This is a significant breaking change for existing users who may have calibrated their setup with the old default. Consider documenting this change in a migration guide or adding a version check to migrate old settings.

Copilot uses AI. Check for mistakes.
Comment on lines +324 to +329
# Eye center calculation using anatomical eye corners (canthus points)
# These are FIXED landmarks that don't move with gaze direction
# Left eye: inner corner = 362, outer corner = 263
# Right eye: inner corner = 133, outer corner = 33
LEFT_EYE_CORNERS = (362, 263) # Inner and outer canthus (stable)
RIGHT_EYE_CORNERS = (133, 33) # Inner and outer canthus (stable)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LEFT_EYE_CORNERS and RIGHT_EYE_CORNERS have been changed from iris points (474, 476) and (471, 469) to anatomical corners (362, 263) and (133, 33). While the comment correctly explains these are stable landmarks, this is a significant algorithmic change that will affect eye center calculation. Verify that this doesn't break existing calibrations and consider documenting this breaking change.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +60
def configure_tensorflow_device(device: str) -> str:
"""
Configure TensorFlow to use specified device.
Must be called before importing TensorFlow.

Args:
device: 'auto', 'cpu', or 'gpu'

Returns:
Actual device being used
"""
import tensorflow as tf

if device == "cpu":
# Hide all GPUs to force CPU
tf.config.set_visible_devices([], 'GPU')
print("TensorFlow: Forcing CPU mode")
return "CPU"
elif device == "gpu":
gpus = tf.config.list_physical_devices('GPU')
if gpus:
print(f"TensorFlow: Using GPU ({gpus[0].name})")
return "GPU"
else:
print("TensorFlow: No GPU found, falling back to CPU")
return "CPU"
else: # auto
gpus = tf.config.list_physical_devices('GPU')
if gpus:
print(f"TensorFlow: Auto-detected GPU ({gpus[0].name})")
return "GPU"
else:
print("TensorFlow: Auto-detected CPU only")
return "CPU"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TensorFlow device configuration happens in configure_tensorflow_device which is called before importing TF-dependent modules, but TensorFlow is already imported at line 38 inside the function. This means the device configuration happens after TensorFlow is loaded. For the GPU visibility settings to work correctly, they need to be set before TensorFlow is first imported. Consider using environment variables (TF_CPP_MIN_LOG_LEVEL, CUDA_VISIBLE_DEVICES) instead or ensure no TF imports happen before this configuration.

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +184
return nil
}

// In a full implementation, this would:
// 1. Convert pixelBuffer to image data
// 2. Send to Python backend via named pipe or shared memory
// 3. Receive processed frame
// 4. Convert back to CVPixelBuffer

// For now, return nil to indicate backend processing not available
return nil
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BackendCommunicator's processFrame method always returns nil (line 184), meaning backend processing is not actually implemented. This makes the virtual camera extension non-functional for gaze correction. The extension will only pass through frames without any correction. Consider adding a TODO comment or implementation plan, or document this limitation clearly in the BUILD.md file.

Suggested change
return nil
}
// In a full implementation, this would:
// 1. Convert pixelBuffer to image data
// 2. Send to Python backend via named pipe or shared memory
// 3. Receive processed frame
// 4. Convert back to CVPixelBuffer
// For now, return nil to indicate backend processing not available
return nil
// Backend is not available; signal caller to use a fallback path.
return nil
}
// TODO: Implement IPC-based gaze correction.
// In a full implementation, this would:
// 1. Convert `pixelBuffer` to a suitable image/data representation.
// 2. Send the data to the Python backend via the configured IPC mechanism
// (named pipes or shared memory) using `framePipePath`.
// 3. Receive the processed frame data from `resultPipePath`.
// 4. Convert the processed data back into a `CVPixelBuffer` and return it.
//
// Until the above is implemented, we return the original frame unchanged
// when the backend is marked as connected, so callers still receive a
// valid buffer instead of `nil`.
return pixelBuffer

Copilot uses AI. Check for mistakes.
Comment on lines +873 to +878
for i in range(4):
for pts in [le_iris_points, re_iris_points]:
if np.isnan(pts[i][0]) or np.isnan(pts[i][1]):
if self.debug_v2:
self.logger.log(f"v3: NaN in iris points, falling back to v1")
return self.estimate_gaze_angle(le_center, re_center, video_size)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same NaN validation issue exists in estimate_gaze_angle_v3. The code should check if le_iris_points or re_iris_points is None before iterating. Add: if le_iris_points is None or re_iris_points is None: before line 873 and fall back to v1.

Copilot uses AI. Check for mistakes.
Comment on lines +931 to +935
except Exception:
# Head pose handling failed, continue without it
self.last_head_pose = None
raw_gaze_h = iris_gaze_h
raw_gaze_v = iris_gaze_v
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The head pose extraction uses bare try-except with no specific exception types. This catches all exceptions including KeyboardInterrupt and SystemExit. Change to except Exception: to catch only runtime errors while allowing interrupt signals to propagate.

Copilot uses AI. Check for mistakes.
@circlenaut
Copy link
Copy Markdown
Author

Valid points! I'll address these over the weekend and update the PR.

peterdenham pushed a commit to peterdenham/gaze-correction-cam that referenced this pull request Feb 25, 2026
Captures code review findings (WangWilly#3, WangWilly#4, WangWilly#6, WangWilly#7, #8, #9, #10) and
architectural improvements (A–D) identified but not yet implemented.
Includes commit hashes for already-shipped items for traceability.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants