Skip to content

Husseinhhameed/Quaternion-Aware-CoHAtNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Quaternion-Aware CoHAtNet

Quaternion-Aware CoHAtNet is a hybrid CNN–Transformer framework for end-to-end camera localization that extends CoHAtNet with a quaternion-aware rotation branch. Instead of treating orientation prediction as a standard real-valued regression problem, this project introduces a structured rotation modeling strategy that explicitly reflects quaternion algebra through Hamilton-consistent interactions.

The main goal is to improve camera orientation estimation while preserving the original strengths of CoHAtNet in extracting both local geometric cues and global contextual information.


Overview

Camera localization aims to estimate the 6-DoF pose of a camera, including:

  • Translation: the camera position in 3D space
  • Rotation: the camera orientation

While many deep learning methods regress rotation using standard fully connected layers, the rotation target is naturally defined on the unit quaternion manifold. This creates a mismatch between the structure of the target and the representation used by conventional regression heads.

This repository explores a simple but effective solution: replacing the conventional rotation head with a quaternion-aware projection module that better aligns learned features with the geometry of 3D rotation.


Key Idea

The proposed method keeps the original CoHAtNet backbone unchanged and modifies only the rotation estimation branch.

What changes?

  • The backbone still extracts hybrid convolutional and attention-based features.
  • The final feature vector is split into four components corresponding to the real and imaginary parts of a quaternion.
  • A quaternion-aware transformation stage is applied using Hamilton-consistent interactions.
  • The final predicted quaternion is normalized to enforce a valid unit-norm rotation representation.

Why is this useful?

  • It injects rotation-aware structure into the learning process.
  • It reduces the gap between the learned feature space and quaternion-based pose representation.
  • It improves rotation estimation without redesigning the entire model.

Main Contributions

  • A quaternion-aware rotation modeling strategy for camera localization.
  • A lightweight extension of CoHAtNet that preserves the original backbone.
  • Explicit modeling of quaternion structure through Hamilton-consistent feature interactions.
  • A practical framework for studying the effect of quaternion-aware heads in pose regression.

Method Summary

The full architecture follows a two-branch pose regression design:

  1. Shared Backbone

    • A hybrid CNN–Transformer feature extractor based on CoHAtNet.
    • Combines local feature extraction and global reasoning.
  2. Translation Head

    • A standard regression head for estimating 3D translation.
  3. Quaternion-Aware Rotation Head

    • Reinterprets the learned feature vector as a quaternion-structured representation.
    • Applies structured transformations that preserve relationships between quaternion components.
    • Produces a 4D quaternion output.
    • Normalizes the output to ensure a valid rotation quaternion.

This design allows the model to remain simple, efficient, and easy to integrate into existing end-to-end localization pipelines.


Expected Use Cases

This repository is intended for:

  • Research on camera pose regression
  • Studies on geometry-aware deep learning
  • Hybrid CNN–Transformer models for visual localization
  • Experiments on quaternion-aware rotation prediction
  • Extensions of CoHAtNet and related localization architectures

Dataset

The method is designed for standard camera localization benchmarks such as:

  • 7-Scenes
  • potentially other RGB or RGB-D localization datasets with pose annotations

Typical dataset requirements

Each sample should provide:

  • an input image (RGB or RGB-D depending on the setup)
  • a ground-truth translation vector
  • a ground-truth quaternion rotation

Please organize your dataset according to the loading logic implemented in the training code in this repository.


About

Quaternion-Aware CoHAtNet is a hybrid CNN–Transformer framework for end-to-end camera localization

Topics

Resources

Stars

Watchers

Forks

Contributors