Quaternion-Aware CoHAtNet

Quaternion-Aware CoHAtNet is a hybrid CNN–Transformer framework for end-to-end camera localization that extends CoHAtNet with a quaternion-aware rotation branch. Instead of treating orientation prediction as a standard real-valued regression problem, this project introduces a structured rotation modeling strategy that explicitly reflects quaternion algebra through Hamilton-consistent interactions.

The main goal is to improve camera orientation estimation while preserving the original strengths of CoHAtNet in extracting both local geometric cues and global contextual information.

Overview

Camera localization aims to estimate the 6-DoF pose of a camera, including:

Translation: the camera position in 3D space
Rotation: the camera orientation

While many deep learning methods regress rotation using standard fully connected layers, the rotation target is naturally defined on the unit quaternion manifold. This creates a mismatch between the structure of the target and the representation used by conventional regression heads.

This repository explores a simple but effective solution: replacing the conventional rotation head with a quaternion-aware projection module that better aligns learned features with the geometry of 3D rotation.

Key Idea

The proposed method keeps the original CoHAtNet backbone unchanged and modifies only the rotation estimation branch.

What changes?

The backbone still extracts hybrid convolutional and attention-based features.
The final feature vector is split into four components corresponding to the real and imaginary parts of a quaternion.
A quaternion-aware transformation stage is applied using Hamilton-consistent interactions.
The final predicted quaternion is normalized to enforce a valid unit-norm rotation representation.

Why is this useful?

It injects rotation-aware structure into the learning process.
It reduces the gap between the learned feature space and quaternion-based pose representation.
It improves rotation estimation without redesigning the entire model.

Main Contributions

A quaternion-aware rotation modeling strategy for camera localization.
A lightweight extension of CoHAtNet that preserves the original backbone.
Explicit modeling of quaternion structure through Hamilton-consistent feature interactions.
A practical framework for studying the effect of quaternion-aware heads in pose regression.

Method Summary

The full architecture follows a two-branch pose regression design:

Shared Backbone
- A hybrid CNN–Transformer feature extractor based on CoHAtNet.
- Combines local feature extraction and global reasoning.
Translation Head
- A standard regression head for estimating 3D translation.
Quaternion-Aware Rotation Head
- Reinterprets the learned feature vector as a quaternion-structured representation.
- Applies structured transformations that preserve relationships between quaternion components.
- Produces a 4D quaternion output.
- Normalizes the output to ensure a valid rotation quaternion.

This design allows the model to remain simple, efficient, and easy to integrate into existing end-to-end localization pipelines.

Expected Use Cases

This repository is intended for:

Research on camera pose regression
Studies on geometry-aware deep learning
Hybrid CNN–Transformer models for visual localization
Experiments on quaternion-aware rotation prediction
Extensions of CoHAtNet and related localization architectures

Dataset

The method is designed for standard camera localization benchmarks such as:

7-Scenes
potentially other RGB or RGB-D localization datasets with pose annotations

Typical dataset requirements

Each sample should provide:

an input image (RGB or RGB-D depending on the setup)
a ground-truth translation vector
a ground-truth quaternion rotation

Please organize your dataset according to the loading logic implemented in the training code in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Quaternion_CoHAtNet.ipynb		Quaternion_CoHAtNet.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quaternion-Aware CoHAtNet

Overview

Key Idea

What changes?

Why is this useful?

Main Contributions

Method Summary

Expected Use Cases

Dataset

Typical dataset requirements

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quaternion-Aware CoHAtNet

Overview

Key Idea

What changes?

Why is this useful?

Main Contributions

Method Summary

Expected Use Cases

Dataset

Typical dataset requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages