This is the OFFICIAL source code of the paper titled "MMF-Gait: A Multi-Model Fusion-Enhanced gait recognition framework integrating convolutional and attention networks"
Gait recognition is an emerging biometric technology that identifies individuals based on their unique walking patterns. Unlike other biometric systems such as fingerprint, iris, or facial recognition, gait recognition has the distinct advantage of being unobtrusive and capable of functioning at a distance. This makes it particularly valuable in surveillance and security applications, where identifying individuals without their active cooperation is often necessary.
The fundamental premise of gait recognition lies in the fact that each person has a distinctive way of walking, influenced by a combination of anatomical and behavioral characteristics. These include factors like leg length, body shape, posture, and the rhythm of movement. By capturing and analyzing these gait patterns, a gait recognition system can accurately distinguish between different individuals.
Traditional approaches to gait recognition have relied heavily on handcrafted features and classical machine learning techniques. However, recent advancements in deep learning have revolutionized the field, enabling the automatic extraction of complex and discriminative features from raw gait data. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown particular promise in this regard, offering robust performance in various gait recognition tasks.
- Project Overview
- Project Requirements
- Installation
- Dataset
- Architecture
- Training
- Testing
- Results
- Use Cases
In this project, we leverage the strengths of three state-of-the-art (SOTA) classification models: VGG16, ResNet50, Vision Transformer (ViT), GoogLeNet (Inception v1) and EfficientNet-B0. These models have demonstrated superior capability in capturing intricate patterns in visual data. To further enhance recognition accuracy, we fine-tuned these models and employed a combination of fusion techniques:
This involves combining the feature representations extracted from each model to create a richer and more comprehensive feature set. In FLF, we extract features using different models, then aggregate these features using pointwise addition to classify them. [See the Main architecture for better unserstanding].
Here, the outputs of individual models are combined, typically using methods like majority voting or weighted averaging, to make the final decision. In DLF, we first classify each method and then use a majority voting mechanism to determine the final class. [See the Main architecture for better unserstanding].
A sophisticated approach that integrates both feature-level and decision-level fusion, aiming to harness the benefits of both techniques for improved performance. In Hybrid Fusion, we utilize ResNet50 and Vision Transformer (ViT) for FLF, along with GoogLeNet (Inception v1) and EfficientNet-B0 also for FLF. We then combine the outputs of these two FLFs with VGG16 to perform DLF. There are various combinations, but this specific one yields the highest classification result. [See the Main architecture for better unserstanding].
- Python 3.8+
- Pytorch 1.10+
- NumPy
- Matplotlib
- Pillow
- Opencv
- Scikit-learn
- Torchvision
- Clone the repository:
git clone https://github.com/kamrulhasanrony/MMH-Gait.git
cd MMH-Gait- Create a virtual environment and activate it:
conda create -n environment_name
conda activate environment_name- Install the required packages:
pip install -r requirements.txt
In this project, we used the CASIA-B gait dataset, which consists of 124 subjects. Each subject consists of 10 walking sequences, namely, six of them are normal walking (NM), two of them are carrying bags (BG), and two of them are wearing coats (CL). And each walking sequence contains 11 viewing angles and gait data. Here, we used gait energy image (GEI) for each sequence. A Gait Energy Image (GEI) is a template image representation that captures the motion patterns of a person walking. It's created by averaging multiple silhouette images of a person during a complete walking cycle. An example of GEI is illustrated in the following figure. In this project, 70% of the data is used for training, and the remaining 30% is used for testing. Actually, NM-01, NM-02, NM-03, NM-04, NM-05, BG-02, CL-02 sequences are used for training and BG-01, CL-01, NM-06 are used for testing. This ensures that all variations are used for training and testing purposes, which ensures all variations rather than random selection. Illustrations of these variations are given in the following figures:
The overall architecture of the proposed DLF, FLF and Hybrid model are given below:
To train the model, run the train.py script for each model where just change the argument --model to a/b/c/d/e for using specific model among the five:
python train.pyWe trained (fine-tuned) each model for 100 epoch with same configuration for each model for fair comparison. For evaluating the results of decision level fusion (DLF), feature level fusion (FLF) and hybrid fusion, run the following command
decision_level_fusion.pyfeature_level_fusion.pyhybrid_fusion.pyThe overall results shown in the following table (Here, bold and italic number denotes the best accuracy and second best accuracy respectively):
| Model | Accuracy | Precision (Macro) | Recall (Macro) | F1-Score (Macro) | Precision (Weighted) | Recall (Weighted) | F1-Score (Weighted) |
|---|---|---|---|---|---|---|---|
| VGG16 | 97.70% | 97.84% | 97.70% | 97.65% | 97.84% | 97.70% | 97.65% |
| ResNet50 | 98.16% | 98.24% | 98.16% | 98.13% | 98.24% | 98.16% | 98.13% |
| ViT | 97.60% | 97.69% | 97.59% | 97.57% | 97.69% | 97.60% | 97.57% |
| Inception_v1 | 96.96% | 97.17% | 96.96% | 96.92% | 97.16% | 96.96% | 96.92% |
| Efficientnet_b0 | 97.33% | 97.48% | 97.33% | 97.29% | 97.49% | 97.33% | 97.29% |
| DLF | 98.16% | 98.24% | 98.16% | 98.13% | 98.24% | 98.16% | 98.13% |
| FLF | 99.09% | 99.16% | 99.09% | 99.08% | 99.16% | 99.09% | 99.08% |
| Hybrid | 99.05% | 99.11% | 99.04% | 99.02% | 99.11% | 99.05% | 99.03% |
- Biometric Authentication
- Multimodal Medical Diagnosis
- Surveillance Systems
- Classification
@article{hasan2025mmf,
title={MMF-Gait: A Multi-Model Fusion-Enhanced gait recognition framework integrating convolutional and attention networks},
author={Hasan, Kamrul and Tuhin, Khandokar Alisha and Bapary, Md Rasul Islam and Doula, Md Shafi Ud and Alam, Md Ashraful and Ahad, Md Atiqur Rahman and Uddin, Md Zasim},
journal={Symmetry},
volume={17},
number={7},
pages={1155},
year={2025},
publisher={MDPI}
}







