A comprehensive PyTorch DataLoader for Indian Sign Language (ISL) video datasets stored in HDF5 format.
The dataset is organized as follows:
h5_output/
├── train/ (50 classes, 626 samples)
├── test/ (49 classes, 178 samples)
└── validation/ (42 classes, 68 samples)
Each split contains subdirectories named after ISL classes (e.g., "1. Dog", "2. Death", etc.), and each class directory contains .h5 files with processed video data.
Each .h5 file contains:
hand_landmarks: (num_frames, 2, 21, 3) - 2 hands, 21 landmarks each, 3D coordinatesworld_landmarks: (num_frames, 2, 21, 3) - same structure in world coordinatesframe_metadata: (num_frames,) - metadata for each framefile_metadata: Additional file-level metadata
pip install -r requirements.txtfrom scripts.isl_dataloader import create_data_loaders
# Create dataloaders for all splits
dataloaders = create_data_loaders(
root_dir='h5_output',
batch_size=16,
num_workers=4,
use_world_landmarks=True,
use_hand_landmarks=True,
normalize=True,
max_frames=150
)
# Use in training loop
for data, labels, lengths, metadata in dataloaders['train']:
# data: (batch_size, max_seq_len, num_features)
# labels: (batch_size,) class indices
# lengths: (batch_size,) actual sequence lengths
# metadata: list of dicts with additional info
passfrom scripts.isl_dataloader import ISLVideoDataset
# Create custom dataset
dataset = ISLVideoDataset(
root_dir='h5_output',
split='train',
use_world_landmarks=True,
use_hand_landmarks=False, # Use only world landmarks
normalize=True,
max_frames=100
)
# Get dataset info
print(f"Number of classes: {dataset.get_num_classes()}")
print(f"Class names: {dataset.get_class_names()}")
# Access single sample
data, label, metadata = dataset[0]
print(f"Data shape: {data.shape}")
print(f"Class: {metadata['class_name']}")- Variable-length sequences: Handles videos with different frame counts
- Flexible feature selection: Choose between hand landmarks, world landmarks, or both
- Automatic padding: Sequences are padded to batch maximum length
- Normalization: Optional per-sequence normalization of landmark coordinates
- Train: 626 samples, 50 classes
- Test: 178 samples, 49 classes
- Validation: 68 samples, 42 classes
- Feature dimensions:
- Hand landmarks only: 126 features (2 hands × 21 landmarks × 3 coords)
- World landmarks only: 126 features
- Both: 252 features
- Lazy loading: H5 files are loaded only when needed
- Configurable workers: Multi-process data loading support
- GPU memory optimization: Automatic pin_memory when CUDA available
root_dir: Path to h5_output directorybatch_size: Batch size for DataLoader (default: 32)num_workers: Number of worker processes (default: 4)use_world_landmarks: Include world landmarks (default: True)use_hand_landmarks: Include hand landmarks (default: True)normalize: Normalize landmark coordinates (default: True)max_frames: Maximum frames per sequence (default: None)shuffle_train: Shuffle training data (default: True)
root_dir: Path to h5_output directorysplit: Dataset split - 'train', 'test', or 'validation'use_world_landmarks: Include world landmarks (default: True)use_hand_landmarks: Include hand landmarks (default: True)normalize: Normalize coordinates (default: True)max_frames: Maximum frames per sequence (default: None)
See train_example.py for a complete training example with:
- Simple LSTM classifier
- Training and validation loops
- Progress tracking with tqdm
- Model saving and loading
python train_example.py- Batch:
(batch_size, max_seq_len, num_features) - Features:
- Hand landmarks: 126 (2 hands × 21 landmarks × 3 coordinates)
- World landmarks: 126 (2 hands × 21 landmarks × 3 coordinates)
- Combined: 252 features
- Type: Integer class indices (0 to num_classes-1)
- Mapping: Available via
dataset.class_to_idxanddataset.idx_to_class
Each sample includes metadata:
{
'file_path': '/path/to/file.h5',
'class_name': '1. Dog',
'num_frames': 76,
'frame_metadata': array([...]),
'file_metadata': {...}
}- Batch Size: Start with smaller batches (8-16) due to variable sequence lengths
- Workers: Use 2-4 workers to avoid I/O bottlenecks
- Max Frames: Set reasonable max_frames (100-200) to control memory usage
- Normalization: Enable normalization for better training stability
- GPU: Use
pin_memory=Truefor faster GPU transfers (automatically enabled)
The dataloader includes comprehensive error handling:
- Missing directory validation
- Empty class directories
- Corrupted H5 files
- Invalid parameter combinations
This dataloader is provided as-is for research and educational purposes.