A robust recommendation system combining text, images, and user behavior features with deep learning and rule-based filtering.
- โจ Features
- ๐๏ธ Architecture
- ๐ Project Structure
- ๐ Getting Started
- ๐ป Usage
- โ๏ธ Configuration
- ๐ง Model Details
- ๐ Evaluation
- ๐ฎ Future Enhancements
- ๐ค Contributing
- ๐ License
- Multi-Modal Fusion: Combines text (BERT), images (ResNet), and user features
- Rule-Based Filtering: Customizable business logic adjustments
- Scalable Training: Distributed training with PyTorch DDP
- Production API: Flask/Gunicorn REST API with Docker support
- Robust Pipeline: Comprehensive logging & error handling
graph TD
subgraph Client["๐ฑ Client"]
A["โจ User App"] -->|HTTP Req| B["POST /recommend"]
end
subgraph API["๐ API Service"]
B --> C{"๐ ๏ธ Validate & Preprocess"}
C --> D["๐ Text (BERT)"]
C --> E["๐ผ๏ธ Image (ResNet)"]
C --> F["๐ค User Features"]
D --> H["๐ง MM Recommender"]
E --> H
F --> H
H --> I["โ๏ธ Rule Filter"]
I --> K["๐ Rating Prediction"]
end
subgraph Training["๐๏ธ Training"]
L["๐ Data"] --> M["๐ฅ Loader"]
M --> N["๐งน Preprocess"]
N --> O["๐ค DataLoader"]
O --> P["๐ง Model Training"]
P --> Q["๐พ Checkpoint"]
end
multi_modal_recommendation/
โโโ data/ # Data processing modules
โโโ models/ # Model architectures
โโโ training/ # Training scripts
โโโ api/ # Flask API implementation
โโโ main.py # Main entry point
โโโ requirements.txt # Dependencies
โโโ Dockerfile # Container configuration
- Python 3.9
- CUDA-enabled GPU (recommended)
- Docker (optional)
git clone https://github.com/your-username/multi-modal-recommendation.git
cd multi-modal-recommendation
python3 -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
pip install -r requirements.txt# Single-GPU training
python main.py --epochs 10 --batch_size 32 --data_path data/movies.csv
# Distributed training (2 GPUs)
python -m torch.distributed.launch --nproc_per_node=2 main.py --distributed# Local deployment
gunicorn --bind 0.0.0.0:5000 api.recommendation_api:app
# Docker deployment
docker build -t multi-modal-recommender .
docker run -p 5000:5000 multi-modal-recommendercurl -X POST -F "text=Space exploration movie" \
-F "user_features=0.5,1.2,0.8,2.1,1.5" \
-F "image=@poster.jpg" \
http://localhost:5000/recommendcurl -X POST -H "Content-Type: application/json" \
-d '{"text": "Romantic comedy", "user_features": "0.5,1.8,2.2", "image_path": "data/movie.jpg"}' \
http://localhost:5000/recommend{
"predicted_rating": 4.2
}- Text Encoder: BERT-base (768d โ 128d FC)
- Image Encoder: ResNet-18 (512d โ 128d FC)
- User Encoder: FC Network (10d โ 128d)
- Fusion: Concatenation + FC Network
- Rules: -10% rating for new users
Metric: Root Mean Squared Error (RMSE)
Epoch 5/10 | Train Loss: 0.32 | Val Loss: 0.41
Best model saved with RMSE: 0.38
- Real-time learning pipeline
- Advanced feature engineering
- Explainable AI components
- Multi-modal attention mechanisms
Distributed under the MIT License. See LICENSE for more information.
