A comprehensive Schema-Aware Natural Language to SQL (NL2SQL) system that converts natural language questions into accurate SQL queries across dynamic database schemas. Features both a web interface and production-ready REST API with deployment support for any cloud platform.
- 🧠 Schema-Aware Intelligence: Dynamic schema extraction and understanding
- 🔄 Multi-Database Support: SQLite, PostgreSQL, MySQL with dialect transpilation
- 🌐 Production API: Complete REST API with authentication and monitoring
- 🖥️ Web Interface: Intuitive Streamlit UI for interactive querying
- 🚀 Cloud Ready: Docker, Kubernetes, and multi-cloud deployment support
- 🔒 Security First: API authentication, SQL injection prevention, query validation
- 📊 Analytics: Query history, confidence scoring, and usage statistics
- 🧪 Fully Tested: Comprehensive test suite with CI/CD ready structure
Schema-Aware-NL2SQL/
├── 📄 README.md # Main documentation
├── 📄 requirements.txt # Python dependencies
├── 📄 setup.py # Package setup
├── 📄 config.py # Configuration management
├── 📄 .env.example # Environment template
├── 📄 .gitignore # Git ignore rules
│
├── 🔧 api.py # FastAPI REST API server
├── 🖥️ app.py # Streamlit web interface
├── 🎯 demo.py # Comprehensive demo script
│
├── 📂 src/ # Core source code
│ ├── __init__.py
│ ├── nl2sql_agent.py # Main orchestrator
│ ├── nl2sql_model.py # T5 model wrapper
│ └── schema_retriever.py # Database schema extraction
│
├── 📂 docs/ # Documentation
│ ├── README.md # Detailed documentation
│ ├── API_DOCUMENTATION.md # API reference
│ ├── SETUP_COMPLETE.md # Setup guide
│ ├── ENVIRONMENT_SETUP.md # Environment guide
│ └── GITHUB_SETUP.md # GitHub integration
│
├── 📂 examples/ # Example scripts
│ ├── quickstart.py # Quick start demo
│ └── client_example.py # API client example
│
├── 📂 tests/ # Test suite
│ ├── __init__.py
│ ├── test_api.py # API endpoint tests
│ └── test_nl2sql_agent.py # Core functionality tests
│
├── 📂 scripts/ # Utility scripts
│ ├── deploy.sh # Deployment automation
│ ├── run_tests.sh # Test runner
│ └── setup_new_environment.py # Environment setup
│
├── 📂 deployment/ # Deployment configurations
│ ├── docker/
│ │ ├── Dockerfile # Container definition
│ │ └── docker-compose.yml # Multi-service orchestration
│ ├── kubernetes/
│ │ └── deployment.yaml # K8s deployment config
│ └── cloud/
│ └── aws-ecs-task.json # AWS ECS task definition
│
├── 📂 data/ # Database files
│ └── quickstart_sample.db # Sample SQLite database
│
├── 📂 models/ # Model cache (auto-created)
├── 📂 logs/ # Application logs (auto-created)
└── 📂 nl2sql_env/ # Virtual environment
# Clone repository
git clone https://github.com/Srijan-Ratrey/Schema-Aware-Natural-Language-to-SQL-Agent.git
cd Schema-Aware-Natural-Language-to-SQL-Agent
# Quick setup with deployment script
chmod +x scripts/deploy.sh
./scripts/deploy.sh devstreamlit run app.pypython api.pyAccess API documentation at: http://localhost:8000/docs
import requests
# API configuration
API_BASE = "http://localhost:8000"
API_KEY = "your-api-key-here"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Connect to database
requests.post(f"{API_BASE}/connect",
json={"db_type": "sqlite", "db_path": "data/quickstart_sample.db"},
headers=headers
)
# Query database
response = requests.post(f"{API_BASE}/query",
json={"query": "Show all books with rating above 4.5"},
headers=headers
)
print(response.json())./scripts/deploy.sh dev./scripts/deploy.sh docker./scripts/deploy.sh composekubectl apply -f deployment/kubernetes/deployment.yaml- AWS ECS: Use
deployment/cloud/aws-ecs-task.json - Google Cloud Run: Build with Docker and deploy
- Azure Container Instances: Deploy with Docker image
- Heroku: Deploy with git push
# Run comprehensive test suite
./scripts/run_tests.sh
# Run specific tests
python -m pytest tests/test_api.py -v
python -m pytest tests/test_nl2sql_agent.py -v- ✅ Dynamic schema extraction and understanding
- ✅ Fine-tuned T5 models (Spider dataset trained)
- ✅ Multi-database support (SQLite, PostgreSQL, MySQL)
- ✅ Real-time SQL generation and execution
- ✅ Confidence scoring and query validation
- ✅ Query history and analytics
- ✅ Interactive Streamlit UI
- ✅ Schema visualization
- ✅ Query result visualization
- ✅ Batch query processing
- ✅ Export capabilities
- ✅ RESTful API with OpenAPI documentation
- ✅ Bearer token authentication
- ✅ Rate limiting and security
- ✅ Batch query processing
- ✅ Health monitoring
- ✅ Comprehensive error handling
- ✅ Docker containerization
- ✅ Kubernetes deployment
- ✅ Multi-cloud support
- ✅ Logging and monitoring
- ✅ Auto-scaling ready
- ✅ Security best practices
- 🔐 API key authentication
- 🛡️ SQL injection prevention
- ✅ Query validation and sanitization
- 🔒 Read-only query enforcement
- 📊 Rate limiting and monitoring
- 🔍 Comprehensive logging
- ⚡ Optimized T5 model inference
- 🚀 Async API endpoints
- 💾 Schema caching
- 📊 Query result caching
- 🔄 Connection pooling
- 📈 Horizontal scaling support
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Run tests:
./scripts/run_tests.sh - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open Pull Request
- 📘 Spider Dataset - Training data
- 🤖 Hugging Face Models - Pre-trained models
- ⚙️ SQLGlot - SQL transpilation
- 📚 NL2SQL Papers - Research
This project is licensed under the MIT License - see the LICENSE file for details.
- Spider Dataset Team for high-quality NL2SQL benchmarks
- Hugging Face for transformer models and infrastructure
- FastAPI & Streamlit teams for excellent frameworks
- SQLAlchemy & SQLGlot for robust SQL handling
🌟 Star this repo if you find it useful!
"Making databases conversational, one query at a time." 🚀