Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/deploy-lambda-api.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ on:
paths:
- 'backend/api/**'
- 'backend/requirements.txt'
- 'backend/requirements-dev.txt'
Comment thread
cristofima marked this conversation as resolved.
Outdated
- '!backend/api/**/*.md'
- '!backend/**/*.txt'
- '.github/workflows/deploy-lambda-api.yml'
workflow_dispatch:
inputs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/deploy-training-container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ on:
branches: [dev, main]
paths:
- 'backend/training/**'
- 'backend/requirements-dev.txt'
Comment thread
cristofima marked this conversation as resolved.
Outdated
- '!backend/training/**/*.md'
- '!backend/training/**/*.txt'
- '.github/workflows/deploy-training-container.yml'
workflow_dispatch:
inputs:
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ build/
coverage/
htmlcov/
test-results/
*-coverage.xml
*coverage*.xml
*-tests.xml

# AWS SAM
Expand Down
13 changes: 11 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Missing values warning with affected columns list
- Selected column details with unique ratio visualization

### Dependencies
- **Dependency Audit & Version Updates** - Production-stable versions with flexible ranges
- FastAPI upgraded from 0.109.0 to >=0.115.0 (fixes ReDoc CDN issue with `redoc@next`)
- scikit-learn pinned to <1.6.0 (skl2onnx compatibility, avoids breaking API changes)
- LightGBM 4.6.0 with improved memory efficiency and faster training
- Pydantic 2.x with better validation performance and error messages
- ONNX Runtime 1.19+ for training, 1.23+ for API (latest optimizations)
Comment thread
cristofima marked this conversation as resolved.
Outdated
- All 263 tests passing with updated dependencies

### Fixed
- **Problem Type Detection** - Regression datasets were incorrectly classified as classification
- Fixed heuristic logic: now requires BOTH integer-like values AND low cardinality
Expand All @@ -86,8 +95,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Testing
- **Comprehensive Test Suite** - Unit and integration tests for backend (v1.1.0)
- 197 total tests (104 API + 93 Training)
- API coverage: 69%, Training coverage: 85%+
- 263 total tests (104 API + 159 Training)
- API coverage: 69%, Training coverage: 53%+
- Tests run automatically in CI/CD before deployment
- Coverage reports published to GitHub Actions

Expand Down
2 changes: 1 addition & 1 deletion backend/Dockerfile.api
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Backend API Dockerfile for local development
# Backend API Dockerfile
FROM python:3.11-slim

WORKDIR /app
Expand Down
36 changes: 24 additions & 12 deletions backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,22 @@ backend/
│ ├── services/ # AWS service integrations
│ └── utils/ # Helper functions
├── training/ # Training container code
│ ├── train.py # Main training script
│ ├── preprocessor.py # Data preprocessing
│ ├── model_trainer.py # FLAML AutoML training
│ ├── eda.py # EDA report generation
│ ├── __init__.py # Package root
│ ├── main.py # Entry point (AWS Batch)
│ ├── Dockerfile # Training container image
│ └── requirements.txt # Training dependencies
│ ├── requirements.txt # Training dependencies
│ ├── core/ # Core ML components
│ │ ├── __init__.py
│ │ ├── preprocessor.py # Data preprocessing
│ │ ├── trainer.py # FLAML AutoML training
│ │ └── exporter.py # ONNX model export
│ ├── reports/ # Report generation
│ │ ├── __init__.py
│ │ ├── eda.py # EDA report generation
│ │ └── training.py # Training results report
│ └── utils/ # Shared utilities
│ ├── __init__.py
│ └── detection.py # Problem type detection
├── Dockerfile.api # API container image
└── requirements.txt # API dependencies
```
Expand Down Expand Up @@ -265,12 +275,14 @@ backend/tests/
│ ├── test_dynamo_service.py # DynamoDB service tests
│ ├── test_s3_service.py # S3 service tests
│ └── test_services_integration.py # moto-based integration tests (21 tests)
└── training/ # Training tests (93 tests, 85%+ coverage)
└── training/ # Training tests (159 tests, 53% coverage)
├── conftest.py # Shared fixtures
├── unit/ # Pure unit tests
│ ├── test_preprocessor.py
│ ├── test_utils.py
│ └── test_model_trainer.py
│ ├── test_column_detection.py
│ ├── test_detect_problem_type.py
│ ├── test_eda.py
│ └── test_training_report.py
└── integration/ # Training integration tests
```

Expand Down Expand Up @@ -355,11 +367,11 @@ The API automatically detects whether a target column is for **Classification**

### Shared Utility Module

The training module uses a centralized `utils.py` for shared functions:
The training module uses a centralized `utils/detection.py` for shared functions:

```python
# backend/training/utils.py
from .utils import (
# backend/training/utils/detection.py
from training.utils.detection import (
detect_problem_type, # Classification vs Regression
is_id_column, # Detect identifier columns
is_constant_column, # Detect constant features
Expand All @@ -368,6 +380,6 @@ from .utils import (
)
```

This follows the DRY principle - logic is defined once and reused across `preprocessor.py` and `eda.py`.
This follows the DRY principle - logic is defined once and reused across `core/preprocessor.py` and `reports/eda.py`.

This detection is performed both in the API (for UI display) and in the training container (for model training).
4 changes: 2 additions & 2 deletions backend/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
app = FastAPI(
title="AWS AutoML Lite API",
description="Lightweight AutoML platform on AWS",
version="1.0.0",
version="1.1.0",
docs_url="/docs",
redoc_url="/redoc"
)
Expand All @@ -39,7 +39,7 @@ async def root() -> Dict[str, str]:
"""Health check endpoint"""
return {
"message": "AWS AutoML Lite API",
"version": "1.0.0",
"version": "1.1.0",
"status": "healthy"
}

Expand Down
16 changes: 8 additions & 8 deletions backend/requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@
-r requirements.txt

# Testing framework
pytest==8.3.4
pytest-cov==6.0.0
pytest-xdist==3.5.0
pytest-asyncio==0.24.0
pytest>=8.3.0,<9.0.0 # Test framework
pytest-cov>=6.0.0,<7.0.0 # Coverage plugin
pytest-xdist>=3.5.0,<4.0.0 # Parallel execution
pytest-asyncio>=0.24.0,<1.0.0 # Async test support

# FastAPI testing
httpx==0.27.2
httpx>=0.27.0,<1.0.0 # HTTP client for testing

# AWS mocking
moto[s3,dynamodb,batch]==5.0.26
moto[s3,dynamodb,batch]>=5.0.0,<6.0.0 # AWS service mocks

# Code quality
ruff==0.8.4
mypy==1.14.0
ruff>=0.8.0,<1.0.0 # Linter and formatter
mypy>=1.14.0,<2.0.0 # Type checking
33 changes: 22 additions & 11 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
fastapi==0.109.0
uvicorn==0.27.0
mangum==0.17.0
boto3==1.34.0
pydantic==2.5.0
pydantic-settings==2.1.0
python-multipart==0.0.6
pandas==2.1.4
numpy==1.26.3
python-jose==3.3.0
onnxruntime==1.16.3
# FastAPI Stack
fastapi>=0.115.0,<1.0.0 # ReDoc fix in 0.115+
uvicorn>=0.27.0,<0.41.0 # ASGI server
mangum>=0.17.0,<0.20.0 # AWS Lambda adapter
python-multipart>=0.0.6,<1.0.0 # Form/file upload support

# AWS SDK
boto3>=1.34.0,<2.0.0 # AWS services

# Data Validation
pydantic>=2.5.0,<3.0.0 # Schema validation
pydantic-settings>=2.1.0,<3.0.0 # Settings management

# Data Processing
pandas>=2.1.4,<3.0.0 # DataFrame operations
numpy>=1.26.0,<2.0.0 # Numerical operations

# Auth
python-jose>=3.3.0,<4.0.0 # JWT handling

# Model Inference
onnxruntime>=1.16.0,<2.0.0 # ONNX model inference
7 changes: 3 additions & 4 deletions backend/tests/training/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,11 @@
import numpy as np
from pathlib import Path

# Add backend/training to path for imports
# Path: tests/training/conftest.py -> backend/training
# Add backend to path for package imports
# The training module is now a proper package: backend/training/
# Tests import using: from training.core.preprocessor import AutoPreprocessor
backend_path = Path(__file__).parent.parent.parent
training_path = backend_path / "training"
sys.path.insert(0, str(backend_path))
sys.path.insert(0, str(training_path))


# =============================================================================
Expand Down
14 changes: 6 additions & 8 deletions backend/tests/training/unit/test_column_detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,13 @@
import pytest
import pandas as pd
import numpy as np
import sys
from pathlib import Path

# Add training module to path
# Path: tests/training/unit/test_file.py -> backend/training
training_path = Path(__file__).parent.parent.parent.parent / "training"
sys.path.insert(0, str(training_path))

from utils import is_id_column, is_constant_column, is_high_cardinality_categorical
# Import from new package structure (path setup in conftest.py)
from training.utils.detection import (
is_id_column,
is_constant_column,
is_high_cardinality_categorical
)


class TestIsIdColumnByName:
Expand Down
10 changes: 2 additions & 8 deletions backend/tests/training/unit/test_detect_problem_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,9 @@
import pytest
import pandas as pd
import numpy as np
import sys
from pathlib import Path

# Add training module to path
# Path: tests/training/unit/test_file.py -> backend/training
training_path = Path(__file__).parent.parent.parent.parent / "training"
sys.path.insert(0, str(training_path))

from utils import detect_problem_type
# Import from new package structure (path setup in conftest.py)
from training.utils.detection import detect_problem_type


class TestDetectProblemTypeClassification:
Expand Down
Loading