Quick Start Guide

Get up and running with the Data Contract Execution Engine.

Prerequisites

Python 3.7+
pip
AWS Account (for Lambda deployment)

1. Installation

# Clone the repository
git clone https://github.com/<your-username>/data-contract-execution-engine.git
cd data-contract-execution-engine

# Install dependencies
pip install -r requirements.txt

2. Define Your Contract

Create or edit contracts/my_contract.yaml:

name: "Customer Data"
version: "1.0"

source_s3_path: "s3://my-bucket/input/customers.csv"
target_s3_path: "s3://my-bucket/output/customers_validated.csv"

schema:
  columns:
    customer_id:
      type: "integer"
      nullable: false
    name:
      type: "string"
      nullable: false
    email:
      type: "string"
    age:
      type: "integer"

sla:
  min_rows: 1
  max_rows: 1000000
  completeness_threshold: 0.95

3. Test Locally

Run the test suite to verify everything works:

# Run all tests
python -m pytest tests/ -v

# Run with coverage report
python -m pytest tests/ --cov=engine --cov=runtime

Expected output:

tests/test_contract_parser.py::test_load_contract PASSED
tests/test_validation_engine.py::test_validate_schema PASSED
tests/test_sla_enforcer.py::test_enforce_sla PASSED
tests/test_pipeline_generator.py::test_generate_pipeline PASSED
tests/test_lambda_handler.py::test_handler PASSED

======================== 18+ tests passed ========================

Or test manually with sample data:

python -c "
import pandas as pd
from engine.contract_parser import load_contract
from engine.pipeline_generator import PipelineGenerator

# Load contract and test
contract = load_contract('contracts/sample_contract.yaml')
df = pd.read_csv('examples/customers_expanded.csv')
pipeline = PipelineGenerator(contract)
results = pipeline.generate(df)
print('Validation Success!' if results.get('success') else 'Validation Failed')
print(results)
"

4. Test with Local Files

The engine supports both local file paths and S3 paths. This is useful for testing before deploying to Lambda.

Step 1: Create output directory

mkdir output

Step 2: Test with local paths

python -c "
from runtime.lambda_handler import handler

# Test with local files
event = {
    'contract_path': 'contracts/sample_contract.yaml',
    'source_path': 'examples/customers_expanded.csv',
    'target_path': 'output/customers_validated.csv'
}

result = handler(event, None)
print(result)
"

Step 3: Verify output

# Check the validated data
cat output/customers_validated.csv

You can mix local and S3 paths:

# Read from local, write to S3
{
    "contract_path": "contracts/sample_contract.yaml",
    "source_path": "examples/customers_expanded.csv",
    "target_path": "s3://my-bucket/output/validated.csv"
}

# Read from S3, write to local
{
    "contract_path": "s3://my-bucket/contracts/contract.yaml",
    "source_path": "s3://my-bucket/data/customers.csv",
    "target_path": "output/customers_validated.csv"
}

5. Deploy to Lambda

For detailed Lambda deployment instructions, see LAMBDA_DEPLOYMENT.md.

Quick deployment:

# Create deployment package
mkdir lambda_build
cp -r engine/ lambda_build/
cp -r runtime/ lambda_build/
cp -r contracts/ lambda_build/
cp requirements.txt lambda_build/

# Create ZIP
cd lambda_build
zip -r ../lambda_function.zip .
cd ..

# Deploy (requires AWS CLI configured)
aws lambda update-function-code \
  --function-name data-contract-executor \
  --zip-file fileb://lambda_function.zip

5. Test Lambda

Create a test event:

{
  "contract_path": "contracts/my_contract.yaml"
}

Or override paths (supports both S3 and local):

{
  "contract_path": "contracts/my_contract.yaml",
  "source_path": "examples/customers_expanded.csv",
  "target_path": "output/customers_validated.csv"
}

The Lambda handler will:

Load the contract (S3 or local)
Read data from source path (S3 or local)
Validate against schema & SLA rules
Write to target path (S3 or local)
Return validation results

Response Example

Success:

{
  "statusCode": 200,
  "body": {
    "message": "Data contract execution completed successfully",
    "contract": "Customer Data",
    "input_rows": 10,
    "output_rows": 5,
    "pipeline_results": {
      "input_rows": 5,
      "success": true,
      "steps": [...]
    }
  }
}

Error:

{
  "statusCode": 500,
  "body": {
    "message": "Data contract execution failed: error details"
  }
}

Running Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=engine --cov=runtime

# Run specific test
pytest tests/test_validation_engine.py -v

Common Issues

Issue	Solution
Missing columns	Add columns to CSV or update contract schema
SLA violation	Check min_rows, max_rows, completeness_threshold
S3 access denied	Verify Lambda IAM role has S3 permissions
FileNotFoundError	Ensure contract_path is correct

Project Structure

data-contract-execution-engine/
├── engine/                  # Core modules
│   ├── contract_parser.py
│   ├── validation_engine.py
│   ├── sla_enforcer.py
│   └── pipeline_generator.py
├── runtime/
│   └── lambda_handler.py
├── contracts/
│   └── sample_contract.yaml
├── examples/
│   ├── sample_data.csv
│   └── lambda_event.json
├── tests/                   # Test suite
├── requirements.txt
├── README.md
└── QUICKSTART.md (this file)

Next Steps

Read the full README.md
Explore CONTRIBUTING.md
Check examples/ for sample configurations
Run tests with coverage: pytest tests/ --cov=engine --cov=runtime

Security Best Practices

Use IAM Roles - Never store AWS credentials in code
S3 Encryption - Enable encryption for sensitive data
Contract Validation - Validate contracts before deployment
Logging - Review CloudWatch logs for security issues
Least Privilege - Grant minimum required permissions

Performance Tips

For Lambda with data under 100MB:

Set Lambda memory to at least 3GB
Use CSV format
Keep transformations simple
Set timeout to 15 minutes

For larger datasets over 100MB, consider AWS Glue with Parquet format and auto-scaling workers.

Questions? Check the documentation or open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide

Prerequisites

1. Installation

2. Define Your Contract

3. Test Locally

4. Test with Local Files

5. Deploy to Lambda

5. Test Lambda

Response Example

Running Tests

Common Issues

Project Structure

Next Steps

Security Best Practices

Performance Tips

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Quick Start Guide

Prerequisites

1. Installation

2. Define Your Contract

3. Test Locally

4. Test with Local Files

5. Deploy to Lambda

5. Test Lambda

Response Example

Running Tests

Common Issues

Project Structure

Next Steps

Security Best Practices

Performance Tips