Testing the Python Conversion Script

This document explains how to test the improved Python conversion script against the real production RDF files generated by the Jupyter notebook.

GitHub Action Testing

Automatic Testing

Runs weekly: Every Saturday at 09:00 UTC (1 hour after production RDF generation)
Compares directly: Against the actual production files just generated

Manual Testing

Go to the Actions tab in GitHub
Select "Test Python Conversion Script"
Click "Run workflow"
Choose whether to compare with current production files:
- true: Compare with the latest production RDF files in data/
- false: Only run Python script and validate output (for development)

What the Action Does

The test workflow:

Backs up production files: Copies current data/*.ttl files for comparison
Runs Python script: Generates RDF files in data-test/ directory
Compares with production: Direct comparison with real production files
Validates TTL syntax: Ensures all generated files are valid Turtle format
Creates report: Generates markdown comparison report
Uploads artifacts: Test files, production backup, and comparison report

Interpreting Results

✅ Success indicators:

All TTL files validate successfully
File sizes match production files (±5% is normal)
Files are identical OR only differ in timestamps
Python script completes without errors

❌ Issues to investigate:

TTL validation failures (syntax errors)
Large file size differences (>10% from production)
Missing output files
Content differences beyond timestamps
Python script errors in logs

⚠️ Expected differences:

Creation timestamps (pav:createdOn, dcterms:modified, pav:importedOn)
Minor whitespace variations
Triple ordering (RDF allows different valid orderings)

Local Testing

Prerequisites

pip install -r requirements.txt

Run Python Script Only

# Test version (outputs to data-test/)
python run_conversion.py --output-dir data-test/

# Compare file sizes
ls -lh data-test/*.ttl

Run Both and Compare

# Run Python version
python run_conversion.py --output-dir data-test/

# Run Jupyter version  
mkdir -p data-jupyter
ln -sf data-jupyter data
jupyter execute AOP-Wiki_XML_to_RDF_conversion.ipynb
rm data

# Compare outputs
diff data-test/AOPWikiRDF.ttl data-jupyter/AOPWikiRDF.ttl

Expected Differences

Some differences between Python script and Jupyter notebook are expected:

Timestamps: Creation dates will differ
Whitespace: Minor formatting differences
Order: Some triples might be in different order (still valid RDF)

Validation Tools

TTL Syntax Check

pip install rdflib
python -c "from rdflib import Graph; g=Graph(); g.parse('data-test/AOPWikiRDF.ttl', format='turtle'); print(f'Valid TTL with {len(g)} triples')"

Triple Count Comparison

# Count triples in each file
grep -c '^\S' data-test/AOPWikiRDF.ttl
grep -c '^\S' data-jupyter/AOPWikiRDF.ttl

Troubleshooting

Common Issues

Network failures: Check internet connectivity for XML/data downloads
File permissions: Ensure write access to test directories
Memory issues: Large XML files may require more RAM
Dependency conflicts: Use fresh virtual environment if needed

Log Analysis

Check the log files for detailed error information:

GitHub Actions: Download artifacts and check log files
Local: Check aop_conversion.log file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing the Python Conversion Script

GitHub Action Testing

Automatic Testing

Manual Testing

What the Action Does

Interpreting Results

Local Testing

Prerequisites

Run Python Script Only

Run Both and Compare

Expected Differences

Validation Tools

TTL Syntax Check

Triple Count Comparison

Troubleshooting

Common Issues

Log Analysis

FilesExpand file tree

TESTING.md

Latest commit

History

TESTING.md

File metadata and controls

Testing the Python Conversion Script

GitHub Action Testing

Automatic Testing

Manual Testing

What the Action Does

Interpreting Results

Local Testing

Prerequisites

Run Python Script Only

Run Both and Compare

Expected Differences

Validation Tools

TTL Syntax Check

Triple Count Comparison

Troubleshooting

Common Issues

Log Analysis