This document explains how to test the improved Python conversion script against the real production RDF files generated by the Jupyter notebook.
- Runs weekly: Every Saturday at 09:00 UTC (1 hour after production RDF generation)
- Compares directly: Against the actual production files just generated
- Go to the Actions tab in GitHub
- Select "Test Python Conversion Script"
- Click "Run workflow"
- Choose whether to compare with current production files:
true: Compare with the latest production RDF files indata/false: Only run Python script and validate output (for development)
The test workflow:
- Backs up production files: Copies current
data/*.ttlfiles for comparison - Runs Python script: Generates RDF files in
data-test/directory - Compares with production: Direct comparison with real production files
- Validates TTL syntax: Ensures all generated files are valid Turtle format
- Creates report: Generates markdown comparison report
- Uploads artifacts: Test files, production backup, and comparison report
✅ Success indicators:
- All TTL files validate successfully
- File sizes match production files (±5% is normal)
- Files are identical OR only differ in timestamps
- Python script completes without errors
❌ Issues to investigate:
- TTL validation failures (syntax errors)
- Large file size differences (>10% from production)
- Missing output files
- Content differences beyond timestamps
- Python script errors in logs
- Creation timestamps (
pav:createdOn,dcterms:modified,pav:importedOn) - Minor whitespace variations
- Triple ordering (RDF allows different valid orderings)
pip install -r requirements.txt# Test version (outputs to data-test/)
python run_conversion.py --output-dir data-test/
# Compare file sizes
ls -lh data-test/*.ttl# Run Python version
python run_conversion.py --output-dir data-test/
# Run Jupyter version
mkdir -p data-jupyter
ln -sf data-jupyter data
jupyter execute AOP-Wiki_XML_to_RDF_conversion.ipynb
rm data
# Compare outputs
diff data-test/AOPWikiRDF.ttl data-jupyter/AOPWikiRDF.ttlSome differences between Python script and Jupyter notebook are expected:
- Timestamps: Creation dates will differ
- Whitespace: Minor formatting differences
- Order: Some triples might be in different order (still valid RDF)
pip install rdflib
python -c "from rdflib import Graph; g=Graph(); g.parse('data-test/AOPWikiRDF.ttl', format='turtle'); print(f'Valid TTL with {len(g)} triples')"# Count triples in each file
grep -c '^\S' data-test/AOPWikiRDF.ttl
grep -c '^\S' data-jupyter/AOPWikiRDF.ttl - Network failures: Check internet connectivity for XML/data downloads
- File permissions: Ensure write access to test directories
- Memory issues: Large XML files may require more RAM
- Dependency conflicts: Use fresh virtual environment if needed
Check the log files for detailed error information:
- GitHub Actions: Download artifacts and check log files
- Local: Check
aop_conversion.logfile