Skip to content

Latest commit

 

History

History
114 lines (88 loc) · 3.49 KB

File metadata and controls

114 lines (88 loc) · 3.49 KB

Testing the Python Conversion Script

This document explains how to test the improved Python conversion script against the real production RDF files generated by the Jupyter notebook.

GitHub Action Testing

Automatic Testing

  • Runs weekly: Every Saturday at 09:00 UTC (1 hour after production RDF generation)
  • Compares directly: Against the actual production files just generated

Manual Testing

  1. Go to the Actions tab in GitHub
  2. Select "Test Python Conversion Script"
  3. Click "Run workflow"
  4. Choose whether to compare with current production files:
    • true: Compare with the latest production RDF files in data/
    • false: Only run Python script and validate output (for development)

What the Action Does

The test workflow:

  1. Backs up production files: Copies current data/*.ttl files for comparison
  2. Runs Python script: Generates RDF files in data-test/ directory
  3. Compares with production: Direct comparison with real production files
  4. Validates TTL syntax: Ensures all generated files are valid Turtle format
  5. Creates report: Generates markdown comparison report
  6. Uploads artifacts: Test files, production backup, and comparison report

Interpreting Results

✅ Success indicators:

  • All TTL files validate successfully
  • File sizes match production files (±5% is normal)
  • Files are identical OR only differ in timestamps
  • Python script completes without errors

❌ Issues to investigate:

  • TTL validation failures (syntax errors)
  • Large file size differences (>10% from production)
  • Missing output files
  • Content differences beyond timestamps
  • Python script errors in logs

⚠️ Expected differences:

  • Creation timestamps (pav:createdOn, dcterms:modified, pav:importedOn)
  • Minor whitespace variations
  • Triple ordering (RDF allows different valid orderings)

Local Testing

Prerequisites

pip install -r requirements.txt

Run Python Script Only

# Test version (outputs to data-test/)
python run_conversion.py --output-dir data-test/

# Compare file sizes
ls -lh data-test/*.ttl

Run Both and Compare

# Run Python version
python run_conversion.py --output-dir data-test/

# Run Jupyter version  
mkdir -p data-jupyter
ln -sf data-jupyter data
jupyter execute AOP-Wiki_XML_to_RDF_conversion.ipynb
rm data

# Compare outputs
diff data-test/AOPWikiRDF.ttl data-jupyter/AOPWikiRDF.ttl

Expected Differences

Some differences between Python script and Jupyter notebook are expected:

  1. Timestamps: Creation dates will differ
  2. Whitespace: Minor formatting differences
  3. Order: Some triples might be in different order (still valid RDF)

Validation Tools

TTL Syntax Check

pip install rdflib
python -c "from rdflib import Graph; g=Graph(); g.parse('data-test/AOPWikiRDF.ttl', format='turtle'); print(f'Valid TTL with {len(g)} triples')"

Triple Count Comparison

# Count triples in each file
grep -c '^\S' data-test/AOPWikiRDF.ttl
grep -c '^\S' data-jupyter/AOPWikiRDF.ttl  

Troubleshooting

Common Issues

  • Network failures: Check internet connectivity for XML/data downloads
  • File permissions: Ensure write access to test directories
  • Memory issues: Large XML files may require more RAM
  • Dependency conflicts: Use fresh virtual environment if needed

Log Analysis

Check the log files for detailed error information:

  • GitHub Actions: Download artifacts and check log files
  • Local: Check aop_conversion.log file