[Feature] Run two models, and have the AI compare the results, and report any differences

### Summary

Suggested automated workflow:

- Run Sonnet as the primary translation
- Run GPT in parallel
- Use QA report approach – automated comparison catches the divergences, and divergences are exactly where the interesting decisions lie
- A human reviewer then adjudicates the divergences