A powerful Python application that translates Microsoft Word (.docx) documents while perfectly preserving all document structure, formatting, images, and logos.
- β Document Structure - Maintains all headings, paragraphs, and sections
- β Text Formatting - Preserves bold, italic, underline, colors, fonts, and sizes
- β Images & Logos - Keeps all images intact and in their original positions
- β Tables - Maintains table structure, borders, and formatting
- β Lists - Preserves numbered and bulleted lists
- β Headers & Footers - Translates and preserves headers and footers
- β Styles - Keeps all paragraph and character styles
- β Layout - Maintains page layout, margins, and spacing
- π Supports 15+ languages
- π Auto-detects source language
- π§ Skips email addresses (preserves formatting)
- π Skips URLs and web links
- π’ Skips pure numbers and dates
- πΎ Caches translations for efficiency
- Python 3.8 or higher
- Node.js (for creating sample documents)
- Internet connection (for translation API)
pip install googletrans==4.0.0rc1 defusedxml --break-system-packagespip install tkinter-
Clone or download the application files
-
Install dependencies:
pip install googletrans==4.0.0rc1 defusedxml --break-system-packages
-
Verify installation:
python document_translator.py --help
python document_translator.py input.docx output.docx [target_language] [source_language]Translate to Spanish (default):
python document_translator.py document.docx translated.docxTranslate to French:
python document_translator.py document.docx translated_fr.docx frTranslate from English to German:
python document_translator.py document.docx translated_de.docx de enTranslate to Portuguese with auto-detect:
python document_translator.py document.docx translated_pt.docx pt autoFor a user-friendly interface:
python document_translator_gui.pyGUI Features:
- Browse and select input/output files
- Choose source and target languages from dropdown
- Real-time progress monitoring
- Error handling with detailed messages
| Language | Code | Language | Code |
|---|---|---|---|
| English | en | Spanish | es |
| French | fr | German | de |
| Italian | it | Portuguese | pt |
| Russian | ru | Chinese (Simplified) | zh-cn |
| Japanese | ja | Korean | ko |
| Arabic | ar | Dutch | nl |
| Polish | pl | Turkish | tr |
| Hindi | hi | Auto-detect | auto |
To test the translator, create a sample document:
node create_sample_document.jsThis creates sample_document.docx with:
- Company letterhead
- Address information
- Multiple sections
- Various formatting styles
- Professional layout
Then translate it:
python document_translator.py sample_document.docx translated.docx es- Unpacking - Extracts the DOCX file (which is a ZIP archive containing XML files)
- Parsing - Reads the XML structure while preserving all formatting
- Translation - Translates text content using Google Translate API
- Smart Filtering - Skips emails, URLs, and numbers
- Preservation - Maintains all images, styles, and structure
- Repacking - Creates the translated DOCX file
document_translator.py # Main translation logic
βββ DocumentTranslator # Core translator class
β βββ translate_text() # Translates individual text segments
β βββ should_translate_node() # Determines what to translate
β βββ translate_xml_text_nodes() # Recursively processes XML
βββ ooxml/ # OOXML manipulation library
β βββ document.py # Document structure handler
βββ scripts/ # Utility scripts
βββ unpack.py # Extracts DOCX files
βββ pack.py # Creates DOCX files
- Paragraph text
- Headings and titles
- Table content
- List items
- Headers and footers
- Text in text boxes
- Captions
- Email addresses (e.g., info@company.com)
- URLs (e.g., www.example.com)
- Pure numbers (e.g., 12345, $1,000)
- Simple dates (e.g., 2025-10-27)
- Images and logos
- Charts and diagrams
- Translate business proposals while keeping company logos
- Convert contracts while maintaining legal formatting
- Localize marketing materials with branded headers
- Translate research papers preserving structure
- Convert thesis documents maintaining formatting
- Localize educational materials with diagrams
- Translate user manuals with screenshots
- Convert technical specifications with tables
- Localize product documentation with logos
To use a different translation service, modify the translate_text() method in DocumentTranslator class:
def translate_text(self, text):
# Replace with your translation API
# Example: Azure Translator, DeepL, etc.
result = your_translation_service.translate(text, target=self.target_lang)
return resultProcess multiple documents:
import glob
from document_translator import DocumentTranslator
translator = DocumentTranslator(target_lang='fr')
for doc in glob.glob('*.docx'):
output = doc.replace('.docx', '_translated.docx')
translator.translate_document(doc, output)Issue: "Translation failed" errors
- Check internet connection
- Verify the Google Translate service is accessible
- Try reducing translation frequency (add delay in translate_text)
Issue: Document structure is altered
- Ensure input file is a valid .docx (not .doc)
- Verify file is not password-protected
- Check that file is not corrupted
Issue: Some text not translated
- Check if text appears as an email, URL, or number
- Verify the text is not inside an image
- Some text boxes may require manual translation
Issue: Images missing
- Ensure images are embedded, not linked
- Check that original document opens correctly in Word
- Verify sufficient disk space for temporary files
document_translator/
βββ document_translator.py # Main CLI application
βββ document_translator_gui.py # GUI application
βββ create_sample_document.js # Sample document generator
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ ooxml/ # OOXML library
β βββ __init__.py
β βββ document.py
β βββ xmleditor.py
βββ scripts/ # Utility scripts
βββ unpack.py
βββ pack.py
This application uses the OOXML library which is proprietary. See LICENSE.txt for details.
Contributions are welcome! Areas for improvement:
- Support for more translation services (DeepL, Azure, AWS)
- Better handling of complex formatting
- Support for .doc (older Word format)
- Parallel processing for large documents
- Translation memory/glossary support
- Large Documents: For documents over 50 pages, consider breaking into sections
- Technical Terms: Create a glossary file for consistent technical translations
- Review: Always review translated documents for context-specific accuracy
- Backup: Keep original documents before translation
- Testing: Test with sample documents first to verify formatting preservation
For issues or questions:
- Check the troubleshooting section
- Verify all requirements are installed
- Test with the provided sample document
- Review error messages for specific issues
- Always preview translated documents in Word before distribution
- Keep originals - never overwrite source documents
- Use specific language codes when you know the source language
- Review technical terms that may need manual adjustment
- Test formatting by opening in Word after translation
- Back up important documents before batch processing
- Small documents (1-5 pages): ~30 seconds
- Medium documents (10-20 pages): ~2-3 minutes
- Large documents (50+ pages): ~10-15 minutes
Translation speed depends on:
- Document size and complexity
- Internet connection speed
- Google Translate API response time
- Number of unique text segments
Built with β€οΈ using Python, OOXML, and Google Translate