Skip to content

MonDevHub/monocr-web

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A privacy-first, in-browser OCR engine for the Mon language (mnw), powered by Rust, WebAssembly, and ONNX Runtime.

Note

The Mon language is classified as a "vulnerable" language in UNESCO's Atlas of the World’s Languages in Danger.

This project aims to digitize the Mon script, establishing a digital foundation suitable for future development, system integrations, and AI-driven preservation efforts.

Overview

MonOCR Web brings high-performance optical character recognition for the Mon script directly to the browser. By leveraging ONNX Runtime Web and a custom Wasm backend, all processing is performed locally on the user's device. This architecture ensures zero latency, offline capability, and absolute privacy—no images ever leave the browser.

Key Features

  • On-Device Inference: Runs entirely in the browser via WebAssembly (Wasm).
  • Privacy by Design: Zero data collection; OCR processing is 100% local.
  • Optional Cloud Sync: Secure, opt-in synchronization for contributing corrected scans to the open-source Mon language dataset.
  • High Performance: Optimized MobileNetV3 + BiLSTM OCR engine (~6.6M parameters).
  • Format Support: Handles PDFs and images up to 50MB.
  • Script Specialized: Purpose-built for Mon script recognition, with supplementary support for Burmese and English.

Tip

File size is limited to 50MB for web and 20MB for mobile. For processing larger files or leveraging more powerful hardware, please use the CLI or package directly via uv add monocr or pip install monocr.

Architecture

Image (Canvas/Blob)
  LineSegmenter     → horizontal projection profile → List<LineSegment>
  ImagePreprocessor  → grayscale + normalize [-1.0, 1.0]
  MonOcrEngine      → ONNX Runtime Web (monocr.onnx)
  CtcDecoder        → greedy CTC decode → String

Model Specification

Attribute Specification
Architecture MobileNetV3 + BiLSTM-384 + CTC
Precision FP32 (ONNX)
Parameters ~6.6M
Input 128 × Variable (H × W)
Asset Size ~25 MB

Project Structure

monocr-web/
├── src/
│   ├── lib/
│   │   ├── engine/           # OCR Pipeline (ONNX/Wasm)
│   │   ├── components/       # Svelte UI Components
│   │   └── utils/            # Image & PDF Processing
│   └── routes/               # Application Pages
├── static/
│   ├── wasm/                 # ONNX Runtime Wasm Binaries
│   └── fonts/                # Mon/Myanmar Unicode Fonts
├── scripts/                  # Build & Asset Management
└── playwright/               # E2E Testing Suite

Ecosystem

MonOCR is a unified cross-platform ecosystem designed for parity and performance:

  • MonOCR Web: (This Repository) Privacy-first in-browser OCR.
  • MonOCR Android: Native Jetpack Compose app with Material 3.
  • MonOCR iOS: Native SwiftUI app with SwiftData persistence.

Development

Prerequisites

  • Node.js 24+
  • pnpm 11+

1. Setup

pnpm install

2. Prepare Assets

Copy the pre-built ONNX Runtime WASM files to the static directory:

pnpm run copy-wasm

3. Local Development

pnpm dev

4. Production Build

pnpm build

Important

The build script automatically optimizes the monocr.onnx model deployment to comply with edge asset limits. In production, models are fetched from the HuggingFace CDN.

Resources

License

MIT

About

A privacy-focused, in-browser OCR tool for the Mon language (mnw), built with Rust and WebAssembly.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors