HackFax Γ PatriotHacks 2026 at George Mason University
Transform complex CBC lab reports into clear, patient-friendly explanations using AI with clinical reasoning
Clarion AI turns cryptic lab reports into understandable health insights with clinical reasoning:
- Upload a CBC lab report PDF (drag-and-drop or click)
- Extract text using
pdf-parsewith automatic OCR fallback (Tesseract.js) - Collect optional patient context (age, sex, pregnancy status, symptoms)
- Stream live extraction progress with real-time page-by-page updates
- Identify test candidates using multi-line OCR-aware regex patterns
- Normalize via Neo4j knowledge graph with batch AI matching (Gemini 2.5 Flash)
- Evaluate clinical reasoning rules against patient context (deterministic graph logic)
- Explain results in plain English using RAG-enhanced AI generation with clinical signals
- Listen to audio summary via ElevenLabs text-to-speech
- Return patient-friendly JSON: summary, findings, red flags, next steps
- Zero persistence: PDFs processed in-memory only, patient context never stored
- No storage: Files never touch disk
- PHI protection: Safe logging with automatic redaction
- Educational only: Clear medical disclaimers on all outputs
- Drag-and-drop PDF upload with visual feedback
- Hero section with gradient healthcare design (#667eea β #764ba2)
- Pipeline indicators showing extraction β patient context β analysis β explanation flow
- Patient intake form with age, sex, pregnancy status, symptoms
- Progress animations with page-by-page OCR status
- Voice playback with play/pause/stop controls (ElevenLabs TTS)
- Responsive design using responsive
clamp()sizing - "Try Sample Report" button for instant demo
- Accessibility: ARIA labels, focus states, keyboard navigation
- Primary:
pdf-parsefor native PDF text extraction - Fallback: Automatic OCR when pdf-parse fails (handles scanned/image PDFs)
- Streaming progress: Server-Sent Events (SSE) with real-time page updates
- Multi-line extraction: Custom regex patterns for OCR table format
- Deterministic rules: Evidence-based clinical logic in Neo4j graph
- Patient-aware: Demographic constraints (age, sex, pregnancy status)
- Threshold evaluation: Operators (>, <, >=, <=, between, abnormal_flag)
- Multi-layer graph: Tests β Findings β Conditions β Actions
- Safety signals: Urgency levels (mild, moderate, severe, critical)
- Contextual guidance: Tailored recommendations based on patient context
- Batch matching: Reduces API calls 15x (single batch call vs. sequential)
- Rate limit handling: Exponential backoff with jitter
- Token optimization: 8192 token limit with conciseness prompts
- 3-tier JSON parsing: Direct parse β markdown strip β regex extraction
- Clinical signals integration: Graph findings injected into AI prompts
- 15 CBC test nodes with aliases, units, LOINC codes, NHANES mappings
- 10 clinical findings (anemia, infection, thrombocytopenia, etc.)
- 6 medical conditions with urgency levels
- 4 action recommendations (contact doctor, emergency care, follow-up)
- 10 clinical rules with 13 threshold nodes
- 4 demographic constraints (pregnancy, age ranges, sex-specific)
- Canonical normalization: Fuzzy matching ("Hgb" β "Hemoglobin")
- Relationship tracking: TestβPanel, RuleβFindingβConditionβAction
- Sub-second queries: Pure Cypher without ML overhead
- ElevenLabs integration: Natural-sounding voice synthesis
- Medical disclaimer: Auto-appended to all audio
- 2000 char limit: Automatic truncation for API limits
- Audio controls: Play, pause, stop, cancel loading
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client (React) β
β β’ Drag-and-drop upload β
β β’ Streaming progress display (SSE) β
β β’ Results visualization β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β POST /api/extract?stream=true β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Try pdf-parse (native extraction) β β
β β β (if fails: XRef errors, scanned PDFs) β β
β β 2. Fallback to OCR (pdf2pic + Tesseract.js) β β
β β β’ Convert PDF pages to PNG (1600Γ2200, 160 DPI) β β
β β β’ OCR each page with Tesseract β β
β β β’ Stream progress: {page, total, textLength} β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββ SSE stream βββΊ Client progress bar β
β β β
β βΌ β
β Extracted Text β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β POST /api/explain β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Extract candidates (regex on multi-line format) β β
β β Pattern: "Test Name\nValue\nUnit Range\nFlag" β β
β β Example: "WBC\n11.8\n10^3/mcL4.5-11.0\nH" β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 2. Batch normalize via Gemini 2.5 Flash β β
β β β’ Single API call for all 15 candidates β β
β β β’ Match to Neo4j canonical test names β β
β β β’ Returns Map<rawName, canonical | null> β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 3. Query Neo4j for test metadata β β
β β MATCH (t:Test {name: $canonical}) β β
β β RETURN t.label, t.unit, t.panel, t.aliases β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 4. Generate explanation (Gemini 2.5 Flash) β β
β β β’ System prompt: medical educator, safety rules β β
β β β’ User message: extracted text + Neo4j context β β
β β β’ Config: temp=0.1, maxTokens=8192, concise β β
β β β’ 3-tier JSON parsing with error recovery β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β LabExplanation JSON β
β {patient_summary, key_findings, results_table, β
β red_flags, next_steps, disclaimer} β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Test nodes (15 CBC tests)
(:Test {
id: "WBC",
name: "White Blood Cell Count",
aliases: ["WBC", "Leukocyte Count", "White Cell Count"],
unit: "10^3/mcL",
loinc: "6690-2",
nhanes_variable: "LBXWBCSI",
label: "Total immune cells; reflects infection status",
description: "Total count of white blood cells"
})
# Clinical findings (10 findings)
(:Finding {
finding_id: "F001",
name: "Anemia",
description: "Low red blood cell count or hemoglobin",
severity: "moderate", # mild | moderate | severe | critical
patient_guidance: "May cause fatigue and weakness. Discuss with doctor."
})
# Medical conditions (6 conditions)
(:Condition {
condition_id: "C001",
name: "Anemia",
description: "Condition characterized by low hemoglobin",
urgency_level: "moderate", # low | moderate | high | critical
typical_causes: "Iron deficiency, vitamin deficiency, chronic disease"
})
# Action recommendations (4 actions)
(:Action {
action_id: "A001",
name: "Contact Primary Care Physician",
guidance_text: "Schedule follow-up with your doctor within 1-2 weeks",
urgency: "moderate" # low | moderate | high | critical
})
# Clinical rules (10 rules with threshold logic)
(:Rule {
rule_id: "R001",
name: "Low Hemoglobin Detection",
description: "Detects anemia from hemoglobin values",
logic_type: "threshold", # threshold | pattern | combination
required_tests: ["HGB"],
priority: 100
})
# Threshold nodes (13 thresholds)
(:Threshold {
threshold_id: "TH001",
test_id: "HGB",
operator: "<", # < | > | <= | >= | between | abnormal_flag
value: 12.0,
unit: "g/dL"
})
# Demographic constraints (4 constraints)
(:DemographicConstraint {
constraint_id: "DC001",
constraint_type: "pregnancy_status", # age | sex_at_birth | pregnancy_status
required_value: "pregnant"
})
# Relationships
(Rule)-[:EVALUATES]->(Threshold)-[:APPLIES_TO]->(Test)
(Rule)-[:HAS_DEMOGRAPHIC_CONSTRAINT]->(DemographicConstraint)
(Rule)-[:PRODUCES_FINDING]->(Finding)
(Finding)-[:INDICATES]->(Condition)
(Condition)-[:RECOMMENDS]->(Action)
(Test)-[:IN_PANEL]->(:Panel {name: "CBC"})Patient Context (age, sex, pregnancy, symptoms)
β
Test Results (HGB=9.5, WBC=18.5, PLT=80)
β
Rule Evaluation (deterministic graph traversal)
β
Matched Findings (F001: Anemia (severe), F002: Infection (moderate))
β
Triggered Conditions (C001: Anemia (high urgency), C002: Acute Infection)
β
Recommended Actions (A002: Seek urgent care, A001: Contact doctor)
β
Clinical Signals β Injected into Gemini Prompt β Enhanced Explanation
| Tool | Version | Purpose |
|---|---|---|
| Node.js | β₯ 18 | Runtime |
| npm | β₯ 9 | Package manager |
| Neo4j | 5.x | Knowledge graph |
| Docker | (optional) | Neo4j container |
git clone https://github.com/royalgillz/Clarion-AI.git
cd Clarion-AI
npm install# Create .env.local file
cat > .env.local << EOF
GEMINI_API_KEY=your_gemini_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password
EOFGet API Keys:
- Gemini: https://aistudio.google.com/app/apikey
- ElevenLabs: https://elevenlabs.io/app/settings/api-keys (for voice output)
Option A β Docker (Recommended):
docker run \
--name clarion-neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/yourpassword \
-d neo4j:5.21Option B β Neo4j Desktop:
- Download from https://neo4j.com/download/
- Create new database β set password β Start
- Verify at http://localhost:7474
# Seed basic test nodes (original)
npm run seed
# Seed complete clinical reasoning graph (NEW)
npm run seed:reasoningWhat npm run seed:reasoning does (completes in ~3 seconds):
- Clears existing graph (DETACH DELETE all nodes)
- Creates 15 CBC test nodes with LOINC codes, units, aliases
- Creates 1 CBC panel node
- Creates 10 clinical findings (F001-F010: anemia, infection, thrombocytopenia, etc.)
- Creates 6 medical conditions (C001-C006: anemia, acute infection, bleeding risk, etc.)
- Creates 4 action recommendations (A001-A004: contact doctor, emergency care, follow-up, avoid risk)
- Creates 10 clinical rules (R001-R010) with deterministic logic
- Creates 13 threshold nodes (TH001-TH013) with operators (<, >, <=, >=, between, abnormal_flag)
- Creates 4 demographic constraints (DC001-DC004: pregnancy, age, sex-specific rules)
- Links all relationships: TestβThresholdβFindingβConditionβAction
Expected output:
ποΈ Clearing existing graph...
β
Graph cleared
π Seeding Test nodes...
β
Created 15 Test nodes
π Seeding Panel nodes and relationships...
β
Created CBC panel and relationships
π Seeding Finding nodes...
β
Created 10 Finding nodes
π₯ Seeding Condition nodes...
β
Created 6 Condition nodes
β‘ Seeding Action nodes...
β
Created 4 Action nodes
π€ Seeding DemographicConstraint nodes...
β
Created 4 DemographicConstraint nodes
π Seeding Threshold nodes...
β
Created 13 Threshold nodes
π§ Seeding Rule nodes and relationships...
β
Created 10 Rule nodes with relationships
β
All seeding complete!
π Graph now contains:
- 15 Test nodes
- 1 Panel node
- 10 Finding nodes
- 6 Condition nodes
- 4 Action nodes
- 4 DemographicConstraint nodes
- 13 Threshold nodes
- 10 Rule nodes with relationships
npm run devOpen http://localhost:3000 π
Method 1 β UI Upload with Patient Context:
- Drag & drop any CBC PDF onto the upload zone
- Watch real-time OCR progress (if PDF is scanned)
- NEW: Fill out patient intake form (age, sex, pregnancy, symptoms) or skip
- View context-aware explanation with clinical reasoning signals
- NEW: Click "Listen to Summary" to hear audio explanation
Method 2 β "Try Sample Report" Button:
- Click "Try Sample Report" on homepage
- Automatically loads
data/sample_cbc_report.pdf - Streams extraction progress
- Provide patient context for enhanced analysis
Method 3 β API Testing:
# Stream extraction progress (Server-Sent Events)
curl -N "http://localhost:3000/api/extract?stream=true" \
-F "file=@data/sample_cbc_report.pdf"
# Get explanation with patient context
curl -X POST http://localhost:3000/api/explain \
-H "Content-Type: application/json" \
-d '{
"extractedText": "WBC 11.8 10^3/mcL [4.5-11.0] H\nRBC 4.8 million cells/mcL\nHemoglobin 13.5 g/dL",
"patientContext": {
"age": 35,
"sex_at_birth": "female",
"pregnancy_status": "unknown",
"symptoms": ["fatigue"]
}
}' | jq '.output.patient_summary'
# Generate voice audio
curl -X POST http://localhost:3000/api/speak \
-H "Content-Type: application/json" \
-d '{"text": "Your white blood cell count is slightly elevated at 11.8."}' \
--output summary.mp3Patient Context Fields:
age: 0-120 (required)sex_at_birth: "female" | "male" | "intersex" | "prefer_not_say" (required)pregnancy_status: "pregnant" | "not_pregnant" | "unknown" (conditional - only for female)symptoms: Array of "fever" | "fatigue" | "shortness_of_breath" | "bleeding_bruising" | "infection_symptoms" | "none" | "other"symptoms_other_text: String (required if "other" selected, max 500 chars)
Extract text from PDF with optional streaming progress.
Query Parameters:
stream=true(optional): Enable Server-Sent Events for real-time progress
Request (multipart/form-data):
file: <PDF binary>
Response (non-streaming):
{
"ok": true,
"extractedText": "WBC 5.2 10^3/mcL...",
"source": "pdf" | "ocr"
}SSE Events (streaming):
data: {"type":"progress","current":1,"total":3,"textLength":450}
data: {"type":"progress","current":2,"total":3,"textLength":920}
data: {"type":"complete","extractedText":"...","source":"ocr"}
Generate patient-friendly explanation from extracted text with optional patient context and clinical reasoning.
Request:
{
"extractedText": "WBC 5.2 10^3/uL [4.5-11.0]...",
"patientContext": {
"age": 35,
"sex_at_birth": "female",
"pregnancy_status": "unknown",
"symptoms": ["fatigue"]
}
}Response:
{
"ok": true,
"output": {
"patient_summary": "Your blood counts appear generally normal. Based on your age (35) and reported fatigue, we've analyzed your results with clinical reasoning...",
"key_findings": [
"White blood cell count is within normal range",
"No anemia detected based on hemoglobin levels"
],
"results_table": [
{
"test": "White Blood Cell Count",
"value": "5.2 10^3/mcL",
"range": "4.5-11.0",
"meaning_plain_english": "Normal immune cell count",
"what_can_affect_it": ["Infection", "Stress"],
"questions_for_doctor": ["Should I monitor this?"]
}
],
"red_flags": [],
"next_steps": [
"Discuss fatigue with your healthcare provider",
"Consider follow-up in 3-6 months"
],
"disclaimer": "This explanation is for educational purposes only..."
},
"debug": {
"candidatesFound": 15,
"testsNormalized": 3,
"normalizedTests": [
{"raw": "WBC", "canonical": "White Blood Cell Count"}
],
"clinicalSignals": {
"findings": [],
"conditions": [],
"actions": []
}
}
}Generate audio narration using ElevenLabs TTS.
Request:
{
"text": "Your white blood cell count is slightly elevated at 11.8."
}Response: Binary audio/mpeg stream (MP3)
Features:
- Auto-appends medical disclaimer
- 2000 character limit (automatically truncated)
- Voice ID:
EST9Ui6982FZPSi7gCHi(configurable) - Natural-sounding voice synthesis
Clarion-AI/
βββ src/
β βββ app/
β β βββ api/
β β β βββ extract/route.ts # PDF extraction with OCR fallback + SSE streaming
β β β βββ explain/route.ts # Batch normalization + clinical reasoning + Gemini
β β β βββ speak/route.ts # ElevenLabs TTS integration
β β β βββ ocr/route.ts # Legacy OCR endpoint
β β βββ page.tsx # Modern UI with patient intake + voice player
β β βββ layout.tsx
β β βββ globals.css
β βββ components/
β β βββ PatientIntakeForm.tsx # Patient context collection form
β β βββ VoicePlayer.tsx # Audio playback controls
β β βββ PipelineIndicator.tsx # Processing stage visualization
β β βββ LoadingProgress.tsx # Progress bars with cancel
β β βββ UploadCard.tsx # File upload UI
β β βββ ErrorDisplay.tsx # Error handling
β β βββ TestResultCard.tsx # Individual test display
β β βββ SearchFilter.tsx # Search/filter UI
β β βββ ExportActions.tsx # Export functionality
β β βββ Button.tsx # Reusable button component
β βββ lib/
β β βββ gemini.ts # Batch matching + explanation generation
β β βββ neo4j.ts # Knowledge graph queries
β β βββ neo4j/reasoning.ts # Clinical reasoning evaluation engine
β β βββ logging.ts # PHI-safe logging with redaction
β β βββ ocr.ts # Tesseract.js OCR with progress callbacks
β β βββ extractLabs.ts # Multi-line regex extraction patterns
β β βββ triageRules.ts # Basic triage logic
β β βββ redact.ts # PII/PHI redaction
β β βββ theme.ts # Centralized design system
β βββ types/
β βββ reasoning.ts # Clinical reasoning types
β βββ patient.ts # Patient context types with Zod validation
βββ scripts/
β βββ convert_xpt.py # (Optional) NHANES CBC_J.xpt β JSON
β βββ seed_neo4j.ts # Basic Neo4j seeding (original tests)
β βββ seedReasoningGraph.ts # Complete clinical reasoning graph seeding
β βββ create_fpdf2_sample.py # Generate sample CBC PDF from NHANES data
βββ __tests__/
β βββ reasoning.test.ts # Unit tests for clinical reasoning
βββ data/
β βββ CBC_J.json # NHANES metadata (optional)
β βββ sample_cbc_report.pdf # Demo PDF for "Try Sample Report"
βββ .env.local # Environment variables (gitignored)
βββ package.json
βββ tsconfig.json
βββ README.md
Original plan: Use Gemini text-embedding-004 + Neo4j vector search. However:
- Free tier limitation: Gemini embedding API not available on all keys
- Hackathon pragmatism: Seeding 20 test embeddings takes ~30 seconds
- Simplicity: Text-based matching via Gemini 2.5 Flash works excellently
Solution: Store canonical test names in Neo4j, use Gemini batch matching for normalization.
Lab reports from OCR have no spaces between columns:
White Blood Cell Count (WBC)
11.8
10^3/mcL4.5 - 11.0
H
Custom regex patterns extract:
- Test name line
- Value line
- Unit + range (concatenated)
- Flag (H/L) if present
See src/lib/extractLabs.ts for implementation.
Before: 15 sequential Gemini calls β 429 rate limit error
After: 1 batch call matching all tests β 2 total API calls (match + explain)
Impact: 15x reduction in API usage, sub-second normalization.
Server-Sent Events (SSE) stream OCR progress:
// Server (extract route)
send({ type: "progress", current: 1, total: 3, textLength: 450 });
// Client (page.tsx)
const eventSource = new EventSource('/api/extract?stream=true');
eventSource.onmessage = (e) => {
const data = JSON.parse(e.data);
if (data.type === 'progress') {
setOcrProgress({ current: data.current, total: data.total });
}
};| Challenge | Solution |
|---|---|
| PDF XRef errors | Auto-fallback to OCR (pdf2pic + Tesseract.js) |
| Zero extraction candidates | Multi-line regex patterns for OCR format |
| Gemini rate limits (5/min) | Batch matching (15 calls β 1) |
| JSON truncation at 4096 tokens | Increased to 8192 + conciseness prompts |
| Embedding API not available | Switched to text-based Gemini matching |
- No vector search: Text matching via Gemini 2.5 Flash is more reliable for 20 canonical names
- OCR streaming: Real-time progress UX critical for multi-page scanned PDFs (30+ seconds)
- Batch normalization: Essential to stay within free tier rate limits
- 3-tier JSON parsing: Handles Gemini's occasional markdown/truncation issues
Built with β€οΈ for HackFax Γ PatriotHacks 2026 by:
- Sehaj Gill
- Erica Mathias
- Dibyashree Basu
- Jash Bisai
MIT License - see LICENSE for details
- NHANES for CBC reference data (CDC)
- Neo4j for graph database infrastructure
- Google Gemini for AI capabilities
- Tesseract.js for client-side OCR
- Next.js/React for modern web framework
This tool is for educational purposes only and is not medical advice.
Always consult qualified healthcare professionals for interpretation of lab results and medical decisions.
Results generated by this application should not be used for:
- Self-diagnosis
- Treatment decisions
- Medication changes
- Emergency medical situations
If you have urgent medical concerns, contact your healthcare provider immediately or call emergency services.