TrustMed AI

An AI-powered chatbot that bridges medical expertise with real-world patient experience.

Overview

TrustMed AI helps patients find clear, reliable information about Diabetes and Cardiovascular Disease. It combines authoritative medical content (Mayo Clinic, CDC, MedlinePlus) with patient discussions from Reddit to deliver answers that are accurate, empathetic, and cited.

Built as a Retrieval-Augmented Generation (RAG) system, it grounds every response in verifiable evidence — unlike systems that rely solely on clinical sources (hard to understand) or patient forums (unverified).

Dataset: 1,727 documents — 201 authoritative medical articles + 1,525 Reddit patient discussions Retrieval: Hybrid k-NN + BM25 search (Amazon OpenSearch Serverless) LLM: Meta Llama 3 8B Instruct via AWS Bedrock Evaluation: Context Relevance 0.72 | Answer Relevance 0.68 | Groundedness 0.74

Architecture

graph TD
    subgraph Data Pipeline
        A1[Mayo Clinic\nCDC, Healthline\n+5 more] -->|scrape| P[Python\nPreprocessing]
        A2[Reddit\nr/diabetes\nr/heartdisease\n+6 more] -->|scrape| P
        P -->|1727 docs| S3[Amazon S3]
        S3 --> KB[Bedrock Knowledge Base\nTitan Embeddings v2]
        KB --> OS[OpenSearch Serverless\nVector Index]
    end

    subgraph RAG Flow
        U[User Query] --> CL[Chainlit UI]
        CL --> RG[Bedrock retrieve_and_generate\nHybrid k-NN + BM25\ntop-k=8]
        OS --> RG
        RG --> LLM[Llama 3 8B Instruct]
        LLM --> R[Response + Citations]
        R --> CL
    end

Tech Stack

Layer	Technology
UI	Chainlit (streaming chat with citations)
RAG Orchestration	LangChain + AWS Bedrock `retrieve_and_generate`
LLM	Meta Llama 3 8B Instruct (AWS Bedrock)
Embeddings	Amazon Titan Text Embeddings G1 (1536-dim)
Vector DB	Amazon OpenSearch Serverless (k-NN + BM25 hybrid)
Data Storage	Amazon S3
Data Collection	Python, Requests, BeautifulSoup, PRAW
Evaluation	TruLens-style metrics (Sentence Transformers)

Key Features

Dual-source RAG — 65% authoritative sources, 35% patient forum content per query, automatically balanced by hybrid retrieval
Transparent citations — every response includes inline citations with source name, type (clinical vs. forum), title, URL, and a content excerpt
Hybrid search — combines semantic similarity (k-NN) with keyword matching (BM25) for better retrieval than either alone
Evaluated quality — custom TruLens-style evaluation across 3 dimensions on 6 representative queries

Evaluation Results

Metric	Score	What it measures
Context Relevance	0.72	Retrieved docs match the query
Answer Relevance	0.68	Response stays on topic
Groundedness	0.74	Claims are supported by retrieved sources

Data Sources

Authoritative (201 articles): Mayo Clinic, CDC, Healthline, MedicalNewsToday, Medical Xpress, Johns Hopkins, MedlinePlus, WebMD

Patient Forums (1,525 threads): r/diabetes, r/diabetes_t2, r/type2diabetes, r/prediabetes, r/hypertension, r/heartdisease, r/cardiology, r/askcardiology

Setup

Prerequisites

Python 3.10+
AWS account with Bedrock access (Llama 3 8B + Titan Embeddings enabled)
Amazon OpenSearch Serverless collection
S3 bucket with processed documents

Install dependencies

cd app
pip install -r requirements.txt

Configure AWS

aws configure

Run the app

cd app
chainlit run app.py

Open http://localhost:8000 in your browser.

Project Structure

.
├── app/
│   ├── app.py                    # Chainlit app + Bedrock RAG integration
│   └── chainlit.md               # Welcome page content
├── data-collection/
│   ├── scripts/                  # Scraping and preprocessing scripts
│   └── data/                     # Collected medical articles and forum threads
├── evaluation/
│   ├── evaluations.py            # TruLens-style evaluation framework
│   ├── consolidated_trulens.png  # Metrics visualization
│   └── indi_query.png            # Per-query analysis
├── docs/                         # Final report, proposal, and presentation
├── knowledge-base/               # Processed documents for S3/Bedrock KB
└── requirements.txt

Screenshots

Consolidated Metrics	Per-Query Analysis

License

MIT License — see LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrustMed AI

Overview

Architecture

Tech Stack

Key Features

Evaluation Results

Data Sources

Setup

Prerequisites

Install dependencies

Configure AWS

Run the app

Project Structure

Screenshots

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
app		app
data-collection		data-collection
docs		docs
evaluation		evaluation
knowledge-base		knowledge-base
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
UI.png		UI.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TrustMed AI

Overview

Architecture

Tech Stack

Key Features

Evaluation Results

Data Sources

Setup

Prerequisites

Install dependencies

Configure AWS

Run the app

Project Structure

Screenshots

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages