Skip to content

shitijkarsolia/trustmedAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrustMed AI

An AI-powered chatbot that bridges medical expertise with real-world patient experience.

Python AWS Bedrock LangChain Chainlit

TrustMed AI


Overview

TrustMed AI helps patients find clear, reliable information about Diabetes and Cardiovascular Disease. It combines authoritative medical content (Mayo Clinic, CDC, MedlinePlus) with patient discussions from Reddit to deliver answers that are accurate, empathetic, and cited.

Built as a Retrieval-Augmented Generation (RAG) system, it grounds every response in verifiable evidence — unlike systems that rely solely on clinical sources (hard to understand) or patient forums (unverified).

Dataset: 1,727 documents — 201 authoritative medical articles + 1,525 Reddit patient discussions Retrieval: Hybrid k-NN + BM25 search (Amazon OpenSearch Serverless) LLM: Meta Llama 3 8B Instruct via AWS Bedrock Evaluation: Context Relevance 0.72 | Answer Relevance 0.68 | Groundedness 0.74


Architecture

graph TD
    subgraph Data Pipeline
        A1[Mayo Clinic\nCDC, Healthline\n+5 more] -->|scrape| P[Python\nPreprocessing]
        A2[Reddit\nr/diabetes\nr/heartdisease\n+6 more] -->|scrape| P
        P -->|1727 docs| S3[Amazon S3]
        S3 --> KB[Bedrock Knowledge Base\nTitan Embeddings v2]
        KB --> OS[OpenSearch Serverless\nVector Index]
    end

    subgraph RAG Flow
        U[User Query] --> CL[Chainlit UI]
        CL --> RG[Bedrock retrieve_and_generate\nHybrid k-NN + BM25\ntop-k=8]
        OS --> RG
        RG --> LLM[Llama 3 8B Instruct]
        LLM --> R[Response + Citations]
        R --> CL
    end
Loading

Tech Stack

Layer Technology
UI Chainlit (streaming chat with citations)
RAG Orchestration LangChain + AWS Bedrock retrieve_and_generate
LLM Meta Llama 3 8B Instruct (AWS Bedrock)
Embeddings Amazon Titan Text Embeddings G1 (1536-dim)
Vector DB Amazon OpenSearch Serverless (k-NN + BM25 hybrid)
Data Storage Amazon S3
Data Collection Python, Requests, BeautifulSoup, PRAW
Evaluation TruLens-style metrics (Sentence Transformers)

Key Features

  • Dual-source RAG — 65% authoritative sources, 35% patient forum content per query, automatically balanced by hybrid retrieval
  • Transparent citations — every response includes inline citations with source name, type (clinical vs. forum), title, URL, and a content excerpt
  • Hybrid search — combines semantic similarity (k-NN) with keyword matching (BM25) for better retrieval than either alone
  • Evaluated quality — custom TruLens-style evaluation across 3 dimensions on 6 representative queries

Evaluation Results

Metric Score What it measures
Context Relevance 0.72 Retrieved docs match the query
Answer Relevance 0.68 Response stays on topic
Groundedness 0.74 Claims are supported by retrieved sources

Data Sources

Authoritative (201 articles): Mayo Clinic, CDC, Healthline, MedicalNewsToday, Medical Xpress, Johns Hopkins, MedlinePlus, WebMD

Patient Forums (1,525 threads): r/diabetes, r/diabetes_t2, r/type2diabetes, r/prediabetes, r/hypertension, r/heartdisease, r/cardiology, r/askcardiology


Setup

Prerequisites

  • Python 3.10+
  • AWS account with Bedrock access (Llama 3 8B + Titan Embeddings enabled)
  • Amazon OpenSearch Serverless collection
  • S3 bucket with processed documents

Install dependencies

cd app
pip install -r requirements.txt

Configure AWS

aws configure

Run the app

cd app
chainlit run app.py

Open http://localhost:8000 in your browser.


Project Structure

.
├── app/
│   ├── app.py                    # Chainlit app + Bedrock RAG integration
│   └── chainlit.md               # Welcome page content
├── data-collection/
│   ├── scripts/                  # Scraping and preprocessing scripts
│   └── data/                     # Collected medical articles and forum threads
├── evaluation/
│   ├── evaluations.py            # TruLens-style evaluation framework
│   ├── consolidated_trulens.png  # Metrics visualization
│   └── indi_query.png            # Per-query analysis
├── docs/                         # Final report, proposal, and presentation
├── knowledge-base/               # Processed documents for S3/Bedrock KB
└── requirements.txt

Screenshots

Consolidated Metrics Per-Query Analysis
Metrics Queries

License

MIT License — see LICENSE

About

AI-powered medical chatbot using RAG to combine clinical sources with patient discussions for trusted health answers on Diabetes and Cardiovascular Disease.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors