Skip to content

Pranay22077/FinSight

Repository files navigation

FinSight Logo

FinSight: Financial Intelligence & Predictive Forecasting

Website: https://fin-sight-v3.vercel.app/

Transforming raw time-series banking data into proactive business intelligence using Conformal Prediction & LLM-Driven Scenario Analysis.

Submission for the NatWest Group Code for Purpose – India Hackathon 2026

System Design Document


1. Overview

FinSight is an advanced financial intelligence platform built specifically for SME Bankers, Retail Banking Managers, and Agri-Bankers.

What it does: It processes raw banking time-series data using a deterministic multi-model mathematical engine to generate statistically rigorous forecasts. It then pipes these statistical anomalies into a constrained LLM reasoning agent to generate contextual, plain-English business narratives.

The problem it solves: Traditional forecasting tools require data scientists to interpret "confidence intervals" and "regime shifts," whereas business leaders need immediate, actionable impacts. FinSight bridges this gap by democratizing advanced statistics—allowing bankers to ask natural language questions (e.g., "What if agriculture supply drops by 10%?") and immediately see visually robust, dynamically adjusted financial models without writing a single line of code. This fundamentally accelerates risk assessment and credit liquidity analysis on the branch floor.


2. Tech Stack

FinSight is built upon a high-performance modern web stack, specifically tailored to handle intensive mathematical computations and dynamic interactive charting natively ensuring stability and speed.

  • Frontend: React 18, Vite, TypeScript, Recharts (SVG charting), Framer Motion (animations), and modular CSS.
  • Backend: FastAPI (Python 3.11), Pydantic (data validation).
  • Mathematical Core: Pandas, NumPy, SciKit-Learn (Linear Regression), Statsmodels (Holt-Winters ETS).
  • Generative AI Layer: Google Gemini API, Groq API (for sub-second NL intent parsing).
  • Deployment: Vercel (Frontend Hosting), Render (Backend Hosting).

Core Dependencies

Frontend (package.json)

  • react, react-dom (UI Framework)
  • recharts (SVG Data Visualizations)
  • framer-motion (Micro-animations)
  • lucide-react (Iconography)
  • jspdf, jspdf-autotable (PDF Export Generation)
  • clsx, tailwind-merge (Styling utilities)

Backend (requirements.txt)

  • fastapi>=0.111.0, uvicorn[standard]>=0.29.0, pydantic>=2.7.1 (Core API boundaries)
  • pandas>=2.1.4, numpy>=1.26.4 (Data structures)
  • statsmodels>=0.14.1, scikit-learn>=1.3.2 (Math Engining)
  • google-genai>=1.16.0, groq>=0.11.0 (LLM Cognitive Layer)
  • python-dotenv>=1.0.1, httpx>=0.28.1 (Config & HTTP boundaries)

3. Install & Run Instructions

We guarantee FinSight is easily reproducible locally. You will need Node.js 18+ and Python 3.11+.

Step 1: Clone the Repository

git clone https://github.com/Pranay22077/FinSight.git
cd FinSight

Step 2: Setup the Python Backend

cd backend
python -m venv venv

# On Mac/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate

# Install strictly locked dependencies
pip install -r requirements.txt
cp .env.example .env

Open .env and insert your Gemini API Key from Google AI Studio (GEMINI_API_KEY=AIzaSy...) or Groq API Key.

# Start the Backend Server (FastAPI)
uvicorn main:app --reload --port 8000

Step 3: Setup the React Frontend

Open a new terminal window in the project root:

cd frontend
npm install
npm run dev

The application will open automatically at http://localhost:3000.


4. What the System Can Do Right Now

All features documented below are 100% implemented and fully functional within the provided codebase. There are no placeholder features.

Capability How it works
3-domain forecasting Switch between SME Cash Flow, Retail Banking, Agri-Banking. Each domain has its own CSV, seasonality config, and currency.
30-day AI forecast Holt-Winters ETS is the primary forecast line. Horizon is configurable (7–90 days) via the POST body.
Uncertainty quantification Conformal Prediction generates mathematically guaranteed 90% confidence interval. NOT standard deviation.
Expanding uncertainty band The band grows using q̂ × √t — further future = wider band. Rendered as a gradient-opacity area on the chart.
Trust Score (0–100) Computed per day by measuring disagreement between 3 models. Drives line colour (blue → amber → red).
Anomaly detection Historical spikes/dips flagged at >2.5σ from a 30-day rolling mean. Rendered as red dots on chart.
What-If NL query User types in plain English. LLM extracts magnitude/direction/delay. Chart re-runs with modified data.
Baseline toggle Optional MA (7-day) overlay. Toggle also available for LR macro trend.
LLM Narrative Headline + recommendation + narrative paragraph generated by Gemini / Groq.
LLM provider switching Swap API keys in your .env to seamlessly switch between lightning-fast Groq parsing or deep Gemini analysis.
Health indicator Navbar shows live status: which LLM, how many datasets loaded, whether LLM is reachable.
Demo-safe fallback If the LLM is down, the app uses hardcoded professional narratives. It never crashes.
Routing & View Splitting Structured UI containing Upload workflows (DropView), Summaries (DigestView), and interaction (DashboardView).
Full-Stack Integration React Vite frontend talks dynamically to FastAPI backend orchestrator rather than local mock states.
Data Ingestion Support for row-by-row live CSV uploads.
Light & Dark Theme Implemented dynamic CSS variables to seamlessly toggle between analytical dark mode and daytime light mode.
Data Exporting Support to export predictions as JSON Context payloads or tabular Raw Time-Series CSV formats.

What is NOT built yet

  • User authentication / multi-user sessions
  • Persistent storage (database layer)

5. Usage Examples & The User Journey

Step 1: Navigating the Dashboard & Selecting Datasets

The user selects a specific banking sector (e.g., SME Revenue, Bank Transactions, or Agri-Prices) activating custom domain boundaries natively.

Dashboard Overview

Step 2: Running the Forecast

By executing the prediction, the UI renders the Conformal Prediction progressive opacity bands, with the CUSUM anomalies dynamically flagged in red.

Forecasting Result

Step 3: Natural Language Scenario Simulation

The banker wants to model a stress scenario. They type: "What if revenue drops 20%?" The LLM instantly translates the string to math vectors, redefining the curve boundaries.

Scenario Input

API Example: Behind the scenes

curl -X POST "http://localhost:8000/api/scenario/parse" \
     -H "Content-Type: application/json" \
     -d '{"query": "What if revenue drops 20%?", "domain": "sme_revenue"}'

Response outputted to our Math Engine:

{
  "scenario_type": "revenue",
  "magnitude": -0.20,
  "explanation": "Calculated a negative 20% operational revenue adjustment.",
  "confidence": 0.98
}

6. Project File Map

FinSight/
│
├── LICENSE                          # Apache 2.0 — required for NatWest
├── .gitignore                       # Excludes .env, node_modules, __pycache__
├── .env.example                     # Template for judges to copy
├── README.md                        # Primary system documentation
│
├── backend/
│   ├── main.py                      # ★ FastAPI app — the orchestration hub
│   ├── schemas.py                   # ★ Pydantic models — all data contracts
│   ├── requirements.txt             # Pinned Python deps
│   ├── .env                         # ★ Environment configuration toggles
│   │
│   ├── math_engine.py               # ★ Engine 1: MA, ETS, LR, Trust, Conformal Prediction
│   ├── llm_copilot.py               # ★ Engine 2: Groq/Gemini translation logic
│   │
│   └── data/                        # Static datasets (Kaggle pre-processed)
│       ├── sme_revenue_gold.csv
│       ├── bank_transactions_gold.csv
│       └── agri_prices_gold.csv
│
└── frontend/ (React UI)
    └── src/
        ├── main.tsx                 # React root mount
        ├── services/api.ts          # Backend fetch routes & CSV ingestion parsing
        └── app/
            ├── App.tsx              # ★ App Shell wrapper
            ├── routes.tsx           # ★ BrowserRouter handling navigation between views
            ├── context/
            │   └── DomainContext.tsx # ★ Universal State & API caller across the App
            └── components/
                ├── layout/Root.tsx  # Core Navigation Sidebar wrapping 
                └── views/
                    ├── DropView.tsx      # ★ Data CSV Ingestion + Domain Selector
                    ├── DigestView.tsx    # ★ 5-Second Exec Summary (KPI logic + Sparkline)
                    ├── DashboardView.tsx # ★ Deep Dive AI Interactive Panel
                    └── ExportView.tsx    # ★ JSON & CSV Local blob downloads

★ = Core files. Understand the components, engines, and the context, and you understand the entire system.


7. System Architecture & Request Lifecycle — End to End

We built FinSight on a highly decoupled architecture, strictly separating the mathematical engine from the UI and the LLM wrappers.

%%{init: { 'theme': 'dark', 'flowchart': { 'nodeSpacing': 70, 'rankSpacing': 90, 'curve': 'basis' } } }%%
flowchart TB
    classDef user      fill:#0c1929,stroke:#38bdf8,stroke-width:2.5px,color:#e0f2fe,font-weight:bold,padding:12px
    classDef gateway   fill:#172554,stroke:#60a5fa,stroke-width:2px,color:#bfdbfe,font-weight:bold
    classDef validator fill:#1e293b,stroke:#94a3b8,stroke-width:1.5px,color:#cbd5e1
    classDef mathcore  fill:#052e16,stroke:#4ade80,stroke-width:2px,color:#bbf7d0,font-weight:bold
    classDef conformal fill:#0f2d1f,stroke:#34d399,stroke-width:1.5px,color:#a7f3d0
    classDef payload   fill:#0a2019,stroke:#6ee7b7,stroke-width:1.5px,color:#d1fae5
    classDef llmagent  fill:#2e1065,stroke:#a78bfa,stroke-width:2px,color:#ede9fe,font-weight:bold
    classDef extsvc    fill:#1e0a4a,stroke:#c084fc,stroke-width:2px,color:#f3e8ff
    classDef ui        fill:#0f2044,stroke:#38bdf8,stroke-width:2px,color:#bae6fd,font-weight:bold

    U(["  Banker  /  End User  "]):::user

    subgraph FE ["  REACT FRONTEND   —   Vite + TypeScript  "]
        direction TB
        UI(["  Dashboard   |   Drop   |   Digest   |   Export  "]):::ui
    end

    subgraph BE ["  PYTHON BACKEND   —   FastAPI  "]
        direction TB
        GW(["  FastAPI Gateway   —   main.py  "]):::gateway
        VAL(["  Pydantic Request Validator   —   schemas.py  "]):::validator

        subgraph MATH ["  MATHEMATICAL CORE   —   math_engine.py  "]
            direction TB
            ENG(["  Math Engine\n  Holt-Winters  ·  OLS Regression  ·  CUSUM Anomaly  "]):::mathcore
            CP(["  Conformal Prediction Engine\n  90th-pct Residual Bounds  ·  sqrt(t) Expansion  "]):::conformal
            AR(["  Forecast Payload\n  ChartPoints  ·  Trust Scores  ·  Anomaly Flags  "]):::payload
        end

        subgraph LLM_BE ["  LLM COGNITIVE LAYER   —   llm_copilot.py  "]
            direction TB
            SA(["  Scenario Agent\n  Natural Language   -->   SimulationParams  "]):::llmagent
            NA(["  Narrative Agent\n  Statistics   -->   Headline  +  Insight  "]):::llmagent
        end
    end

    subgraph EXT ["  EXTERNAL AI SERVICES  "]
        direction LR
        GROQ(["  Groq API\n  Ultra-fast Intent Parsing  "]):::extsvc
        GEM(["  Gemini API\n  Deep Narrative Generation  "]):::extsvc
    end

    U        -- "1  HTTP POST  +  NL Query"        --> GW
    GW       -- "2  Route  +  Deserialise"          --> VAL
    VAL      -- "3  Validated Request"               --> ENG
    VAL      -- "3  Natural Language Query"          --> SA
    SA       -. "4  Constrained Prompt"         .-> GROQ
    GROQ     -. "4  Structured JSON Params"      .-> SA
    SA       -- "5  Delta Simulation Params"         --> ENG
    ENG      -- "6  Residual Series"                 --> CP
    CP       -- "7  Confidence Bounds  +  Opacity"   --> AR
    ENG      -- "7  ETS  ·  MA  ·  LR  ·  Trust"    --> AR
    AR       -- "8  Statistical Context"             --> NA
    NA       -. "9  Constrained Prompt"         .-> GEM
    GEM      -. "9  Headline  +  Narrative"      .-> NA
    AR       -- "10  Aggregated Forecast JSON"       --> GW
    NA       -- "10  Insight Payload"                --> GW
    GW       -- "10  ForecastResponse"               --> UI
    UI       -- "11  Chart Re-renders"               --> U
Loading

The Exact Lifecycle: Trace of "What if revenue drops 20%?"

User (browser)
  │
  ├── [1] components/views/DashboardView.tsx: handleSubmit() fires
  │         → calls runForecast({ nl_query: "What if revenue drops 20%?" })
  │
  ├── [2] context/DomainContext.tsx: runForecast is triggered
  │         → cancels any currently active requests via AbortController
  │         → POST /api/forecast via `fetchForecast` (in lib/api.ts)
  │         payload = { domain: "sme_revenue", horizon_days: 30, nl_query: "..." }
  │
  └── FastAPI backend receives request
        │
        ├── [3] Pydantic validation (schemas.py)
        │       ForecastRequest model validates: domain is Literal, horizon is 7-90, etc.
        │       If invalid → 422 Unprocessable Entity, never reaches engine code
        │
        ├── [4] main.py: forecast() function
        │       Reads sme_revenue_gold.csv from RAM (loaded at startup, not from disk)
        │       Converts to pandas Series
        │
        ├── [5] llm_copilot.py: parse_nl_intent()
        │       Sends prompt to LLM: "Extract What-If params from: What if revenue drops 20%?"
        │       LLM returns JSON: { magnitude: -0.20, direction: "decrease", payment_delay_days: 0 }
        │       (if LLM fails → rule-based regex fallback: finds "20%" and "drops" → -0.20)
        │       → SimulationParams: { magnitude: -0.20, direction: "decrease" }
        │
        ├── [6] math_engine.py: run_math_engine()
        │       apply_scenario(): scales last 30 days of series by 0.80 (1 + -0.20)
        │       detect_anomalies(): finds historical points >2.5σ from 30-day rolling mean
        │       compute_ma_baseline(): rolling(7).mean() → flat array for 30 future days
        │       compute_ets_forecast(): Holt-Winters with weekly seasonality → 30-day array
        │       compute_lr_trend(): OLS slope on time index → 30-day array
        │       compute_trust_scores(): CV = std([MA,ETS,LR]) / mean([MA,ETS,LR]) → per-day scores
        │
        ├── [7] math_engine.py: run_conformal_engine()
        │       Split: train=first 70% (255 rows), calibration=last 30% (110 rows)
        │       Train ETS on training set
        │       Predict calibration set → compute residuals: |actual - predicted|
        │       q̂ = 90th percentile of residuals
        │       upper_bound = ETS forecast + q̂ × √t (t=1..30)
        │       lower_bound = ETS forecast - q̂ × √t
        │       opacity_scores = 1.0 → ~0.2 (decreases with t)
        │
        ├── [8] main.py: build_chart_data()
        │       Last 60 historical rows → ChartPoint with actual= value, is_anomaly= flag
        │       30 future rows → ChartPoint with forecast=, upper/lower=, opacity=, trust_score=
        │       Combined into single flat list (90 ChartPoints total)
        │
        ├── [9] llm_copilot.py: generate_narrative()
        │       Sends prompt: "ETS range £5k-£9k, avg trust 55, 2 anomalies, scenario -20%..."
        │       LLM returns JSON: { headline: "...", recommendation: "...", narrative: "..." }
        │       (if fails → fallback dict with hardcoded SME narrative)
        │
        └── [10] FastAPI returns ForecastResponse JSON
                chart_data: [ChartPoint × 90]
                insights: { trust_score: 55, headline: "...", recommendation: "...", }
                anomalies: [{ date: "2025-03-17", label: "Revenue dip: -4.2σ", ... }]
                scenario_applied: { magnitude: -0.20, direction: "decrease" }
                parsed_nl_query: { magnitude: -0.20, ... }

Frontend receives response
  ├── context/DomainContext.tsx: updates `forecastData` state.
  └── DashboardView components: Auto-rerenders based on the new context variable.

Total time: ~100ms for math engines + API translation time.


8. The Dataset — Deep Dive

The platform ingests real-world, pre-processed datasets sourced from Kaggle and other verified financial repositories. They have been normalized into static "Gold" CSV files via Jupyter Notebook pipelines.

Dataset 1: sme_revenue_gold.csv

  • Source Context: Online Retail II UCI Dataset (Kaggle)
  • Schema: date (YYYY-MM-DD), daily_gross_revenue (float)
  • Rows: 365 (Jan 2025 – Dec 2025)
  • Structure: Derived from Kaggle SME Retail datasets. It accurately retains weekend volume dips (approx 0.35 multiplier compared to weekdays), end-of-month invoice rushes, and contains an embedded real-world anomaly simulating a massive localized supply chain drop in March.
  • Backtesting Performance Metrics:
    • MAE: 7,133.80 | RMSE: 8,756.62 | MAPE: 43.76%
    • Conformal Coverage: 98.57% (Exceeds 90% threshold guarantee)
    • Avg Trust Score: 54.06

Dataset 2: bank_transactions_gold.csv

  • Source Context: Bank Transaction Dataset for Fraud Detection (Kaggle)
  • Schema: date (YYYY-MM-DD), running_balance (float)
  • Rows: 365
  • Structure: Sourced from authentic checking account distributions matching quarterly rent disbursements, salary flows on the 1st of every month, and summer holiday overspends. Tests the engine's ability to alert against impending overdraft bounds.
  • Backtesting Performance Metrics:
    • MAE: 480.20 | RMSE: 587.29 | MAPE: 7.71%
    • Conformal Coverage: 97.14% (Exceeds 90% threshold guarantee)
    • Avg Trust Score: 60.08
    • Improvement Note: Holt-Winters explicitly outperforms the simple MA baseline by 86.79% on this stable longitudinal dataset.

Dataset 3: agri_prices_gold.csv

  • Source Context: Agriculture Vegetables & Fruits Time Series Prices (Kaggle)
  • Schema: date (YYYY-MM-DD), commodity, state, price_per_quintal (float)
  • Rows: 104 (weekly, Jan 2024 – Dec 2025)
  • Structure: Pre-trained on Kaggle agricultural supply chain datasets mapping monsoon supply shortages and harvest supply crashes based on robust 52-week sinusoids.
  • Backtesting Performance Metrics:
    • MAE: 8.44 | RMSE: 9.95 | MAPE: 19.59%
    • Conformal Coverage: 100.0% (Exceeds 90% threshold guarantee)
    • Avg Trust Score: 64.57

9. Backend: Engine 1 — Predictive Mathematics

File: backend/math_engine.py Entry point: run_forecast(df, domain, horizon, alpha, show_baseline, scenario_params)

Function: apply_scenario(series, scenario)

Takes the historical series and mutates the tail (last 30 days) to simulate the scenario before the models run.

# Revenue modifier: multiply last 30 days by (1 + magnitude)
# e.g. magnitude=-0.20 → scale last 30 days to 80% of actual
modified = modified * (1.0 + revenue_modifier_pct / 100.0)

# Payment delay: shift series forward simulating delayed cash inflow
modified = modified.shift(payment_delay_days).fillna(0.0)

Why mutate the tail, not the future? The forecasting models learn trends from historical data. If you modify the future directly, you bypass the model entirely. By modifying the recent past, the models see a realistic "what if this had been happening" and project forward naturally.

Function: holt_winters_forecast(series, horizon, seasonality_period)

model = ExponentialSmoothing(
    series,
    trend="add",           # Additive trend: allows for upward/downward drift
    seasonal="mul",        # Multiplicative seasonality
    seasonal_periods=7,    # Weekly boundaries
)
fit = model.fit(optimized=True)
forecast = fit.forecast(steps=horizon).values

Function: linear_regression_forecast(series, horizon)

Fits a single straight OLS regression line through all prior points to capture the true macro trend without seasonality noise confounding it.

X = np.arange(len(series)).reshape(-1, 1)   
y = series.values                           
model = LinearRegression()
model.fit(X, y)                             # OLS: minimizes Σ(y - (m×x + b))²
future_X = np.arange(len(series), len(series) + horizon).reshape(-1, 1)
return model.predict(future_X)

Function: calculate_trust_score(ma, hw, lr)

This is the core differentiator — per-day model agreement score scaling natively.

stack = np.array([ma, hw, lr])           # 3 model predictions for day i
mu    = np.mean(stack, axis=0)           # Mean of models
sigma = np.std(stack, axis=0)            # Standard Dev of models
cv    = sigma / (np.abs(mu) + 1e-6)      # Coefficient of Variation
trust = np.maximum(0.0, 100.0 - scaling * cv * 100.0)

Function: detect_anomalies(series, window, z_threshold)

# Rolling window for historical drift correction
for i in range(window, len(values)):
    window_vals = values[i - window: i]
    mu  = window_vals.mean()
    std = window_vals.std() + 1e-6
    z   = abs(values[i] - mu) / std   # Z-score in local context
    
    if z > z_threshold:
        # Flag as critical anomaly

10. Backend: Engine 2 — Conformal Prediction

File: backend/math_engine.py (Integrated Math Component)

The Core Problem with Standard Deviation

Standard confidence intervals assume data is Gaussian (bell curve). Financial data has "fat tails" — extreme events are more common than a Gaussian predicts. If we use standard deviation, our confidence bands are systematically too narrow and will be violated more than expected.

Conformal Prediction solves this by making zero distributional assumptions. It only asks: "How wrong has our model actually been on real historical data?"

Step-by-Step Implementation

Step A: Split

n = len(series)
split = max(int(n * 0.70), 2 * seasonal_period + 1)
train_series = series.iloc[:split] # Oldest 70%
calib_series = series.iloc[split:] # Recent 30%

Step B: Train on train set, predict calibration set

calib_forecast = holt_winters_forecast(train_series, calib_len, seasonal_period)

Step C: Compute residuals

residuals = np.abs(calib_series.values - calib_forecast)

Step D: Extract quantile

q_hat = float(np.quantile(residuals, 1 - alpha)) # 90th percentile
# Interpretation: "90% of the time, our model was within £{q_hat} of the truth"

Step E: Apply expanding bounds

time_factors = np.sqrt(np.arange(1, horizon + 1))
upper_bound  = forecast + q_hat * time_factors
lower_bound  = forecast - q_hat * time_factors

The √t factor explained: The further into the future, the more uncertainty compounds. Error grows proportional to √t (analogous to a random walk).


11. Backend: Engine 3 — LLM Cognitive Layer

Two Independent Modules

Module A (Pre)   → Scenario Agent    → Translates NL queries → SimulationParams dict
Module B (Post)  → Narrative Agent   → Maps Math stats → headline/recommendations

Module A: Intent Parser The LLM is instructed to return ONLY a JSON object. No markdown. If the model adds json wrapping, the re.search(r'\{.*\}', raw, re.DOTALL) regex still extracts valid JSON. If the LLM completely fails, a hardcoded regex method scans for digits (20%) and semantic verbs (drop, decrease) to build an identical SimulationParams dict ensuring backend continuity regardless of LLM stability.

Module B: Unified Explainer Inputs injected into the LLM prompt via google-genai integration:

  • Forecast range limits (Lower Bound to Upper Bound constraints)
  • Average Trust Score
  • Anomalies payload mapping

Trust Directive System:

  • if trust < 40: "You MUST lead with: 'Warning: High model disagreement...'"
  • else: "Models agree strongly. Present with confidence."

12. Backend: FastAPI Orchestration & Pydantic Schemas

File: backend/main.py Data loaded entirely into RAM via Pandas on boot.

# Boot sequence routes the static Kaggle assets into RAM bypassing disk I/O.
DATASETS["sme_revenue"] = pd.read_csv(DATA_DIR / "sme_revenue_gold.csv")
DATASETS["bank_transactions"] = pd.read_csv(DATA_DIR / "bank_transactions_gold.csv")
DATASETS["agri_prices"] = pd.read_csv(DATA_DIR / "agri_prices_gold.csv")

File: backend/schemas.py FastAPI automatically validates every incoming request utilizing ForecastRequest. Invalid shapes force immediate 422 Unprocessable Entity responses.

class ForecastRequest(BaseModel):
    domain: Literal["sme_revenue", "bank_transactions", "agri_prices"] = "sme_revenue"
    horizon_days: int = Field(default=30, ge=7, le=90)
    scenario: Optional[SimulationParams] = None
    nl_query: Optional[str] = Field(default=None, max_length=500)

13. Frontend: Component Architecture & State Management

View Breakdown structure

App.tsx
├── <Root Layout>
│     ├── Sidebar (Navigation links to /drop, /digest, /dashboard, /export)
│     └── Main Content Outlet 
│
├── <DropView> (/app/drop)
│     - Allows file drag/drop, runs batch `ingestRow()` looping over API bounds.
│
├── <DigestView> (/app/digest)
│     - Ingests `forecastData` state. Computes `sparkData` array mapping to Recharts miniature graph natively.
│
├── <DashboardView> (/app/dashboard)
│     - Manages What-If querying strings, parsing textual inputs and feeding to Context `runForecast` hooks.
│
└── <ExportView> (/app/export)
      - Transforms JSON responses natively to Blob URLs and converts object arrays to CSV values dynamically.

State Management (DomainContext.tsx) React utilizes native functional contexts specifically mapping fetch() handlers.

const runForecast = useCallback(async ({ nl_query, horizon_days }) => {
    abortRef.current?.abort();
    const controller = new AbortController();
    abortRef.current = controller;
    
    // fetches /api/forecast dynamically and sets forecastData explicitly.
})

Why AbortController? If the user spans domains (SME → Retail → Agri) via quick successive interactions, 3 independent requests fire. The AbortController kills natively propagating HTTP fetches, reducing race condition anomalies.


14. Trade-offs: Every Design Decision Explained

Decision What We Did Why The Cost
Data: Pre-Processed vs Live Used Pre-Cleaned Kaggle CSVs Stable demonstration environment preventing raw unhandled limits from halting the system during showcase. Doesn't query a live SQL database.
LLM: API Driven Cloud Groq & Gemini integration Lightning-fast logic parsing capacity without crashing local resources. Subject to cloud rate limits compared to localized Ollama setups.
Seasonality: Multiplicative seasonal="mul" for ETS Financial data: seasonality scales with trend level Requires >= 2 full seasonal cycles. Short datasets fall back to additive-only.
Anomaly: Rolling window 30-day centered rolling Avoids false positives when businesses grow/shrink 15-day edge effect at start/end of series (rolling is NaN there, skipped)
Conformal: Split vs Full Split Conformal Fast: train once, calibrate once. Runs in <50ms. Less statistically efficient. Lower coverage guarantee for small datasets.
Trust Score: CV of 3 models σ/μ × C formula Simple, interpretable, deterministic Scaling constant is empirically tuned based on visual tolerances.
State: Central Context Global DomainContext Minimizes unnecessary re-running. All tabs share cache. Memory footprints stick around until overwritten; slightly higher client load.
CSV in RAM Load all CSVs at startup Sub-100ms responses during demo Can't reload natively formatted raw files without an endpoint ingestion push.

15. Future Improvements & Scalability

While FinSight provides robust predictive analytics, several architectural expansions are mapped for our enterprise trajectory:

  • PostgreSQL / TimescaleDB Integration: Transitioning away from local RAM-loaded CSV caching to handle massive, stateful institutional workloads natively using connection pooling.
  • Event-Driven Kafka Architecture: Expanding the Pydantic ingestion schemes to support real-time streaming Kafka topics rather than batch static requests, allowing true live monitoring.
  • On-Premise LLM Deployment: Further developing local integrations using Meta's LLaMa 3 (gguf models) natively handling parsing tasks on-premise for banks explicitly requiring zero-cloud-footprint security policies.
  • RBAC Authentication Framework: Building native JWT and OAuth flows to partition views between "Admin Bankers," "Credit Analysts," and "Branch Managers."

16. Open-Source Compliance & Security

  • No Hardcoded Secrets: All private authentication runs through isolated /backend/.env boundaries.
  • Type Strictness: Python TypeHints (-> dict) and TypeScript prevent runtime panics.
  • License: Released under the Apache License 2.0.
  • Developer Certificate of Origin (DCO): All code adheres to strict DCO rules.
# Compliant commit requirement
git commit -s -m "feat: updated conform bounds"

17. The Team

Aryan  •  Riddhima  •  Roopanshu  •  Pranay


Clean, functional, mathematically sound, and purpose-built.

About

Predictive analytics for SME and Agri-Banking. Converts raw financial data into clear risk warnings and business impact statements using an AI-augmented statistical engine

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors