A full-stack analytics app that lets users query a retail dataset in plain English. GPT-5 translates the input into ClickHouse SQL using a Context Free Grammar (CFG) that constrains the model's output to syntactically valid, safe SELECT statements.
User (browser)
│
▼
Frontend — React + Vite + TypeScript
│ POST /api/query { natural_language }
▼
Backend — Python + FastAPI
│
├── LLM Service (llm.py)
│ GPT-5 Responses API
│ tools: [{ type: "custom", format: { type: "grammar", syntax: "lark", definition: CFG } }]
│ → grammar-constrained SQL string
│
├── CFG Grammar (cfg_grammar.py) ◄── see below
│
└── ClickHouse Service (clickhouse.py)
executes SQL against train_clean view
returns columns + rows
│
▼
Frontend renders StatCard (single value) or paginated ResultsTable
The grammar is the core safety and correctness mechanism. It is written in Lark EBNF and passed directly to GPT-5 via the Responses API. The model's token sampler is constrained by a pushdown automaton over the grammar — it is physically unable to emit tokens that violate the rules.
What the grammar enforces:
| Rule | Effect |
|---|---|
select_stmt starts with SELECT |
No INSERT, UPDATE, DELETE, DROP, CREATE |
FROM train_clean is hardcoded |
Cannot query any other table |
column_ref: COLUMN_NAME |
Only whitelisted columns can be referenced |
BLOCKED_NAME terminal |
customer_name, credit_card, ssn, password can never appear — not even as aliases |
agg_func whitelist |
Only sum, count, avg, min, max |
| Date functions | Only subtractHours, subtractDays, toMonth, toYear, etc. |
| No subqueries, JOINs, HAVING, UNION | Grammar has no production rules for them |
Key design: string terminals in Lark have higher priority than regex terminals. BLOCKED_NAME: "customer_name" | "credit_card" | ... causes the lexer to match those tokens before IDENTIFIER can, making them unusable anywhere in the grammar — column, alias, or string value.
ClickHouse view: train_clean — Global Superstore retail dataset, 4 years of orders.
| Column | Type |
|---|---|
| order_id, customer_id, product_id | String |
| order_date, ship_date | DateTime |
| region, category, sub_category, segment | String |
| ship_mode, city, state, country | String |
| product_name | String |
| sales | Float64 |
A script-based eval framework that tests the system across three dimensions. No test framework required — plain Python.
evals/
├── helpers.py # run_query(nl) + normalize(sql)
├── test_consistency.py # semantic equivalents → identical SQL
├── test_correctness.py # SQL contains expected structural elements
├── test_safety.py # adversarial inputs + CFG enforcement
└── run_evals.py # runner with summary + exit code
Run:
cd evals
python run_evals.pyCategories:
- Consistency — semantically equivalent phrasings (e.g. "total sales by region" vs "show sales per region") produce identical SQL after normalization
- Correctness — generated SQL contains expected fragments (
sum(,group by,subtractdays, etc.) - Safety — adversarial inputs cannot produce destructive SQL, JOINs, PII column references, or out-of-scope table access
Current score: 20/20
# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # fill in OPENAI_API_KEY + ClickHouse vars
uvicorn app.main:app --reload --port 8000
# Frontend
cd frontend
npm install
cp .env.example .env # set VITE_API_BASE_URL=http://localhost:8000
npm run devTwo Railway services:
| Service | Root Dir | Build | Start |
|---|---|---|---|
| Backend | backend/ |
pip install -r requirements.txt |
uvicorn app.main:app --host 0.0.0.0 --port $PORT |
| Frontend | frontend/ |
npm run build |
npm run start |
Set VITE_API_BASE_URL to the backend Railway URL before the frontend build. Set CORS_ORIGINS in the backend to the frontend Railway URL.