Skip to content

Commit e91791c

Browse files
authored
Merge pull request #17 from dbt-labs/answering-questions
2 parents 82944fb + 2cc8977 commit e91791c

1 file changed

Lines changed: 198 additions & 0 deletions

File tree

  • dbt-semantic-layer/answering-natural-language-questions-with-dbt
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
---
2+
name: answering-natural-language-questions-with-dbt
3+
description: Use when a user asks a business question that requires querying data (e.g., "What were total sales last quarter?"). NOT for validating, testing, or building dbt models during development.
4+
---
5+
6+
# Answering Natural Language Questions with dbt
7+
8+
## Overview
9+
10+
Answer data questions using the best available method: semantic layer first, then SQL modification, then model discovery, then manifest analysis. Always exhaust options before saying "cannot answer."
11+
12+
**Use for:** Business questions from users that need data answers
13+
- "What were total sales last month?"
14+
- "How many active customers do we have?"
15+
- "Show me revenue by region"
16+
17+
**Not for:**
18+
- Validating model logic during development
19+
- Testing dbt models or semantic layer definitions
20+
- Building or modifying dbt models
21+
- `dbt run`, `dbt test`, or `dbt build` workflows
22+
23+
## Decision Flow
24+
25+
```mermaid
26+
flowchart TD
27+
start([Business question received])
28+
check_sl{Semantic layer tools available?}
29+
list_metrics[list_metrics]
30+
metric_exists{Relevant metric exists?}
31+
get_dims[get_dimensions]
32+
sl_sufficient{SL can answer directly?}
33+
query_metrics[query_metrics]
34+
answer([Return answer])
35+
try_compiled[get_metrics_compiled_sql<br/>Modify SQL, execute_sql]
36+
check_discovery{Model discovery tools available?}
37+
try_discovery[get_mart_models<br/>get_model_details<br/>Write SQL, execute]
38+
check_manifest{In dbt project?}
39+
try_manifest[Analyze manifest/catalog<br/>Write SQL]
40+
cannot([Cannot answer])
41+
suggest{In dbt project?}
42+
improvements[Suggest semantic layer changes]
43+
done([Done])
44+
45+
start --> check_sl
46+
check_sl -->|yes| list_metrics
47+
check_sl -->|no| check_discovery
48+
list_metrics --> metric_exists
49+
metric_exists -->|yes| get_dims
50+
metric_exists -->|no| check_discovery
51+
get_dims --> sl_sufficient
52+
sl_sufficient -->|yes| query_metrics
53+
sl_sufficient -->|no| try_compiled
54+
query_metrics --> answer
55+
try_compiled -->|success| answer
56+
try_compiled -->|fail| check_discovery
57+
check_discovery -->|yes| try_discovery
58+
check_discovery -->|no| check_manifest
59+
try_discovery -->|success| answer
60+
try_discovery -->|fail| check_manifest
61+
check_manifest -->|yes| try_manifest
62+
check_manifest -->|no| cannot
63+
try_manifest -->|SQL ready| answer
64+
answer --> suggest
65+
cannot --> done
66+
suggest -->|yes| improvements
67+
suggest -->|no| done
68+
improvements --> done
69+
```
70+
71+
## Quick Reference
72+
73+
| Priority | Condition | Approach | Tools |
74+
|----------|-----------|----------|-------|
75+
| 1 | Semantic layer active | Query metrics directly | `list_metrics`, `get_dimensions`, `query_metrics` |
76+
| 2 | SL active but minor modifications needed (missing dimension, custom filter, case when, different aggregation) | Modify compiled SQL | `get_metrics_compiled_sql`, then `execute_sql` |
77+
| 3 | No SL, discovery tools active | Explore models, write SQL | `get_mart_models`, `get_model_details`, then `show`/`execute_sql` |
78+
| 4 | No MCP, in dbt project | Analyze artifacts, write SQL | Read `target/manifest.json`, `target/catalog.json` |
79+
80+
## Approach 1: Semantic Layer Query
81+
82+
When `list_metrics` and `query_metrics` are available:
83+
84+
1. `list_metrics` - find relevant metric
85+
2. `get_dimensions` - verify required dimensions exist
86+
3. `query_metrics` - execute with appropriate filters
87+
88+
If semantic layer can't answer directly (missing dimension, need custom logic) → go to Approach 2.
89+
90+
## Approach 2: Modified Compiled SQL
91+
92+
When semantic layer has the metric but needs minor modifications:
93+
94+
- Missing dimension (join + group by)
95+
- Custom filter not available as a dimension
96+
- Case when logic for custom categorization
97+
- Different aggregation than what's defined
98+
99+
1. `get_metrics_compiled_sql` - get the SQL that would run (returns raw SQL, not Jinja)
100+
2. Modify SQL to add what's needed
101+
3. `execute_sql` to run the raw SQL
102+
4. **Always suggest** updating the semantic model if the modification would be reusable
103+
104+
```sql
105+
-- Example: Adding sales_rep dimension
106+
WITH base AS (
107+
-- ... compiled metric logic (already resolved to table names) ...
108+
)
109+
SELECT base.*, reps.sales_rep_name
110+
FROM base
111+
JOIN analytics.dim_sales_reps reps ON base.rep_id = reps.id
112+
GROUP BY ...
113+
114+
-- Example: Custom filter
115+
SELECT * FROM (compiled_metric_sql) WHERE region = 'EMEA'
116+
117+
-- Example: Case when categorization
118+
SELECT
119+
CASE WHEN amount > 1000 THEN 'large' ELSE 'small' END as deal_size,
120+
SUM(amount)
121+
FROM (compiled_metric_sql)
122+
GROUP BY 1
123+
```
124+
125+
**Note:** The compiled SQL contains resolved table names, not `{{ ref() }}`. Work with the raw SQL as returned.
126+
127+
## Approach 3: Model Discovery
128+
129+
When no semantic layer but `get_all_models`/`get_model_details` available:
130+
131+
1. `get_mart_models` - start with marts, not staging
132+
2. `get_model_details` for relevant models - understand schema
133+
3. Write SQL using `{{ ref('model_name') }}`
134+
4. `show --inline "..."` or `execute_sql`
135+
136+
**Prefer marts over staging** - marts have business logic applied.
137+
138+
## Approach 4: Manifest/Catalog Analysis
139+
140+
When in a dbt project but no MCP server:
141+
142+
1. Check for `target/manifest.json` and `target/catalog.json`
143+
2. **Filter before reading** - these files can be large
144+
145+
```bash
146+
# Find mart models in manifest
147+
jq '.nodes | to_entries | map(select(.key | startswith("model.") and contains("mart"))) | .[].value | {name: .name, schema: .schema, columns: .columns}' target/manifest.json
148+
149+
# Get column info from catalog
150+
jq '.nodes["model.project_name.model_name"].columns' target/catalog.json
151+
```
152+
153+
3. Write SQL based on discovered schema
154+
4. Explain: "This SQL should run in your warehouse. I cannot execute it without database access."
155+
156+
## Suggesting Improvements
157+
158+
**When in a dbt project**, suggest semantic layer changes after answering (or when cannot answer):
159+
160+
| Gap | Suggestion |
161+
|-----|------------|
162+
| Metric doesn't exist | "Add a metric definition to your semantic model" |
163+
| Dimension missing | "Add `dimension_name` to the dimensions list in the semantic model" |
164+
| No semantic layer | "Consider adding a semantic layer for this data" |
165+
166+
**Stay at semantic layer level.** Do NOT suggest:
167+
- Database schema changes
168+
- ETL pipeline modifications
169+
- "Ask your data engineering team to..."
170+
171+
## Rationalizations to Resist
172+
173+
| You're Thinking... | Reality |
174+
|--------------------|---------|
175+
| "Semantic layer doesn't support this exact query" | Get compiled SQL and modify it (Approach 2) |
176+
| "No MCP tools, can't help" | Check for manifest/catalog locally |
177+
| "User needs this quickly, skip the systematic check" | Systematic approach IS the fastest path |
178+
| "Just write SQL, it's faster" | Semantic layer exists for a reason - use it first |
179+
| "The dimension doesn't exist in the data" | Maybe it exists but not in semantic layer config |
180+
181+
## Red Flags - STOP
182+
183+
- Writing SQL without checking if semantic layer can answer
184+
- Saying "cannot answer" without trying all 4 approaches
185+
- Suggesting database-level fixes for semantic layer gaps
186+
- Reading entire manifest.json without filtering
187+
- Using staging models when mart models exist
188+
- Using this to validate model correctness rather than answer business questions
189+
190+
## Common Mistakes
191+
192+
| Mistake | Fix |
193+
|---------|-----|
194+
| Giving up when SL can't answer directly | Get compiled SQL and modify it |
195+
| Querying staging models | Use `get_mart_models` first |
196+
| Reading full manifest.json | Use jq to filter |
197+
| Suggesting ETL changes | Keep suggestions at semantic layer |
198+
| Not checking tool availability | List available tools before choosing approach |

0 commit comments

Comments
 (0)