|
| 1 | +--- |
| 2 | +name: answering-natural-language-questions-with-dbt |
| 3 | +description: Use when a user asks a business question that requires querying data (e.g., "What were total sales last quarter?"). NOT for validating, testing, or building dbt models during development. |
| 4 | +--- |
| 5 | + |
| 6 | +# Answering Natural Language Questions with dbt |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +Answer data questions using the best available method: semantic layer first, then SQL modification, then model discovery, then manifest analysis. Always exhaust options before saying "cannot answer." |
| 11 | + |
| 12 | +**Use for:** Business questions from users that need data answers |
| 13 | +- "What were total sales last month?" |
| 14 | +- "How many active customers do we have?" |
| 15 | +- "Show me revenue by region" |
| 16 | + |
| 17 | +**Not for:** |
| 18 | +- Validating model logic during development |
| 19 | +- Testing dbt models or semantic layer definitions |
| 20 | +- Building or modifying dbt models |
| 21 | +- `dbt run`, `dbt test`, or `dbt build` workflows |
| 22 | + |
| 23 | +## Decision Flow |
| 24 | + |
| 25 | +```mermaid |
| 26 | +flowchart TD |
| 27 | + start([Business question received]) |
| 28 | + check_sl{Semantic layer tools available?} |
| 29 | + list_metrics[list_metrics] |
| 30 | + metric_exists{Relevant metric exists?} |
| 31 | + get_dims[get_dimensions] |
| 32 | + sl_sufficient{SL can answer directly?} |
| 33 | + query_metrics[query_metrics] |
| 34 | + answer([Return answer]) |
| 35 | + try_compiled[get_metrics_compiled_sql<br/>Modify SQL, execute_sql] |
| 36 | + check_discovery{Model discovery tools available?} |
| 37 | + try_discovery[get_mart_models<br/>get_model_details<br/>Write SQL, execute] |
| 38 | + check_manifest{In dbt project?} |
| 39 | + try_manifest[Analyze manifest/catalog<br/>Write SQL] |
| 40 | + cannot([Cannot answer]) |
| 41 | + suggest{In dbt project?} |
| 42 | + improvements[Suggest semantic layer changes] |
| 43 | + done([Done]) |
| 44 | +
|
| 45 | + start --> check_sl |
| 46 | + check_sl -->|yes| list_metrics |
| 47 | + check_sl -->|no| check_discovery |
| 48 | + list_metrics --> metric_exists |
| 49 | + metric_exists -->|yes| get_dims |
| 50 | + metric_exists -->|no| check_discovery |
| 51 | + get_dims --> sl_sufficient |
| 52 | + sl_sufficient -->|yes| query_metrics |
| 53 | + sl_sufficient -->|no| try_compiled |
| 54 | + query_metrics --> answer |
| 55 | + try_compiled -->|success| answer |
| 56 | + try_compiled -->|fail| check_discovery |
| 57 | + check_discovery -->|yes| try_discovery |
| 58 | + check_discovery -->|no| check_manifest |
| 59 | + try_discovery -->|success| answer |
| 60 | + try_discovery -->|fail| check_manifest |
| 61 | + check_manifest -->|yes| try_manifest |
| 62 | + check_manifest -->|no| cannot |
| 63 | + try_manifest -->|SQL ready| answer |
| 64 | + answer --> suggest |
| 65 | + cannot --> done |
| 66 | + suggest -->|yes| improvements |
| 67 | + suggest -->|no| done |
| 68 | + improvements --> done |
| 69 | +``` |
| 70 | + |
| 71 | +## Quick Reference |
| 72 | + |
| 73 | +| Priority | Condition | Approach | Tools | |
| 74 | +|----------|-----------|----------|-------| |
| 75 | +| 1 | Semantic layer active | Query metrics directly | `list_metrics`, `get_dimensions`, `query_metrics` | |
| 76 | +| 2 | SL active but minor modifications needed (missing dimension, custom filter, case when, different aggregation) | Modify compiled SQL | `get_metrics_compiled_sql`, then `execute_sql` | |
| 77 | +| 3 | No SL, discovery tools active | Explore models, write SQL | `get_mart_models`, `get_model_details`, then `show`/`execute_sql` | |
| 78 | +| 4 | No MCP, in dbt project | Analyze artifacts, write SQL | Read `target/manifest.json`, `target/catalog.json` | |
| 79 | + |
| 80 | +## Approach 1: Semantic Layer Query |
| 81 | + |
| 82 | +When `list_metrics` and `query_metrics` are available: |
| 83 | + |
| 84 | +1. `list_metrics` - find relevant metric |
| 85 | +2. `get_dimensions` - verify required dimensions exist |
| 86 | +3. `query_metrics` - execute with appropriate filters |
| 87 | + |
| 88 | +If semantic layer can't answer directly (missing dimension, need custom logic) → go to Approach 2. |
| 89 | + |
| 90 | +## Approach 2: Modified Compiled SQL |
| 91 | + |
| 92 | +When semantic layer has the metric but needs minor modifications: |
| 93 | + |
| 94 | +- Missing dimension (join + group by) |
| 95 | +- Custom filter not available as a dimension |
| 96 | +- Case when logic for custom categorization |
| 97 | +- Different aggregation than what's defined |
| 98 | + |
| 99 | +1. `get_metrics_compiled_sql` - get the SQL that would run (returns raw SQL, not Jinja) |
| 100 | +2. Modify SQL to add what's needed |
| 101 | +3. `execute_sql` to run the raw SQL |
| 102 | +4. **Always suggest** updating the semantic model if the modification would be reusable |
| 103 | + |
| 104 | +```sql |
| 105 | +-- Example: Adding sales_rep dimension |
| 106 | +WITH base AS ( |
| 107 | + -- ... compiled metric logic (already resolved to table names) ... |
| 108 | +) |
| 109 | +SELECT base.*, reps.sales_rep_name |
| 110 | +FROM base |
| 111 | +JOIN analytics.dim_sales_reps reps ON base.rep_id = reps.id |
| 112 | +GROUP BY ... |
| 113 | + |
| 114 | +-- Example: Custom filter |
| 115 | +SELECT * FROM (compiled_metric_sql) WHERE region = 'EMEA' |
| 116 | + |
| 117 | +-- Example: Case when categorization |
| 118 | +SELECT |
| 119 | + CASE WHEN amount > 1000 THEN 'large' ELSE 'small' END as deal_size, |
| 120 | + SUM(amount) |
| 121 | +FROM (compiled_metric_sql) |
| 122 | +GROUP BY 1 |
| 123 | +``` |
| 124 | + |
| 125 | +**Note:** The compiled SQL contains resolved table names, not `{{ ref() }}`. Work with the raw SQL as returned. |
| 126 | + |
| 127 | +## Approach 3: Model Discovery |
| 128 | + |
| 129 | +When no semantic layer but `get_all_models`/`get_model_details` available: |
| 130 | + |
| 131 | +1. `get_mart_models` - start with marts, not staging |
| 132 | +2. `get_model_details` for relevant models - understand schema |
| 133 | +3. Write SQL using `{{ ref('model_name') }}` |
| 134 | +4. `show --inline "..."` or `execute_sql` |
| 135 | + |
| 136 | +**Prefer marts over staging** - marts have business logic applied. |
| 137 | + |
| 138 | +## Approach 4: Manifest/Catalog Analysis |
| 139 | + |
| 140 | +When in a dbt project but no MCP server: |
| 141 | + |
| 142 | +1. Check for `target/manifest.json` and `target/catalog.json` |
| 143 | +2. **Filter before reading** - these files can be large |
| 144 | + |
| 145 | +```bash |
| 146 | +# Find mart models in manifest |
| 147 | +jq '.nodes | to_entries | map(select(.key | startswith("model.") and contains("mart"))) | .[].value | {name: .name, schema: .schema, columns: .columns}' target/manifest.json |
| 148 | + |
| 149 | +# Get column info from catalog |
| 150 | +jq '.nodes["model.project_name.model_name"].columns' target/catalog.json |
| 151 | +``` |
| 152 | + |
| 153 | +3. Write SQL based on discovered schema |
| 154 | +4. Explain: "This SQL should run in your warehouse. I cannot execute it without database access." |
| 155 | + |
| 156 | +## Suggesting Improvements |
| 157 | + |
| 158 | +**When in a dbt project**, suggest semantic layer changes after answering (or when cannot answer): |
| 159 | + |
| 160 | +| Gap | Suggestion | |
| 161 | +|-----|------------| |
| 162 | +| Metric doesn't exist | "Add a metric definition to your semantic model" | |
| 163 | +| Dimension missing | "Add `dimension_name` to the dimensions list in the semantic model" | |
| 164 | +| No semantic layer | "Consider adding a semantic layer for this data" | |
| 165 | + |
| 166 | +**Stay at semantic layer level.** Do NOT suggest: |
| 167 | +- Database schema changes |
| 168 | +- ETL pipeline modifications |
| 169 | +- "Ask your data engineering team to..." |
| 170 | + |
| 171 | +## Rationalizations to Resist |
| 172 | + |
| 173 | +| You're Thinking... | Reality | |
| 174 | +|--------------------|---------| |
| 175 | +| "Semantic layer doesn't support this exact query" | Get compiled SQL and modify it (Approach 2) | |
| 176 | +| "No MCP tools, can't help" | Check for manifest/catalog locally | |
| 177 | +| "User needs this quickly, skip the systematic check" | Systematic approach IS the fastest path | |
| 178 | +| "Just write SQL, it's faster" | Semantic layer exists for a reason - use it first | |
| 179 | +| "The dimension doesn't exist in the data" | Maybe it exists but not in semantic layer config | |
| 180 | + |
| 181 | +## Red Flags - STOP |
| 182 | + |
| 183 | +- Writing SQL without checking if semantic layer can answer |
| 184 | +- Saying "cannot answer" without trying all 4 approaches |
| 185 | +- Suggesting database-level fixes for semantic layer gaps |
| 186 | +- Reading entire manifest.json without filtering |
| 187 | +- Using staging models when mart models exist |
| 188 | +- Using this to validate model correctness rather than answer business questions |
| 189 | + |
| 190 | +## Common Mistakes |
| 191 | + |
| 192 | +| Mistake | Fix | |
| 193 | +|---------|-----| |
| 194 | +| Giving up when SL can't answer directly | Get compiled SQL and modify it | |
| 195 | +| Querying staging models | Use `get_mart_models` first | |
| 196 | +| Reading full manifest.json | Use jq to filter | |
| 197 | +| Suggesting ETL changes | Keep suggestions at semantic layer | |
| 198 | +| Not checking tool availability | List available tools before choosing approach | |
0 commit comments