Conversation
jasnonaz
left a comment
There was a problem hiding this comment.
Bunch of comments, mostly around splitting out some logic in here that probably doesn't need to live in the main skill.
I feel like this is a lot of what, some how and very little why. To the extent we can bake in more opinioned / high signal heuristics for the models we should attempt to.
| dbt unit test uses a trio of the model, given inputs, and expected outputs (Model-Inputs-Outputs): | ||
|
|
||
| 1. `model` - when building this model | ||
| 2. `given` inputs - given a set of source, seeds, and models as preconditions |
There was a problem hiding this comment.
| 2. `given` inputs - given a set of source, seeds, and models as preconditions | |
| 2. `given` inputs - given a set of source, seeds, macros and models as preconditions |
There was a problem hiding this comment.
Since macros go under overrides rather than under given, I'm not going to commit this as-is.
But good point that macros fall under the inputs / pre-conditions umbrella (along with project variables and environment variables).
So we'll either want to add that info here or just leave it to the later sections that cover those scenarios
| ## `sql` | ||
|
|
||
| Using this format: | ||
| - Provides more flexibility for the unit testing column that have a data type not supported by the `dict` or `csv` formats |
There was a problem hiding this comment.
THIS is great - this is the stuff we keep
|
|
||
| # Special cases | ||
|
|
||
| ## Unit testing incremental models |
There was a problem hiding this comment.
incremental and versioned examples should live in their own file with callouts that these are special cases and the model should look into it. For the most part - keep the prose,, move the code.
| 1. There was an error in the way the unit test was constructed (false positive) | ||
| 2. There is an bug is the model (true positive) | ||
|
|
||
| It takes expert judgement to determine one from the other. |
There was a problem hiding this comment.
Can we have some guidance on how to determine one from the other?
|
|
||
| ``` | ||
|
|
||
| ### Fixture files |
There was a problem hiding this comment.
Is this needed given the above section talking about files?
There was a problem hiding this comment.
Partially addressed in 5719ff4.
Can revisit to consolidate any duplicated content.
Co-authored-by: Jason Ganz <jason.ganz64@gmail.com>
…skills into dbeatty/unit-tests
| @@ -0,0 +1,3 @@ | |||
| # Caveats for BigQuery | |||
There was a problem hiding this comment.
I really think we need this type of info! Thanks for adding.
My comments:
- is caveat the correct word here? if we add recommendations and best practices for other parts in dbt for BQ, those might not be caveats. I like
instructions_bigquery.mdor something like that - we could add all of those under a specific folder for adapter/warehouse specific instructions
- we should call out the name of the file in any skill that would need to get this information, otherwise the LLM will not look at it
There was a problem hiding this comment.
we should call out the name of the file in any skill that would need to get this information, otherwise the LLM will not look at it
✅ Done!
is caveat the correct word here? if we add recommendations and best practices for other parts in dbt for BQ, those might not be caveats. I like instructions_bigquery.md or something like that
One definition of caveat is "specific stipulations, conditions, or limitations", so it is a good descriptor in this case. We can call it whatever we want though, and I don't feel strongly either way.
we could add all of those under a specific folder for adapter/warehouse specific instructions
Are you thinking something like this within a subfolder with simply the name of the adapter/warehouse?
my-skill/
├── SKILL.md
└── adapter/
├── bigquery.md (adapter/warehouse specific instructions - loaded when needed)
├── databricks.md (adapter/warehouse specific instructions - loaded when needed)
├── redshift.md (adapter/warehouse specific instructions - loaded when needed)
└── snowflake.md (adapter/warehouse specific instructions - loaded when needed)
Or flat like this?
my-skill/
├── SKILL.md
├── bigquery.md (adapter/warehouse specific instructions - loaded when needed)
├── databricks.md (adapter/warehouse specific instructions - loaded when needed)
├── redshift.md (adapter/warehouse specific instructions - loaded when needed)
└── snowflake.md (adapter/warehouse specific instructions - loaded when needed)
There was a problem hiding this comment.
I was actually thinking of a third option (close to the 1st one)
my-skill1/
└── SKILL.md
my-skill2/
└── SKILL.md
adapter/
├── bigquery.md (adapter/warehouse specific instructions - loaded when needed)
├── databricks.md (adapter/warehouse specific instructions - loaded when needed)
├── redshift.md (adapter/warehouse specific instructions - loaded when needed)
└── snowflake.md (adapter/warehouse specific instructions - loaded when needed)
The adapter specific things we'd have to say will be short enough that we can have 1 file to cover all the skills.
b-per
left a comment
There was a problem hiding this comment.
TY Doug! Added some comments
|
|
||
| ## What are unit tests in dbt | ||
|
|
||
| In software programming, unit tests validate small portions of your functional code, and they work much the same way in dbt. dbt uwnit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. dbt unit tests enable test-driven development, benefiting developer efficiency and code reliability. |
There was a problem hiding this comment.
| In software programming, unit tests validate small portions of your functional code, and they work much the same way in dbt. dbt uwnit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. dbt unit tests enable test-driven development, benefiting developer efficiency and code reliability. | |
| In software programming, unit tests validate small portions of your functional code, and they work much the same way in dbt. dbt unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. dbt unit tests enable test-driven development, benefiting developer efficiency and code reliability. |
| @@ -0,0 +1,42 @@ | |||
| See below for all the required and optional keys in the YAML definition of unit tests. | |||
There was a problem hiding this comment.
Yes to this type of info!!!
This is the information dense type of content that is maybe too dense for humans to reason through, but it is exactly what LLMs need!
|
|
||
| This example creates a new `dim_customers` model with a field `is_valid_email_address` that calculates whether or not the customer’s email is valid: | ||
|
|
||
| <file name='dim_customers.sql'> |
There was a problem hiding this comment.
We haven't used this `<file ...> syntax in the other skills. Open question, should we do it everywhere or should we not use it? I think we should be consistent though.
There was a problem hiding this comment.
| - Verifying that a bug fix solves a bug report for an existing dbt model. | ||
|
|
||
| More examples: | ||
| - When your SQL contains complex logic: |
There was a problem hiding this comment.
Should we add complex joins logic?
|
|
||
| Self explanatory -- the title says it all! | ||
|
|
||
| ### 2. Mock the inputs |
There was a problem hiding this comment.
Should we recommend using dbt show to explore the existing inputs/outputs and sanitize them to make sure they don't contain sensitive data?
There was a problem hiding this comment.
This is a fantastic idea!
Might even be a good stand-alone and reusable skill.
|
From Benoit in internal slack:
|
- Add complex joins to list of scenarios that warrant unit tests - Add tip to use dbt show for exploring input data with sanitization reminder
|
Merging this for now as a v1 🚀 |
resolves #
Description
Checklist
changie newto create a changelog entry