DBT Basics
A quick Introduction to DBT
DBT is pretty powerful and you can do a lot of fancy stuff with it. But let's keep it simple first and focus on the basic dbt project structure first:
my_dbt_project
├── dbt_project.yaml
├── profiles.yaml
└── models
├── pokemon__model_a.sql
├── pokemon__model_b.sql
├── ...
└── schema.yaml
dbt_project.yaml
In the dbt_project.yaml
, you can define the metadata of your project. For example:
- Naming
- How to store your tables?
- Defining directories
An example model dbt_project.yaml
file
name: 'pokemon_analytics'
version: '1.0.0'
config-version: 2
profile: 'postgres'
model-paths: ["models"]
target-path: "target"
clean-targets:
- "target"
- "dbt_packages"
models:
pokemon_analytics:
materialized: table
intermediate:
+schema: intermediate
analytics:
+schema: analytics
profiles.yaml
The profiles.yaml
holds all of your connection information. Here you can add multiple outputs (for example database environments).
An example model profiles.yaml
file
postgres:
target: dev
outputs:
dev:
type: postgres
host: localhost
port: 5432
dbname: pokemon
user: postgre
models/
DBT uses the term model to define a sql logic. So each model represents a .sql
file. In the models directory, you can also use subdirectories to organize your models (e.g. "finance models", "marketing models", etc.).
Within your models directory, you can also create documentation and declare tests for your models. You can do that by simply creating another .yaml
file and fill it with the required information.
An example model schema.yaml
file
version: 2
models:
- name: pokemon__model_a
description: This table is super nice!
columns:
- name: pokemon_id
desription: Unique identifier for a pokemon.
tests:
- unique
- not_null
With this .yaml we are saying, that the column pokemon_id
in the model pokemon__model_a
should be unique and not null.
DBT will then automatically test this for us and throw an error, if the test fails.
ref()
As mentioned above, dbt is using jinja templating
to boost your SQL development.
Maybe the most famous use of jinja in DBT is the ref()
function, which is used to reference other DBT models.
-- With the ref() function you can reference other DBT models:
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
SELECT
*
FROM
{{ ref(my_table) }}
-- This compiles to this:
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
SELECT
*
FROM
my_schema.my_table
Running dbt
There are multiple DBT commands to choose from. Here are the basic ones:
dbt run # creates all models
dbt test # runs all tests
dbt build # runs both commands for each model
With dbt run
, dbt checks your models directory and runs all of your models in the correct order.
If you want to select a specific model or directory you can use --select
:
dbt build --select 1_staging # runs and tests all models in the 1_staging directory
dbt build --select stg__pokemon__donations # rund and tests ONLY the model called stg__pokemon__donations
You can also create a webserver with all of your documentation:
dbt docs generate
dbt docs serve
If you want to compile your jinja sql model, use:
dbt compile --select stg__pokemon__donations # compiles the models outputs the actual sql that runs on your database