This is an example dbt project, for dbt-core (refeerd to as the CLI/Community/Open Source edition)
of dbt (which can be used in any environment).
Note If you are new to dbt, this isn't a place to start. Go and read there great documentation (for this CLI, you want dbt core): https://docs.getdbt.com/docs/get-started/getting-started/overview
My intention is for this example (and anything built like it/extending on it) to be self-contained, so it can be replaced and deleted with minimal impact on the computer it's being run on. I hope that it paves the way for self-contained demo projects, useful for tutorials and sharing solutions within the community.
This project is using the dbt-DuckDB database connector, with a local (to the project) profiles.yml configuration.
Using DuckDB as a database is the lightest and most self-contained option.
Developers using dbt use a wide array of databases, and I wanted to create something anyone in the community could use for dbt demos (focusing on dbt rather than a specific database) with minimal overhead.
I was inspired by the recent changes to allow for a local (to the project) profiles.yml configuration file.
This can have someone running a pre-built dbt project without installing any large database software, with everything self-contained in a python virtual environment and local directory structure (checked out from GitHub).
The instructions are the fastest way to checkout, install dependencies and run this project in your environment.
How software is installed on your system is a personal choice and beyond this document (but as I intend this to be for tutorials I will give some pointers).
- If you use Linux, chances are you will already know the package manager on your distribution.
- If you use Mac OS, then the best way to get all the tools/applications required is Homebrew.
- If you use Windows, then you will have to download a few things from a few different websites and install/configure them.
You will need to have:
- Git - One can always download the project as a zip file and unzip it into the location, but I expect most people to already have git installed/available.
- The standard version from https://git-scm.com/
- The desktop application from GitHub https://desktop.github.com/
- Embedded into another application/IDE (such as Egit for eclipse)
- python - https://www.python.org/
- DuckDB - The
dbt-duckdbpackage installs everything dbt needs for DuckDB, but if you wish to view the database you will need either DuckDB CLI or something which can browse the database file.
- Create a working directory - so the code, python installation and database all site in a self-contained location.
- Clone the code from git - This can be using:
- The usual git ssh integration/desktop application.
- Use
HTTPSfor a git pull if you haven't configured git ssh locally. - If you aren't signed into a git account, download the code in a zip file and extract it.
- Create a python virtual environment - Keeping all the libraries and dependencies nicely independent.
- Start the virtual environment, and install
dbt-duckdb - Run the project
Use this at your own peril!
You should start with the section: 'Step by step' Checkout & Execution
I've only tested it on Mac and Linux (ubuntu).
As a proof of concept, a script can be called to configure everything ready for a dbt run. The script in this repository is:
git_demo_install.bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/d-roman-halliday/DuckDB-DBT-Example/main/install/git_demo_install.bash)"
So all you need to do is
source venv/bin/activate
cd DuckDB-DBT-Example
cd demo
dbt run
These commands have been run/tested using Linux, Mac OS and Windows (both git bash and cmd). I've put comments for the variations.
# Create a working directory
mkdir dbt_demo
cd dbt_demo
# Clone the code from git (use one of)
git clone git@github.com:d-roman-halliday/DuckDB-DBT-Example.git
git clone https://github.com/d-roman-halliday/DuckDB-DBT-Example.git
# Create a python virtual environment (the second option shown as a Windows example, because python isn't always configured in the path on windows)
python -m venv venv
"C:\Program Files\Python310\python.exe" -m venv venv
# Start the virtual environment (The second line is for cmd under windows)...
source venv/bin/activate
venv\Scripts\activate.bat
# ...and install `dbt-duckdb`
pip install --upgrade pip
pip install dbt-DuckDB
# Optional DuckDB CLI (Command Line Interface)
pip install duckcli
#Run the project (changing into directory, showing both options for run and build (which includes tests))
cd DuckDB-DBT-Example
cd demo
dbt run
dbt build
Now it's time to see the database created (the local DuckDB database). This can be done on the command line:
duckcli ../local_duckdb_project_db.duckdb
Testing the output by querying the database should look like the below:
Version: 0.2.1
GitHub: https://github.com/dbcli/duckcli
../local_duckdb_project_db.duckdb> .tables
+---------------------+
| table_name |
+---------------------+
| my_first_dbt_model |
| my_second_dbt_model |
+---------------------+
Time: 0.016s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_first_dbt_model;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set
Time: 0.009s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_second_dbt_model;
+----+
| id |
+----+
| 1 |
+----+
1 row in set
Time: 0.008s
This is how I created and configured the project, if you wish to replicate my process for your own project, then these are the steps I took.
mkdir demo_dbt_duckdb
cd demo_dbt_duckdb
For setting up the project/repository it needs some configuration.
git clone git@github.com:d-roman-halliday/DuckDB-DBT-Example.git
# Edit this file
vi README.md
This ensures that all we do is self-contained from a software installation and management perspective. For the demonstration, we will stick with whatever default python version is on the system at hand.
# From the 'demo_dbt_duckdb' directory (so the venv is independent of the codebase)
python -m venv venv
# Start the virtual environment
source venv/bin/activate
# Upgrade pip
pip install --upgrade pip
# Install 'dbt-duckdb', which depends on (so includes) 'dbt-core' and other libraries
pip install dbt-duckdb
# Optional DuckDB CLI (Command Line Interface)
pip install duckcli
This checks the dbt installation and creates the demo project (built into the dbt application).
# Test dbt instalation
dbt --version
# Craete 'demo' project which includes the 'example' models
cd DuckDB-DBT-Example
dbt init demo
See the dbt docs for DuckDB setup.
Since more recent versions of dbt, the profiles.yml file doesn't need to be in the ~/.dbt location, instead, it can be local to a project which is handy for this sort of standalone situation.
# Modify the profiles
vi demo/profiles.yml
Note there are changes from the original docs:
- The configuration name has been changed to
local_duckdb_project, this will need to be reflected in thedbt_project.yml. - The database file location has been given as
../local_duckdb_project_db.duckdbthis places the file outside of the rest of the dbt code but within the project filesystem, it will need to be removed from the code by the.gitignorefile. - The
extensionssection is commented out (according to the docs this is required for crypto and snapshots). This will be enabled later.
local_duckdb_project:
target: dev
outputs:
dev:
type: duckdb
path: '../local_duckdb_project_db.duckdb'
#optional fields
schema: duckdb_demo
This change is to reflect the name in the above profiles.yml.
# This setting configures which "profile" dbt uses for this project.
profile: 'local_duckdb_project'
Now it's time to test the Execution:
dbt run
The project should run successfully, if it doesn't check the output for useful error messages (dbt is quite good with them). Then try:
dbt build
If you are familiar with the demo/example project there is a bug in one of the models which needs fixing.
vi models/example/my_first_dbt_model.sql
Now it's time to see the database created (the local DuckDB database). Note: Requires duckcli to be installed
duckcli ../local_duckdb_project_db.duckdb
Testing the output by querying the database should look like the below:
Version: 0.2.1
GitHub: https://github.com/dbcli/duckcli
../local_duckdb_project_db.duckdb> .tables
+---------------------+
| table_name |
+---------------------+
| my_first_dbt_model |
| my_second_dbt_model |
+---------------------+
Time: 0.016s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_first_dbt_model;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set
Time: 0.009s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_second_dbt_model;
+----+
| id |
+----+
| 1 |
+----+
1 row in set
Time: 0.008s