Skip to content

d-roman-halliday/DuckDB-DBT-Example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DuckDB-DBT-Example

This is an example dbt project, for dbt-core (refeerd to as the CLI/Community/Open Source edition) of dbt (which can be used in any environment).

Note If you are new to dbt, this isn't a place to start. Go and read there great documentation (for this CLI, you want dbt core): https://docs.getdbt.com/docs/get-started/getting-started/overview

My intention is for this example (and anything built like it/extending on it) to be self-contained, so it can be replaced and deleted with minimal impact on the computer it's being run on. I hope that it paves the way for self-contained demo projects, useful for tutorials and sharing solutions within the community.

This project is using the dbt-DuckDB database connector, with a local (to the project) profiles.yml configuration.

Using DuckDB as a database is the lightest and most self-contained option. Developers using dbt use a wide array of databases, and I wanted to create something anyone in the community could use for dbt demos (focusing on dbt rather than a specific database) with minimal overhead.

I was inspired by the recent changes to allow for a local (to the project) profiles.yml configuration file.

This can have someone running a pre-built dbt project without installing any large database software, with everything self-contained in a python virtual environment and local directory structure (checked out from GitHub).

'Out of the box' Install & Execute

The instructions are the fastest way to checkout, install dependencies and run this project in your environment.

General

Prerequisites

How software is installed on your system is a personal choice and beyond this document (but as I intend this to be for tutorials I will give some pointers).

  • If you use Linux, chances are you will already know the package manager on your distribution.
  • If you use Mac OS, then the best way to get all the tools/applications required is Homebrew.
  • If you use Windows, then you will have to download a few things from a few different websites and install/configure them.

You will need to have:

  • Git - One can always download the project as a zip file and unzip it into the location, but I expect most people to already have git installed/available.
  • python - https://www.python.org/
  • DuckDB - The dbt-duckdb package installs everything dbt needs for DuckDB, but if you wish to view the database you will need either DuckDB CLI or something which can browse the database file.

Overview of Steps

  1. Create a working directory - so the code, python installation and database all site in a self-contained location.
  2. Clone the code from git - This can be using:
    1. The usual git ssh integration/desktop application.
    2. Use HTTPS for a git pull if you haven't configured git ssh locally.
    3. If you aren't signed into a git account, download the code in a zip file and extract it.
  3. Create a python virtual environment - Keeping all the libraries and dependencies nicely independent.
  4. Start the virtual environment, and install dbt-duckdb
  5. Run the project

Setup Script

Use this at your own peril!

You should start with the section: 'Step by step' Checkout & Execution

I've only tested it on Mac and Linux (ubuntu).

As a proof of concept, a script can be called to configure everything ready for a dbt run. The script in this repository is: git_demo_install.bash

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/d-roman-halliday/DuckDB-DBT-Example/main/install/git_demo_install.bash)"

So all you need to do is

source venv/bin/activate

cd DuckDB-DBT-Example
cd demo

dbt run

'Step by step' Checkout & Execution

These commands have been run/tested using Linux, Mac OS and Windows (both git bash and cmd). I've put comments for the variations.

# Create a working directory
mkdir dbt_demo
cd dbt_demo

# Clone the code from git (use one of)
git clone git@github.com:d-roman-halliday/DuckDB-DBT-Example.git
git clone https://github.com/d-roman-halliday/DuckDB-DBT-Example.git

# Create a python virtual environment (the second option shown as a Windows example, because python isn't always configured in the path on windows)
python -m venv venv
"C:\Program Files\Python310\python.exe" -m venv venv

# Start the virtual environment (The second line is for cmd under windows)...
source venv/bin/activate
venv\Scripts\activate.bat

# ...and install `dbt-duckdb`
pip install --upgrade pip
pip install dbt-DuckDB

# Optional DuckDB CLI (Command Line Interface)
pip install duckcli
    
#Run the project (changing into directory, showing both options for run and build (which includes tests))
cd DuckDB-DBT-Example
cd demo

dbt run

dbt build

View models created

Now it's time to see the database created (the local DuckDB database). This can be done on the command line:

duckcli ../local_duckdb_project_db.duckdb

DuckDB commands & output

Testing the output by querying the database should look like the below:

Version: 0.2.1
GitHub: https://github.com/dbcli/duckcli
../local_duckdb_project_db.duckdb> .tables
+---------------------+
| table_name          |
+---------------------+
| my_first_dbt_model  |
| my_second_dbt_model |
+---------------------+
Time: 0.016s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_first_dbt_model;
+----+
| id |
+----+
| 1  |
| 2  |
+----+
2 rows in set
Time: 0.009s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_second_dbt_model;
+----+
| id |
+----+
| 1  |
+----+
1 row in set
Time: 0.008s

Installation & Configuration Command History

This is how I created and configured the project, if you wish to replicate my process for your own project, then these are the steps I took.

Create a directory for this self-contained project

mkdir demo_dbt_duckdb
cd demo_dbt_duckdb

Check out the empty repo

For setting up the project/repository it needs some configuration.

git clone git@github.com:d-roman-halliday/DuckDB-DBT-Example.git
# Edit this file
vi README.md

Create & configure python venv (Virtual Environment)

This ensures that all we do is self-contained from a software installation and management perspective. For the demonstration, we will stick with whatever default python version is on the system at hand.

# From the 'demo_dbt_duckdb' directory (so the venv is independent of the codebase)
python -m venv venv

# Start the virtual environment
source venv/bin/activate

# Upgrade pip
pip install --upgrade pip
# Install 'dbt-duckdb', which depends on (so includes) 'dbt-core' and other libraries
pip install dbt-duckdb

# Optional DuckDB CLI (Command Line Interface)
pip install duckcli

Create demo project

This checks the dbt installation and creates the demo project (built into the dbt application).

# Test dbt instalation
dbt --version

# Craete 'demo' project which includes the 'example' models
cd DuckDB-DBT-Example
dbt init demo

Configure project & DuckDB

See the dbt docs for DuckDB setup. Since more recent versions of dbt, the profiles.yml file doesn't need to be in the ~/.dbt location, instead, it can be local to a project which is handy for this sort of standalone situation.

# Modify the profiles
vi demo/profiles.yml

profiles.yml

Note there are changes from the original docs:

  1. The configuration name has been changed to local_duckdb_project, this will need to be reflected in the dbt_project.yml.
  2. The database file location has been given as ../local_duckdb_project_db.duckdb this places the file outside of the rest of the dbt code but within the project filesystem, it will need to be removed from the code by the .gitignore file.
  3. The extensions section is commented out (according to the docs this is required for crypto and snapshots). This will be enabled later.
local_duckdb_project:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: '../local_duckdb_project_db.duckdb'
      #optional fields
      schema: duckdb_demo

dbt_project.yml

This change is to reflect the name in the above profiles.yml.

# This setting configures which "profile" dbt uses for this project.
profile: 'local_duckdb_project'

Execution & Testing

Now it's time to test the Execution:

dbt run

The project should run successfully, if it doesn't check the output for useful error messages (dbt is quite good with them). Then try:

dbt build

If you are familiar with the demo/example project there is a bug in one of the models which needs fixing.

vi models/example/my_first_dbt_model.sql

Now it's time to see the database created (the local DuckDB database). Note: Requires duckcli to be installed

duckcli ../local_duckdb_project_db.duckdb

DuckDB commands & output

Testing the output by querying the database should look like the below:

Version: 0.2.1
GitHub: https://github.com/dbcli/duckcli
../local_duckdb_project_db.duckdb> .tables
+---------------------+
| table_name          |
+---------------------+
| my_first_dbt_model  |
| my_second_dbt_model |
+---------------------+
Time: 0.016s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_first_dbt_model;
+----+
| id |
+----+
| 1  |
| 2  |
+----+
2 rows in set
Time: 0.009s
../local_duckdb_project_db.duckdb> SELECT * FROM duckdb_demo.my_second_dbt_model;
+----+
| id |
+----+
| 1  |
+----+
1 row in set
Time: 0.008s

About

An example project using dbt-duckdb with local configuration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages