Creating a Postgres docker container and connecting dbt

Creating a Postgres docker container and connecting dbt

Table of Contents

Introduction

I am currently researching dbt, a data transformation tool that allows you to transform data in your warehouse more effectively. One of the things that is really interesting about it, is that dbt has a set of adapters that allow you to interchange between different database engines.

It is also really easy to get started with local development. I initially tried to get the SQL Server adapter up and running because I am able to run SQL Server locally on my Mac. However, the SQL Server adapter is a community adapter and it’s not so quick to get started, so for now I am going to work with Postgres.

I have chosen to use the Postgres adaptor because I can spin up a Postgres database locally in a Docker container in a few minutes. This means I can do dedicated development offline without having to sign up for any cloud services. Here is how I set things up.

Pre-requisites

Before you can complete the steps below, you need to install and configure Docker from Docker docs.

Create a container instance

The command I am going to run to create a Postgres container is:

# make sure that Docker is running when you run this command
Docker run -d --name dbt-quickstart -p 5432:5432 -e POSTGRES_USER=justin -e POSTGRES_PASSWORD=justin postgres:15-alpine 

The parameters are as follows:

  • --name dbt-quickstart is the name of the container.
  • -p 5432:5432 maps the port 5432 on the host to the port 5432 in the container.
  • -e POSTGRES_USER=justin is the user name.
  • -e POSTGRES_PASSWORD=justin is the password (look away security fans).
  • postgres:15-alpine is the image that will be used to create the container.

If you haven’t had a Postgres container up and running before, then Docker will download the image, then create the container.

Unable to find image 'postgres:15-alpine' locally
15-alpine: Pulling from library/postgres
bca4290a9639: Pull complete 
73755f5e6a3e: Pull complete 
3a60b4a71875: Pull complete 
56089f613fb2: Pull complete 
3972f5d4acdc: Pull complete 
edbb275221bf: Pull complete 
fe375eefc236: Pull complete 
f8ac95c0dcf9: Pull complete 
f89251e2b9f9: Pull complete 
Digest: sha256:8a8d55343d6fc456cb183453e3094ff140b984157b36b48f817dd581654f2aec
Status: Downloaded newer image for postgres:15-alpine
8b122e2d8b752c8ef7a32094a897c826193415151bcf0fa9371edafa46e3196a

You can check that the container is running by running Docker ps or by reviewing the Docker dashboard.

the screenshot shows the Docker dashboard, displaying a running container called dbt-quickstart, it also displays the image postgres:15-alpine and is listening on port 5432

Install and configure dbt-sqlserver

Next I need to install the dbt adapter for postgres:

# installs dbt adapter
pip install dbt-sqlserver

Run dbt

I am using the jaffle-shop sandbox project to review dbt. All I need to do from here is set up my profile in the profiles.yml file. I want this project to be built within my new postgres database, and so the profiles.yml file will look like this:

quickstart-postgres:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: justin
      password: justin
      port:  5432
      dbname: jaffle
      schema: dbo

By updating the profile in the dbt_project.yml file to quickstart-postgres I can now run dbt commands to build and test the project against my Postgres database.

Accessing the database in Azure Data Studio

I can also connect to the database using Azure Data Studio. I can see the tables that have been created by dbt as a result of running the dbt seed command so I know that everything is working.

The image shows Azure Data Studio connected to my local Postgres database, the jaffle database is expanded showing multiple tables created by the dbt seed command

And so from there I can run dbt run to build the project:

dbt run

# output
Running with dbt=1.7.14
Registered adapter: postgres=1.7.14
Found 13 models, 6 seeds, 6 sources, 0 exposures, 19 metrics, 651 macros, 0 groups, 6 semantic models
Concurrency: 1 threads (target='dev')

1 of 13 START sql table model dbo.metricflow_time_spine ........................ [RUN]
1 of 13 OK created sql table model dbo.metricflow_time_spine ................... [SELECT 3651 in 0.14s]
2 of 13 START sql view model dbo.stg_customers ................................. [RUN]
2 of 13 OK created sql view model dbo.stg_customers ............................ [CREATE VIEW in 0.06s]
3 of 13 START sql view model dbo.stg_locations ................................. [RUN]
3 of 13 OK created sql view model dbo.stg_locations ............................ [CREATE VIEW in 0.05s]
4 of 13 START sql view model dbo.stg_order_items ............................... [RUN]
4 of 13 OK created sql view model dbo.stg_order_items .......................... [CREATE VIEW in 0.04s]
5 of 13 START sql view model dbo.stg_orders .................................... [RUN]
5 of 13 OK created sql view model dbo.stg_orders ............................... [CREATE VIEW in 0.05s]
6 of 13 START sql view model dbo.stg_products .................................. [RUN]
6 of 13 OK created sql view model dbo.stg_products ............................. [CREATE VIEW in 0.05s]
7 of 13 START sql view model dbo.stg_supplies .................................. [RUN]
7 of 13 OK created sql view model dbo.stg_supplies ............................. [CREATE VIEW in 0.05s]
8 of 13 START sql table model dbo.locations .................................... [RUN]
8 of 13 OK created sql table model dbo.locations ............................... [SELECT 6 in 0.04s]
9 of 13 START sql table model dbo.products ..................................... [RUN]
9 of 13 OK created sql table model dbo.products ................................ [SELECT 10 in 0.04s]
10 of 13 START sql table model dbo.order_items ................................. [RUN]
10 of 13 OK created sql table model dbo.order_items ............................ [SELECT 90900 in 0.15s]
11 of 13 START sql table model dbo.supplies .................................... [RUN]
11 of 13 OK created sql table model dbo.supplies ............................... [SELECT 65 in 0.03s]
12 of 13 START sql table model dbo.orders ...................................... [RUN]
12 of 13 OK created sql table model dbo.orders ................................. [SELECT 61948 in 0.33s]
13 of 13 START sql table model dbo.customers ................................... [RUN]
13 of 13 OK created sql table model dbo.customers .............................. [SELECT 935 in 0.09s]
Finished running 7 table models, 6 view models in 0 hours 0 minutes and 1.31 seconds (1.31s).
Completed successfully

Which will build the project in the Postgres database:

The image shows Azure Data Studio connected to my local Postgres database, the jaffle database is expanded showing multiple tables created by the dbt run command

Conclusion

If you want to explore dbt and you don’t want to go through the hassle of setting up a cloud database rource, then this is a pretty quick way to get started. I do like the idea of being able to develop and test the entire solution offline. There is also the option to connect to DuckDB, but I haven’t tried that yet.

References

#mtfbwy



Recent Posts