The in and out of the DBT Package

Chaim Turkel
Israeli Tech Radar
Published in
3 min readMar 9, 2023

--

DBT Is an excellent tool, but “With great power comes great responsibility”.

Any build system that respects itself has to have a way of importing dependencies, and DBT is no different.

So what are the basic options that DBT gives you for defining dependencies?

All dependencies are defined in the packages.yml file, for example:

packages:
- package: dbt-labs/snowplow
version: 0.7.0

This is the classic solution to include packages that DBT releases.

If you want to use your own dependencies DBT has you covered:

packages:
# use this format when accessing your repository via a github application token
- git: "@github.com/dbt-labs/awesome_repo.git">https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL

You have a lot of variations for this including branches, tags, and even git revisions. If you have a mono repo, and your DBT project is nested in the repo you even have the option to get a sub-directory:

packages:
- git: "https://github.com/dbt-labs/dbt-labs-experimental-features" # git URL
subdirectory: "materialized-views" # name of subdirectory containing `dbt_project.yml`

Warning Note

If you thought you could do some nice projects and have all the dependencies in one repo, think again. You cannot add to the packages the same repo twice (with different subdirectories).

Project Dependencies

Once your projects become very big you understand that you cannot have the monolithic project anymore and you need to move to the microservice architecture.

In DBT you will split your models by domains, and each will have its own git repo. Very quickly you will find that the domains intercross, and you need dependencies between them. To connect them you use the same package mechanism. But here is where things get complicated.

Here is a simple example:

Platform -> 2023.3.1
DBT -> 2023.2.2
dbt-labs/dbt_utils -> 1.0.0

Data Science -> 2023.3.5
Platform -> 2023.3.1
DBT -> 2023.2.2
dbt-labs/dbt_utils -> 1.0.0

Running each project works fine. But now you have found a bug or introduced a new version to your utilities of DBT. You now have DBT version 2023.4.1, and you want to update your projects with it.

If you try to update the Data Science project with DBT 2023.4.1, for instance:

Data Science -> 2023.3.5
Platform -> 2023.3.1
DBT -> 2023.4.1
dbt-labs/dbt_utils -> 1.0.0

When running dbt deps, you will get the following error:

git dependencies should contain exactly one version. DBT contains: 
{'2023.2.2', '2023.4.1'}

So what is the error? The problem is that when DBT brings in the package of Platform it also brings in all of Platform dependencies including DBT 2023.2.2.

The only way to upgrade the DBT package in Data Science is to first release a new version of Platform with the new DBT version, and then after this release, you can update Data Science to the new Platform and the new DBT version.

DBT Limitations

This is a very big limitation in my opinion since once you have a big graph, you will need to stage a lot of releases just to bump the DBT package to a new version.

Let's hope that DBT comes up with a solution for this (does not look like there will be something soon, you can see the git hub issues where this is discussed: Conflict with dbt-utils package).

Another big limitation is the case where you have two projects that reference each other. Let's say Platform uses a model from Data Science, and another model from Data Science references a model from Platform. This scenario cannot be solved by DBT since you will go into an endless loop when trying to run dbt deps, since you have a circular reference.

The only solution for this is to extract the models into a new domain — this might not be a good solution at the company level.

--

--