DBT — Database Schemas

Chaim Turkel
Israeli Tech Radar
Published in
4 min readNov 17, 2022

--

As you know from my previous stories, I am a DBT advocate. DBT has done so many good things in the ELT area. If you are new to DBT see DBT get started.

There are other areas where DBT has not yet decided to take a stand as to what the best practice is, and how to do it.

When creating a new model, the name of the model is the name of the file. The database for the model is the default schema. If you do want to move your model to a different schema, you can do this either on the model configuration, or you can do it on a folder layer to all models in the folder:

In a previous blog, I described the folder structure that in my opinion is the best for a multi-domain dbt project.

Based on this structure, it is very natural to want to distribute all models to databases by the name of the domain. On the other hand, you don’t want all the staging models together with the marts model in the same database.

So what we came up with is the following map per model for production:

model_name -> db_name / table_name

domain_stg__daily_campaigns -> protected_domain / daily_campaigns

domain__campaign_analysis_campaigns -> public_domain / campaign_analysis_campaigns

On the other hand in dev mode we want all the tables to be put in a local database, so the schema would be:

model_name -> db_name / table_name

domain_stg__daily_campaigns -> dev_db / domain_stg__daily_campaigns

domain__campaign_analysis_campaigns -> dev_db / domain__campaign_analysis_campaigns

So how do we do this magic in DBT, so that it automatically knows where to write each table?

DBT gives you the option to write your own custom schema mapping. This is done by giving you two macros that you need to implement in your project:

generate_schema_name.sql

get_custom_alias.sql

My personal opinion is that macros should all be in a separate git repo and not the same repo as the models (this way the infra team can update them without all changes in the git history, and remove duplications between repos). But in order to override the macros in DBT you must implement them in the macros directory, and the packages directory is not scanned for this. So my solution was to create a skeleton macro, that just points to the code from the packages.

get_custom_alias.sql

{% macro generate_alias_name(custom_alias_name=none, node=none) -%}

{{ dbt_data_applications.generate_alias_name_env(custom_alias_name, node) }}

{%- endmacro %}

generate_schema_name.sql

{% macro generate_schema_name(custom_schema_name, node) -%}

{{ dbt_data_applications.generate_schema_name_env(custom_schema_name, node) }}

{%- endmacro %}

Now you can implement what you want in your private git, and deploy new versions when things are updated.

The code that generates the data schema based on the mapping above is:

generate_schema_name_env

{% macro generate_schema_name_env(custom_schema_name, node) -%}

{{ return(adapter.dispatch(‘generate_schema_name_env’, ‘dbt_data_applications’)(custom_schema_name, node)) }}

{%- endmacro %}

{% macro default__generate_schema_name_env(custom_schema_name, node) -%}

{%- set default_schema = target.schema -%}

{# example path: loyalty/marts/loyalty__fact_campaigns.sql #}

{# example fqn: [‘project’, ‘loyalty’, ‘marts’, ‘loyalty__fact_campaigns’] #}

{%- if target.name == “prod” and node[“resource_type”]==”model” -%}

{% set dir_path = node[“path”].split(‘/’) %}

{% set model_type = dir_path[1] %}

{% set domain = node[“fqn”][1] %}

{%- if model_type == “marts” -%}

{%- set default_schema = “public_” ~ domain -%}

{% elif model_type == “staging” -%}

{%- set default_schema = “protected_” ~ domain -%}

{%- else -%}

{{ exceptions.raise_compiler_error(“No such model type (marts, staging): “ ~ model_type) }}

{%- endif -%}

{{ default_schema }}

{%- else -%}

{%- if custom_schema_name is none -%}

{{ default_schema }}

{%- else -%}

{{ default_schema }}_{{ custom_schema_name | trim }}

{%- endif -%}

{% endif %}

{%- endmacro %}

generate_alias_name_env

{#

Renders a alias name given a custom alias name. If the custom

alias name is none, then the resulting alias is just the filename of the

model. If an alias override is specified, then that is used.

This macro can be overriden in projects to define different semantics

for rendering a alias name.

Arguments:

custom_alias_name: The custom alias name specified for a model, or none

node: The available node that an alias is being generated for, or none

#}

{% macro generate_alias_name_env(custom_alias_name=none, node=none) -%}

{% do return(adapter.dispatch(‘generate_alias_name_env’, ‘dbt_data_applications’)(custom_alias_name, node)) %}

{%- endmacro %}

{% macro default__generate_alias_name_env(custom_alias_name=none, node=none) -%}

{%- if target.name == “prod” and node[“resource_type”]==”model” -%}

{%- if custom_alias_name is none -%}

{% set table_parts = node.name.split(‘__’) %}

{% set node_name = table_parts[1] %}

{{ node_name }}

{%- else -%}

{{ custom_alias_name | trim }}

{%- endif -%}

{%- else -%}

{%- if custom_alias_name is none -%}

{{ node.name }}

{%- else -%}

{{ custom_alias_name | trim }}

{%- endif -%}

{%- endif -%}

{%- endmacro %}

So as you can see, DBT does not force its opinion as to how to deploy your models into schemas but does give you a framework that will enable you to have the option to do as you please.

Note: A side effect of using these macros is that for production the target_schema is not used at all, and any value can be put there.

For more information on dbt follow me on twitter:

--

--