MDS FEST 3.0

From Manual to Magical: Smarter Warehouse Organization With dbt Macro Overrides

Mariah Rogers, Senior Analytics Engineer, Arcadia

Improve organization, discovery, and tidiness in dbt projects while reducing conflicts as they scale. Learn a methodology that helps manage increasing numbers of models and contributors, ensuring sustainable growth without sacrificing quality or coordination.

Talk overview

This talk showcases a method for dbt users to improve the organization, data discovery, and tidiness of the models in their project while overall reducing the types and costs of conflicts that arise as a project scales in magnitude with increasing numbers of models and contributors over time.The types of issues of scale within a growing dbt project that this talk will discuss include model naming collisions, increasingly complex model names and naming conventions that try to get around the limitations of a single namespace, long configuration files or hard-to-debug configuration discrepancies that arise from the presence of configs in models, profiles, and project yaml files where it isn't clear which one should apply, and others.--As a dbt project (and a data warehouse) grows, eventually an organization will outgrow the single namespace available with dbt model names, also known as node names. The tool provides a method to expand the namespaces available via custom schema, database, and alias name configurations at the node level or at the folder-level, provided in yaml configuration blocks in each individual model or in the project configuration file. At some level of scale, however, you may end up with either a 1000 line-long config file that is hard to parse or 1000 custom schema and database overrides that are hard to track down in individual model files. Additionally, the models directory and the data warehouse may become cluttered and disorganized with models becoming hard to find, users confused about how a model in the dbt docs ends up in a particular schema, and poor discoverability of data overall. And last but not least, the ability to name only a single model with a particular common table name such as "dim_customer" is very limiting and forces you to adopt poor naming conventions or even more complicating table alias configurations to get around the requirement for unique node names.This talk will introduce a deterministic table materialization pattern via overrides to the dbt default macros "generate_database_name", "generate_schema_name", and "generate_alias_name" which will allow dbt users to automate exactly where a model ends up in the data warehouse with custom names for all three layers of the namespace using only the node name itself. With this pattern users will no longer be limited to just one table called "dim_customer"--you can have a "dim_customer" in every schema in your warehouse if you so choose without having to alias each model individually in their config blocks.In this talk I'll also discuss some patterns for project organization to go along with the macro automations that will improve data discoverability within the data warehouse as well as the tidiness of the dbt project itself.

Currently a Metaplane or Monte Carlo user?

Switch to Secoda and get those exact same features for free. Get everything you love now, and your budget back.

Meet us at Snowflake Summit

Unlock the blueprint for enterprise data governance

Benchmarks and actionable strategies to scale governance frameworks effectively.

Get the report