Your DAG is Headless

Integrating a goals framework (such as OKRs) with the BI and data creation layer is critical to regulate this natural bias for creation and to uncover how data assets align to company goals. Learn how
Last updated
May 2, 2024
Author

This article was co-authored by Michael Böhm and Lindsay Murphy

tl;dr

Your data-to-value processes are probably not scalable. Here's why.

Requesting new data assets frequently is in the best interest of stakeholders. Rarely are they incentivised to limit data requests, or to deprecate old data assets.

Integrating a goals framework (such as OKRs) with the BI and data creation layer is critical to regulate this natural bias for creation and to uncover how data assets align to company goals.

The dreaded data team request backlog

As with any team, data teams don’t have an unlimited budget for team growth and tooling. To drive the most business value for their company, data teams must focus on what supports the most pressing business goals. While that seems obvious, often the reality for data teams is that asset creation does not come from well defined and documented company goals. Much more likely, assets are created from various teams and individuals who request support, whose day-to-day work may not always (read: rarely) be driven by said goals. When the process of company goal setting is detached from the process of data asset creation, bad things tend to occur:

  • The rate of data asset growth exceeds that of the resources needed to maintain and interpret them, leading to a reduced focus on the most important business questions (if not outright neglect).
  • Consistent change within the business tends to lead to assets getting stale quickly, creating ever more “zombie” dashboards and reports, sinking time and resources while not generating relevant value.
  • Creators of data assets (i.e. the data team) are not motivated by the value their work generates. While they may know "this is needed for dashboard X”, it is often unclear how that dashboard (and therefore, their work) contributes to the company goals exactly.

It’s partly a people problem (again)

A root cause of this problem is often that data teams sit as centralized functions in a cross-functional business environment, where priorities are often misaligned (or even worse, conflicting). When company department heads are not in alignment about the strategic direction of the company, then the data team tends to get pulled in many different directions trying to support decision makers who aren’t all rowing the boat in the same direction. In addition, if you have a request process that is open to anyone in the company, then folks in more tactically focused roles can contribute to a growing backlog of requests, which aren’t in alignment with critical priorities of the department.

But tools are letting us down too

Adding a metrics layer to the modern data stack (MDS) is not a new concept. LookML (perhaps considered the “OG” of metrics layers) and more recently dbt Metrics (deprecated as of dbt v1.6 and migrating to MetricFlow), are common metrics layer tools available. But features for both LookML and dbt MetricFlow are focused specifically on metrics, and as a result, don’t easily ladder up into common goal setting frameworks, such as Objectives and Key Results (OKR). We’ve seen a lot more hype in the industry lately about metrics trees, and standardization of metrics, which are stepping in the right direction, but all seem far off from being a well-oiled technical solution for data assets, metrics, and company goals.

So, how can data teams solve this, today?

DAGs are awesome… but

Instead of surveying stakeholders "does anyone need dataset X" on Slack, data teams can leverage some of the tools in our existing tool box to avoid reaching a state of misaligned and low-value data assets. Most of the following suggestions are already well supported by vendors in the MDS. And the rest isn't rocket science.

Data teams could chart the usage of various data assets in their BI tool, but that still leaves a lot of questions. Since every problem in life can be solved with graph theory, let's see what is available. The first thing we might consider is dbt's DAG, sprawling with relationship details of various models, sources, tests etc. Then there are other tools that can provide column-level lineage to understand how downstream visualization/BI layers connect to upstream parent models. A few even attribute usage stats (e.g. views or #queries) to parts of those lineages. Although these resources and DAGs can be insightful, they still cannot provide a complete picture of how these assets connect to business goals or objectives. If we define our ideal DAG from bottom → top as sources → business goals, we can easily see that the DAG in its current state is incomplete (the “head”–AKA: company goals–are still missing)!

To get a complete picture of how data across a company connects back to business goals, the MDS needs to incorporate a goal setting framework. This type of goal setting function is usually managed by tools that sit outside of the MDS (such as monday.com, or Lattice).

What would a modern data stack solution that incorporates goal setting as another component of the metrics layer look like? Maybe something like this:

  • Company leadership/C-level defines O’s (objectives, represented as exposures in the DAG)
  • Department heads define KR for each (key results, represented as metric nodes with specific target values), with the help of the data team
  • Infobox/note: metrics “nodes” were introduced with the dbt metrics package, but are being deprecated in dbt 1.6 with the move towards MetricFlow and the dbt Semantic Layer.
  • Infobox/note: Binary (or qualitative) KRs are difficult to translate into metric nodes. While many may debate whether or not these even fit the definition of a goal, the authors consider these more as “deliverables”.

This capability would enable a more robust process around connecting data to specific business goals. It would drive alignment across the company on critical KPIs and metrics, help to prevent other metrics and inconsistent definitions from cropping (everything should be tied to an official KR or company goal), as well as help everyone understand how specific initiatives are driving business impact.

While this model works for OKR related data assets, it is true that companies will also have some top-line metrics that need permanent reporting, irrespective of changing goals (e.g. plots for investor slides, service/app reliability). To enable this, data teams should also consider a dedicated objective with limited KR nodes for these assets. This should not become a catch-all for random reports, but a limited list of relatively static company reporting metrics.

Leveraging usage stats across the DAG

If tools in the MDS provided usage stats of exposure nodes, this would help teams to identify reports that stakeholders across the organization are paying attention to, but for some reason, have not been translated into company goals.

And that's it. By forcing the company's goal setting to take place within the technological framework of its data asset creation and management, data teams would be able to link every node/asset to the objective(s) it contributes to (i.e. optimizing high-value data pipelines, critical to measuring business impact). Just as easily, teams would be more equipped to weed out the data assets that do not align with company goals (i.e. by deprecating assets, or saying no to misaligned stakeholder requests). In addition, leveraging usage stats (total queries, last queried, etc.) alongside the DAG, teams would be able to identify goals that, weirdly enough, nobody cares about.

Keep reading

See all stories