Analytics Teams Need More Than Traditional Documentation
Documentation is function specific, with clear approaches to documentation in business and tech teams. In computer science courses, engineers are taught about docstrings. Less technical employees find writing more natural and tend to document processes and projects in Confluence instead. Analytics rely on both types of documentation. They write code, develop business processes, but also document something else: data. Data itself is hard to document because it is ever changing and requires context to make all the numbers make sense.
Let's not leave data teams in the dust: do they use docstrings, Confluence, both, or neither?
They have been stuck in limbo between technical and business approaches to documentation. Documenting data requires some knowledge or connection to an underlying database, a feature neither docstrings or Confluence are made for.
Documentation is like metrics: there needs to be a single source of truth.
Relying on tool-specific workspaces is bound to create definitional and process drift.
Let's go back to the purpose of documentation and why it's so important.
Business teams use documentation to create repeatable processes. For instance:
- The way customer success teams interact with customers should be consistent within a growing team.
- Creating product requirements documents for each feature launch shouldn't be limited to the Head of Product, as long as the appropriate expectations and context is communicated to others.
In both cases, documentation serves the purpose of delivering consistent results in a growing team.
On the engineering front, I'm putting my foot down here: documentation through code is not documentation. Code is documented because it can be complex, especially at an early stage company where patterns don't yet exist to force consistency. Reading through code is prone to error, as it relies on the reader's understanding of the logic being coded.
At this early stage documentation seems like a complete after thought. However, it is precisely here when documenting the context behind coding decisions is the most important for the next engineer that looks at the code weeks, months, or years later. Code is like a second language, with the documentation being a translation into the language most familiar.
In their infancy, analytics teams also write code that needs to be documented, with even more business context. Analytics teams release new metrics that go through stakeholder sign-off. There's a method to the madness of building out reports in a hurry to help product teams understand how new features are performing.
The underlying data is what ties together code and business context.
Metadata including column and metric definitions provide a skeleton for documentation that helps analytics teams ramp up quickly. Documentation answers the question: how do repeatable processes and existing code explain the data being queried? Fundamentally, data teams need to document what data in a particular table or metric means for a business user.
So what tools are out there to do the job?
After reading the header you must be thinking about Confluence, right? Since you're already there let's dive into it.
The features of Confluence span adding tables, attachments, document nesting, an organizational paradise. Confluence fits a general need for documentation that is horizontal across many teams. It doesn't, however, work well for documenting specific functions, like data.
Templates enable repeatable processes. Team workspaces enable inter-team process collaboration. Although Confluence has plugins for both Github and database connections, the plugins are cringe-worthy as they don't enable the user to document code within Github or data within a database.
Software engineers document in the tool they live in most: their code editor. If formatted correctly, strings within functions and classes can be compiled and extracted as global documentation for a code base.
This is a great place to comment in business context or reasoning behind a function doing what it's doing. However, the documentation doesn't make any comment of what the input could be or ranges between. This is outside the scope of the particular object.
These types of documentation resources don't exist for SQL, and good SQL linters are only starting to become a thing. With metrics stores gaining popularity, how to approach documentation becomes even less clear.
Data teams need several, all very different, things in documentation:
- Documenting SQL code, not just traditional object-oriented languages
- Explaining business context surrounding tables and columns
- Understanding common usage patters and popularity across data resources
- Describing metrics, giving examples from both data and business
All of these three needs can be combined into one approach: knowledge sharing**. Knowledge sharing in analytics is sharing business context surrounding particular code, metrics or data.**
Confluence doesn't support documenting technical resources, while software documentation best practices don't support procedural and business documentation easily, let alone documenting data. So what now?
Approaching analytics documentation as a knowledge repository
The best analytics teams can speak the language of software engineers and product managers. They understand how the underlying data is generated and are able to tell the story behind what it means for the business.
The trojan horse of data teams is how they combine code and business context to output insights.
Documentation allows them to rinse and repeat this approach internal to the team. A knowledge repository allows teams to enable other parts of the organization to self-serve without losing the understanding data teams have worked so hard to build.
This right here is where data democratization fails: you can't democratize something no one understands. Without sharing the knowledge built up by analytics teams, the teams themselves simply can't scale like the rest of the organization can.
Documenting processes allows business teams to scale. Documenting code allows software engineering teams to scale. Similarly, documenting data enables the same scale for analytics teams.
For building an analytics knowledge repository of both metadata, metrics, queries, code, lineage and documents, try out Secoda.
November 17, 2021