8 Signs You Should Reconsider Your Data Documentation Tools
March 29, 2021
Small teams are collecting and storing more data than ever before. This, alongside the growing desire to use data, has created an enormous demand for data scientists and data engineers. Many data engineers find it difficult to communicate with teams outside of the data organization. It’s hard to set the right expectations around data. It’s hard to align teams around the same KPIs. And it’s hard to help employees ask the right questions.
We love how eager and willing people outside of the data team are to use data to make decisions. Unfortunately, the tools that exist today aren’t built for the entire organization. This is because key data information is still kept in the heads of the data team. Documentation is a dreadful problem for most teams. It gets outdated and is difficult to search. All employees should have access to the data documentation created by the data team. This can help teams get on the same page about the core data dictionary definitions.
Everyone has a desire to work with data and to make the right business decisions. Unfortunately, they suppress this desire because they believe they are not smart enough to use the right tools or don't understand the data. The delta between data-savvy and non-data savvy employees can change with technology. It starts by having the right data documentation toolkit that’s available to everyone. Below are 8 signs that your team should reconsider their data documentation tools:
1. You still get pinged on Slack about simple questions
Few things are as distracting as a Slack in the middle of the workday - especially now that we’re all working from home. At the last company I worked at, the Slack questions to the data team were constant. Things like:
- “Is our retention trending the right way”
- “Do people like our product more now than they did a year ago?”
- “What’s the difference between the column ‘new_user’ and ‘active_user’”
These questions took our data team away from their primary job. To solve this problem, our data team implemented a request process on Github. They hoped this process would help save them the trouble of answering questions on Slack. The problem was that after a few weeks, employees started using Slack again to ask the same questions. If you and your team and spending time answering these questions in Slack, it’s a sign that the documentation is not accessible to other employees. Adopting a data documentation tool that allows employees to ask these basic questions in their existing workflow can help relieve this pain point.
2. Your core data team has been at the company for over one year
Keeping information undocumented doesn’t feel like a problem until someone with lots of tribal knowledge leaves the team. When this happens with a key employee and at an early stage, it can take months to recover the lost information. To make things more painful, turnover is a common problem in the data industry. The average tenure of someone working as a data analyst is less than two years. This means that once your data team has been at the company for over a year, you can likely expect a few of them to leave the company within the next 12 months.
If your team has been together for over a year, it might also be a good idea to speak with your team members about what they would look for in their next role. Understanding what they are looking for can help you think about ways that you can keep the employee before they find a job elsewhere. Good data teams should prepare for turnover and realize that it’s inevitable. Great data teams document their data to ensure that if someone were to leave, it would not interfere with the day-to-day operation. Amazing data teams view data documentation as a feature of their product.
3. Your organization is growing
Growing teams face a challenge when trying to transfer tribal knowledge with new hires. The reason for this is that people usually enter the business with different skills and understanding of data. If an employee has onboarded onto a companies data in the past, they likely have processes to learn about the data themselves. If they haven't, there is a considerable amount of work to get the employee up to speed on all the right information.
Data discovery tools can help solve this problem. One common issue teams run into is finding out that the data discovery tool is not intuitive to non-technical employees. Good data tools should develop an intuitive UI that the least technical employee can use. This way, if someone is trying to access and understand data, they have a clear and simple way of figuring out exactly what the information means.
4. Documentation is inconsistent
Getting teams to buy in to document their work can be an uphill battle. Different teammates might have ideas about what they should document or when they should document their work. Some teams that we've spoken to have created a standard for what needs documentation, but this does not capture the old documentation. Although it is more clear at organizations with cross-functional data teams, inconsistent documentation can become an uphill battle at an early stage. Teams who suffer from inconsistent documentation should consider the benefits of auto documenting the data through one central repository. If teams are insistent on using their existing documentation tools, having a template for everyone to follow is important. Below is some recommended content and a standard template you can copy to use in your README's:
- Dataset name
- Date of data collection
- Keywords used to describe the data type
Data and file overview
- A short description of what data it contains
- The date that the file was created
- Business terms related to the data
- Date(s) that the file(s) was updated (versioned) and the nature of the update(s), if applicable
- Any important "gotcha's" we should know about this data?
Sharing and access information
- Restrictions placed on the data
- Links to any sources that cite or use the data
- Team most affected by the data
The most important part of the README is to make sure that the description of the data source is legible by anyone trying to use the data. This requires teams to write their README documents in plain English. Teams should focus on simplicity that does not overcomplicate anything for non-technical users.
5. Business leaders are redefining their core KPI’s
There are inflection points in a traditional business lifecycle. It's common for business leaders to reconsider core KPI's at these inflection points. If your business is about to go through an inflection point, it is a good time to reconsider the business definitions and documentation.
During the previous company I worked at, there was constant experimentation with the business model and the core KPI's. This inconsistency made it difficult to agree to a set of common definitions. As a member of the data team, you should try to involve yourself in these conversations. As you take part in the conversations, reference your data dictionary to make sure that your documentation stays consistent. A good, updated, data dictionary can be the difference between having everyone on the same page and losing weeks as you try to find the right information.
6. Teams outside of the data organization want to start working with data
There's an appetite for people outside of the data team to start working with data. The problem for most of the people outside of the data organization is that it's difficult to understand documentation. They feel discouraged when trying to use analytics because they don't know where to start. Tools like Looker can help a ton. They can help the non-technical employee feel confident about the data. That said, Looker alone is not enough.
Even with the best self-service tools, many non-technical folks still struggle to make sense of the data. Having a repository of updated, searchable information can help these employees feel more confident about their data analysis abilities. That way, business users should be able to search for business definitions and find the right way to use the data.
7. You’re manually updating documentation whenever new changes happen
Teams who are using Confluence, Notion or Google docs to document their data manually update all documentation. Teams that rely on 100% manual entry for documentation risk forgetting to update documentation and losing it forever. Instead, teams should adopt tools that document a majority of their data automatically. Instead of losing important documentation, your data documentation tool should help you remember when you need to document a field and should automatically extract metadata so you don't have to manually update all your documentation.
8. Your team has no central data dictionary
We've heard about teams with many different data dictionaries across marketing, sales and product. Teams who have many data dictionaries throughout the company should consider centralizing the documentation into one place. Few employees should be able to manipulate the data dictionary, but every employee should be able to access the information in the document. Today, data dictionaries live in Excel sheets or Notion tables that are unavailable to everyone at the company. Data teams that work together need to embrace collaborative data documentation and dictionary tools that are accessible to every employee. We created a guide for teams that are putting together a data dictionary here.
If your team is starting to consider a new data documentation or discovery tool, we would love to be apart of the conversation. You can start automatically cataloguing your data with Secoda by signing up to our free tier. Our vision is to make searching your data feel like a superpower and we're exciting about what we have in store to make that vision a reality.