Rethinking the data catalog
September 8, 2021
Data teams need a better knowledge management tool for their data knowledge
Sturgeon's law states that 95% of everything is crap.
Nowhere is that more true than the data world. Most dashboards are wrong, most tables are trash, most ETL is broken, and most metrics are incorrect.
For some reason, most metadata management tools choose to solve this problem by trying to catalog this encyclopedia; they're building a library. But in reality, most of the books are just bound-up junk mail and shouldn't be cataloged in the first place. To solve this problem, opinionated data discovery powered by search is the key.
Over the last six months, our team at Secoda has been making data accessible to people using the modern data stack. We believe that every company will need a data discovery tool in the future. Existing tools have been able to deliver value to data teams, but have not cracked the elusive data discovery problem because of their focus on technical cataloging, instead of cleaning the overwhelming mess and fragmentation in our current data stack. After speaking with tons of data teams, a few things became clear to us:
- Existing tools are not comprehensive in the data knowledge that they capture. They traditionally focus on only one area of data knowledge, which is technical metadata.
- Data discovery tools should reduce the number of questions data teams get from people trying to use company data
- Existing data discovery tools are built for technical users, but aren’t intuitive for non-technical users.
- Existing tools combine curation and consumption into one view.
We found that teams are looking for a tool that captures all data knowledge in one place and reduces the number of questions data teams get from non-technical teams. The tool should focus on the business users because they are the source of many discovery questions and have few options to find the answer for themselves.
Since coming to this understanding, we’ve made some significant changes to our product. The first significant change is that we don't think of Secoda as a data catalog. Instead, we think of Secoda as a knowledge management tool that helps data teams share metadata, charts, queries, and documentation with any employee. Data teams are creating more data at an earlier stage, and are looking for a way to organize this knowledge and share it with employees. The tool we’re working towards will address the insights we’ve gathered from data teams in the following ways:
Data discovery tools should be comprehensive in the data knowledge that they capture
They traditionally have focused on only one area of data knowledge, which is technical metadata. We strongly the best solution in this space will unite all metadata, rather than just handling a few layers of the stack. A lot of tools working in this space are built as layers on top of the data warehouse and don't connect or communicate with the other layers of the data stack (dbt / Segment / Airflow / Looker / Tableau / Mode / Bigeye / Monte Carlo / Hightouch). We believe that swallowing the whale and working on a solution that is comprehensive in its approach to metadata makes much more sense. By connecting a variety of data tools to the central data discovery tool, data teams should be able to see end-to-end lineage from Segment to Looker and everything in between. A good data discovery tool should be the glue that connects all your different data tools into one central place.
Additionally, we believe that data knowledge goes beyond metadata. Data discovery tools should also allow users to document common queries, definitions, and even ad hoc questions all from one place. Having a very intuitive data knowledge base that is built on top of existing metadata should be the end goal.
Data discovery tools should reduce the number of questions data teams get from people trying to use company data
We’ve built functionality into Secoda that lets business users ask questions when they can’t find what they are looking for through the data discovery tool. Teams can work together to answer these questions in a “stack overflow” like workflow, which allows editors to mark the correct answer and upvote any questions. After a question has been answered using Secoda, it is indexed in the central search bar so any employee who asks this question in the future can see the correct answer from a central place. Additionally, dashboards, tables, queries, charts, metadata and all other data knowledge is also indexed in the central search bar.
We found that on average, 50% of the questions that a data teams receives are repetitive. Answering these questions over and over again takes away from important tasks and can become overwhelming quickly. With a functional and comprehensive data discovery tool, a team should never have to answer the same question twice.
Existing data discovery tools are built for technical users, but aren’t extremely intuitive for non-technical users
A lot of our existing data tools are divided by technical/non-technical lines. This is true of data catalogues, which are mostly built for technical users. We believe that we can architect a much better experience if we consider the ways that both technical and non technical employees need to use the tool instead.
Data discovery needs to be powered by intuitive search and simple UI. Existing tools use a catalog-style approach, which can be overwhelming for someone who doesn’t know where to even begin. Secoda stands for Searchable Company Data. Our goal is to make it intuitive and easy for anyone to find what you need without having to rely on the data team. Having an opinionated search is important because it helps you find the few accurate tables, the dashboards that are used, and the metrics that matter to my role. Making the tool intuitive will be an ongoing focus for our team as we continue to build the easiest way to search for data knowledge.
In addition to this, data discovery tools should be easy to adopt an understand. With Secoda, anyone can sign up, connect their data for free (and without a sales call), and start to share the tool with people across the company. Teams don’t need to wait weeks to setup and struggle to set up the tool, data discovery should be extremely intuitive and quick to adopt.
Tools should separate curation and consumption views.
This means that data teams are able to use the data discovery tool to curate knowledge for specific team members and groups. It also means that data consumers are not overwhelmed by the amount of information that they find in the data discovery tool.
This insight is a little unintuitive at first, but we believe it’s extremely important for a data discovery tool. Right now, all the curation and consumption for data discovery tools happen in the same dashboard. By doing this, existing tools are solving two problems poorly; they aren’t giving technical users a comprehensive way to curate knowledge and they aren’t giving data consumers an easy way to understand what they need to see.
We’re building the infrastructure for Secoda in a way that separates the curation and consumption view, to make the best curation and consumption experiences. On the backend, customer support teams can use this tool to choose who can see which knowledge document and can curate the right information into the knowledge document. End users can see all knowledge documents that have been published in a clean and intuitive interface that is separate from the admin dashboard.
We believe a similar approach would benefit the data discovery space greatly. Imagine a world where you could share specific analyses, tables, charts, and questions with specific functions of the organization by only making what you want viewable to show up in their feed. Creating this separation is going to change the way that employees find and understand data. We’re extremely excited to build this into the product as we continue to grow.
We think of Secoda as the instructions manual that brings together all pieces of the data stack into one centralized place. We think about a data discovery tool as a layer on top of the existing stack that helps users access and understands the stack without actually changing any pieces of the puzzle. If we draw a comparison to building a chair: dbt, Fivetran, snowflake, and Looker are the nuts, bolts, legs, and support for the chair. Secoda is the instructions manual that tells you all about the different pieces, where to find them, how they fit together, and which ones are the most important. With an intuitive manual - just like data discovery - you should understand how to build the chair without calling customer support for more help.