When to Buy and When to Build When It Comes to Data Discovery

June 14, 2021

At Secoda, we’re very fortunate to spend time with many different-sized data teams and to see the spectrum of challenges that each faces at their unique phase of growth. 

Depending on where your team is along the spectrum, you can find your team considering the alternatives between buying or building a tool internally for data discovery and data governance. There have been fantastic innovations in the build space, which enable teams to adopt open source tooling for data discovery. But, these open source solutions may not make sense for all data teams. 

Naturally, we want to help as many teams as possible with Secoda but we’re also aware that for some teams, only an in-house tool can provide the level of support and insights that they need from their data discovery tool. 

Our experience has helped us develop a good understanding of what companies should consider when they are debating if they should build or buy a tool in the data discovery space. In this post, we will share some of the ways we think about this conundrum to help teams who are trying to decide whether to build, buy or adopt an open source data discovery tool.

There are pros and cons to both approaches

Many teams choose to purchase data tools because they want to start using the tools as quickly as possible, without any concern about the tradeoffs between speed and flexibility. Similarly, many teams choose to build their solutions off an open-source tool because they believe their use case is so unique that nothing that exists can fulfil their unique requirements. No matter which approaches your team chooses to go with, there will always be benefits and drawbacks to choosing one tool over the other (speed, customization, support, reliability) 

When making this decision, teams should be aware of the advantages and challenges that come with each approach. They should also develop a deep understanding of the business's technical requirements before jumping into building the tool. While some open source tools may seem like the way to get something running quickly and affordably, the cost associated with maintain could outright the cost of a software solution. Teams should evaluate tools with this long-term view in mind by asking questions about scale, reliability, customizability and support. 

Below are some questions that are important to answer as you evaluate your alternatives:

  • What job do these tools need to accomplish? 
  • How will it be managed and iterated on overtime?
  • Who is affected by this solution internally? If a large majority of your team is using this solution consistently, they will need the solution to be reliable and consistent.
  • Can this tool scale with our needs? 
  • How much work will it take to set up? 

Let’s compare the approaches to see how they stack up against one another

Building your own / using an open-source tool

At first, glance, building off an open-source tool appears to be a good option because it allows you to create a tool that is the perfect fit for your specific business model. But, teams who choose to build using open-source products can introduce unique challenges and usually which can end up requiring even more data engineering effort, which makes the result just as expensive. Teams should consider that is highly unlikely that they will get the resources to build this vision of the perfect tool internally. One of the reasons for this is that investing in a tool that is not part of your core differentiation might not be a great use of company resources. 

This is especially true at the beginning, and with a tool that is used by a variety of stakeholders. When data teams decide to manage open source tools that are used by a variety of stakeholders, they risk having to meet the demands of those stakeholders for future iterations of the product. The management of an internal open-source tool can very easily start to consume a data team that is not prepared to manage a product that is built for the entire organization. That being said, there are a lot of open-source tools that can work well for data teams. Tools that are used internally by the data team and perform a very isolated role are a great use case for open source tooling. 

Here are some of the pros and cons of building data discovery tools from an open-source library or scratch:

Pros:

  • Your team has complete control of the product and where you want to take it in the future
  • Your team benefits from contributions made by other members of the open-source community 
  • You can fit the tool to your exact use case

Cons

  • It takes a longer time to see value 
  • Building the initial proof-of-concept version is relatively easy, but generating deep features and making sure they are accurate gets increasingly complex and challenging.
  • Any additional functionalities and integrations need to be built on a custom basis by your team.

Over time, the volume, complexity and scope of the tools might change as the needs of your business and technical requirements change. When you’re planning your product, you need to think about how the tool may change as things become more complex and need to be prepared to build support for that future iteration of the product. 

Buying a third party data discovery tool

The primary reason to purchase instead of building software is to save time, money and resources. Additionally, teams should consider what building the tool internally adds to the organization's core competency. This way, teams can configure the data discovery tool to their exact data stack and specific needs. 

By buying from a vendor, you are guaranteed to see continuous changes and developments to the tool, regardless of your companies resources. If it takes time for your team to develop new features and for the open-source community to innovate on the product, it might make more sense to consider purchasing a solution from a vendor instead of building your tool.

Just like the above, there are still tradeoffs to purchasing your tool. Below are some of those tradeoffs.

Pros:

  • You can get started with the product right away without a large or complex setup period
  • You can trust the results that you see on the product because they are vetted by multiple engineers that work on the product as a part of the vendor's team.
  • Most integrations are built by the vendor and work out of the box
  • The product continuously improves to match the latest trends

Cons:

  • The vendor might not customize the product to match your team's exact needs. 
  • Setting up processes, documentation and tools might still require resources. 

Once you have a good understanding of the cost associated with building, you should try to understand what it takes to build your tool or manage one internally. 

How to choose between  Building or Buying

Once it’s time to make a decision, we believe that it’s important to choose a solution based on the end goals your team has with the product as well as the dependency that other stakeholders will have on the product. 

Suppose you’re looking to implement data discovery and start using the tool to align teams on what terms mean, how to access data and what data to trust. In that case, it could make more sense to buy from a vendor who already has built and manages a product that can achieve this function. 

However, suppose you are more interested in deep data governance and that your data infrastructure requires unique features that are not covered by traditional vendors, you should consider building a tool for your specific use case. 

As the data stack becomes more fragmented with tools like Reverse ETL, data quality, data observability, data catalogues and headless BI tools, teams will have to pick which of these tools they want to maintain internally vs. Buy from a vendor. We believe that in the future, teams who make the right decisions about which products to manage vs. purchase will be able to leverage their data teams' core competency and provide much more value to the business. 

In the case of data discovery, data teams should ask themselves if they are well equipped to build a user-friendly data discovery and governance tool, which requires a mix of user experience, product management and data engineering abilities. From our experience building Secoda, we can tell you it’s not an easy task. 

Make your choice and ride the course

Whichever path your team chooses, we believe making a decision is way better than taking too long to evaluate alternatives. The earlier teams adopt data governance and data discovery tools, the faster that they will be able to trust their data. Of course, there’s a lot at stake when making decisions about data and the impact it can have on your business. 

If you are currently facing some challenges about building or buying a data discovery tool and would like to get a second opinion (we promise it won’t be based), our team can help you define your goals and technical requirements so you avoid any serious roadblocks along the way.