As a new team in the data engineering space, we often stumble upon new tools in the industry that are solving some pressing problems. We’re also constantly chatting with data engineers looking for new tools.
We started keeping track of some of our team's favourite new tools in the data engineering space and thought we would share some of the ones that have stood out to us during the past few months. Our team tried to pick tools that solve different problems and didn't overlap with one another. Below is a list of the tools we will highlight in this article
Airbyte - ETL Tool
Airbyte is an open-source ETL solution that is disrupting incumbents such as Fivetran and Stitch. Although the incumbent data pipeline platforms have integrations with many destinations, there are a lot of small services that aren't supported. If you can’t use your ETL tool to import all your data, you will only have a partial view of data associated with the business. Because the community builds connections and shares them with the rest of the community, Airbyte is able to provide customers with all integrations they require. This use of open source creates a strong network effect through a community-driven approach to development. Unlike traditional solutions, Airbyte is focused on the long tail of the connectors, which makes it much more flexible than existing solutions. This flexibility is further emphasized with their feature-based pricing. This means that the price paid will depend on the number of connectors (but not on the volume of data), the number of seats needed, and which premium features are activated. We like this pricing model because it can be managed and forecasted while keeping costs low.
Hightouch - Reverse ETL
Hightouch is a reverse ETL tool that uses SQL to sync data from a data warehouse to any SaaS tool. In the past, if sales, marketing or support team needed operational data in their SaaS tools, teams wrote API connectors from the data warehouse to the SaaS product. This can be difficult because most APIs were not built to handle real-time data transfer. Reverse ETL tools offer easy-to-use connectors to SaaS products, so teams don't need to write and maintain these processes. With a tool like Hightouch, teams can set up multiple connections to SaaS tools with SQL. We also appreciate their simple pricing with a generous free tier.
Prefect - Data workflow
Prefect is a new kind of data workflow tool that's able to take your code and turn it into a distributed pipeline easily. Prefect offers two products at the moment. The first product is Prefect Core, which is used to run data applications with Python. The second product is Prefect Cloud, which is a central place to manage user workflows using a Cloud UI. Prefect never receives its customers' code or data, which has helped companies in healthcare and financial services adopt the tool. With Prefect, you can continue to use your existing tools, languages, infrastructure, and scripts, meaning you don't have to change your workflow to get value from the tool.
Datafold - Data observability
Datafold automates data quality monitoring through an easy-to-use plugin with all major SQL data warehouses and ETL tools. Our team likes how easily the tool allows you to adjust the sensitivity for anomaly detection notifications and add custom thresholds. With a tool like Datafold, data teams can be proactive about changes and errors in datasets. Datafold can identify issues and summarize the key insights into an easy-to-read feed. This is another tool that has a generous free tier that allows new teams to get value out of the tool before you commit to payments.
Firebolt - Data Warehouse
The data warehouse industry is extremely interesting to us. Snowflake has become one of the most anticipated IPO's and has intrigued many data teams. But, the problem with data warehouses is that they can become expensive as data scales. The cost of a terabyte of storage and 100,000 queries per month can cost up to $450,000. Firebolt is built to try to reduce the cost to compute ratio of data warehouses. The company claims to be the first to figure out how to deliver a “sub-second” analytics experience. In a recent comparison, the tool ingested 39 billion rows of data in 2.5 hours for $20 per hour. In comparison, the company showed that a well-known data warehouse ingested the same data in six hours at $300 per hour. This improved performance and price are extremely promising. We're looking forward to trying out this product in the future.
Preset - Data Visualization
Preset is a fully hosted, hassle-free cloud service for Apache Superset. Superset is a widely used open-source data visualization and analytics platform. This tool gives any team the ability to visualize, collaborate and share data. Preset builds on top of this mission and provides the full functionality of Apache Superset, without any of the hassles of managing the solution yourself. This means that you can easily manage security, scalability and the newest release without taking up valuable resources. Similar to us, the team at Preset believes that the future of data is collaborative. Tools built for the next generation of data teams need to focus on collaboration first approaches to data, which are more reflective of the way modern teams operate.
Spectacles - Looker Maintenance
Spectacles helps data teams working with Looker find SQL errors faster. The built-in validator uncovers any syntax errors in your LookML. This tool is like Datafold for your look ML model. The founding team started the company to solve the manually testing dimensions and measures from the explore page. You can set up Spectacles to test your Looker on a regular cadence or while you're developing. Setting the tool up is easy and only requires a set of API keys and adjusting a few permissions on your LookML. Lastly, the tool also allows you to set up Slack notifications so you can keep an eye on changes from a place you're frequenting often.
Count - Collaborative Notebook
Count is a collaborative BI notebook. The tool combines the flexibility of the notebook and presentation tools in one central place. We love how Count allows analysts to collaborate with others on the team without having to switch tools. Having every step of the analysis in one place is much simpler than switching between Powerpoint, Datagrip and Slack. We also like the "Notion like" design of the tool that allows team members to explore older notebooks. Count integrates with many of the major data warehouses and protects only takes just a couple of minutes to set up. Lastly, the pricing is friendly for those looking to try the product!
Secoda - Data Discovery
We're biased here, but we're excited about what we're building at Secoda. Even with great data practices, many organizations still struggle to get value out of data. Up to 73% of all enterprise data goes unused, we call this data debt. One of the big contributors to this problem is that organizations create data siloes from day one by not documenting and centralizing their data in a place where everyone can access the information. Secoda is the place to centralize company data. We’ve built Secoda as a single place for all incoming data and metadata, a single source of truth. The goal of Secoda is to help employees find and understand the right information as quickly as possible. Teams can start using Secoda for free and scale with the tool with feature-based pricing.
Hex - Data Notebook
Hex is a collaborative data analysis platform that enables data people to ingest, explore, and visualize data using both Python and SQL. It may not appear that different from tools like Jupyter Notebook, RStudio's excellent IDE, or even Shiny apps, but the more time I spend with it, the more I think there's an exciting potential here.