Evolving data pipelines at scale | Marisa Smith

This is some text inside of a div block.
May 2, 2024

Previously streamed live at MDS Fest

Evolving data pipelines at scale, a talk by Marisa Smith, Developer Advocate, Tobiko data


Creating new data pipelines has never been easier, thanks to tools like dbt, which have commoditized pipeline creation, making it more accessible than ever before. However, the landscape shifts when it comes to changing and evolving existing data pipelines; serious problems can emerge and block your team from succeeding. Some of these problems include: costly data outages, inaccurate reporting or metrics affecting the business outcomes, and delayed deliverables due to complex communication chains between engineering and stakeholders. Because of these problems, engineers and data practitioners often resort to building new pipelines and datasets instead of changing existing ones, escalating technical debt, operational costs, and increasing complexity. At the heart of the open-source project SQLMesh we have created a novel data pipeline development workflow that stemmed from our combined experience at companies like Airbnb, Apple, Google and Netflix. This data transformation tool is designed to handle complexities associated with evolving data pipelines at the internet scale. In this session, we will talk about the challenges data practitioners face today, the core concepts of SQLMesh: including virtual data environments, automatic detection of breaking changes, continuous testing and accurate previews of changes and end with a demo. The intended audience for this talk is really anyone in the modern data stack space. Starting at beginner level, attendees can expect to learn about the daily problems that our DevOps and Data Engineers face, discuss a new solution in the ecosystem, and show some of the many fantastic developer experience improvements and productivity increases they can get with this OS tool. SQLMesh is a new OS project gaining real traction in the data engineering community because of its ease of use and scalability. However, few people know about it yet, and we want to share the project and its community with others.