May 2, 2024

Previously streamed live at MDS Fest

How an Open-source Data Platform Taught 100s About Data Engineering, a talk by Tim Castillo, Data Engineer, Dagster Labs

Talk Description

When engineering teams deliver a great project or library, the question of "Should we open source this?" often arises. Data engineering is no exception to this rule of nature, but it can be difficult to open source data and analytics projects because of the risk of leaking sensitive information or bespoke business logic. Open source data platforms are the rising tide that lifts all ships by sharing invaluable wisdom, building goodwill, and accelerating the growth of the data community. This talk covers the existing art of open source data teams, and the processes and tooling it took to enable us to open source our own data platform. We’ll also cover our motivations to empower and educate the Python community on data engineering best practices. This repository shows how a real B2B SaaS business does billing, marketing attribution, and telemetry analytics, along with explanations for why we chose certain design patterns. As data engineering continues to adopt more software engineering best practices, having an opensource repository of an organization that embraces it is invaluable for the community.