MDS Fest '23: Day 4 Recap and Recordings

Find the recordings from Day 4 of MDS Fest '23 here
Last updated
January 24, 2024
Author

Join the MDS Fest slack community to access recordings, engage with speakers, and get updates on MDS Fest ’24

Going from event data to business activities ft. Timo Dechau

We are starting to see a sheer abundance of event data. Events can be collected from frontends, backends, databases, streams, webhooks. But events are technical and usually don’t explain important business aspects. The concept of an activity layer makes this translation while it is defined from a business perspective and events are mapped towards business activities.

Lessons collected from an enterprise rollout of dbt & Lightdash ft. Rohan Thakur

Continuing from his recent blog post about how the Collectors data team evolved their data platform, in this talk, Rohan highlights key lessons learned along the way

Data engineering best practices: automated testing ft. Gleb Mezhanskiy

How do you test your data? dbt tests? Maybe some customized macros and dbt packages?…not at all?

In this judgement-free session, we’re unpacking why we test data in the first place, the different methods we may take to test, and what we think data testing really looks like when we adopt software engineering best practices. Specifically, I’ll walk through how testing applies to the analytics engineering workflow, individual dbt tests I found helpful for ensuring data quality, the role data diffing plays in proactively finding data issues, and share principles for implementing data testing effectively at scale.

How a design-thinking approach creates smoother, faster modelling practices ft. Chris Hloros

For analysts starting with dbt, the concepts of version control and PRs can be daunting- and even a turn off! They feel like a significant slow-down to the speed we're used to operating, and incompatible with stakeholder demands.

At least, that's what we experienced as a lean analytics team at a data-hungry, early-stage startup. Yet we knew our existing build-in-isolation approach wasn't sustainable either given the number of times things broke, or data was unreliable.

So how did we get from dbt-weary to dbt-success? We adopted a design-thinking Double Diamond process (divergence and convergence) for our model building. It's a shift from patterns of old that has led to early + often collaboration, DRY-er models, and faster PRs. In this discussion, we'll explore how we reached this process, the benefits it has delivered, and ways you too can adopt it.

dbt + Hamilton: Enabling you to maintain complex python within dbt models ft. Stefan Krawczyk

dbt's recent python extensions are a great way to get data science work into the dbt ecosystem. However, unlike what dbt does for SQL, it does not help one to structure or maintain the python code that you can put into a python model. This is where Hamilton, an open source library that came out of Stitch Fix, can help. It prescribes a similarly prescriptive paradigm like dbt does, but for python functions. It means that code is always unit testable, documentation friendly, and comes with lineage visualization. So paired together with dbt, means that you can wrangle your SQL, as well as the python code you're connecting it with. In this talk we'll cover what Hamilton is, how it fits with dbt, and how one might use both together.

Move over, LLMs; doing more with less using process optimization ft. Lindsay Murphy

This talk is for data team leads and members facing hiring freezes, lay-offs, and budget cuts. There is increasing pressure on data teams to “do more with less”. But before jumping to shiny new AI tools and LLMs to solve all of your team’s challenges, consider the fundamentals of process optimizations to help you identify and improve your team’s value-draining practices.


From overgrown to thriving: scaling your dbt project like a gardener ft. Nicholas Yager

Imagine inheriting a beautiful, yet overgrown garden. The once well-designed space has become a sprawling tangle of weeds, vines, and overgrown shrubs. After a thorough inspection, you decide that rehabilitation is necessary for it to really thrive. Similarly, even the most well-loved dbt projects can slowly become a tangle of dependencies and unmaintained models without proper care.

Learn the five steps that gardeners follow while taming an overgrown garden, and demonstrate how you can apply those principles to your dbt projects. Attendees will learn how to survey their project, remove unnecessary models, use tooling to identify and rectify antipatterns for improved efficiency, divide a project into separate groups and projects to create well-defined interfaces, and how to adopt tooling to establish a lasting culture of data gardeners who ensure the long-term health and success of their projects.

Create a data-driven culture with business writing ft. Zack Ober

Business writing can solve almost all problems. Clear and concise writing is the solution to many frustrating data challenges. This includes: data product adoption, communicating the value of data work, and driving business impact. I'll explain what business writing is and how I've used it to create a data-driven culture.


The state of the MDS: The reports of my death have been greatly exaggerated ft. Peter Fishman

Maybe the real MDS was the open source tools we found along the way ft. Pedram Navid

There are many misconceptions of the MDS and it's easy to forget the real pain it solved and the value it unlocked. While some people still view MDS as marketing-fluff, I'd like to demonstrate how powerful it can be and by reclaiming the the Modern Data Stack using Open Source tooling. With tools like alto, dbt, duckdb, dagster, and more, we can spin up cost-effective, fast, and powerful data stacks that would have taken months to implement in the past.


The why and how of saying no ft. Will Sweet

Data people get a lot of requests. Saying “yes” to everything puts us in a services trap. Saying “no” to everything would likely put us out of a job.

Like a lot of things, it’s a balance. In this talk we’ll review why it’s important to say “no,” how to say no, and how to balance “no” and “yes.” As part of the discussion, we’ll be walking through real life examples.

You’ll walk away with a better understanding of how to collaborate with your peers and leadership.


The Future of Analytics Engineering ft. Jonathan Talmi

The field of analytics engineering is undergoing rapid transformation, fueled by advancements in generative AI and a burgeoning landscape of startups and open-source projects. In this talk, I will delve into three emerging trends: Code generation, templated data models, and adaptive data pipelines. Code generation in analytics engineering has become powerful, largely due to the incorporation of contextual metadata such as column-level lineage into generative AI tools. Templated data models are gaining traction as industry-specific standards for metric definitions gain wider adoption, enabling organizations to go from zero-to-one with minimal investment. Lastly, data platforms are becoming increasingly autonomous, automatically adapting data pipelines and models in response to environmental changes.

Full Conference Videos

Day 1

Day 2

Day 3

Day 4

Day 5

Keep reading

See all stories