dbt Coalesce 2023 Day 1 Sessions Recap

A deeper dive into the session content from day 1 of dbt Coalesce 2023. Learn more here.
Last updated
May 2, 2024
Author

It’s taken me a bit longer than I had anticipated to get myself back on EST and back to the realities of my day job (aka: finally accepting that my job can’t be playing with puppies and hanging with data peeps all year long).

If you’re interested in the overall Coalesce experience, I shared a post about this last week, but I didn’t dive too far into the insights I got from the sessions I attended. So let’s do that now!

A great start to the day

After a much-needed good night's sleep, we were ready for day one and the official kick-off of the conference. We were up bright and early to start the day, with the activation hall opening at 8 am, the Secoda team was ready to go and show off our memorable fits (white Dickies overalls, Converse high tops, Pit Vipers, and Tristan Coalesce rap tees).

Thanks for being a great sport, Tristan!

We grabbed some breakfast on the outdoor terrace at the Hilton and started hyping folks up for the arrival of the Secoda puppies at our booth later that afternoon. Having the option to eat outside on the rooftop terrace (just outside the activation hall) was a really nice upgrade from last year. While Tuesday morning started a bit gloomy and chilly, the fog quickly burned off, so it was great to have the option to sit outside a few times throughout the day and enjoy the sun (especially for us Canadians–as we’re full steam ahead to “dark and cold all the time” season 😂 🆘).

The Opening Keynote

Next, I caught the dbt Opening Keynote on the main stage. Similar to last year’s, it was another action-packed hour, with inspiring stats about how the community continues to grow.

Beyond announcing a lot of cool new features to expect from dbt (Tristan breaks these down in detail in this recap blog post), the keynote seemed to lend a lot of focus to one key challenge dbt is looking to solve–the complexity that comes from scaling with dbt. Complexity not just at the individual developer level, but also at the team and organization level. Tristan showed a screenshot of the DAG for their internal dbt project, and it got some audible laughs from the audience.

Some Tristan quotes that made me laugh and were very relatable: 

  • “holy shit, there are a lot more models than there used to be” 
  • “getting up and running in this new code base was kind of hard, in a lot of ways writing code in this large complicated code base, was kind of hard.”
  • “In a lot of ways writing code in this project felt the opposite of what it felt like using dbt in the early days: it was kinda slow, ungainly, unpleasant”

He used this example to highlight how dbt users (and dbt themselves) are dealing with this problem of growing complexity, a situation that comes from using dbt for a few years.

Tristan: “How do we flatten this line of complexity vs. amount of code?”

After hooking us into the problems of managing complexity, Tristan then handed things off to the new VP of Product, who recently joined dbt from AWS, Luis Maldonado. Luis broke down several new features and best practices that dbt is launching (dbt Explorer, Semantic Layer updates, dbt Cloud for CLI, and most notably, “dbt Mesh”)–all of which are aimed at helping with managing this complexity. It was noted that data mesh is not something a company can “just buy”, but rather a set of best practices, organizational ways of working, and supporting dbt features. A customer story followed with Third Fifth Bank, who explained how they're using dbt to implement data mesh in a real-world application (in a highly governed environment).

Overall, I left this year’s keynote feeling excited about the future of dbt and how dbt users can continue to manage problems of scale across the modern data stack. This topic is something near and dear to my heart, having lived through the pains of a growing dbt project that starts to feel slow, clunky, and difficult to manage. This was the inspiration for me to build a course on this very topic with Uplimit. I’m excited to bring more of these advanced dbt topics, best practices, and features into my next course cohort (launching in January) to help more data professionals overcome these types of challenges. If you want to learn more about the course, feel free to reach out to me on LinkedIn.

Other Tuesday sessions also focused on managing complexity

After the keynote, I headed down to the 3rd floor for a Panel discussion: Fixing the data eng lifecycle, with Mehdi Ouazza (MotherDuck), Matt Housley (Ternary Data), Sung Won Chung (Datafold), and hosted by Louise de Leyritz (The Data Couch Podcast). This was a meaty panel with a lot of great insights that covered a wide range of topics, encompassing both soft and technical aspects of data engineering.

Some of the key topics discussed were gaps in the existing data engineering life cycle, career progression and skill development, and what we can hope to see in the future of the profession. Some of the key takeaways I had from this panel were:

  • Invest in metadata (data about data, such as pipeline health, operational workflows, or observability) as this will be critical to enhancing the developer experience, driving better outcomes, and making data processes faster, more predictable, and more interoperable.
  • We should not be paying to develop locally: Companies like MotherDuck and DuckDB are thinking differently about the warehousing costs incurred when developing locally. If you’re looking for ways to reduce your Snowflake bill–this is a big one.
  • Defining data engineering roles is really hard and job descriptions and expectations differ widely across different companies. We should be looking at using more specific job titles (think "data platform engineer" instead of just “data engineer”) to better reflect evolving responsibilities.
  • When it comes to data quality and contracts, moving DevOps practices earlier in the development process will help foster more collaboration between data engineers and data scientists to improve data access and usability.
  • With the field of data engineering changing so rapidly, there needs more focus on training and best practices that transcend specific tools. Learning is an important part of the job, and teams should be investing in this (with internal sharing or “guilds” as mechanisms for assessing performance and addressing criticisms)
  • There is still a high demand for these skill sets–as data engineers it is critical to stay informed and embrace formal education and best practices as they become more standardized over time. Look for training that covers the end-to-end data lifecycle and start with technology-agnostic fundamentals before introducing specific tools, and learning academic theory to understand the basics of database functionality is critical (beyond just technical and coding skills).

Overall, the panel was really valuable, and I highly recommend watching back the recording.

The next session I caught was titled Warehouse-first Data Strategy at ClickUp presented by Marc Stone (Head of Data at ClickUp). Marc provided some great insights about how the data team has developed at ClickUp over the past three years, through a serious growth period (from 50 people to over 950 people at ClickUp). 

He broke down his presentation into four highly valuable sections, sharing his data team’s Strategic Principles, Team Structure, Warehouse Design/Architecture, and finally Business Impact. While the entire presentation was great, I think my favourite section was the strategic principles Marc shared:

Setting fundamental guiding principles like this for your data team is so valuable and helps to set expectations both within the data team and externally with stakeholders.

The final session I caught live on Tuesday was titled Lazy devs unite! Building a data ecosystem that spoils data engineers with Ryan Dolley and Jan Soubusta of GoodData. They provided a lot more technical details in their presentation, but at a high level, these were my key takeaways for improving DevExp:

  • Invest in good onboarding and documentation–this is critically important for your team’s collective knowledge
  • Don’t allow documentation to be an afterthought (a downfall of many data teams in my experience)
  • Leverage metadata from GitHub workflows to understand developer productivity and success (e.g. DORA metrics)

The puppies arrived at our Secoda booth around 1 p.m. on Tuesday, so I was quite distracted from sessions after that 🙂.

Here are more sessions that I hope to catch up on soon from Day 1:

Keep reading

See all stories