This article has been co-written with Mikkel Dengsøe from Synq
Joining a new company can be an exciting yet challenging experience. As a new member of the data team, you’re not only navigating the intricacies of a new workplace, but also immersing yourself in a new–and potentially complex–data ecosystem.
With the median tenure at scaleups being just two years (Carta), and people frequently moving around within larger companies, onboarding is an important topic that often doesn't get the attention it deserves. Companies with strong onboarding processes have better employee outcomes and retention, and individuals who take accountability of onboarding themselves more quickly will increase their chances of success (if you’re looking for tips on improving your onboarding, read this great article by Jerrie Kumalah).
In this article, we will explore some common problems faced by those joining a new data team, how these challenges amplify as the team size grows, and provide some helpful guidance for teams to have a more “onboarding friendly” setup that gets new team members up to speed faster.
How it feels to be the “Data Team Newbie”
Imagine this: You've just started as the newest data team member at your company. Hopefully, your manager has prepared a solid onboarding plan for you. However, this is often not the case.
“According to Gallup, only 12% of employees agree that their organization has a good onboarding process” – Gallup
Regardless of onboarding processes, you're probably surrounded by a multitude of data sources, databases, analytics tools, and pipelines. It's overwhelming, to say the least. The onboarding process often consists of a whirlwind of information, introductions to new colleagues, and an avalanche of technical documentation. This initial phase sets the stage for the difficulties that lie ahead.
"When we were 30 people in the data team, new joiners would be on the “data floater” rotation within their first 30 days of joining. At that point they would be expected to be up to speed on how to understand, debug and triage issues. As we've grown to 100 people only experienced analytics engineers can realistically do this creating a bottleneck for us" – Fintech scaleup
Common Challenges When Joining a Data Team
Even with the best onboarding plan, the sheer volume of information you will consume during onboarding can be staggering. It's often challenging to decipher what's important and where to focus your efforts. Thoughts like "Where do I start?" and "What data models, reports, and other assets are most important?" will be top of mind during this initial phase.
“When I joined the data team at Maple, we had over 300 dbt models, 3,000 dbt tests, and many dashboards in Looker. Beyond skimming through the “core” folder of dbt models (which meant reading 1,000s of lines of code…🆘) , I wasn’t sure how to really dig in and understand the importance and complexity of each model” – Lindsay Murphy
Fear of “Breaking Stuff”
The fear of making a mistake that has business implications can paralyze even the most experienced data professionals. The lack of confidence and familiarity with a new stack makes it difficult to contribute effectively. You may find yourself thinking, "I'm afraid to touch anything" or "What breaks if I make this change?"
Learning New Tools
If you’re lucky, you will have used a few of the same tools in your last role (heyo, dbt!). But with so many data tools on the market, tech stacks are often like a unique fingerprint of each data team, and the way different teams use the same tools can also vary drastically. The odds are pretty high that you’ll not only be learning a new tool, but also working to understand how the tool fits into a broader workflow or pipeline. This can be especially tricky if there is a lack of documentation.
Lack of Proper Documentation and Lineage
For many data teams, documentation and lineage tooling are often not prioritized, leading to a lack of clarity and understanding of the data stack and data sprawl. A lot of critical information might live in siloed Google Docs, Notion pages, or simply be inside someone's head (aka: not written down anywhere ☹️). Documentation that does exist is often stale, so when you are assigned to fix an issue or need to make code changes, you may find yourself asking, "this thing broke, what do I do?” or “I need to update this model, but I don’t know if it has important downstream dependencies…Who do I contact?" Without clear documentation and lineage, even minor issues or changes can turn into an “all-hands on deck” situation to identify the best person to help you resolve the problem.
Growth of Problems with Team Size and Data Stack Complexity
And if these problems weren’t challenging enough for newbies, as the data team and the data stack grow in size and complexity, the problems faced by new joiners become even more pronounced. A larger team means more interdependencies, increased communication challenges, and potential for knowledge silos. The absence of proper documentation and observability exacerbates these issues, making it difficult to navigate the data stack and collaborate effectively.
Imagine a scenario where multiple team members are working on different parts of the data stack simultaneously. Each person's changes may unknowingly impact other parts of the system, leading to unforeseen issues and data inconsistencies. The lack of clear ownership and accountability further complicate matters, leaving new joiners feeling lost and hesitant to make changes. It's a recipe for chaos and an out-of-control system.
Solutions To Improve the Data Onboarding Journey
1. Tame your Data Stack
No amount of documentation or tools will help you if you've built your data stack on shaky foundations. You may have started out with good intentions, but as your team and data use cases have grown, teams may find themselves with thousands of data models, tests, and dashboards making onboarding and core workflows such as debugging and developing new models more difficult for new-joiners. There’s no quick fix to this issue but it’s often best addressed from three angles:
- Addressing issues with upstream data – if you’ve got inconsistent or duplicate events coming into your base layer in your warehouse it’s harder to build a reliable modelling layer on top of this. Start by tracking back the events that constitute your handful of most critical metrics or models and use these to go back to your engineering teams to have them help you understand if these are well maintained
- Deleting unused data models and columns – using column level lineage, you can remove unused columns from your core data models. You can also use the lineage to identify data models with no downstream dependencies or usage. If you’ve established an ownership model you may want to embrace dbt 1.5 model access to use the protected, private and public configurations for your data models to make it explicit what’s internal to a domain and what’s publicly available
- Creating a dashboard deprecation process - monitor which dashboards are actively used and set expectations with data consumers that unused dashboards and reports will be deleted regularly. Avoid falling into the trap of saying yes to all requests to make a new dashboards
"We're considering switching from Looker to Tableau just so we can start over without having 1,000s of dashboards to maintain" – Marketplace scaleup
2. Proactive Documentation and Lineage
To mitigate these challenges, organizations should proactively invest in clear documentation and full-stack lineage observability as early as possible. This investment serves as a solid foundation for a data team to operate smoothly and efficiently. By adopting good documentation processes for data models, reports, and other critical assets, new joiners can quickly grasp the essential components and their relationships within the data stack. Clear and accessible documentation serves as a roadmap, easing the transition for new team members.
Additionally, comprehensive lineage ensures that changes made to the data stack can be more easily tracked, allowing for a better understanding of the impact and potential issues that may arise. It empowers new team members to make informed decisions and minimizes the fear of breaking something. With lineage in place, the question of "What breaks if I make this change?" can be answered with confidence.
For those who have already embarked on building a data team, without prioritizing documentation and lineage, don’t keep putting this off.
"The best time to plant a tree was 20 years ago. The second best time is now."
Even if you don’t have great documentation today, investing in it now will start to return dividends in a short period of time. Making documentation a priority, rather than an afterthought, will provide benefits that extend far beyond the onboarding of new joiners.
3. Build Newbie-Friendly Processes as You Scale
As the team size and complexity of the data stack increase, it becomes crucial to establish workflows that make the stack newbie-friendly. Leveraging automated workflows, documenting human processes, and fostering a culture of collaboration and knowledge sharing can go a long way in creating a welcoming environment for new joiners. Encouraging mentorship from more senior/experienced team members and providing avenues for asking questions and seeking guidance (like a daily standup) further eases the process of integrating into the team.
4. Set Clear Expectations
It can be a daunting task for new-joiners to be on the hook for pipelines and dashboards that are used for business-critical decisions. Creating well-designed severity levels to set clear expectations about what should happen when there is an issue will ensure that everyone is on the same page. This might include:
- Defining clear severity levels for data issues to make it explicit what constitutes a critical issue (see the guide Designing severity levels for data issues)
- Setting clear expectations for different types of severity levels. For example, a P1 issue should be addressed within two hours while a P2 issue can wait until the end of the week
- Automating your severity levels by bringing them directly into your data alerts
The life of a new data team member can be challenging, but some changes to your team’s approach to onboarding can help make it less overwhelming. By recognizing the common problems faced during onboarding, understanding the amplification of these challenges as the team and data stack grow, and proactively investing in documentation and lineage observability, organizations can set their new data team members up for success. Remember, it's never too late to prioritize good documentation, lineage, and a foundational data stack. By doing so, you'll pave the way for a cohesive and efficient data team where every member can thrive.
Secoda is the only AI powered, all-in-one data catalog, lineage, analysis, and documentation platform that lets data teams take the grunt-work out of their day.
With Secoda AI on top of your metadata, you can now get contextual search results from across your tables, columns, dashboards, metrics, queries. Secoda AI can also help you generate documentation and queries from your metadata, saving your team hundreds of hours of mundane work and redundant data requests.
Mikkel is a data enthusiast and the co-founder of Synq. Before that, he was leading data teams at Monzo and saw the challenges of onboarding new-joiners first hand as the data team scaled from 20 to 100 people.