Like so many things in the data space, data onboarding as we currently know it is fundamentally flawed. Put yourself in the shoes of a new data engineer, analyst, or scientist: it can take some of them months to get a full, thorough understanding of an existing data stack and set up. The fact that so many data teams only get around to scaling up once the demand on their existing team is too much never helps with this– that new hire was needed a week ago, and won’t be able to start contributing for another month or so. 

Based on why a lot of Secoda customers use the platform, we’ll break down what the current state of affairs of data onboarding is like for the modern data team, why this is broken, and how to set your team up for success by making onboarding seamless. 

The Problems with Data Onboarding 

To put it lightly, data teams are just very busy. The responsibilities of data engineers, analysts, and scientists become more critical as time goes on, since the volume of data that they have to organize and store exponentially grows. And, with such a high demand for data folks and not enough people in the job market with that skill set, hiring, onboarding, and getting ramped up is incredibly tedious and time consuming. 

Data teams are in charge of building the data stack that their entire company works with, making sure it can grow alongside the company, maintaining this data stack, taking on larger projects to make sure this stack is optimized, and keep up with data requests from business and product teams. Realistically, data documentation and discovery are key parts to this puzzle, but are usually the first things to fall to the wayside once a data team becomes too busy. Sometimes, this means that there is really minimal or out-dated data documentation that new hires must come on board and make sense of, but oftentimes, it means that there’s little to no documentation at all. 

So, instead of a new data hire coming onto the team and relieving the growing workload that the team is already facing, the current analysts and engineers have to dedicate a lot of time to filling in the gaps of context that this new hire requires to be successful. 

Onboarding Data Engineers, Data Scientists, and Data Analysts 

What does onboarding to a data team as a new hire actually look like? Whether you’ve added a data engineer, data scientist, or data analyst (or all of the above), these new hires are responsible for: 

  • Learning about the existing tooling, how everything pipes back to each other, and how current collected data is stored and organized 
  • Going through documentation on this data and within these various tools (if it exists), or reconciling data documentation on a completely separate platform from the data tools (like Confluence or Google Docs)
  • Making sense of conflicting, missing, or dated documentation, usually relying on manual knowledge transfer from more experienced members of the data team to fill in the blanks
  • Contributing to the data team’s workload 

It’s a costly endeavor to onboard new members to the data team, and one that could likely set the team back in terms of productivity and output if they’re not ready for it. “Manual knowledge transfer” includes pinging the team on Slack or email with questions, or if you’re in office, tapping someone on the shoulder with a question. This knowledge transfer is rarely ever comprehensive nor is it recorded for new hires afterwards to reference. 

The Best Tool for Data Team Onboarding

It seems like the clear solution for making onboarding to a data team as a new hire would be some sort of process and tool that streamlines data knowledge, grows alongside the team, and scales well. While there are many data catalogs and data discovery tools, if these aren’t rolled out properly, they only add to the growing number of tools that a data team must keep up with (and in turn, add to the workload of those who are onboarding). Secoda aims to provide a clear space for data documentation, data collaboration, and data discovery– right alongside your other integral data tools. 

  • Integration, embedding, and smart documentation alongside existing tools: this means no more jumping back and forth between a completely isolated data catalog or documentation. It also means that getting a full picture of the data and accompanying context is easier and even pleasant to do. 
  • Automated data documentation: Secoda automatically pulls metadata from your other data tools, creating trustworthy and consistent documentation. No more conflicting documentation or wondering if it’s updated. 
  • A searchable repository for questions and requests: not only is data documentation and data resourcing searchable within Secoda, but so are all of the past questions and requests that have been asked of the data team. This means less pinging current data team members with questions they’ve answered time and time again, no bouncing between Slack or email to get these answers, and better yet, it gives the data team insight into how they can improve documentation so that they’re relied on to answer questions just a little less. 

7shifts, who recently raised a Series C led by Softbank, almost doubled their data team over the last year. Doing so wouldn’t have been nearly as easy had they not had their documentation and data questions within Secoda– in fact, they estimate that they’ve been able to cut down onboarding time by over a week per new hire. 

A demonstration of using Secoda for data onboarding
An example of how to use Secoda for data onboarding

Don’t Wait for it to Break to Fix it

Denying the need for a seamless onboarding process for your data team will only become more and more detrimental as time goes on– data requests from business and product teams will grow, and data engineers, analysts, and scientists will become increasingly frustrated trying to work with a fragmented data stack. Instead of waiting for the data team to grow, companies should be thinking about the onboarding process and data knowledge transfer much before scaling the team. Frankly, all of the features and best practices of a seamless data onboarding process benefit the current team, no matter what size it is.