What to do as the first data hire at an early-stage startup?

In an early-stage startup, the first data hire faces numerous challenges -- an infrastructure designed to manage the business, but not necessarily to analyze it, an organization hungry for answers, but lacking a system for requesting or prioritizing it, and less documentation of how everything is done or how it’s defined. Prioritizing, planning and setting the foundation for a successful data organization at a startup is a lot of work and requires focus and dedication. 

In this post, I share some ideas on tools and processes that have been helpful for first-time data leaders at startups as they have helped their team scale. Of course, you will have to think about tools and key topics such as defining metrics, measuring product impact and creating accurate financial reports.

As a first hire, you will also need to think about the fracture and tools that you will grow into data warehouses, ETL, BI tools, etc. These tool-related topics are best left for other posts on our blog

Instead, in this blog, I will focus on the foundations for building a data organization for all early-stage teams that will set you up for success in the long term. The processes you develop early on will become the routines that the organizations grow accustomed to as it matures. Thinking through these processes early will help you set up your team for success as you scale. 

Below are some high-level themes that are clear throughout the suggestions:

  • Work quickly and do things that work well for your current stage. 
  • Think about how things will scale, but don’t overengineer them too early.
  • Get into good habits early. With documentation, transparency, and reproducibility, you can scale beyond your current size and get started sooner. 

1. Map the companies data infrastructure

Begin by getting an understanding of the places where data is stored, processed and used. What are the primary sources of data? Does the company use multiple databases? Is there an ETL built already? How are people visualizing and accessing this data today? 

In the future, you’re going to want to make a lineage graph that shows the relationship between these different sources of data. For now, writing them down in a document is sufficient. You’re going to just need to wrap your head around the state of things today and what areas you’re going to need to spend a majority of your time in. Thinking through what is missing, what should change and what the future infrastructure should look like might also be helpful to write down. The ideas you have can go in the document you make to shape your roadmap and future planning.

2. Plan your roadmap

Trying to balance short-term and long-term goals is one of the challenges new data teams face. The initial months of your new job will be hectic, pulling you in several directions, but you should take the time to develop opinions about the future. Similarly, the future of data engineering space is exciting and changing rapidly over the next 5 years.

Think about what is missing, what can be improved, and how they align with the organization’s priorities. Are you using Google Analytics for tracking? That might be fine for now, but many businesses will eventually need something more powerful. How are reports being run? If data is primarily accessed and explored in spreadsheets, you should think about options for BI tools. Who should be hired next? A data analyst? A data engineer?

Form opinions about your roadmap now so you will be prepared to advocate for it whenever it comes up. What will you say when someone in marketing decides they want to integrate a new service, when an engineer starts to build a data tool in-house, or when the founders raise more funding and ask you how to spend it?

3. Create a data dictionary

As you learn things, write them down. This is something that can be started on day one and can pay dividends. We’ve created a guide for creating your data dictionary. You should think about making onboarding onto data as frictionless as possible for the next hire and for other employees. By getting alignment on terms like active users, you can avoid the confusion that is created whenever someone new joins. Secoda is a great tool for building this repository on top of your metadata.

Simple data discovery starts with good organization. A data dictionary is a list of key terms and metrics with definitions; a business glossary. Although this seems like a simple exercise, it’s very difficult to align business departments with the same definitions. Most companies we've spoken with have been keeping the data dictionary in Google sheets or Confluence documents, or not keeping one at all. 

The definitions should be understood by anyone in the company, not just the data team. Additionally, the definitions should be adopted by all teams and by leadership. Secoda makes this easier by making the definitions easier to find and understand for every employee.

One piece of advice we have is that when there are big changes in the business, treat the changes to the data dictionary as a key part of the product release. Communicate the changes in the dictionary definition to the rest of the teams and make sure all team leads are informed about the new changes. One strategy that we used was to create a Slack channel dedicated to updates to data dictionary terms or data documentation (which you can easily do with Secoda). This way, all business stakeholders can stay informed on the important key business metrics.

4. Create communication channels for data

Strive to make all your communication transparent. Transparency, documentation, and reproducibility go hand in hand.

Create a channel for data if you are on Slack. Most likely, people will contact you via email or Slack, but if at all possible, document those discussions in open channels on Slack or in your Trello board to maintain a history of the conversation. You will be able to learn about your business from others as a result of this. You can use Secoda’s Slack integration to capture this communication in the same place as the rest of your documentation. 

5. Create a code repository or query document

Use GitHub liberally by setting up a repository. Your future self and future colleagues will appreciate comments on your queries and code. Make sure your commits appear in your Trello cards. It is even possible to integrate GitHub into Trello so that it handles permalinks for you. It's not important where things go. As your repository grows, you will have to tweak its organization. Committing everything and making sure the commits are linked to your work documentation are the two keys to success.

6. Talk to the team about data

Build time into your schedule to understand how different team members use data. Identifying the data sources they use and finding out if they are running reports is an easy way to start adding value and understanding the business. 

While doing this exercise, you can also figure out the data-savvy employees that can later become champions for the data team. You can think of these folks as your scrappy data council. In the past, we’ve found that finance and operations have many data-savvy folks that see the value of data. These folks will be your partners in determining how data is generated, processed, stored, and accessed.

7. Go with the flow

Working at a startup can be challenging. Working with data at a startup can be even more challenging. As the central hub for data and decision-making at the startup, you have to be ready to move quickly, discard old work and not get attached to anything that you create. Because startups move quickly, you are bound to find yourselves overworked and stretched in many different ways. You may also feel like the work your doing is not creating the kind of impact you want. 

Remind yourself that there isn’t a right way to do “data”. Be flexible and willing to go where your key stakeholders take you. Learn about the business and try to show quick, tangible wins. By embracing a culture of curiosity, you can set yourself and the future data team members up for success. Remember that you won’t always be the only data hire if things go well, so your job is to set the best foundation for what’s to come. But if you don’t get to everything, that’s alright too. You can only do so much on your own. Our next post will focus on the future of the data analytics space