How will data engineering change over the next 5 years?

April 27, 2021

The future of the data analytics space is exciting. Traditionally, companies have focused on collecting and visualizing data. Now that teams have figured out these problems, they are starting to think about better ways to transform, manage and track their data. This next phase of the data analytics journey requires companies to redefine their goals and organizations' needs. efficiency, flexibility and accessibility become key pillars that drive most of the data engineers interviewed.

What was particularly interesting in this exercise was how differently those interviewed thought about the future of the space. We've heard everything from streaming to cataloguing to monitoring as future areas that teams believe will become front and centre over the next five years. Below are the top three takeaways we had from the interviews presented in the report.

Specialization will grow within the data team

Most data engineers and data analysts are wearing many hats today. This is because the investment into the data team has only recently increased. As the value of data teams becomes more evident and more investment is placed in this department, data teams will specialize to focus on a particular function. This could mean having a reliability data engineer, a visualization lead and a separation between backend and frontend data engineering teams. We believe these kinds of organizational changes will begin to take shape over the next 5 years.

The "data gap" between data producers and consumers will shrink

As more investment is directed towards self-service analytics, the gap between data consumers and data producers will continue to shrink. Tools that help teams centralize an understanding of data will become mandatory across all data teams. We've solved storing data, and moving data, as well as visualizing data. When we look at the challenges that a team faces today, the idea of self-serve analytics and understanding is the next largest issue.

Data will become a product

More data teams will adopt practices that help them measure, manage and develop data like a product team. On the surface, this might mean a transition towards agile project management. At a more intricate level, this might mean transitioning towards data tools that enable cross-organization collaboration, version control and monitoring. We believe that innovation in this area of data analytics will be interesting.

Below are snippets from all our interviews. If you're interested in the future of data analytics and want to see the full transcript, you can read the entire report here

What is the data trend you’re most excited about?

Dylan Baker, Co-founder of at Spectacles
I’m excited about more maturity in the testing space. The explosion of data over the last decade is a net positive, but there’s a ton of projects that are under-tested and I’m excited to see what interesting things people do there.
Andrew McEwen, Co-founder at Secoda
The data trend that I’m most excited about is the idea that companies are adopting a modern data stack earlier on in their lifecycle. Making well-informed decisions is something that companies should be doing no matter what stage they are in, and data is an important input for making decisions. With affordable and easy to adopt tools which include cloud warehouses (Snowflake & Redshift), ETL tools (Fivetran and Airbyte), event management tools (Segment), and BI tools (Mixpanel and Mode), there’s no reason a company shouldn’t have a proper data stack setup to inform their decision-making processes.
Braun Reyes, Data Engineer at Clearcover
I am very excited by the recent innovation in Data Operations. These are trends like ‘Data Reliability/Observability’, ‘Governance and Auditing’, and ‘Discovery and Visibility’. We have all this great tooling for data processing, storage, and movement, but we do not have a ‘Datadog’ for Data Operations just yet. SLO-based monitoring for data is not easy with current tools. Unified metrics collection and observability for dbt flows, Fivetran connectors, and other disparate pipelines is not a problem that has been fully solved just yet. It is interesting to me that we also do not seem to have great tooling around monitoring for query and warehouse performance in Snowflake. I also do not feel like there are ‘no brainer’ solutions for Governance, Auditing, and Discovery just yet. I am excited to see who will be the next ‘Snowflake’, ‘dbt’, ‘Prefect’ of these spaces.
Miriah Peterson, Data Engineer at Weave HQ
I have personally done a lot of research into the concept of Data Mesh. I like the ideas and concepts it prescribes for data ownership as well as its relevance when it comes to cloud-native practices and microservice architecture we see more and more in the software landscape.

How do you see the role of a data engineer evolving over the next 5 years?

Ben Cohen, Director of Data at Dott
Changing from model builder to enabling model building by analysts, scientists and more.
Ben Dundee, Director and Head of Data Science at Dekeo
The easy answer here is that the tooling will improve, increasing efficiency. It has to, as it will be driven by efficiency gains in tools for data scientists and data analysts. But I think “data engineer” will seep out of its current silo. In my career, some of the most frustrating obstacles I’ve faced came because the data engineering team reported up through a different VP than Analytics/DS and because software engineers don’t see data as a product. I believe there will be lots of opportunities for hybrid engineers, software engineers who can moonlight as data engineers, or vice versa. Most teams already have a few people like this (the person who always fixes the Spark jobs when data engineer is busy, or the engineer that always talks to the data science team).
Dylan Baker, Co-founder of at Spectacles
This depends a lot on what we’re calling a data engineer and what size business we’re talking about. At one end, there are lots of ‘data engineers’ who do mostly data modelling in tools like dbt. I expect their roles to stay largely the same, possibly with an increased focus on tooling to enable analysts and analytics engineers they work closely with. At the other end, there are a lot of data engineers who do complex things with real-time data, streaming, etc. I expect their roles to stay reasonably similar as well. I think it’s the people in the middle whose roles will likely move more to one extreme of the other. I think the ‘get data from the source into warehouse’ work is likely to diminish a little in importance.

What’s one data analytics problem that you think will be solved in the next 5 years?

Sarah Krasnik, Data Engineer at PerPay
I believe the data documentation and data cataloging problem will be solved in the next 5 years. We have gotten to this place by generating so much data with the decrease in the cost of data storage, but now maybe the person who built the data pipeline left the company, or the product has evolved since the data pipeline was built. Either way, the undocumented, uncleaned, and un-tested data assets are quickly outnumbering the data that's actually used and understood. This disorganization and lack of documentation aren't sustainable at the exponential rate at which we're generating new data. Personally, I believe that the team at Secoda is the one to solve it!
Akash Bachu, Data Engineer at Drift
The data scalability problem. We want to retrieve the data in less time and be able to be efficient with data processing. We struggle to load that data in less time. We have to constantly look for new algorithms to move data faster. Because of the delay in the jobs, the business doesn’t get the data at the right time. I think that this will only improve over time.
Arpit Choudhury, Founder at Data-led Academy
In my opinion, the biggest problem today is that no one analytics tool or technology can serve all the analytics needs of an organization. Web analytics doesn’t play well with Product Analytics and BI is often a data silo that doesn’t meet the needs of product teams looking to understand user behaviour. And then there’s the attribution challenge and there are specific tools just to understand attribution. Now, larger companies with dedicated data teams have the resources to invest in purpose-built tools and implement them properly. But early-stage startups and SMBs that don’t have in-house data expertise are facing analysis paralysis when it comes to choosing the right tools to solve their data analytics woes and there is no one analytics tool that takes care of the specific needs of these companies. So, I believe that over the next 5 years, this problem will be solved and at the same time, industry-specific vertical solutions will emerge that might end up displacing many of the established analytics tools that we all know about.
Braun Reyes, Data Engineer at Clearcover
I do not see it so much as an evolution but as an increased understanding of value. Maybe that is the same thing as evolution, but I truly think the true role of the Data Engineer is enablement. The adoption of modern data stack technologies and this wonderful mix of build and buy available tooling makes the role less about writing pipelines and being a data warehouse DBA and more about enabling secure, reliable data access across the organization. This enablement takes so many forms, so maybe that is the ‘evolution, because you may start to see a splintering of the DE role into other roles that sit along the Data Spectrum. For example, you are seeing the rise in Analytics Engineering over the last couple of years, which sits between Data Analysts and Data Engineers, which is similar to what we saw with Machine Learning Engineers. You can have DEs that focus primarily on streaming, and others focusing totally on batch and replication, with others focusing on reliability and quality. Are these Data Streaming Engineers, Data Movement Engineers, and Data Reliability Engineers?
James Densmore at Hubspot once tweeted out “Do what you love, but if I were breaking into software engineering right now I’d be all in on data engineering. Many of the same fun challenges of more general backend development, but a front-row seat to how the analytics space evolves.” I think he is right. I kind of accidentally fell into this field in my mid-30s. It’s exciting to see folks coming more junior into this space.

What’s the largest challenge that data teams face today?

Annika Lewis, Associate at Vanedge Capital
The biggest challenges I see with data teams today are organizational — people & process stuff, rather than technical challenges. Data teams face tall orders — they typically need to stand up & maintain data infra, respond to reporting & ad-hoc data needs of all business stakeholders, all the while making sure nothing breaks. Most data teams I talk to feel under-resourced and budget-constrained, given the increasing data needs their companies are facing.
Sarah Krasnik, Data Engineer at PerPay
Data teams are changing so rapidly whether it's the tools they use or the people they're composed of. However, what is abundantly clear is the growing need for efficient and scalable data workflows in organizations. DataOps is a term that's just starting to gain momentum because of this need but will be a space that's the backbone of any analytics organization in the next 5 years. How can a data analyst do their job successfully if the warehouse is underperforming due to infrastructure issues? How can analytics engineers write modelling workflows if the orchestrator is down due to other infrastructure issues? Just like software engineering teams need DevOps, analytics teams will be in the hands of DataOps.
Braun Reyes, Data Engineer at Clearcover
As Data Engineers work more as a service organization to enable other teams to own their data stack components, there needs to be a way to have a single pane of glass view of all this activity to maintain trust in the platform. Are there dbt models that are killing your Snowflake credits? What downstream dependencies are affected by raw and prepared data quality issues and how are we communicating this and enabling action to resolve it? Are we hitting our SLAs for delivery? What is this data used for? Wrangling these KPIs into a unified view to enable quick action is tough and constant a challenge that Data Engineers are always looking to improve.
Andrew McEwen, Co-founder at Secoda
The largest challenge that data teams face today is the disconnect that occurs between data producers and data consumers. Based on our ~200 conversations to date with data engineers and analysts, most people have reported that 20–30% of their time in a week is taken up answering ad hoc questions, most of which are trivial to answer. This is further compounded due to being constantly distracted from the more meaningful work that data engineers and data analysts were hired to do. To me, this means that there is information that lives inside of the data team’s head, that should be accessible and interpretable in a central place which can provide answers to data consumers in a self service manner.

What are the similarities and differences that will emerge between data teams and traditional software engineering teams?

Akash Bachu, Data Engineer at Drift
Software backend engineers and data engineers are very similar. Most software backend engineers know DevOps and continuous deployment, but this is lacking on the data side. I think this will change over the next few years as demand for data engineers increases
Kevin Liu, Data Engineer at Smarter Sorting
I find that data engineering teams are much more beholden to the whims of data science teams and must operate in a more ad-hoc manner than traditional sprint/product cycles.
Arpit Choudhury, Founder at Data-led Academy
Based on what I am seeing and reading, data teams are going to adopt software-engineering best practices and this is already happening with Data Ops emerging as a crucial function just like Dev Ops did in the last decade. And in terms of differences, I believe software engineers will become customers of data teams who will provide the data infra that software teams need to build data-powered products.

What do you believe is the biggest hurdle to adopting a “data-driven culture” today?

Ben Dundee, Director and Head of Data Science at Dekeo
I see “data-driven culture” as a bit of a misnomer. At the end of the day, “data-driven culture” means “I trust what the data people are telling me enough so that I’m comfortable making decisions”. So I’d reframe “data-driven culture” as “a culture built around trust in the data”. For this to happen, you have to believe that the data being produced is representative of the Universe you’re trying to understand. Data must be a first-class citizen. You have to have end-to-end responsibility for data quality — engineers must believe that there is no difference between code quality and data quality. Data is a product, data errors are customer-facing always. You have to have a common understanding of what numbers are telling you. You have to have an analytics team that understands what questions business leaders are asking. You have to have analytics leaders who can translate the numbers into a narrative that business leaders can understand and identify with. You have to build a culture of trust between executives and analysts. In my mind, all of these are significant barriers to having a data-driven culture. As a leader, you have to grade yourself on how much progress you’re making along these dimensions. To the extent that you worry about these problems, and they drive your prioritization will tell you how much your organization desires culture, vs. the label. For what it’s worth, the hardest problem to solve is trust. At the end of the day, no one ever trusts a number, they trust a person. It doesn’t matter how right you are. That should rub a lot of data scientists and analysts the wrong way, but it’s simply not enough to be right.
Dylan Baker, Co-founder of at Spectacles
I think a lack of investment in training. So many organizations spend so much time building high-quality data sets and spend no time training their organizations on how to be data literate. I think there’s a Field of Dreams problem here. Data teams increasingly should think of their roles as advocates for data, and part of that is helping people become more comfortable with data and helping them develop solid mental models around data.
Ben Cohen, Director of Data at Dott
Mindset — to be data-driven, one has to be humble in their convictions. Allowing data to change your mind on what you thought to be true. This is a mindset not easily crafted.

What’s the one tool you’re excited to incorporate into your data stack?

Kevin Liu, Data Engineer at Smarter Sorting
I'm pretty stoked to scale our data infrastructure and look at data tools like Hadoop and Kafka, maybe even Spark.
Miriah Peterson, Data Engineer at Weave HQ
I would honestly love to have a better reporting tool. We have such problems with the data integrity of our current stack. They are so many options, but something should be better.
Annika Lewis, Associate at Vanedge Capital
On this one, I’ve got to give a shoutout to two of my portfolio companies, Plotly & OmniSci. Plotly helps Data Scientists build analytic web apps using Python, and is proving instrumental in getting Data Science out of the lab and operationalizing it across the organization. OmniSci is a GPU-accelerated database that allows you to query massive datasets with low latency and offers pretty amazing browser-based geospatial visualizations.
Andrew McEwen, Co-founder at Secoda
I’m excited to incorporate Avo into our data stack. The problem Avo solves is creating a central source of truth for events that are defined in your applications. I’ve seen this become a problem at other places I’ve worked, and it’s typically solved through a manually managed Google Sheet. I think even for small companies like Secoda, this is an important problem to jump on early so it doesn’t become a big problem later.

This is a snippet of all the interviews. If you're interested in the future of data analytics and want to see the full transcript, you can read the entire report here

Create Your Secoda Workspace

Rediscover your data

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.