Why there isn't an LLM for data (...yet)

🎥 Recap from Etai Mizrahi's talk at MDS Fest 3.0
The data industry has seen remarkable technological shifts over the past three years.
First came the expansion of specialized tools, then consolidation through acquisitions, and now we're entering the third wave: AI integration. But despite the clear demand for AI-powered data tools, no universal "LLM for data" has emerged to transform analytics the way tools like Cursor have revolutionized coding or Harvey has transformed legal work.
"A tool could be something like what we're seeing Cursor do for coding. A tool could be something like Harvey for legal. The question is really, where is that for data and analytics, and why haven't we seen it just yet?" - Etai Mizrahi, CEO, Secoda
At MDS Fest 3.0, Secoda CEO and co-founder Etai Mizrahi explored this shift, breaking down exactly why building effective AI for data is uniquely challenging—and what it takes to get it right.
Before getting into the challenges, it's worth establishing what success would look like for a data team using an AI tool.
An ideal AI tool for data should handle five core capabilities:
The vision is compelling, but the execution has proven elusive - here’s why.
"It really comes down to two fundamental challenges. The first challenge is the data fragmentation challenge, but I think the bigger underlying problem is the lack of interoperability around how we communicate with those different tools." - Etai Mizrahi, CEO, Secoda
The most obvious problem is that data context is scattered across disconnected tools.
Your Snowflake warehouse stores the actual data, your dbt YAML files contain transformation logic, and your Looker LookML defines business metrics. Each represents a critical piece of context, but they don't communicate effectively with each other.
The deeper issue is semantic inconsistency across tools. What Power BI calls a "chart," Looker might refer to as an "embedded data source." These aren't just naming differences—they represent fundamentally different approaches to organizing and presenting data that an AI system must learn to navigate.
This problem compounds when working with newer tools that lack extensive public documentation. A platform like Sigma might not have the domain context available for LLM training that more established tools possess, creating knowledge gaps that are difficult to fill.
Schema evolution adds another layer of complexity. Unlike static documents, data environments are fluid. What's true on day one might be completely different by day ninety as schemas evolve, lineage changes, and pipelines are rebuilt. Any AI system must account for this constant state of change.
"MRR return may vary across organizations. We could, in theory, try to define them with a central definition, but there's likely some nuance, and there needs to be some sort of definitional logic that requires us to understand that institutional and legacy knowledge." - Etai Mizrahi, CEO, Secoda
The second major challenge involves the institutional knowledge that makes data meaningful. MRR (Monthly Recurring Revenue) might seem like a standard metric, but its calculation can vary significantly between organizations or even departments. Some include one-time fees, others exclude them. Some count annual contracts monthly, others don't. See where this is going?
This contextual ambiguity extends beyond metrics to fundamental questions about data interpretation. Understanding which tables are authoritative, which joins are appropriate, and which metrics are trusted requires deep organizational knowledge that exists in people's heads, not in documentation.
"Obviously, with something like data and analytics we want consistency to be very right even if that means speed is a little bit slower. There's obviously an issue around staleness of architecture. We might have metrics or lineage that is outdated." - Etai Mizrahi, CEO, Secoda
These fundamental challenges create several technical constraints that current LLM architectures struggle to handle:
"There are some things that are working. There are some things we're seeing that indicate that this is the right direction for data to be going in. The first is we're seeing some narrow copilots that work really well for certain use cases." - Etai Mizrahi
While no universal solution has emerged, several approaches are showing promise in specific contexts:
These solutions work because they operate within defined boundaries with clear context. The challenge is scaling this approach across the full complexity of modern data stacks.
"What do we need to actually get to the next rung of great tooling for data and AI? I think it comes down to four things. The first is a unified knowledge graph." - Etai Mizrahi
Based on current progress and technical requirements, four key components seem necessary for a breakthrough:
Moving beyond traditional data catalogs to create queryable knowledge graphs that include tables, dashboards, metrics, owners, data quality scores, and governance rules. This knowledge graph must update automatically as lineage changes, track usage patterns, and reflect schema evolution in real-time.
AI systems need to combine text, structured metadata, and visual context to understand data relationships. This means parsing not just table schemas but also dashboard configurations, transformation logic, and business context to validate outputs across different formats.
Rather than trying to replicate RBAC logic for each tool, successful AI systems will need to integrate with centralized security models that can propagate appropriate access controls to AI interactions.
The most promising approaches involve data teams actively training AI systems through validation and correction. This creates organizational memory where correct reasoning paths are reinforced and incorrect ones are avoided.
"We've been modeling workflows by giving different agents a specific task. For example, an agent could be really good at lineage parsing. An agent could be really good at writing queries or searching for the information. And all of those agents work in tandem." - Etai Mizrahi, CEO, Secoda
One of the most interesting developments is the emergence of multi-agent systems that mirror actual data team workflows. Instead of trying to solve everything with a single large model, these systems assign specialized agents to specific tasks:
These agents communicate with each other, plan multi-step workflows, and validate their work before presenting results. When they lack context, they can request additional information from other agents, creating an iterative problem-solving approach that resembles how experienced data teams tackle complex questions.
This approach has shown promise because it can handle the multi-step exploration that characterizes real analytics work, while maintaining accuracy through validation and cross-checking between specialized components.
"I really do believe this is a game-changing inflection point for how we think about data stacks. How do we think about a data coordination platform? Something that really allows embedded agents, memories, enforced rules, and different data sources to come into one place." - Etai Mizrahi, CEO, Secoda
Looking forward, several areas need development to realize the vision of universal AI for data:
"The best way to launch something like this is to really think about your modeled data that has context around it. These models deteriorate in quality as soon as they are working with extremely messy data and extremely undocumented environments." - Etai Mizrahi, CEO, Secoda
Perhaps the most practical insight from current implementations is the importance of building trust through controlled deployments. Rather than launching AI tools across entire data warehouses, successful teams start with 25-30 well-modeled, well-documented tables. This approach allows for:
As trust builds and the system proves reliable in constrained scenarios, scope can expand to include more complex data sources and use cases.
The question isn't whether AI will transform data work—it's how quickly we can solve the unique challenges that make data different from other domains where AI has already succeeded.
Data comes with complex relationships, evolving schemas, organizational context, and strict security requirements. The solutions that emerge will need to account for all of these factors while remaining practical for everyday use.
The most promising approaches combine specialized AI agents, comprehensive metadata management, and iterative human feedback to create systems that understand not just data, but how organizations actually use data. While we're not there yet, the foundation is being built for AI tools that can finally deliver on the promise of universal, reliable, context-aware data assistance.
For data teams exploring AI today, the key is to start with strong metadata, clear documentation, and well-defined scope—then expand gradually as trust and capability grow. The universal LLM for data may not exist yet, but the path to building it is becoming clearer.
Secoda CEO Etai Mizrahi shares a major update on how Secoda AI is evolving with a new context-first foundation, powered by metadata, Claude 4, and the Model Context Protocol (MCP). This letter outlines how Secoda is redefining AI for data teams—prioritizing trust, accuracy, and alignment with real-world workflows.