What is Airbnb's Midas Data Process?

What is Airbnb's Midas Data Process?

The Midas Data Process is a certification protocol initiated by Airbnb in 2020 to ensure the quality and timeliness of critical data sets and metrics. It involves four key reviews: Spec Review, Data Review, Code Review, and Minerva Review. Each review assesses different aspects of the data pipeline, from design specs to code quality and metric definitions. The process aims to improve data quality but requires significant investment in design, development, validation, and maintenance.

  • Spec Review: This review evaluates the proposed design specifications for the data model before its implementation. It ensures that the design meets the required standards and is feasible for development.
  • Data Review: This stage involves checking the data quality through validation reports and quality checks. It ensures that the data pipeline produces accurate and reliable data.
  • Code Review: This review assesses the code used to generate the data pipeline. It checks for code quality, efficiency, and adherence to best practices.
  • Minerva Review: This final review verifies the source of truth metric definitions implemented in Minerva, Airbnb's metrics service. It ensures that the metrics are correctly defined and implemented.

How does Airbnb's Data Quality Score (DQ Score) work?

The Data Quality Score (DQ Score) is a high-level metric ranging from 0 to 100 that assesses the quality of data assets at Airbnb. The score is calculated by averaging the quality scores of each column in a data asset. The DQ Score aims to motivate data producers to collaborate with data consumers to enhance data quality. Scores are categorized into four levels: "Poor," "Okay," "Good," and "Great," and dimensional scores allow for detailed assessment across different quality dimensions.

  • Categorical Thresholds: The DQ Score is divided into four categories: "Poor," "Okay," "Good," and "Great." These thresholds help in quickly identifying the overall quality of a data asset.
  • Dimensional Scores: These scores allow an asset to be evaluated on multiple dimensions, such as accuracy and reliability, providing a more nuanced view of data quality.
  • Collaboration Incentive: The DQ Score encourages data producers and consumers to work together to improve data quality, fostering a collaborative environment.

What challenges does the Midas process face?

While the Midas process has significantly improved data quality at Airbnb, it faces several challenges. The certification strategy has proven to be non-scalable, encountering resistance from the frontline data team. The process requires substantial investment in time and resources to design, develop, validate, and maintain the necessary data assets and documentation. These challenges have raised concerns about the long-term sustainability and scalability of the Midas process.

  • Non-Scalability: The process is resource-intensive and may not scale well as the volume of data and number of data assets increase.
  • Frontline Resistance: The significant investment required has led to resistance from the frontline data team, who may find the process burdensome.
  • Resource Investment: Meeting the full data quality criteria necessitates a considerable investment in design, development, validation, and maintenance, which can be challenging to sustain.

How Can You Use Secoda to Enhance Data Quality in Your Organization?

Secoda is the only data platform that incorporates Airbnb's Data Quality Score (DQ Score) and the Midas process, offering a comprehensive solution for data quality management. By integrating these methodologies, Secoda helps organizations ensure data accuracy, completeness, and consistency. It automates data monitoring and lineage, preventing issues like inconsistencies and errors. Additionally, Secoda's AI assistant allows users to interact with their data using natural language, making data management more accessible.

  • Automated Data Monitoring: Secoda continuously monitors data quality, identifying and alerting users to any inconsistencies, errors, or anomalies that could affect data accuracy and completeness.
  • Data Lineage: The platform tracks the flow of data through various systems, providing a clear lineage that helps in understanding data origins and transformations, thereby ensuring data integrity.
  • AI Assistant: Secoda's AI assistant enables users to ask questions about their data using natural language, making it easier to access and understand complex data sets without needing advanced technical skills.

From the blog

See all