What is Airbnb's Midas Data Process?

What is Airbnb's Midas Data Process?

The Midas Data Process is a certification protocol initiated by Airbnb in 2020 to ensure the quality and timeliness of critical data sets and metrics. It involves four key reviews: Spec Review, Data Review, Code Review, and Minerva Review. Each review assesses different aspects of the data pipeline, from design specs to code quality and metric definitions. The process aims to improve data quality but requires significant investment in design, development, validation, and maintenance.

  • Spec Review: This review evaluates the proposed design specifications for the data model before its implementation. It ensures that the design meets the required standards and is feasible for development.
  • Data Review: This stage involves checking the data quality through validation reports and quality checks. It ensures that the data pipeline produces accurate and reliable data.
  • Code Review: This review assesses the code used to generate the data pipeline. It checks for code quality, efficiency, and adherence to best practices.
  • Minerva Review: This final review verifies the source of truth metric definitions implemented in Minerva, Airbnb's metrics service. It ensures that the metrics are correctly defined and implemented.

How does Airbnb's Data Quality Score (DQ Score) work?

The Data Quality Score (DQ Score) is a high-level metric ranging from 0 to 100 that assesses the quality of data assets at Airbnb. The score is calculated by averaging the quality scores of each column in a data asset. The DQ Score aims to motivate data producers to collaborate with data consumers to enhance data quality. Scores are categorized into four levels: "Poor," "Okay," "Good," and "Great," and dimensional scores allow for detailed assessment across different quality dimensions.

  • Categorical Thresholds: The DQ Score is divided into four categories: "Poor," "Okay," "Good," and "Great." These thresholds help in quickly identifying the overall quality of a data asset.
  • Dimensional Scores: These scores allow an asset to be evaluated on multiple dimensions, such as accuracy and reliability, providing a more nuanced view of data quality.
  • Collaboration Incentive: The DQ Score encourages data producers and consumers to work together to improve data quality, fostering a collaborative environment.

What challenges does the Midas process face?

While the Midas process has significantly improved data quality at Airbnb, it faces several challenges. The certification strategy has proven to be non-scalable, encountering resistance from the frontline data team. The process requires substantial investment in time and resources to design, develop, validate, and maintain the necessary data assets and documentation. These challenges have raised concerns about the long-term sustainability and scalability of the Midas process.

  • Non-Scalability: The process is resource-intensive and may not scale well as the volume of data and number of data assets increase.
  • Frontline Resistance: The significant investment required has led to resistance from the frontline data team, who may find the process burdensome.
  • Resource Investment: Meeting the full data quality criteria necessitates a considerable investment in design, development, validation, and maintenance, which can be challenging to sustain.

How Can You Use Secoda to Enhance Data Quality in Your Organization?

Secoda is the only data platform that incorporates Airbnb's Data Quality Score (DQ Score) and the Midas process, offering a comprehensive solution for data quality management. By integrating these methodologies, Secoda helps organizations ensure data accuracy, completeness, and consistency. It automates data monitoring and lineage, preventing issues like inconsistencies and errors. Additionally, Secoda's AI assistant allows users to interact with their data using natural language, making data management more accessible.

  • Automated Data Monitoring: Secoda continuously monitors data quality, identifying and alerting users to any inconsistencies, errors, or anomalies that could affect data accuracy and completeness.
  • Data Lineage: The platform tracks the flow of data through various systems, providing a clear lineage that helps in understanding data origins and transformations, thereby ensuring data integrity.
  • AI Assistant: Secoda's AI assistant enables users to ask questions about their data using natural language, making it easier to access and understand complex data sets without needing advanced technical skills.

Related terms

Data governance for Tableau

Data Governance is becoming more important as organizations move away from standard models of data management and towards individualized projects with multiple teams in multiple locations. To ensure accuracy and consistency, organizations need to properly set up Data Governance. Tableau and secoda are two tools to simplify this process. Tableau can be used to visually monitor, identify, explore, and analyze data discrepancies, while secoda offers an automated data lineage tracking system that can not only give data lineage visibility but can also help quickly detect errors. Set up for both of these tools starts by properly cataloging your data sources, data model objects, and transformations. This can be done via Tableau’s Data Visualization Paths feature or secoda’s Graph feature. Tableau then allows for data profiling and provides data level security. Also, Tableau allows you to automate routines with scripts that can be used to automate data summarization and instance updates. Secoda enables users to track data lineage and impact through capturing dependencies between tables and detect anomalies between source and target. This helps ensure data accuracy and make data governance easier. Both Tableau and secoda support a variety of governance strategies such as data privacy, data quality monitoring, and data access control. By properly setting up Data Governance with Tableau and secoda, organizations can ensure accuracy and consistency while still having access to reliable sources of data. This increases the value and improves the ROI of data-driven projects.
Right arrow

From the blog

See all