Find, understand and get familiar with data terms.
A data mesh is a decentralized architecture that organizes and leverages domain-oriented ownership of data. Learn more about a data mesh and its benefits.
A data product manager is responsible for the success of the product and needs to be able to write software using various programming languages. Learn more.
Relational database management systems (RDBMS) are the most common type of database management system. Learn more about RDBMS and how they are used here.
A recommendation engine, also referred to as "suggestion engine”, is used to predict the rating or preference that a user would give to an item. Learn more.
DataOps is a data management method that emphasizes communication, collaboration and integration for data teams. Learn everything you need to know here.
Data stewardship is the set of people, processes and tools that ensure consistent and reliable access to data. Learn everything you need to know here.
Data validation is a key aspect of software quality assurance and represents a process where data is tested to ensure it meets certain criteria.
Data silos create duplicate work, waste time and money, and prevent organizations from learning from their own experiences.
A data fabric is a software architecture for data management that unifies and integrates data across multiple systems. Learn more about data fabric here.
Data confidentiality is a set of rules or a promise that limits access or places restrictions on any information that is being shared. Data confidentiality is a component of information security and privacy.
Anonymized data is data that has been stripped of personally identifiable information, also known as PII. Learn more about anonymized data here.
ETL is the process of moving data from source systems to a data warehouse. ETL pipelines take raw data and make it possible to provide analytics and insights.
MDM is a way to bring together critical business data in one place and help ensure that it's accurate, consistent, complete and trusted across the enterprise
Data preparation is the process of gathering, cleansing, transforming and modelling data to prepare for analysis. Learn more about data preparation here.
Data deduplication is a process that identifies and removes duplicate copies of repeating data segments to free up storage capacity and reduce costs
Data wrangling is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes
Data ingestion is the transportation of data from assorted sources to a storage medium where it can be accessed and analyzed by an enterprise. Learn more here.
Data architecture is the design of data for use in defining the target state and the subsequent planning needed to achieve the target state. Learn more here.
Apache Airflow is a platform to programmatically author, schedule and monitor workflows. Learn about the history of Apache Airflow and more here.
Data analytics is an umbrella term for a number of different ways that data can be analyzed.
Data privacy is the right of people to control their own personal data. Learn about the two major types of data privacy and the future of data privacy here.
A relational database is a type of database that stores and provides access to data points that are related to one another. Learn more here.
Metrics layer is a type of software that allows you to access data from multiple sources, transform the data, and send it to another system. Learn more here.
Data modeling is the process of creating a visual representation of data structures and their relationships within a system. Learn more here.
A data silo is a repository of information and are often used to store digital assets. A data silo has a single purpose and can be problematic. Learn more.
Data discovery tools help non-technical users access and analyze complex data sets within their organization. Learn more about data discovery here.
Data platforms provide the infrastructure to bring together all the needed data points in one place. Learn more about a data platform here.
Data analysts are the people who take data and use it to help companies make better business decisions. Learn about the responsibilities of a data analyst.
BI technologies provide historical, current and predictive views of business operations
A data governance framework defines who is responsible for managing and protecting the information collected by a company or organization. Learn more here.
A data catalog allows organizations to discover and collaborate on data, as well as find and understand the meaning of specific data elements. Learn more.
A query is a request for data or information from a database table or combination of tables. Learn more about using a database query tool here.
In the context of a database, a primary key is simply a unique identifier for a row. Learn more about a primary key and how to use it in a table here.
Metabase is an open source tool that allows for powerful data instrumentation, visualization, and querying. Learn more about Metabase and its features here.
Data obfuscation or data masking is the process of intentionally hiding or obscuring sensitive or confidential information. Learn more about data privacy here.
Data engineers design, build and maintain the architecture for collecting, storing and processing data to support analytics and decision-making. Learn more.
Redash features a simple interface that allows you to create new queries, visualize the data and share dashboards. Learn more here.
Data curation is the practice of organizing data to ensure that it is accurate, relevant and accessible for research. Learn more about data curation here.
Metadata management is the process by which data managers collect, organize and maintain metadata, or data about other data. Learn more here.
Structured Query Language is used to communicate with a database. Some relational database management systems that use SQL are Oracle and Microsoft SQL Server.
A data lake is a repository that holds a vast amount of raw data in its native format until it is needed. Learn about data lakes and what they are used for.
Data profiling is a set of processes and tools used to understand the contents of a dataset. Learn everything you need to know about data profiling here.
An Entity Relationship Diagram (ERD) is a visual representation of different data using conventions that describe how these data are related to each other
Data lineage enables you to trace the path of a specific piece of data as it moves throughout your data ecosystem
Metadata is data that provides some description of another dataset. Learn more about the different types of metadata and why it is important here.
Data governance is the practice and concept of managing data throughout its lifecycle. Learn everything you need to know about data governance here.
Monthly recurring revenue (MRR) is used to describe the amount of money a business expects to receive from their subscribers each month. Learn more here.
A reverse ETL is the opposite of an ETL, and is used as a development life cycle to test the fitness, quality and readiness of data.
A data glossary is a collection of data definitions. A data glossary is created when there is a need for a common vocabulary to talk about data. Learn more.
A database is a tool to collect, store, sort, and manage your data.
A 'data warehouse' is a database managed by an organization.
Snowflake Data Warehouse is a cloud-based analytics platform designed for scalable data storage, processing, and analysis. Learn more here.
Data build tool (dbt) is an open-source command line tool that enables data analysts and engineers to transform data in their warehouse more effectively.
Extract, Transform, and Load (ETL) is a three-step process used to transfer data from one location to another. Learn more about ETL and how it works here.
A data dictionary is a central source of information about the data in your organization, business or enterprise. Learn more here.
A/B testing is a method of comparing two versions of a webpage against each other to determine which one performs better. Learn more about A/B testing here.
Get the newsletter for the latest updates, events, and best practices from modern data teams.