What is a Data Glossary?

A data glossary, sometimes also referred to as a data dictionary, is a collection of data definitions. They are designed to define the different types of data used in an organization so that everyone can refer to the common definitions. A glossary is a type of dictionary that provides a full description of all the terms used in your company involving data.

The definitions are usually accompanied by short explanatory notes on usage, defined by the data stewards in an organization. Data glossaries are useful in their ability to improve the legibility of existing data and empower people within and outside of your data organization better understand the data warehouse or database they're looking in. It also creates consistency: everyone has one common understanding of the terms, information, and fields in the data.

How is a data glossary used?

A Data Glossary is created when there is a need for a common vocabulary to talk about data, and concepts underlying the data. It is important to choose meaningful, easy to understand definitions and terms that are consistently used across the organization. It's necessary to consult with the involved stakeholders when creating a data glossary- and if your organization doesn't already have a data governance council, the creation of the glossary may be a good time to start one. This way, experts from different areas and organizational functions can be consulted an ensure that the descriptions and terms cover all major areas that use data.

Hisotrically, data glossaries are used to define and clarify terms used in the data world. These terms can be specific to a field or subject – e.g. mathematics, computers or sociology – or specific to an organization or network of organizations – e.g. the World Bank, or an insurance company. Now, the term data glossary is commonly associated with subject-specific glossaries in computer science, as there are many data sets associated with specific areas such as bioinformatics and financial services, and these need cross-legal terms that allow for data sharing, integration and reuse across organizational boundaries when necessary.

Why is it helpful to have a data glossary?

When there is one person managing the data analysis and management for an entire organization, there is only one source of truth. However, most businesses need to grow and scale, and it's not sustainable to have the source of truth be one person's memory. The benefits of data glossary include:

  • Single source of truth. This means that there are no conflicting definitions of data terms within an organization, since everyone has agreed upon the term that is recorded in the data glossary.
  • Scaleability. Anyone with access to the data glossary is able to understand the data without having to consult the data stewards (typically data analysts or engineers).
  • Empowerment. People within and outside of the data organization are able to understand information from the data by consulting the glossary, and can draw their own conclusions from it.