What's involved in Data Integrity?

Data integrity within the context of information technology means maintaining and assuring the accuracy and completeness of data over its entire life-cycle. This means that data cannot be modified in an unauthorized or undetected manner. This is not the same as referential integrity in databases, although it can be viewed as a special case of consistency as understood in the classic ACID model of transaction processing.

Data integrity involves data security, which refers to the authorized access of data, but goes beyond access limitations by ensuring that the data itself stays intact and trustworthy. Data integrity includes the prevention of unauthorized access or alteration to data.

The integrity of data is threatened by changes in hardware devices, software programs, people, processes, and natural phenomena such as fire, flood, and earthquakes. The term has a slightly different meaning in fields like cryptography and data compression where it refers to the correctness of the result of a function or algorithm over its inputs.

Why is Data Integrity important?

The importance of data integrity increases as organizations generate, store and process more sensitive information about their customers, employees, supply chain and financials. In addition to maintaining consumer trust and minimizing liability, many companies must comply with strict industry regulations regarding data security and privacy.

Data integrity means that every piece of data in your system is trustworthy. It's accurate and consistent across different applications and places in the database. All parties who are working with the data involved are aware of the actions that they're taking to maintain data integrity.

The importance of data integrity is especially evident when you're first getting started with a project or process. When you begin working with data for the first time, you may not have any idea what questions to ask or what kinds of analyses to perform. It's tempting to just dive in. However, if the quality of your data isn't up to snuff, you can make a total mess out of your results, or not come up with any at all.

How to Maintain Data Integrity

  • Keep your team informed. This means that everyone on the data team and anyone outside of the team who is interacting with the data are fully aware of what practices to follow, how to correctly interact with the data, and what exactly the consequences are for each of their actions.
  • Create a process for backing up data. There are tools that will automatically back up your data at different check points. If someone were to make a change to the data that was incorrect or unauthorized, the change could easily be reversed and the data restored to what it was before the change was made.
  • Ensure there is a trail that can't be changed or altered. Closely related to backing up data, there are tools that will track every change and interaction made within your database or warehouse. This creates accountability and transparency with regards to who is interacting with your data, and how.
  • Control who has access to your data. Only those who are in the data organization and have accept the responsibilities of having full access to the data should be able to edit and change it. Those who need to be able to understand and access the data without editing should be given the permissions to do so.