What is Data Validation?

Data validation is a key aspect of software quality assurance and represents a software development process where data is tested to ensure it meets certain criteria. Data validation can be implemented through the UI layer by validating input values entered by the end user into text-input controls, or through the business layer with business rules that are used to validate data from other layers.

While data validation can help improve data quality and integrity, it cannot prevent all forms of invalid data from entering a system. For example, validation routines might not catch all incorrect or illogical combinations of data. Some forms of malicious input, such as SQL injection, might bypass application layer validation routines and enter a system directly at the database layer.

Data validation is intended to provide certain well-defined guarantees for fitness, accuracy, and consistency for any of various kinds of user input into an application or automated system. Proper data validation should prevent bad data from entering a system. It can mean the difference between an application that works well and one that doesn't work at all.

Purpose of Data Validation

The ultimate purpose of validating data is to ensure data integrity. These are both closely related to data governance in that they ensure that collecting, storing, analyzing, and utilizing data throughout its lifecycle within an organization is sound. Data validation is a part of the technical process in which data integrity is achieved. This means that all data collected meets standards set out by the organizations data stewards and stakeholders (i.e. the Data Council), and that it is usable and scaleable.

Methods of Data Validation

  • Attribute constraints. Attribute constraints specify acceptable values for a set of attributes in a relation or relation schema. The range and domain are examples of attribute constraints (although the range may be defined across multiple attributes). In addition, attribute constraints may be defined to reflect relationships among two or more attributes
  • Entity integrity. Entity integrity states that each record in a table must be unique in order to identify it. This constraint can be enforced by using primary key constraints, unique constraints or identity columns
  • Referential integrity. Referential integrity is an attribute that ensures the relationships between tables remain consistent, despite updates to one table. When a user deletes data from one table that is related to data in another table, referential integrity prevents the deletion if it would create orphaned records in the second table.

How to Validate Data

1. Settings in a database or software application that automatically validate data as it is entered into the system. For example, a field may be configured so that it only accepts positive numbers between 1 and 100. An error message would then be displayed if someone tries to enter a number outside of this range.

2. Rules that are applied during batch processing, sometimes using specialized software tools, to check large amounts of historical data for issues. For example, a rule could check that all customer names contain at least three characters.