What is Data Curation?
Data curation is the practice of sifting through and organizing data to ensure that it is accurate, relevant and accessible for research.
The amount of data being created on a daily basis is staggering. The world generates 2.5 quintillion bytes of data every day, and that amount is growing exponentially. Up to 90 percent of all data has been created in just the last two years.
The bulk of this data is unstructured and unorganized. A large portion of it is also inaccurate or irrelevant. This makes it nearly impossible for researchers, engineers, and anyone outside of your data organization, or even within it, to pull out the information they need in a timely manner.
Data curation is the organization and management of data throughout its lifecycle. It includes features such as access management, identification, description, preservation, transformation and usage of data. Data curation services are used to ensure that data is findable, accessible, interoperable and trustworthy.
What's involved in data curation?
Data curation covers all of the activities involved in preparing data for analysis and preservation. This includes both manual and automated processes that deal with tasks such as indexing, cleaning and normalizing data, ensuring its quality, adding metadata and ensuring compliance with standards or policies.
Here are just some of the ways that data curation can benefit your company:
1. Reduces storage costs
2. Helps you meet compliance requirements
3. Enhances organizational productivity
4. Protects valuable data assets
5. Improves customer relationships and increases customer loyalty
Data Curator Responsibilities
The data curator within an organization or business is typically a data analyst, engineer, or scientist, if it's a smaller team. In short, it is the responsibility of everyone on the data team to decide who is ultimately held accountable for curating the data. Sometimes, this can be a series of day-to-day best practices to ensure the data is maintained. On the other hand, data curation can be a big undertaking done in one go to make a significant change.
It is up to the data curator to decide what is relevant, how it should be stored, how it is defined, and how it's accessed. This also includes managing the metadata for the accompanying data in a sound fashion.
How is data curation different than data management?
Data curation differs from data management in that it focuses on the processes and tools used to manage data as a valuable resource. Curated data is kept pristine and organized so it can be found easily in the future. Data management is the general practice of data throughout its lifecycle- curation is the focused practice of ensuring it is well maintained and accessible.
Data curation encompasses both the technical and organizational aspects of handling data: it requires well-defined procedures for ingesting, storing, documenting and sharing the resource.
Curation allows users to find the right data at the right time and ensures that it remains usable in the future.
Curation is crucial for big data, because it can be difficult to manage large volumes of varied data without a well-organized system in place. However, data curation is important for any organization that depends on consistent access to reliable information.