Data preparation is the process of gathering, cleansing, transforming and modelling data with the goal of making it ready for analysis as part of data visualization or business intelligence.
Data preparation is an important step in data analytics as well as in business intelligence. It's also a core function of business analysts.
In its most basic terms, preparation means making sure that data from multiple sources can be combined seamlessly into a single useful source. That usually requires unifying and normalizing data so that it can be used for further analysis and reporting.
A data preparation platform can help you find, prepare, and use your data faster than you ever thought possible. Here are four ways a modern data prep platform can make your life easier:
Data preparation is one of the most important steps in data mining, and it’s also the most tedious part. It involves such tasks as collecting data, cleaning it up for further processing, transforming it into a suitable format, and so forth.
The idea behind data preparation is to transform raw data into a form that you can use for analysis.
Raw data is usually collected from multiple sources. For example, if you want to analyze customer behavior, your raw data might come from your company’s sales database and customer relationship management (CRM) system. In this case, sales records would be stored in a sales table while customer information would be stored in a customer table. These two tables would probably have identical fields (or columns), but they would contain different values. It’s your job as a data scientist to combine these two tables into one big table so you can determine which customers bought what products and how much they paid for those products.
The data preparation process is often overlooked in the rush to find insights in new data sources. It's a tedious, time-consuming task, but it's important to implementing a data-driven business strategy.
Data preparation is the process of collecting, transforming and enriching raw data to make it suitable for analysis. This can include cleaning, normalizing and consolidating raw data sets, as well as adding new dimensions or attributes to the data.
Data preparation is sometimes referred to as "data wrangling." It's the process of taking raw data from a variety of sources and getting it ready for analysis. Data preparation is a critical aspect of data science — the quality and reliability of analysis is only as good as the quality of input data.
An example of data preparation is ensuring numerical values are stored within a table or warehouse consistently. Take the value of "time"- some may input time as "2:30 PM", while others may input this as "14:30". Whether its automatic or manual, someone is making sure that all times are inputted as EITHER on a 24-hour clock, or on a 12-hour clock, but not both. This way, those looking back on the data are able to compare the values in a clean manner.