Introduction to Data Preprocessing – Feature Engineering and Feature Selection in Data Mining
In this article, I will discuss,
- Motivation for Data Preprocessing,
- Steps in Data Preprocessing
Motivation for Data Preprocessing
Real-world datasets are highly influenced by negative factors such as the presence of noise, missing values, redundancy, outliers, and inconsistencies. A low-quality dataset will leads to poor performance or failure of machine learning or deep learning project.
Now a day’s, a large number of Machine Learning, Deep Learning, and transfer learning algorithms were designed. But the success or failure of these models largely depends on the quality of the data set used and the features selected.
Hence, Data Preprocessing also known as Feature Engineering & Feature Selection plays a very important stage in building a useable machine learning or deep learning project.
There are mainly two steps in data preprocessing:
- Data Preparation
- Data Reduction
Following are the forms of Data Preparation
Data cleaning is the process of Correcting the bad data, filter out incorrect data from the data set, and reduce the unnecessary detail of data.
Data Transformation is the process of consolidation of data so that the mining process result could be applied or maybe more efficient.
Collecting and Merging the data from multiple data stores.
Data Normalization is the process to express data in the same measurements such as units, scale, or range.
Missing Data Imputation
The collected data may contain missing values, Imputation method is used to fill the variables that contain missing values with some intuitive data.
To detect random errors or variances in a measured variable.
Following are the Forms of Data Reduction
Achieves the reduction of the data set by removing irrelevant or redundant features (or dimensions).
Consists of choosing a subset of the total available data to achieve the original purpose of the application as if the whole data had been used.
Transforms quantitative data into qualitative data, that is, numerical attributes into nominal attributes with a finite number of intervals.
Feature Extraction/Instance Generation –Extends both the feature and instance selection by allowing the modification of the internal values that represent each example or attribute.
This article introduces Data Preprocessing – FeatureEngineering and Feature Selection in Data Mining. If you like the material share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.