In this lesson, you will discover the importance of data preparation for automatic assembly testing machine in predictive modeling with machine learning.
Predictive modeling projects involve learning from data.
Data refers to examples or cases from the domain that characterize the problem you want to solve.
On a predictive modeling project, such as classification or regression, raw data typically cannot be used directly.
There are four main reasons why this is the case:
Data Types: Machine learning algorithms require data to be numbers.
Data Requirements: Some machine learning algorithms impose requirements on the data.
Data Errors: Statistical noise and errors in the data may need to be corrected.
Data Complexity: Complex nonlinear relationships may be teased out of the data.
The raw data must be pre-processed prior to being used to fit and evaluate a machine learning model. This step in a predictive modeling project is referred to as “data preparation.”
There are common or standard tasks that you may use or explore during the data preparation step in a machine learning project.
These tasks include:
Data Cleaning: Identifying and correcting mistakes or errors in the data.
Feature Selection: Identifying those input variables that are most relevant to the task.
Data Transforms: Changing the scale or distribution of variables.
Feature Engineering: Deriving new variables from available data.
Dimensionality Reduction: Creating compact projections of the data.
Each of these tasks is a whole field of study with specialized algorithms.
For this lesson, you must list three data preparation algorithms that you know of or may have used before and give a one-line summary for its purpose.
One example of a data preparation algorithm is data normalization that scales numerical variables to the range between zero and one.
Post your answer in the comments below. I would love to see what you come up with.
In the next lesson, you will discover how to fix data that has missing values, called data imputation.