From: Predicting academic success in higher education: literature review and best practices
Strategies | Methods | Cases | Advantaged | Disadvantages | |
---|---|---|---|---|---|
Missing data | Listwise deletion | Instance/row deletion | Records contain few missing values | Does not affect the ability of the prediction model if the size of data set is large | Affects the ability of the prediction model if the size of data set is small |
Feature/column deletion | Column contain too many missing values | Does not affect the ability of the prediction model if the size of data set is large | Affects the ability of the prediction model if the number of attributes is small | ||
Imputation (Replacement) | Numeric values: (median or mean) of the student, Nominal values: (mode) of the student. | Missing data such as grade or marks | Preserve the data | Can introduce bias in the analysis | |
Numeric values: (median or mean) of the feature, Nominal values: (mode) of the feature. | Other missing data | ||||
Outlier data | Remove the outlier’s data | Incorrectly entered or outliers outside the population of interest. | Does not affect the ability of the prediction model if the size of data set is large | Affects the ability of the prediction model if the size of data set is small | |
Bin the data | Too extreme outliers that remain outliers after transformation | Easier to understand and handle Improve the ability of the prediction model | – | ||
Leave the outliers | When outliers are from the population of interest | Preserve the data | Affects the ability of the prediction model |