## Advantages and Disadvantages of Regression Model – Data Mining – Machine Learning

In this tutorial, we will understand the Advantages and Disadvantages of the Regression Model.

1. Regression models are easy to understand as they are built upon basic statistical principles, such as correlation and least-square error.

2. the output of regression models is an algebraic equation that is easy to understand and use to predict.

3. The strength (or the goodness of fit) of the regression model is measured in terms of the correlation coefficients, and other related statistical parameters that are well understood.

4. The predictive power of regression models matches with other predictive models and sometimes performs better than the competitive models.

5. Regression models can include all the variables that one wants to include in the model.

6. Regression modeling tools are pervasive. Almost all the data mining packages include statistical packages include regression tools. MS Excel spreadsheets can also provide simple regression modeling capabilities.

1. Regression models cannot work properly if the input data has errors (that is poor quality data). If the data preprocessing is not performed well to remove missing values or redundant data or outliers or imbalanced data distribution, the validity of the regression model suffers.

2. Regression models are susceptible to collinear problems (that is there exists a strong linear correlation between the independent variables). If the independent variables are strongly correlated, then they will eat into each other’s predictive power and the regression coefficients will lose their ruggedness.

3. As the number of variables increases the reliability of the regression models decreases. The regression models work better if you have a small number of variables.

4. Regression models do not automatically take care of nonlinearity. The user needs to imagine the kind of additional terms that might be needed to be added to the regression model to improve its fit.

5. Regression models work with datasets containing numeric values and not with categorical variables. There are ways to deal with categorical variables though by creating multiple new variables with a yes/no value.

Summary: