With this exercise, you can learn more about regression. You can try of different variants of linear regression on a data set and compare the performance with different metrics. You should also try to gain insights into the models through the coefficients.
Libraries and Data
Your task in this exercise is to try out different regression models, evaluate their goodness of fit and evaluate the meaning of the coefficients. You can find everthing you need in sklearn
, statsmodel
is another popular library for this kind of analysis.
We use data about house prices in california in this exercise.
Train, Test, Evaluate
Now that training and test data are available, you can try out the different variants of linear regression. What happens when you use OLS/Ridge/Lasso/Elastic Net? How does the goodness of fit measured with $R^2$ on the test data change? Additionally, perform a visual evaluation of the results. How do the coefficients change?
Bonus Task (will not be discussed during the exercise)
Regression does not have to be linear. There is also non-linear regression and even decision trees can be used for regression. In recent years, random forests (and, of course, neural networks, which we politely ignore here) were remarkably successful for all kinds of regression tasks. Try out random forest regression on this data set and ideally, also find out how this works.