Search
Regression

With this exercise, you can learn more about regression. You can try of different variants of linear regression on a data set and compare the performance with different metrics. You should also try to gain insights into the models through the coefficients.

Libraries and Data

Your task in this exercise is to try out different regression models, evaluate their goodness of fit and evaluate the meaning of the coefficients. You can find everthing you need in sklearn, statsmodel is another popular library for this kind of analysis.

We use data about house prices in california in this exercise.

Generating training and test data

Before you can start building regression models, you need to separate the data into training and test data. Please use 50% of the data for training, and 50% of the data for testing.

Train, Test, Evaluate

Now that training and test data are available, you can try out the different variants of linear regression. What happens when you use OLS/Ridge/Lasso/Elastic Net? How does the goodness of fit measured with $R^2$ on the test data change? Additionally, perform a visual evaluation of the results. How do the coefficients change?

Bonus Task (will not be discussed during the exercise)

Regression does not have to be linear. There is also non-linear regression and even decision trees can be used for regression. In recent years, random forests (and, of course, neural networks, which we politely ignore here) were remarkably successful for all kinds of regression tasks. Try out random forest regression on this data set and ideally, also find out how this works.