Search
Data Exploration with Descriptive Statistics and Visualizations

This exercise is for the data exploration and your task is to understand a data set based on the description, statistics, and from visualizations.

Data for this exercise

We use the boston house price data in this exercise. The data is available as part of sklearn for Python. The description of the data is provided together with the actual data and should be the starting point for your analysis of the data.

Descriptive statistics of the boston data

Explore the boston data using descriptive statistics. Calculate the central tendency with the mean and median, the variability through the standard deviation and the IQR, as well as the range of the data. The real task is understanding something about the data from these results. For example, what can you learn about the CRIM feature from the mean and the median?

Visualizations

The Python library matplotlib is great for creating all kinds of visualizations. There are even libraries on top of matplotlib that facilitate relatively complex visualizations in a single line of code like seaborn.

Analyze single features of the boston data

Visually analyze features zn and indus of the boston data. Use the techniques described in Chapter 3, i.e., histograms and density plots (with/without rugs). What can you learn about these features from these plots? What are the advantages and drawbacks of the different plots for this data?

Analyze the pair-wise relationships between the features of the boston data

Next, analyze the pair-wise relationships between all fourteen features of the boston data. First, analyze their relationship through scatter plots. Then, create a heatmap of the correlations between the features. What did you learn about the data?