Association Rule Mining

Within this exercise, you can get a deeper understanding of association rule mining and learn how to identify good rules for a sample data set.

Libraries and Data

The first part of the exercise is about association rule mining. In Python, you can use the mlxtend library for the mining of association rules.

We use data about store baskets in this exercise. You can use the following code to load the data. The code creates a list of records, where each record is a list of the items that are part of the transaction.

with open('store_data.csv') as f:
    records = []
    for line in f:
        records.append(line.strip().split(','))

Finding frequent itemsets

Once you have the transactional records, use the apriori algorithm to find frequent itemsets with a suitable threshold for support for this data. Try to find a suitable threshold for the minimal support such that you can state a clear reason why you picked this threshold.

Mining rules from the frequent itemsets

Determine good rules from the results for this data. Use lift and confidence as metrics for your evaluations.

Validation of the rules

Randomly split your records into two sets with roughly 50% of data each. Now use the Apriori algorithm to determine rules on both of these sets. Do you find similar rules on both sets? What does the similarity/the differences indicate?