Search
Association Rule Mining

This is a sample solution for the association rule mining exercise. This does not mean that this is the only way to solve this exercise. As with any programming task - and also with most data analysis tasks - there are multiple solutions for the same problem.

Libraries and Data

The first part of the exercise is about association rule mining. In Python, you can use the mlxtend library for the mining of association rules.

We use data about store baskets in this exercise. You can use the following code to load the data. The code creates a list of records, where each record is a list of the items that are part of the transaction.

with open('data/store_data.csv') as f:
    records = []
    for line in f:
        records.append(line.strip().split(','))

Finding frequent itemsets

Once you have the transactional records, use the apriori algorithm to find frequent itemsets with a suitable threshold for support for this data. Try to find a suitable threshold for the minimal support such that you can state a clear reason why you picked this threshold.

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder

# we first need to create a one-hot encoding of our transactions
te = TransactionEncoder()
te_ary = te.fit_transform(records)
data_df = pd.DataFrame(te_ary, columns=te.columns_)

# use support of 0.005 - low threshold may include to many candidates
# careful selection of rules based on other metrics required
# this means that we use a higher confidence
frequent_itemsets = apriori(pd.DataFrame(
    data_df), min_support=0.005, use_colnames=True)
frequent_itemsets
support itemsets
0 0.020397 frozenset({almonds})
1 0.008932 frozenset({antioxydant juice})
2 0.033329 frozenset({avocado})
3 0.008666 frozenset({bacon})
4 0.010799 frozenset({barbecue sauce})
... ... ...
720 0.007466 frozenset({spaghetti, soup, mineral water})
721 0.009332 frozenset({spaghetti, tomatoes, mineral water})
722 0.006399 frozenset({spaghetti, mineral water, turkey})
723 0.006266 frozenset({spaghetti, mineral water, whole whe...
724 0.005066 frozenset({pancakes, olive oil, spaghetti})

725 rows × 2 columns

Mining rules from the frequent itemsets

Determine good rules from the results for this data. Use lift and confidence as metrics for your evaluations.

from mlxtend.frequent_patterns import association_rules

# use a high confidence to counterbalance the low support threshold
# order by lift to have the "best" rules at the top
association_rules(frequent_itemsets, metric="confidence",
                  min_threshold=0.5).sort_values('lift', ascending=False)
antecedents consequents antecedent support consequent support support confidence lift representativity leverage conviction zhangs_metric jaccard certainty kulczynski
13 frozenset({shrimp, ground beef}) frozenset({spaghetti}) 0.011465 0.174110 0.005999 0.523256 3.005315 1.0 0.004003 1.732354 0.674995 0.033408 0.422751 0.278856
6 frozenset({frozen vegetables, ground beef}) frozenset({spaghetti}) 0.016931 0.174110 0.008666 0.511811 2.939582 1.0 0.005718 1.691742 0.671179 0.047515 0.408893 0.280791
9 frozenset({olive oil, frozen vegetables}) frozenset({spaghetti}) 0.011332 0.174110 0.005733 0.505882 2.905531 1.0 0.003760 1.671444 0.663346 0.031899 0.401715 0.269404
8 frozenset({soup, frozen vegetables}) frozenset({mineral water}) 0.007999 0.238368 0.005066 0.633333 2.656954 1.0 0.003159 2.077178 0.628658 0.020994 0.518578 0.327293
17 frozenset({olive oil, soup}) frozenset({mineral water}) 0.008932 0.238368 0.005199 0.582090 2.441976 1.0 0.003070 1.822476 0.595818 0.021476 0.451296 0.301951
7 frozenset({olive oil, frozen vegetables}) frozenset({mineral water}) 0.011332 0.238368 0.006532 0.576471 2.418404 1.0 0.003831 1.798297 0.593226 0.026864 0.443918 0.301938
15 frozenset({milk, soup}) frozenset({mineral water}) 0.015198 0.238368 0.008532 0.561404 2.355194 1.0 0.004909 1.736520 0.584287 0.034820 0.424136 0.298599
2 frozenset({soup, chocolate}) frozenset({mineral water}) 0.010132 0.238368 0.005599 0.552632 2.318395 1.0 0.003184 1.702471 0.574488 0.023052 0.412618 0.288061
3 frozenset({eggs, cooking oil}) frozenset({mineral water}) 0.011732 0.238368 0.006399 0.545455 2.288286 1.0 0.003603 1.675590 0.569675 0.026258 0.403195 0.286150
5 frozenset({frozen vegetables, ground beef}) frozenset({mineral water}) 0.016931 0.238368 0.009199 0.543307 2.279277 1.0 0.005163 1.667711 0.570931 0.037378 0.400376 0.290949
16 frozenset({milk, turkey}) frozenset({mineral water}) 0.011332 0.238368 0.006133 0.541176 2.270338 1.0 0.003431 1.659967 0.565950 0.025178 0.397578 0.283452
19 frozenset({spaghetti, soup}) frozenset({mineral water}) 0.014265 0.238368 0.007466 0.523364 2.195614 1.0 0.004065 1.597933 0.552427 0.030451 0.374192 0.277342
12 frozenset({soup, ground beef}) frozenset({mineral water}) 0.009732 0.238368 0.005066 0.520548 2.183798 1.0 0.002746 1.588546 0.547410 0.020845 0.370494 0.270900
0 frozenset({chocolate, chicken}) frozenset({mineral water}) 0.014665 0.238368 0.007599 0.518182 2.173871 1.0 0.004103 1.580745 0.548028 0.030961 0.367387 0.275031
11 frozenset({pancakes, ground beef}) frozenset({mineral water}) 0.014531 0.238368 0.007466 0.513761 2.155327 1.0 0.004002 1.566375 0.543937 0.030418 0.361583 0.272541
4 frozenset({eggs, ground beef}) frozenset({mineral water}) 0.019997 0.238368 0.010132 0.506667 2.125563 1.0 0.005365 1.543848 0.540342 0.040816 0.352268 0.274586
18 frozenset({spaghetti, salmon}) frozenset({mineral water}) 0.013465 0.238368 0.006799 0.504950 2.118363 1.0 0.003589 1.538496 0.535143 0.027748 0.350015 0.266737
1 frozenset({olive oil, chocolate}) frozenset({mineral water}) 0.016398 0.238368 0.008266 0.504065 2.114649 1.0 0.004357 1.535749 0.535896 0.033532 0.348852 0.269370
10 frozenset({milk, ground beef}) frozenset({mineral water}) 0.021997 0.238368 0.011065 0.503030 2.110308 1.0 0.005822 1.532552 0.537969 0.044385 0.347493 0.274725
14 frozenset({milk, olive oil}) frozenset({mineral water}) 0.017064 0.238368 0.008532 0.500000 2.097595 1.0 0.004465 1.523264 0.532348 0.034557 0.343515 0.267897

The rules make sense, but are only helpful in a limited way, because we only have two different items as consequent: spaghetti and mineral water. The first three rules indicate that if a person buys a product that is often used to make pasta sauce (e.g., ground beef, vegetables, olive oil), they also often buy pasta. While this makes sense, this rule may also be there, because people tend to buy pasta a lot. However, the lift of about 3 indicates that these rules are three times as likely as a random effect. The shrimp does not really match this pattern.

The other rules are likely effective, but an random artifact: the problem is that mineral water is part of many transactions:

# the mean of a one hot encoded column is the percentage that this value occurs
data_df['mineral water'].mean()
np.float64(0.23836821757099053)

Thus, the rule to predict mineral water is almost universally good.

Validation of the rules

Randomly split your records into two sets with roughly 50% of data each. Now use the Apriori algorithm to determine rules on both of these sets. Do you find similar rules on both sets? What does the similarity/the differences indicate?

from sklearn.model_selection import train_test_split

# split the data into two sets with 50% of the data
X_train, X_test = train_test_split(data_df, test_size=0.5, random_state=42)

# create frequent itemsets
frequent_itemsets_train = apriori(pd.DataFrame(
    X_train), min_support=0.005, use_colnames=True)
frequent_itemsets_test = apriori(pd.DataFrame(
    X_test), min_support=0.005, use_colnames=True)
# rules for first set
association_rules(frequent_itemsets_train, metric="confidence",
                  min_threshold=0.5).sort_values('lift', ascending=False)
antecedents consequents antecedent support consequent support support confidence lift representativity leverage conviction zhangs_metric jaccard certainty kulczynski
27 frozenset({spaghetti, whole wheat pasta}) frozenset({milk}) 0.010133 0.135733 0.005067 0.500000 3.683694 1.0 0.003691 1.728533 0.735991 0.035985 0.421475 0.268664
5 frozenset({eggs, grated cheese}) frozenset({spaghetti}) 0.009600 0.181333 0.005867 0.611111 3.370098 1.0 0.004126 2.105143 0.710090 0.031700 0.524973 0.321732
15 frozenset({soup, frozen vegetables}) frozenset({spaghetti}) 0.008533 0.181333 0.005067 0.593750 3.274357 1.0 0.003519 2.015179 0.700575 0.027417 0.503766 0.310846
10 frozenset({frozen vegetables, ground beef}) frozenset({spaghetti}) 0.016800 0.181333 0.009600 0.571429 3.151261 1.0 0.006554 1.910222 0.694331 0.050919 0.476501 0.312185
21 frozenset({shrimp, ground beef}) frozenset({spaghetti}) 0.012267 0.181333 0.006933 0.565217 3.117008 1.0 0.004709 1.882933 0.687614 0.037143 0.468914 0.301726
34 frozenset({milk, frozen vegetables, mineral wa... frozenset({spaghetti}) 0.011200 0.181333 0.006133 0.547619 3.019958 1.0 0.004102 1.809684 0.676446 0.032904 0.447417 0.290721
14 frozenset({olive oil, frozen vegetables}) frozenset({spaghetti}) 0.011200 0.181333 0.005867 0.523810 2.888655 1.0 0.003836 1.719200 0.661224 0.031429 0.418334 0.278081
0 frozenset({cake, chocolate}) frozenset({spaghetti}) 0.013333 0.181333 0.006933 0.520000 2.867647 1.0 0.004516 1.705556 0.660083 0.036932 0.413681 0.279118
6 frozenset({herb & pepper, eggs}) frozenset({spaghetti}) 0.012800 0.181333 0.006400 0.500000 2.757353 1.0 0.004079 1.637333 0.645597 0.034091 0.389251 0.267647
32 frozenset({spaghetti, eggs, milk}) frozenset({mineral water}) 0.009067 0.249600 0.006133 0.676471 2.710219 1.0 0.003870 2.319418 0.636800 0.024287 0.568857 0.350522
12 frozenset({soup, frozen vegetables}) frozenset({mineral water}) 0.008533 0.249600 0.005600 0.656250 2.629207 1.0 0.003470 2.182982 0.624990 0.022175 0.541911 0.339343
2 frozenset({pancakes, cooking oil}) frozenset({mineral water}) 0.008267 0.249600 0.005333 0.645161 2.584781 1.0 0.003270 2.114764 0.618231 0.021119 0.527134 0.333264
23 frozenset({pancakes, low fat yogurt}) frozenset({mineral water}) 0.010133 0.249600 0.006133 0.605263 2.424933 1.0 0.003604 1.901013 0.593633 0.024185 0.473965 0.314918
33 frozenset({spaghetti, milk, frozen vegetables}) frozenset({mineral water}) 0.010133 0.249600 0.006133 0.605263 2.424933 1.0 0.003604 1.901013 0.593633 0.024185 0.473965 0.314918
16 frozenset({grated cheese, ground beef}) frozenset({mineral water}) 0.011467 0.249600 0.006667 0.581395 2.329308 1.0 0.003805 1.792622 0.577308 0.026205 0.442158 0.304052
11 frozenset({olive oil, frozen vegetables}) frozenset({mineral water}) 0.011200 0.249600 0.006400 0.571429 2.289377 1.0 0.003604 1.750933 0.569579 0.025157 0.428876 0.298535
19 frozenset({pancakes, ground beef}) frozenset({mineral water}) 0.014667 0.249600 0.008267 0.563636 2.258159 1.0 0.004606 1.719667 0.565455 0.032292 0.418492 0.298378
3 frozenset({spaghetti, cooking oil}) frozenset({mineral water}) 0.016267 0.249600 0.009067 0.557377 2.233081 1.0 0.005007 1.695348 0.561319 0.035306 0.410151 0.296851
9 frozenset({frozen vegetables, ground beef}) frozenset({mineral water}) 0.016800 0.249600 0.009333 0.555556 2.225783 1.0 0.005140 1.688400 0.560130 0.036307 0.407723 0.296474
28 frozenset({olive oil, soup}) frozenset({mineral water}) 0.009333 0.249600 0.005067 0.542857 2.174908 1.0 0.002737 1.641500 0.545300 0.019958 0.390801 0.281578
26 frozenset({milk, soup}) frozenset({mineral water}) 0.016267 0.249600 0.008800 0.540984 2.167402 1.0 0.004740 1.634800 0.547525 0.034232 0.388304 0.288120
17 frozenset({low fat yogurt, ground beef}) frozenset({mineral water}) 0.009867 0.249600 0.005333 0.540541 2.165627 1.0 0.002871 1.633224 0.543604 0.020986 0.387714 0.280954
18 frozenset({milk, ground beef}) frozenset({mineral water}) 0.020267 0.249600 0.010933 0.539474 2.161353 1.0 0.005875 1.629440 0.548442 0.042225 0.386292 0.291639
1 frozenset({eggs, cooking oil}) frozenset({mineral water}) 0.012533 0.249600 0.006667 0.531915 2.131069 1.0 0.003538 1.603127 0.537489 0.026096 0.376219 0.279312
25 frozenset({milk, pancakes}) frozenset({mineral water}) 0.018667 0.249600 0.009867 0.528571 2.117674 1.0 0.005207 1.591758 0.537823 0.038184 0.371764 0.284051
4 frozenset({eggs, grated cheese}) frozenset({mineral water}) 0.009600 0.249600 0.005067 0.527778 2.114494 1.0 0.002671 1.589082 0.532183 0.019937 0.370706 0.274038
20 frozenset({soup, ground beef}) frozenset({mineral water}) 0.010133 0.249600 0.005333 0.526316 2.108637 1.0 0.002804 1.584178 0.531142 0.020964 0.368758 0.273842
30 frozenset({spaghetti, soup}) frozenset({mineral water}) 0.015467 0.249600 0.008000 0.517241 2.072281 1.0 0.004140 1.554400 0.525569 0.031120 0.356665 0.274646
24 frozenset({milk, olive oil}) frozenset({mineral water}) 0.018667 0.249600 0.009600 0.514286 2.060440 1.0 0.004941 1.544941 0.524457 0.037113 0.352726 0.276374
8 frozenset({red wine, eggs}) frozenset({mineral water}) 0.009867 0.249600 0.005067 0.513514 2.057346 1.0 0.002604 1.542489 0.519058 0.019916 0.351697 0.266906
31 frozenset({spaghetti, chocolate, ground beef}) frozenset({mineral water}) 0.009867 0.249600 0.005067 0.513514 2.057346 1.0 0.002604 1.542489 0.519058 0.019916 0.351697 0.266906
22 frozenset({milk, herb & pepper}) frozenset({mineral water}) 0.010400 0.249600 0.005333 0.512821 2.054569 1.0 0.002737 1.540295 0.518674 0.020942 0.350774 0.267094
13 frozenset({frozen vegetables, whole wheat rice}) frozenset({mineral water}) 0.010933 0.249600 0.005600 0.512195 2.052064 1.0 0.002871 1.538320 0.518353 0.021967 0.349940 0.267316
29 frozenset({spaghetti, salmon}) frozenset({mineral water}) 0.015733 0.249600 0.008000 0.508475 2.037158 1.0 0.004073 1.526676 0.517258 0.031088 0.344982 0.270263
7 frozenset({olive oil, eggs}) frozenset({mineral water}) 0.013333 0.249600 0.006667 0.500000 2.003205 1.0 0.003339 1.500800 0.507568 0.026015 0.333689 0.263355
# rules for second set
association_rules(frequent_itemsets_test, metric="confidence",
                  min_threshold=0.5).sort_values('lift', ascending=False)
antecedents consequents antecedent support consequent support support confidence lift representativity leverage conviction zhangs_metric jaccard certainty kulczynski
20 frozenset({pancakes, olive oil}) frozenset({spaghetti}) 0.010930 0.166889 0.005599 0.512195 3.069080 1.0 0.003774 1.707878 0.681620 0.032508 0.414478 0.272871
16 frozenset({olive oil, tomatoes}) frozenset({mineral water}) 0.007465 0.227139 0.005065 0.678571 2.987466 1.0 0.003370 2.404455 0.670272 0.022067 0.584105 0.350436
4 frozenset({soup, chocolate}) frozenset({mineral water}) 0.008798 0.227139 0.005865 0.666667 2.935055 1.0 0.003867 2.318582 0.665143 0.025492 0.568702 0.346244
15 frozenset({olive oil, soup}) frozenset({mineral water}) 0.008531 0.227139 0.005332 0.625000 2.751614 1.0 0.003394 2.060962 0.642054 0.023148 0.514790 0.324237
13 frozenset({milk, turkey}) frozenset({mineral water}) 0.011464 0.227139 0.006931 0.604651 2.662026 1.0 0.004328 1.954883 0.631587 0.029919 0.488460 0.317584
14 frozenset({olive oil, shrimp}) frozenset({mineral water}) 0.009064 0.227139 0.005332 0.588235 2.589754 1.0 0.003273 1.876947 0.619478 0.023095 0.467220 0.305855
12 frozenset({milk, soup}) frozenset({mineral water}) 0.014130 0.227139 0.008264 0.584906 2.575095 1.0 0.005055 1.861891 0.620431 0.035469 0.462912 0.310645
9 frozenset({olive oil, frozen vegetables}) frozenset({mineral water}) 0.011464 0.227139 0.006665 0.581395 2.559641 1.0 0.004061 1.846278 0.616386 0.028736 0.458370 0.305369
1 frozenset({chocolate, chicken}) frozenset({mineral water}) 0.012797 0.227139 0.007198 0.562500 2.476452 1.0 0.004291 1.766538 0.603925 0.030928 0.433921 0.297095
5 frozenset({eggs, cooking oil}) frozenset({mineral water}) 0.010930 0.227139 0.006132 0.560976 2.469741 1.0 0.003649 1.760405 0.601676 0.026437 0.431949 0.293985
7 frozenset({soup, eggs}) frozenset({mineral water}) 0.009064 0.227139 0.005065 0.558824 2.460267 1.0 0.003006 1.751817 0.598969 0.021915 0.429164 0.290562
2 frozenset({pancakes, chicken}) frozenset({mineral water}) 0.009064 0.227139 0.005065 0.558824 2.460267 1.0 0.003006 1.751817 0.598969 0.021915 0.429164 0.290562
3 frozenset({olive oil, chocolate}) frozenset({mineral water}) 0.014396 0.227139 0.007998 0.555556 2.445879 1.0 0.004728 1.738936 0.599784 0.034247 0.424936 0.295383
8 frozenset({frozen vegetables, ground beef}) frozenset({mineral water}) 0.017062 0.227139 0.009064 0.531250 2.338872 1.0 0.005189 1.648769 0.582380 0.038549 0.393487 0.285578
18 frozenset({spaghetti, soup}) frozenset({mineral water}) 0.013063 0.227139 0.006931 0.530612 2.336064 1.0 0.003964 1.646529 0.579500 0.029714 0.392662 0.280564
6 frozenset({eggs, ground beef}) frozenset({mineral water}) 0.018662 0.227139 0.009864 0.528571 2.327079 1.0 0.005625 1.639401 0.581121 0.041808 0.390021 0.285999
11 frozenset({turkey, ground beef}) frozenset({mineral water}) 0.010397 0.227139 0.005332 0.512821 2.257734 1.0 0.002970 1.586398 0.562931 0.022962 0.369641 0.268147
0 frozenset({light cream}) frozenset({mineral water}) 0.012263 0.227139 0.006132 0.500000 2.201291 1.0 0.003346 1.545721 0.552497 0.026286 0.353053 0.263498
10 frozenset({tomatoes, ground beef}) frozenset({mineral water}) 0.012263 0.227139 0.006132 0.500000 2.201291 1.0 0.003346 1.545721 0.552497 0.026286 0.353053 0.263498
17 frozenset({spaghetti, salmon}) frozenset({mineral water}) 0.011197 0.227139 0.005599 0.500000 2.201291 1.0 0.003055 1.545721 0.551901 0.024055 0.353053 0.262324
19 frozenset({spaghetti, whole wheat rice}) frozenset({mineral water}) 0.013863 0.227139 0.006931 0.500000 2.201291 1.0 0.003783 1.545721 0.553393 0.029613 0.353053 0.265258

The "mineral water" rules seem to be fairly stable and are present in both splits. The "spaghetti" rules seem to be more random. While there are such rules in both splits, they are not the same in the splits. In general, it seems like spaghetti may also be just something that is bought very often, same as mineral water, and that the rules may be good suggestions for cross-sell, but that there is probably not a strong association between the antecedents and the consequents for all rules, i.e., there does not seem to be causality.