# Association Rules Homework

last updated 23-Jun-2021

## Objectives

Do some basic exercises from chapters 5 and 6 to show the construction of association rules

Advance comfort using a graphical user interface data mining tools Weka. Do not use Python or R.

Explore the decision rule generation with a relatively simple, and clean, dataset.

Experiment with the different support and confidence thresholds of the association rule algorithm to identify a set of rules.

Experiment with the Apriori algorithm.

Based on homework problem 5.2: Consider the data set

 Customer ID Transaction ID Items Bought 1 0001 {a, d, e} 1 0024 {a, b, c, e} 2 0012 {a, b, d, e} 2 0031 {a, c, d, e} 3 0015 {b, c, e} 3 0022 {b, d, e} 4 0029 {c, d} 4 0040 {a, b, c} 5 0033 {a, d, e} 5 0038 {a, b, e}

a) Compute the support for itemsets {a}, {b, c}, and {a,b,e} by treating  each transaction ID as a market basket. You have 10 transactions

b) Use the results in part (a) to compute the confidence for the association rules {b, c} → {a} and {a} → {b, c}.

c)  Repeat part (a) by treating each customer ID as a market basket. You have 5 customers. Each item should be treated as a binary variable (1 if an item appears in at least one transaction bought by the customer, and 0 otherwise.)

d) Use the results in part (c) to compute the confidence for the association rules {b, c} → {a} and {a} → {b, c}.

Based on homework problem 5.8: Consider the following set of frequent 3-itemsets:

{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 3, 6}, {1, 5, 6}, {2, 3, 4}, {2, 3, 5}, {2,3,6}, {3, 4, 5}, {3, 5, 6}.
Assume that there are only six items in the data set.

a) List all candidate 4-itemsets obtained by a candidate generation procedure using the Fk-1× Fk-1 merging strategy.

b) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori algorithm.

Based on homework problem 6.1:

 Weather Condition Driver’s Condition Traffic Violation Seat Belt Crash Severity Good Alcohol-impaired Exceed speed limit No Major Bad Sober None Yes Minor Good Sober Disobey stop sign Yes Minor Good Sober Exceed speed limit Yes Major Bad Sober Disobey traffic signal No Major Good Alcohol-impaired Disobey stop sign Yes Minor Bad Alcohol-impaired None Yes Major Good Sober Disobey traffic signal Yes Major Good Alcohol-impaired None No Major Bad Sober Disobey traffic signal No Major

a) Show a binarized version of  the data set.

b) What is the maximum width of each transaction now that it is binarized?

c) Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?

d)  Create a data set that contains only the following asymmetric binary attributes: (Weather = Bad, Driver’s condition = Alcohol-impaired, Traffic violation = Yes,  Seat  Belt =  No,  Crash  Severity =  Major). For Traffic violation, only None has a value of 0. The rest of the attribute values are assigned to 1.

e) Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?

Based on homework problem 6.2:

 TID Temperature Pressure Alarm 1 Alarm 2 Alarm 3 1 95 1105 0 0 1 2 85 1040 1 1 0 3 103 1090 1 1 1 4 97 1084 1 0 0 5 80 1038 0 1 1 6 100 1080 1 1 0 7 83 1025 1 0 1 8 86 1030 1 0 0 9 101 1100 1 1 1

a) Partition the range of the temperature into 3 bins of equal sized range. Show those ranges.

b) If the support threshold is 30%, which ranges from (a) have the support?

c) If you partition the range such that each bin has the same number of transations, show those ranges.

d) if the support threshold is 30%, which ranges in (c) have the support?

### Based on homework problem 6.5:

For the 3 attributes given below, describe how you would convert it into a binary transaction data set appropriate for association analysis.

Specifically, indicate for each attribute

1. How many binary attributes it would correspond to in the transaction data set?
2. How the values of the original attribute would be mapped to values of the binary attributes?
3. If there is any hierarchical structure in the data values of an attribute that could be useful for grouping the data into fewer binary attributes, what are they?

Zip code : zip code for the home address of a U.S. student, zip code for the local address of a non-U.S. student

Languages: Each of the following is a separate attribute that has a value of 1 if the person speaks the language and a value of 0, otherwise.

• Arabic
• Bengali
• Chinese Mandarin
• English
• Portuguese
• Russian
• Spanish

Experiment with different parameters to create your rule set.

Experiment withe the parameters in FP-Growth and Create Associations processes to establish your best set of rules.

From your final rule set, which of them would you choose of value? Why?

## Write Up

In a single Word document with your name:

For the data set from Weka

1. Capture your best rule set
• What are the parameters that you settled on?
• capture the measures (confidence, lift, etc)
2. explain as best you can the reasons why you chose your final rule set.