Association Rules Homework

DM 352 Syllabus | DM 552 Syllabus

last updated 23-Jun-2021


Objectives

Do some basic exercises from chapters 5 and 6 to show the construction of association rules

Advance comfort using a graphical user interface data mining tools Weka. Do not use Python or R.

Explore the decision rule generation with a relatively simple, and clean, dataset.

Experiment with the different support and confidence thresholds of the association rule algorithm to identify a set of rules.

Experiment with the Apriori algorithm.

 


Book chapter tasks

Based on homework problem 5.2: Consider the data set

Customer ID

Transaction ID

Items Bought

1

0001

{a, d, e}

1

0024

{a, b, c, e}

2

0012

{a, b, d, e}

2

0031

{a, c, d, e}

3

0015

{b, c, e}

3

0022

{b, d, e}

4

0029

{c, d}

4

0040

{a, b, c}

5

0033

{a, d, e}

5

0038

{a, b, e}

a) Compute the support for itemsets {a}, {b, c}, and {a,b,e} by treating  each transaction ID as a market basket. You have 10 transactions

b) Use the results in part (a) to compute the confidence for the association rules {b, c} → {a} and {a} → {b, c}.

c)  Repeat part (a) by treating each customer ID as a market basket. You have 5 customers. Each item should be treated as a binary variable (1 if an item appears in at least one transaction bought by the customer, and 0 otherwise.)

d) Use the results in part (c) to compute the confidence for the association rules {b, c} → {a} and {a} → {b, c}.

Based on homework problem 5.8: Consider the following set of frequent 3-itemsets:

{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 3, 6}, {1, 5, 6}, {2, 3, 4}, {2, 3, 5}, {2,3,6}, {3, 4, 5}, {3, 5, 6}.
Assume that there are only six items in the data set.

a) List all candidate 4-itemsets obtained by a candidate generation procedure using the Fk-1× Fk-1 merging strategy.

b) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori algorithm.

Based on homework problem 6.1:

Weather
Condition

Driver’s
Condition

Traffic
Violation

Seat Belt

Crash
Severity

Good

Alcohol-impaired

Exceed speed limit

No

Major

Bad

Sober

None

Yes

Minor

Good

Sober

Disobey stop sign

Yes

Minor

Good

Sober

Exceed speed limit

Yes

Major

Bad

Sober

Disobey traffic signal

No

Major

Good

Alcohol-impaired

Disobey stop sign

Yes

Minor

Bad

Alcohol-impaired

None

Yes

Major

Good

Sober

Disobey traffic signal

Yes

Major

Good

Alcohol-impaired

None

No

Major

Bad

Sober

Disobey traffic signal

No

Major

a) Show a binarized version of  the data set.

b) What is the maximum width of each transaction now that it is binarized?

c) Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?

d)  Create a data set that contains only the following asymmetric binary attributes: (Weather = Bad, Driver’s condition = Alcohol-impaired, Traffic violation = Yes,  Seat  Belt =  No,  Crash  Severity =  Major). For Traffic violation, only None has a value of 0. The rest of the attribute values are assigned to 1.

e) Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?

Based on homework problem 6.2:

TID

Temperature

Pressure

Alarm 1

Alarm 2

Alarm

3

1

95

1105

0

0

1

2

85

1040

1

1

0

3

103

1090

1

1

1

4

97

1084

1

0

0

5

80

1038

0

1

1

6

100

1080

1

1

0

7

83

1025

1

0

1

8

86

1030

1

0

0

9

101

1100

1

1

1

a) Partition the range of the temperature into 3 bins of equal sized range. Show those ranges.

b) If the support threshold is 30%, which ranges from (a) have the support?

c) If you partition the range such that each bin has the same number of transations, show those ranges.

d) if the support threshold is 30%, which ranges in (c) have the support?

 

Based on homework problem 6.5:

For the 3 attributes given below, describe how you would convert it into a binary transaction data set appropriate for association analysis.

Specifically, indicate for each attribute

      1. How many binary attributes it would correspond to in the transaction data set?
      2. How the values of the original attribute would be mapped to values of the binary attributes?
      3. If there is any hierarchical structure in the data values of an attribute that could be useful for grouping the data into fewer binary attributes, what are they?

Year : Freshman, Sophomore, Junior, Senior, Graduate:Masters, Graduate:PhD, Professional

Zip code : zip code for the home address of a U.S. student, zip code for the local address of a non-U.S. student

Languages: Each of the following is a separate attribute that has a value of 1 if the person speaks the language and a value of 0, otherwise.

 

 


Weka tasks

Download the following data file to your local computer:

Start with Weka and upload this file

Experiment with different parameters to create your rule set.

Experiment withe the parameters in FP-Growth and Create Associations processes to establish your best set of rules.

From your final rule set, which of them would you choose of value? Why?

Write Up

In a single Word document with your name:

For the data set from Weka

  1. Capture your best rule set
    • What are the parameters that you settled on?
    • capture the measures (confidence, lift, etc)
  2. explain as best you can the reasons why you chose your final rule set.

Upload your Word document into Moodle.