Rule-base Classifier

DM 352 Syllabus | DM 552 Syllabus

last updated 23-Jun-2021


We want to classify records by using a collection of simpler “if…then…” rules

Rule notation: (Condition) → y


Examples of classification rules:

R1: (Give Birth = no) & (Can Fly = yes) → Birds

R2: (Give Birth = no) & (Live in Water = yes) → Fishes

R3: (Give Birth = yes) & (Blood Type = warm) → Mammals

R4: (Give Birth = no) & (Can Fly = no) → Reptiles

R5: (Live in Water = sometimes) → Amphibians

Application of Rule-Based Classifier

A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule

The rule R1 above covers a hawk => Bird

The rule R3 covers the grizzly bear => Mammal

A lemur triggers rule R3, so it is classified => Mammal

A turtle triggers both R4 and R5

A dogfish shark trigger matches none of the rules

Measures of Coverage and Accuracy

Coverage of a rule:

Accuracy of a rule:



Characteristics of Rule Sets

Mutually exclusive rules

Exhaustive rules

Handling rules that are not mutually exclusive

Handling rules that are not exhaustive


Ordered Rule Set

Rules are rank ordered according to their priority

When a test record is presented to the classifier

Example from above

  1. R1: (Give Birth = no) & (Can Fly = yes) → Birds
  2. R2: (Give Birth = no) & (Live in Water = yes) → Fishes
  3. R3: (Give Birth = yes) & (Blood Type = warm) → Mammals
  4. R4: (Give Birth = no) & (Can Fly = no) → Reptiles
  5. R5: (Live in Water = sometimes) → Amphibians

A turtle triggers both R4 and R5, but by the order, the conclusion is =>Reptile

We did not define a default rule.

Rule ordering schemes

Rule-based ordering : Individual rules are ranked based on their quality

Class-based ordering: Rules that belong to the same class appear together, ordered by class sizes

In this example the ordering criteria is not clear.

Building Classification Rules

Direct Method:

Indirect Method:

Direct Method: Sequential Covering

  1. Start from an empty rule
  2. Grow a rule using the Learn-One-Rule function
  3. Remove training records covered by the rule
  4. Repeat Step (2) and (3) until stopping criterion is met

More formally

Sequential-Covering(Target_attribute, Attributes, Examples, Threshold)

However, because Sequntial Covering does not backtrack, this algorithm is not guaranteed to find the smallest or best set of rules,
thus theLearn-One-Rule function must be very effective.

The rectangular boundaries represent the simplistic conditional elements in the antecedent, which you can see sometimes include wrong classifications.

Instance Elimination

Why do we need to eliminate instances?

Why do we remove positive instances?

Why do we remove negative instances?

Rule Growing

Top-down -- General-to-specific

Example on left shows a specific rule determining 3 cases, all classified as 'No'. Choice is picked because of the number of cases and accuracy.

Bottom-up -- Specific-to-general

Example on right has two rules (top) and the income part doesn't distinguish (85K or 90K) and thus the rule can be simplified.

Rule Evaluation

How do we determine the best next rule.

FOIL’s Information Gain

[ FOIL: First Order Inductive Learner – an early rule-based learning algorithm]

R0: {} => class (initial rule)
R1: {A} => class (rule after adding conjunct--a condition component in the antecedent)

Information Gain(R0, R1) = t *[ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]

where t= number of positive instances covered by both R0 and R1
p0= number of positive instances covered by R0
n0=number of negative instances covered by R0
p1=number of positive instances covered by R1
n1= number of negative instances covered by R1

Gain(R0,R1) is similar to the entropy gain calculations used in decision trees.

Direct Method: RIPPER

Repeated Incremental Pruning to Produce Error Reduction (Weka->Classifier -> Rules -> JRip)

For 2-class problem, choose one of the classes as positive class, and the other as negative class.  You may choose the positive class as the smaller number of cases.

Generalize for multi-class problem

Growing a rule:

Building a Rule Set:

Uses the sequential covering algorithm

Each time a rule is added to the rule set, compute the new description length

Optimize the rule set:

Indirect Methods

Convert from the decision tree:



(Weka: Classifiers -> Rules -> PART)

Extract rules from an unpruned decision tree

For each rule, r: A → y,

Instead of ordering the rules, order subsets of rules (class ordering)

Example Comparison

C4.5 Decision Tree versus C4.5rules versus RIPPER


(Give Birth=No, Can Fly=Yes) → Birds
(Give Birth=No, Live in Water=Yes) → Fishes
(Give Birth=Yes) → Mammals
(Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles
( ) → Amphibians



(Live in Water=Yes) → Fishes
(Have Legs=No) → Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No) → Reptiles
(Can Fly=Yes, Give Birth=No) → Birds
() → Mammals


Confusion Matrix for C4.5 versus RIPPER

C4.5 accuracy = 16/20 = 80%

RIPPER accuracy = 12/20 = 60%

Advantages of Rule-Based Classifiers

Has characteristics quite similar to decision trees

Better suited for handling imbalanced classes

Harder to handle missing values in the test set