# Nearest Neighbor Classifiers

last updated 23-Jun-2021

## Introduction

Instance Based Classifiers -- delay the model processing/creation until the test or observation is ready to be classified.

Examples

• Rote-learner: "Memorizes" entire training data and performs classification only if attributes of record match one of the training examples exactly
• Nearest neighbor: Uses k “closest” points (nearest neighbors) for performing classification

If it walks like a duck, quacks like a duck, then it’s probably a duck ## Nearest-Neighbor Classifiers Requires three things

1. Training set: the set of labeled records
2. A Distance Metric: to compute distance between records
3. Some value of k, the number of nearest neighbors to retrieve / consider

To classify an unknown record:

• Compute distance to other training records
• Identify k nearest neighbors
• Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

K-nearest neighbors of a record x are data points that have the k smallest distances to x 1 nearest-neighbor using a Voroni diagram ## Nearest Neighbor Classification

Compute distance between two points.

We can use the choices of the similarity unit here.

• Euclidean distance  ### Considerations

Determining the class from the nearest neighbors list

• Take the majority vote of class labels among the k-nearest neighbors.
This gives each instance equal weight.
• Might weigh each neighbor's vote according to its distance using a weight factor, w = 1/d2

Balancing the choice of k:

• If k is too small, sensitive to noise points
• If k is too large, neighborhood may include points from other classes

Scaling issues

Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes

Example:

• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 90lb to 300lb
• income of a person may vary from \$10K to \$1M

Selection of the best similarity measure is critical ### Other issues with k-NN Classification

k-NN classifiers are lazy learners since they do not build models explicitly, a priori

Classifying unknown records are relatively expensive--search training set for k-NN

Easy to handle variable interactions since the decisions are based on local information

Selection of right proximity measure is essential

Superfluous or redundant attributes can create problems

Missing attributes are hard to handle

## Improving k-NN Efficiency

Avoid having to compute distance to all objects in the training set. Possible techniques include

• Multi-dimensional access methods (k-d trees), see below
• Fast approximate similarity searching using partitions
• Locality Sensitive Hashing (LSH), a probabilistic approach to compute approximate neighbors

Condensing: Determine a smaller set of objects that give the same performance

Editing: Remove objects to improve efficiency

Proximity graphs:

• a graph in which two vertices are connected by an edge if and only if the vertices satisfy particular geometric requirements
• nearest neighbor graphs,
• minimum spanning trees

### k-d Tree From Wikipedia: A 3-dimensional k-d tree. The first split (the red vertical plane) cuts the root cell (white) into two subcells, each of which is then split (by the green horizontal planes) into two subcells. Finally, those four cells are split (by the four blue vertical planes) into two subcells. Since there is no more splitting, the final eight are called leaf cells.

A generalized BST. The training data would be preprocessed into the tree.

The observation to be classified is then searched to find the nearest neighbors in the kd tree