@[toc]
Instance Based Classifiers
Examples:
-
Rote-learner
Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly.
-
Nearest neighbor
Uses k "closest" points(nearest neighbors) for performing classification.
Nearest Neighbor Classifiers
Requires three things
- The set of stored records.
- Distance Metric to compute distance between records.
- The value of k, the number of nearest neighbors to retrieve.
To classify an unknown record
- Compute distance to other training records.
- Identify k nearest neighbors.
- Use class labels of nearest neighbors to determine the class label of unknown record(e.g., taking majority vote)
Definition of Nearest Neighbor
K-nearest neighbors of a record x are data points that have the k smallest distance to x.
Nearest Neighbor Classification
Compute distance between two points:
Euclidean distance :
Determine the class from nearest neighbor list
- take the majority vote of class labels among the k-nearest neighbors.
- weigh the vote according to distance(加权)
Choosing the value of k:
- If k is too small, sensitive to noise points.
- If k is too large, neighborhood may include points from other classes.
Scaling issues
- Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes.
Problem with Euclidean measure:
- High dimensional data: curse of dimensionality
- Can produce counter-intuitive results.
- Solution: Normalize the vectors to unit length
k-NN classifiers are lazy learners
- It does not build models explicitly.
- Unlike eager learners such as decision tree induction and rule-based systems.
- Classifying unknown records are relatively expensive.
Example: PEBLS
PEBLS: Parallel Examplar-Based Learning System(Cost & Salzberg)
-
Works with both continuous and nominal features.
- For nominal features, distance between two nominal features.
Each record is assigned a weight factor.
Number of nearest neighbor, k=1
Distance between record X and record Y:
where: = Number of times X is used for prediction/(Number of times X predicts correctly)
if X makes accurate prediction most of the time.
if X is not reliable for making predicti