【多比克文学,经典语录,心灵鸡汤,励志语录,正能量句子,www.dopic.net】

数据挖掘:基于实例的分类器-最近邻KNN

专题: 散文 简友广场 想法 心理
作者:Cache_wood 来源:原文地址 时间:2022-04-12 16:30:54  阅读:257   网上投稿

@[toc]

Instance Based Classifiers

Examples:

  • Rote-learner

    Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly.

  • Nearest neighbor

    Uses k "closest" points(nearest neighbors) for performing classification.

Nearest Neighbor Classifiers

Requires three things

  • The set of stored records.
  • Distance Metric to compute distance between records.
  • The value of k, the number of nearest neighbors to retrieve.

To classify an unknown record

  • Compute distance to other training records.
  • Identify k nearest neighbors.
  • Use class labels of nearest neighbors to determine the class label of unknown record(e.g., taking majority vote)

Definition of Nearest Neighbor

K-nearest neighbors of a record x are data points that have the k smallest distance to x.

Nearest Neighbor Classification

Compute distance between two points:

Euclidean distance :

Determine the class from nearest neighbor list

  • take the majority vote of class labels among the k-nearest neighbors.
  • weigh the vote according to distance(加权)

Choosing the value of k:

  • If k is too small, sensitive to noise points.
  • If k is too large, neighborhood may include points from other classes.

Scaling issues

  • Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes.

Problem with Euclidean measure:

  • High dimensional data: curse of dimensionality
  • Can produce counter-intuitive results.
  • Solution: Normalize the vectors to unit length

k-NN classifiers are lazy learners

  • It does not build models explicitly.
  • Unlike eager learners such as decision tree induction and rule-based systems.
  • Classifying unknown records are relatively expensive.

Example: PEBLS

PEBLS: Parallel Examplar-Based Learning System(Cost & Salzberg)

  • Works with both continuous and nominal features.

    • For nominal features, distance between two nominal features.
  • Each record is assigned a weight factor.

  • Number of nearest neighbor, k=1

Distance between record X and record Y:

where: = Number of times X is used for prediction/(Number of times X predicts correctly)

if X makes accurate prediction most of the time.

if X is not reliable for making predicti

    相关美文阅读: