blog




  • Essay / Limitations of K-Nearest Neighbor Classification

    K-Nearest Neighbor (KNN) is one of the most popular algorithms for pattern recognition. Many researchers have found that the KNN algorithm achieves very good performance in their experiments on different datasets. The traditional KNN classification algorithm has three limitations: (i) computational complexity due to using all training samples for classification, (ii) performance depends only on the training set and of the selection of k.Say no to plagiarism. Get Custom Essay on “Why Violent Video Games Should Not Be Banned”?Get Original Essay Nearest neighbor finding is one of the most popular learning and classification techniques introduced by Fix and Hodges, which proved to be a simple and powerful method. recognition algorithm. Cover and Hart showed that the decision rule works well given that no explicit knowledge of the data is available. A simple generalization of this method is called the K-NN rule, in which a new model is classified into the class with the most members present among the K nearest neighbors. Traditional KNN text classification has three limitations: Keep in mind: This is just a sample. Get a custom paper now from our expert writers. Get a custom essay High computational complexity: to know the k samples nearest neighbors, all similarities between the training samples are less, the KNN classifier is no longer optimal, but if the training set contains a large number of samples, the KNN classifier needs more time to calculate the similarities. This problem can be solved in 3 ways: reduce the dimensions of the feature space; use smaller data sets; using an improved algorithm that can speed up; Training set dependence: The classifier is generated only with the training samples and does not use any additional data. This makes the algorithm overly dependent on the training set; it must be recalculated even if there is a small change in the training set; No difference in weight between samples: all training samples are treated the same way; there is no difference between samples with a small number of data and a large number of data. This therefore does not correspond to the real phenomenon where samples generally have an uneven distribution. The efficiency of kNNC largely depends on the efficient selection of k-nearest neighbors. The limitation of conventional kNNC is that once we choose the k-nearest neighbor selection criteria, the criteria remain unchanged. But this feature of kNNC is not suitable for many cases if we want to make correct prediction or classification in real life. An instance is described in the database using a number of attributes and the corresponding values ​​of those attributes. Thus, the similarity between two instances is identified by the similarity of attribute values. But in real data, when we describe two instances and try to find out the similarity between these two, the similarities in different attributes do not carry the same weight with respect to a particular classification. Additionally, as more and more training data comes in, it may happen that the similarity of a particular attribute value matters more or less than before. For example, let's say we are trying to predict the outcome of a football match based on the results..