Implementation of K-Nearest Neighbors (K-NN) in Python

 

Implementation of K-Nearest Neighbors (K-NN) in Python – Machine Learning

In this tutorial, we will understand the implementation of K-Nearest Neighbors (K-NN) in Python – Machine Learning.

Importing the libraries

To begin the implementation first we will import the necessary libraries like NumPy, and pandas.

import numpy as np
import pandas as pd

Importing the dataset

Next, we import or read the dataset. Click here to download the dataset used in this implementation. The breast cancer dataset has the following features: Sample code number, Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitosis, Class.

After reading the dataset, divide the dataset into concepts and targets. Store the concepts into X and targets into y.

dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

Splitting the dataset into the Training set and Test set

Once the dataset is read. Next, divide the dataset into two parts, training and testing using the train test split function from sklearn. The test_size and random_state attributes are set to 0.30 and 42 respectively. You can change these attributes as per your requirements.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 42)

Feature Scaling

Feature scaling is the process of converting the data into a given range. In this case, the standard scalar technique is used.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Training the K-Nearest Neighbors (K-NN) Classification model on the Training set

Once the dataset is scaled, next, the K-Nearest Neighbors (K-NN) classifier algorithm is used to create a model. The hyperparameters such as n_neighbors, metric, and p are set to 5, Minkowski, and 2 respectively. The remaining hyperparameters are set to default values.

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)

classifier.fit(X_train, y_train)

K-Nearest Neighbors (K-NN) classifier model

KNeighborsClassifier(algorithm=’auto’, leaf_size=30, metric=’minkowski’, metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights=’uniform’)

Display the results (confusion matrix and accuracy)

Here evaluation metrics such as confusion matrix and accuracy are used to evaluate the performance of the model built using a decision tree classifier.

from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

Output

[[125, 2]
[ 5, 73]]

Accuracy: 0.9658536585365853

Summary:

In this tutorial, we understood, the Implementation of K-Nearest Neighbors (K-NN) in Python. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *