Skip to content

Creating a custom KNN algorithm with the Breast Cancer dataset

Notifications You must be signed in to change notification settings

chriz-ty/K-Nearest-Neighbors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

K-Nearest-Neighbors Algorithm

Creating a custom KNN algorithm with the Breast Cancer dataset

INTRODUCTION

  • K-Nearest Neighbors (KNN) is a simple and intuitive supervised machine learning algorithm used for classification and regression tasks.
  • It is based on the idea that objects (data points) that are close to each other in a feature space are likely to belong to the same class or have similar numeric values.
  • In KNN, the "K" represents the number of nearest neighbors used to make predictions.

The Calculation of Distance

KNN uses Euclidean distance for measuring the straight-line distance between two points in a multidimensional space.

In the context of KNN, the Euclidean distance is used to quantify the similarity (or dissimilarity) between data points when determining the K nearest neighbors.

The Euclidean distance between two points, A and B, in a two-dimensional space (2D) with coordinates (x1, y1) and (x2, y2) can be calculated as:
alt text

In a multidimensional space (nD), where each data point consists of n features (attributes), the Euclidean distance between two data points,A and B can be calculated as:
alt text

Calculating the Euclidean distance in python

The first method(direct equation):
from math import sqrt
plot1 = [1,3]
plot2 = [2,5]
euclidean_distance = sqrt( (plot1[0]-plot2[0]) + (plot1[1]-plot2[1]) )
The second method(using numpy, for large dataset):
import numpy as np
plot1 = [1,3]
plot2 = [2,5]
eucildean_distance = np.linalg.norm(np.array(plot1)-np.array(plot2))

The second method is mainly used for computing large datasets and its mainly used in KNN algorithm. It takes lesser computing time compared to the First method.

About

Creating a custom KNN algorithm with the Breast Cancer dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages