Machine Learning Classification & Clustering Projects

Overview

These projects were part of data science coursework where I implemented fundamental machine learning algorithms from scratch and using libraries. The focus was on understanding the underlying mathematics and evaluation metrics rather than just using pre-built tools.

Projects Included

K-Nearest Neighbors (KNN)

Implemented KNN algorithm by hand to understand distance metrics and classification logic
Worked with various distance metrics (Euclidean, Manhattan, etc.)
Analyzed how k-value affects classification performance

K-Means Clustering

Implemented K-Means clustering algorithm from scratch
Experimented with different initialization methods
Analyzed convergence behavior and cluster quality

Evaluation Metrics

Built confusion matrices to analyze classification performance
Calculated precision, recall, and F-scores manually
Compared different evaluation approaches for various problem types

Technical Implementation

All projects were built using Python with pandas for data manipulation and scikit-learn for comparison and validation. The emphasis was on building reproducible experiments and understanding the mathematical foundations of each algorithm.

Key Learnings

Deep understanding of distance metrics and their impact on algorithm performance
Hands-on experience with evaluation metrics and when to use each
Ability to implement algorithms from scratch vs. using libraries
Building reproducible experiments and proper experimental methodology