$ cd ..
$ cat projects/ml-classification-clustering/README.md
Machine Learning Classification & Clustering Projects
Data science coursework projects implementing K-Nearest Neighbors, K-Means clustering, and evaluation metrics by hand and in code. Worked with confusion matrices, precision/recall, F-scores, and distance metrics.
Machine LearningData ScienceClassification
Overview
These projects were part of data science coursework where I implemented fundamental machine learning algorithms from scratch and using libraries. The focus was on understanding the underlying mathematics and evaluation metrics rather than just using pre-built tools.
Projects Included
K-Nearest Neighbors (KNN)
- Implemented KNN algorithm by hand to understand distance metrics and classification logic
- Worked with various distance metrics (Euclidean, Manhattan, etc.)
- Analyzed how k-value affects classification performance
K-Means Clustering
- Implemented K-Means clustering algorithm from scratch
- Experimented with different initialization methods
- Analyzed convergence behavior and cluster quality
Evaluation Metrics
- Built confusion matrices to analyze classification performance
- Calculated precision, recall, and F-scores manually
- Compared different evaluation approaches for various problem types
Technical Implementation
All projects were built using Python with pandas for data manipulation and scikit-learn for comparison and validation. The emphasis was on building reproducible experiments and understanding the mathematical foundations of each algorithm.
Key Learnings
- Deep understanding of distance metrics and their impact on algorithm performance
- Hands-on experience with evaluation metrics and when to use each
- Ability to implement algorithms from scratch vs. using libraries
- Building reproducible experiments and proper experimental methodology
$ cat package.json
Tech stack
Pythonpandasscikit-learn