$ cd ..
$ cat projects/ml-classification-clustering/README.md
Sales & Economic Impact Analysis — ML Classification (School)
Academic ML and data science project at Cal State Fullerton — classification and regression on sales/estimate data to analyze geographic demand and macroeconomic drivers.
School ProjectData ScienceMachine LearningGeospatial Analysis
Overview
School project (Cal State Fullerton). An applied data science and machine learning coursework project analyzing sales and estimate data to understand geographic demand patterns and evaluate whether macroeconomic conditions contributed to a downturn in sales.
The project involved end-to-end data ownership: sourcing raw business data, cleaning and transforming it, enriching it with geospatial and economic data, and applying statistical and machine learning techniques to explore potential drivers of sales performance.
Data Acquisition & Preparation
- Obtained raw sales and estimate data directly from business records
- Cleaned and normalized inconsistent real-world data (missing values, formatting issues, incomplete addresses)
- Converted customer addresses into latitude and longitude coordinates for spatial analysis
- Structured datasets to support geographic, temporal, and economic comparisons
Geospatial Analysis
- Visualized sales and estimate activity using latitude/longitude mappings
- Explored geographic patterns in sales volume and conversion rates
- Identified regions with higher concentrations of successful sales versus lost estimates
Economic Analysis
- Integrated macroeconomic indicators to investigate external influences on sales performance
- Applied LASSO regression to analyze relationships between economic variables and observed sales declines
- Evaluated whether broader economic conditions aligned with changes in customer conversion behavior
Exploratory Machine Learning
- Applied K-Means clustering to explore geographic groupings within the sales data
- Used clustering results as an exploratory tool to visualize regional structure rather than as a final predictive model
- Treated clustering as a hypothesis-generation step to guide further analysis
Key Takeaways
- Demonstrated how messy, real business data can be transformed into structured analytical datasets
- Showed how geospatial visualization can reveal regional demand patterns not visible in raw tables
- Explored the role of macroeconomic indicators in business performance using regularized regression
- Highlighted the difference between exploratory machine learning and production modeling
$ cat package.json
Tech stack
PythonpandasNumPyscikit-learnGeospatial mappingRegression analysis