House Price Predictor

A straightforward tabular ML project to sharpen fundamentals and round out the portfolio.

⚡ Why This Project?

It’s a classic. Clean, tabular data; sensible preprocessing; baseline-to-better models; and clear evaluation with reproducible results. Not a passion project — just a tidy way to demonstrate end-to-end ML hygiene.

🎯 Objectives

Data cleaning and feature engineering on a well-known housing dataset.
Baseline models (Linear/ElasticNet) → tree-based models (RandomForest/XGBoost/LightGBM).
Cross-validation, simple hyperparameter tuning, and holdout test evaluation.
Interpretability: feature importances/SHAP summary for top drivers.
Lightweight dashboard to show predictions and metric snapshots.

🧪 MVP Scope

Notebook + scripts for ingest, preprocess, train, evaluate.
2–3 baseline models with k-fold CV; pick a champion by RMSE.
Simple FastAPI endpoint for predict(); CSV upload input.
Mini Next.js page to display RMSE/MAE/R² and top features.

🛠️ Tools & Stack (Planned)

Python: pandas, scikit-learn, (optionally) xgboost/lightgbm
FastAPI for a lightweight prediction service
SQLite/PostgreSQL (optional) for experiment logs
Next.js mini dashboard for metrics

📡 Data Notes

Use a widely used public housing dataset to keep the scope predictable and comparable. Focus is on clean preprocessing steps (missing values, categorical encoding, skew fixes) and reproducible evaluation.

📈 Metrics

RMSE (primary), MAE, and R²
Train vs validation gap (over/underfit check)
K-fold stability (variance across folds)

🗺️ Roadmap (High Level)

Dataset setup + EDA
Preprocessing + baselines
Tree models + simple tuning
Interpretability (importances/SHAP)
FastAPI + mini metrics page

← Back to Coding Projects View Planning Changelog →