House Price Predictor
A straightforward tabular ML project to sharpen fundamentals and round out the portfolio.
⚡ Why This Project?
It’s a classic. Clean, tabular data; sensible preprocessing; baseline-to-better models; and clear evaluation with reproducible results. Not a passion project — just a tidy way to demonstrate end-to-end ML hygiene.
🎯 Objectives
- Data cleaning and feature engineering on a well-known housing dataset.
- Baseline models (Linear/ElasticNet) → tree-based models (RandomForest/XGBoost/LightGBM).
- Cross-validation, simple hyperparameter tuning, and holdout test evaluation.
- Interpretability: feature importances/SHAP summary for top drivers.
- Lightweight dashboard to show predictions and metric snapshots.
🧪 MVP Scope
- Notebook + scripts for ingest, preprocess, train, evaluate.
- 2–3 baseline models with k-fold CV; pick a champion by RMSE.
- Simple FastAPI endpoint for predict(); CSV upload input.
- Mini Next.js page to display RMSE/MAE/R² and top features.
🛠️ Tools & Stack (Planned)
- Python: pandas, scikit-learn, (optionally) xgboost/lightgbm
- FastAPI for a lightweight prediction service
- SQLite/PostgreSQL (optional) for experiment logs
- Next.js mini dashboard for metrics
📡 Data Notes
Use a widely used public housing dataset to keep the scope predictable and comparable. Focus is on clean preprocessing steps (missing values, categorical encoding, skew fixes) and reproducible evaluation.
📈 Metrics
- RMSE (primary), MAE, and R²
- Train vs validation gap (over/underfit check)
- K-fold stability (variance across folds)
🗺️ Roadmap (High Level)
- Dataset setup + EDA
- Preprocessing + baselines
- Tree models + simple tuning
- Interpretability (importances/SHAP)
- FastAPI + mini metrics page