House Price Predictor
A straightforward tabular ML project to sharpen fundamentals and round out the portfolio.
π Pythonπ pandas / scikit-learnπ² XGBoost / LightGBMπ SHAPβ‘ FastAPI
π§ Planned Project
β‘ Why This Project?
It's a classic. Clean, tabular data; sensible preprocessing; baseline-to-better models; and clear evaluation with reproducible results. Not a passion project β just a tidy way to demonstrate end-to-end ML hygiene.
π― Objectives
- Data cleaning and feature engineering on a well-known housing dataset.
- Baseline models (Linear/ElasticNet) β tree-based models (RandomForest/XGBoost/LightGBM).
- Cross-validation, simple hyperparameter tuning, and holdout test evaluation.
- Interpretability: feature importances/SHAP summary for top drivers.
- Lightweight dashboard to show predictions and metric snapshots.
π§ͺ MVP Scope
- Notebook + scripts for ingest, preprocess, train, evaluate.
- 2β3 baseline models with k-fold CV; pick a champion by RMSE.
- Simple FastAPI endpoint for
predict(); CSV upload input. - Mini Next.js page to display RMSE/MAE/RΒ² and top features.
π οΈ Tools & Stack (Planned)
- Python: pandas, scikit-learn, (optionally) xgboost/lightgbm
- FastAPI for a lightweight prediction service
- SQLite/PostgreSQL (optional) for experiment logs
- Next.js mini dashboard for metrics
π‘ Data Notes
Use a widely used public housing dataset to keep the scope predictable and comparable. Focus is on clean preprocessing steps (missing values, categorical encoding, skew fixes) and reproducible evaluation.
π Metrics
- RMSE (primary), MAE, and RΒ²
- Train vs validation gap (over/underfit check)
- K-fold stability (variance across folds)
πΊοΈ Roadmap (High Level)
- Dataset setup + EDA
- Preprocessing + baselines
- Tree models + simple tuning
- Interpretability (importances/SHAP)
- FastAPI + mini metrics page