House Price Predictor

A straightforward tabular ML project to sharpen fundamentals and round out the portfolio.

⚡ Why This Project?

It’s a classic. Clean, tabular data; sensible preprocessing; baseline-to-better models; and clear evaluation with reproducible results. Not a passion project — just a tidy way to demonstrate end-to-end ML hygiene.

🎯 Objectives

  • Data cleaning and feature engineering on a well-known housing dataset.
  • Baseline models (Linear/ElasticNet) → tree-based models (RandomForest/XGBoost/LightGBM).
  • Cross-validation, simple hyperparameter tuning, and holdout test evaluation.
  • Interpretability: feature importances/SHAP summary for top drivers.
  • Lightweight dashboard to show predictions and metric snapshots.

🧪 MVP Scope

  • Notebook + scripts for ingest, preprocess, train, evaluate.
  • 2–3 baseline models with k-fold CV; pick a champion by RMSE.
  • Simple FastAPI endpoint for predict(); CSV upload input.
  • Mini Next.js page to display RMSE/MAE/R² and top features.

🛠️ Tools & Stack (Planned)

  • Python: pandas, scikit-learn, (optionally) xgboost/lightgbm
  • FastAPI for a lightweight prediction service
  • SQLite/PostgreSQL (optional) for experiment logs
  • Next.js mini dashboard for metrics

📡 Data Notes

Use a widely used public housing dataset to keep the scope predictable and comparable. Focus is on clean preprocessing steps (missing values, categorical encoding, skew fixes) and reproducible evaluation.

📈 Metrics

  • RMSE (primary), MAE, and R²
  • Train vs validation gap (over/underfit check)
  • K-fold stability (variance across folds)

🗺️ Roadmap (High Level)

  1. Dataset setup + EDA
  2. Preprocessing + baselines
  3. Tree models + simple tuning
  4. Interpretability (importances/SHAP)
  5. FastAPI + mini metrics page