HousePrice Analytics

Live Australian residential property market intelligence. Growth hotspot detection, XGBoost + LightGBM price predictions, and composite investment scoring — powered by ABS 6416.0 data.

🐍 Python📊 Streamlit🌲 XGBoost / LightGBM📈 Plotly🗄️ SQLite🐳 Docker🏛️ ABS 6416.0
8
Capital Cities
160
Data Points
3
ML Models
5y
Historical Data

What It Does

🔥

Growth Hotspot Detection

Statistical outlier analysis across 8 Australian capitals. Flags cities with YoY growth above market average + accelerating 3-quarter momentum. Scatter plots reveal acceleration vs year-on-year movement at a glance.

🤖

ML Price Prediction

XGBoost + LightGBM ensemble trained on 5 years of ABS quarterly index data. Features include 4-quarter lag values, rolling averages, and acceleration metrics. Time-series cross-validated for real predictive validity — not just in-sample fitting.

📊

Investment Scoring

Composite 0–100 score per city combining: growth momentum (40%), ML prediction confidence (30%), volatility-adjusted stability (15%), and market position signals (15%). Ranked table with radar chart breakdown for top cities.

📈

Trend Analysis

Interactive price index trends with city comparison, QoQ change bars, and YoY rankings. Powered by ABS 6416.0 Residential Property Price Indexes — the same source RBA and Treasury use.

Stack

🏗️ Infrastructure
  • Docker (containerized app)
  • DigitalOcean droplet
  • Nginx reverse proxy
  • Daily ETL cron (6am UTC)
🧠 ML Pipeline
  • scikit-learn (preprocessing, CV)
  • XGBoost + LightGBM ensemble
  • TimeSeriesSplit cross-validation
  • 14-feature lag engineering
📡 Data
  • ABS Cat. 6416.0 (primary)
  • Domain API (suburb stats)
  • SQLite database
  • Plotly interactive charts

What I Learned

  • Time-series CV matters. Standard k-fold leaks future data into training — TimeSeriesSplit prevents this and gives a realistic performance estimate.
  • ABS data is surprisingly usable. The 6416.0 RPPI dataset is clean, consistent, and goes back decades. More reliable than scraped sources for trend analysis.
  • Ensemble prediction improves stability. XGBoost and LightGBM tend to diverge on edge cases — averaging their predictions reduces individual model bias.
  • Composite scoring needs weighting discipline. Easy to game a score by tweaking weights. Each weight needs a defensible reason — documented in the methodology.