Project
Ontario Real Estate Lakehouse
Production-grade Medallion pipeline on 1.2M civic records.
The problem
Toronto and the Ontario government publish valuable real estate data (property boundaries, building permits, rental evaluations, housing price indices) — but it’s fragmented across portals, inconsistently shaped, and updated on different cadences. No single place gives decision-makers a trustworthy view of what’s happening in the market.
What I built
A production-style Medallion (Bronze/Silver/Gold) lakehouse on Databricks + Delta Lake:
- Bronze (8 tables): Raw API landing from Toronto Open Data + StatsCan — idempotent ingestion with audit columns.
- Silver (4 tables): Type-cast, deduplicated, enriched with geospatial keys; schema-enforced Delta writes.
- Gold (6 tables): Business-ready aggregates powering the dashboard — permit trends, construction investment, apartment quality scores, price indices.
A Streamlit dashboard on top surfaces six analytical tabs tied directly to the Gold layer.
Why it matters
This was my playground for putting my day-job patterns into a public, portable repo: incremental Delta, schema enforcement, layered refinement, and a thin but real UI. Full CI-style reruns, audit trails, and documented Gold grain.
Questions about this project?
Email me