Project
F1 Azure Databricks Lakehouse
End-to-end Azure lakehouse with Unity Catalog governance.
Azure DatabricksPySparkSpark SQLADLS Gen2Azure Data FactoryPower BIUnity Catalog
Azure
Cloud
Unity Catalog
Governance
Incremental + Full
Load modes
The problem
F1 race data is rich but heavily normalized across many feeds (results, lap times, pit stops, standings). Turning it into analyst-ready tables requires the full lakehouse pattern — not just raw storage.
What I built
A full Azure lakehouse demonstrating incremental + full-load patterns with enterprise governance:
- Storage: ADLS Gen2 with raw/cleansed/curated zones.
- Compute: Azure Databricks — PySpark ingestion, Spark SQL transformations, Delta Lake writes with schema enforcement.
- Orchestration: Azure Data Factory pipelines trigger Databricks notebooks.
- Governance: Unity Catalog with row-level access and lineage.
- Serving: Power BI dashboards for race performance, driver stats, and constructor standings.
Why it matters
Complements the AWS/Databricks work in my day job by demonstrating the parallel Azure stack: ADF + Unity Catalog + Power BI. Shows I can move between clouds without losing the shape of a good pipeline.
Questions about this project?
Email me