Skip to content
Back to projects

Project

F1 Azure Databricks Lakehouse

End-to-end Azure lakehouse with Unity Catalog governance.

Azure DatabricksPySparkSpark SQLADLS Gen2Azure Data FactoryPower BIUnity Catalog
Azure
Cloud
Unity Catalog
Governance
Incremental + Full
Load modes

The problem

F1 race data is rich but heavily normalized across many feeds (results, lap times, pit stops, standings). Turning it into analyst-ready tables requires the full lakehouse pattern — not just raw storage.

What I built

A full Azure lakehouse demonstrating incremental + full-load patterns with enterprise governance:

  • Storage: ADLS Gen2 with raw/cleansed/curated zones.
  • Compute: Azure Databricks — PySpark ingestion, Spark SQL transformations, Delta Lake writes with schema enforcement.
  • Orchestration: Azure Data Factory pipelines trigger Databricks notebooks.
  • Governance: Unity Catalog with row-level access and lineage.
  • Serving: Power BI dashboards for race performance, driver stats, and constructor standings.

Why it matters

Complements the AWS/Databricks work in my day job by demonstrating the parallel Azure stack: ADF + Unity Catalog + Power BI. Shows I can move between clouds without losing the shape of a good pipeline.

Questions about this project?

Email me