Skip to content

Hello, I'm

Pratik Bhikadiya.

> Data Engineer

I build production-grade lakehouse pipelines — Bronze/Silver/Gold on Databricks, real-time clickstream on Kafka+EMR, and SLA-monitored analytics for Intuit TurboTax. 5+ years turning raw events into trusted data products.

Certified

Databricks Data Engineer Azure DP-203 Power BI PL-300
Pratik Bhikadiya
5+
Years
1PB+
Data
3
Certs
medallion-lakehouse.flow streaming
pipeline: healthy
Delta Lake ACID

Daily driver stack

PySparkDatabricksDelta LakeAWSAzureAirflowKafkaSQLPythondbt

How I work

Reading the driver logs

Most of my day is spent reading Spark driver logs, chasing skewed joins, and tuning shuffle partitions.

This is a simulated feed showing the kind of events I see on a real ingestion job — executor starts, stage completions, shuffle spills, skew warnings, Delta commits. Click pause if it moves too fast.

Avg job
7.4 min
▼ 35% after tuning
SLA hit rate
99.5%
last 90 days
Peak throughput
2.4 GB/s
shuffle read
Cluster
EMR 6.x
Spark 3.4
spark-driver@emr-cluster:~$ tail -f ingest.log
job: streaming-bronze-to-silver
Exec: 8 · Cores: 32

* simulated driver log · events generated client-side for demo

01 · About

A little about me

I'm a Data & Analytics Engineer who enjoys the craft of turning messy source data into reliable, well-modeled data products that stakeholders actually trust.

Currently at Intuit TurboTax, I architect DataMart pipelines for Marketing, Finance, Product and Sales — Bronze/Silver/Gold on Databricks, Spark SQL on EMR, and SLA monitoring on Databricks Workflows & CloudWatch.

Before Intuit: BI Engineering at ArcelorMittal (Azure Databricks / ADF / Synapse) and BI Analytics at Adani Ports & SEZ — 20+ executive dashboards powering port operations, logistics, and finance.

"Good data engineering is invisible. You only notice it when it fails."

At a glance

  • Role Data / Analytics Engineer
  • Current Intuit TurboTax
  • Based in Brampton, ON 🇨🇦
  • Education M.M. — Business Data Analytics
  • Domains Finance · Tax · Supply Chain
  • Stack Databricks · PySpark · AWS · Azure
rows processed · career (est.)
0
5+
Years
Experience
50+
Dashboards
Delivered
15+
Pipelines
Shipped
99.5%
SLA
Hit rate

02 · Experience

Where I've shipped

A career timeline across three industries — fintech, heavy industry, and logistics. Different domains, same commitment to trustworthy data.

  1. IT

    Data Engineer / Analytics Engineer

    Intuit TurboTax

    Apr 2024 – Present

    Brampton, ON

    • Architected enterprise DataMart pipelines (Marketing, Finance, Product, Sales) on Databricks + PySpark, reducing end-to-end latency by 35%.
    • Designed scalable clickstream ingestion (Kafka → EMR/Databricks) processing millions of daily events.
    • Implemented Bronze-Silver-Gold medallion architecture with incremental Delta processing and schema enforcement.
    • Built automated SLA monitoring (Databricks Workflows + CloudWatch), improving MTTR by 40%.
    DatabricksPySparkDelta LakeKafkaEMRAWS
  2. AM

    BI Engineer

    ArcelorMittal Nippon Steel India

    Apr 2022 – Dec 2022

    India

    • Developed Spark transformation pipelines on Azure Databricks integrating Azure SQL, Blob Storage & Synapse.
    • Orchestrated enterprise ETL workflows in Azure Data Factory, cutting manual intervention by 40%.
    • Engineered curated data mart layers with referential integrity controls for monthly executive reporting.
    Azure DatabricksADFSynapseSpark SQL
  3. AP

    BI Analyst

    Adani Ports and SEZ Ltd.

    Jun 2016 – Mar 2022

    India

    • Built large-scale ETL workflows and warehouse schemas supporting port operations, logistics & finance.
    • Improved data processing efficiency by 20% via SQL optimization while maintaining 99.5% SLA compliance.
    • Delivered 20+ executive dashboards (Power BI / Tableau) tracking operational KPIs, revenue & cargo throughput.
    SQLPower BITableauETL

Education

Degrees · Honors
2023 – 2024

Master of Management (Honors) — Business Data Analytics

University of Windsor, Canada

2020 – 2021

PG Diploma (Honors) — Marketing Management

Maharaja Sayajirao University

2012 – 2016

B.Tech (Honors) — Engineering

Maharaja Sayajirao University

03 · Projects

Selected work

Real, end-to-end data products — each with its own architecture, metrics, and GitHub link. Hover a card to see the dataflow pulse.

All repos on GitHub
medallion

Ontario Real Estate Lakehouse

/01

Production-grade Medallion pipeline on 1.2M civic records.

Bronze/Silver/Gold lakehouse pipeline for Ontario real estate analytics, ingesting 1.2M+ authentic records from Toronto Open Data and Statistics Canada into a Streamlit dashboard.

1.2M+
Records processed
540 MB
Data volume
6
Gold tables
PySparkDelta LakeDatabricksStreamlitPython
Sole Engineer — Design, build, deploy Case study
rag

AI Stock Intelligence — CAN/US Markets

/02

Multi-source ingestion + RAG on 40 stocks across TSX/NYSE/NASDAQ.

End-to-end GenAI-powered analytics platform: ingests 8+ financial and social sources, scores stocks via a 5-factor composite model, and answers natural-language questions through a RAG-powered Streamlit dashboard.

8+
Data sources
40
Stocks tracked
TSX · NYSE · NASDAQ
Markets
PythonDelta LakeAirflowChromaDBGoogle Gemini
Sole Engineer — architecture, pipelines, UI Case study
azure

F1 Azure Databricks Lakehouse

/03

End-to-end Azure lakehouse with Unity Catalog governance.

Formula 1 analytics pipeline on Azure Databricks: ADF orchestration, ADLS Gen2 storage, Delta Lake lakehouse, Unity Catalog governance, and Power BI reporting.

Azure
Cloud
Unity Catalog
Governance
Incremental + Full
Load modes
Azure DatabricksPySparkSpark SQLADLS Gen2Azure Data Factory
Sole Engineer Case study
analytics

Multi-Channel Marketing Analytics

/04

Unified KPI model across Facebook, Google Ads, and TikTok.

Analytics engineering project: a single standardized data model and interactive dashboard unifying Facebook / Google / TikTok ad spend, with built-in data-quality checks and ROI-driven KPIs.

3 (FB / Google / TikTok)
Channels
CTR · CPC · CPA · ROAS · CPM
KPIs modeled
Streamlit Cloud
Deploy target
PythonPandasStreamlitPlotly
Sole Engineer — model, UI, QC Case study

04 · Skills

Tools I build with

Grouped by where they live in the stack — with the tools I reach for most.

Data Engineering

01
PySpark Spark SQL Delta Lake Databricks Airflow Kafka EMR Unity Catalog

Cloud

02
AWS S3 AWS EMR AWS Athena CloudWatch Azure Databricks Azure Data Factory Azure Synapse ADLS Gen2

Languages & Warehousing

03
Python SQL T-SQL Scala Snowflake BigQuery

BI & Viz

04
Power BI Tableau Streamlit Plotly dbt

Ops & Governance

05
SLA Monitoring Data Quality CI/CD Git Docker Medallion Architecture

AI / GenAI

06
RAG ChromaDB LLM APIs Gemini Claude Sentiment / NLP

05 · Contact

Let's build something

Open to Data Engineering / Analytics Engineering opportunities, consulting, and interesting collaborations. Fastest reply is email.