Skip to content
Back to projects

Project

AI Stock Intelligence — CAN/US Markets

Multi-source ingestion + RAG on 40 stocks across TSX/NYSE/NASDAQ.

PythonDelta LakeAirflowChromaDBGoogle GeminiStreamlitDocker
8+
Data sources
40
Stocks tracked
TSX · NYSE · NASDAQ
Markets
Airflow / Docker
Orchestration

The problem

Retail investors drown in fragmented signals — price ticks, news, Reddit chatter, analyst notes — with no way to unify them into a single, explainable recommendation.

What I built

A containerized analytics platform that merges financial, news, and social data into a unified Delta Lake store, applies NLP sentiment, and surfaces recommendations via a RAG-powered Q&A interface:

  • Ingestion: yfinance, NewsAPI, Reddit PRAW, and 5+ supplementary sources.
  • Processing: TextBlob sentiment, a 5-factor composite scoring model, feature enrichment.
  • Storage: Delta Lake for structured history + ChromaDB vector store for unstructured embeddings.
  • AI Layer: Google Gemini + RAG for natural-language questions grounded in ingested data.
  • Orchestration: Airflow DAGs, full Docker/Podman infra, deployable on Streamlit Cloud.

Why it matters

This project shows I can reach beyond classic batch ETL — modern data engineering now includes vector stores, LLM APIs, and grounded retrieval. It’s an honest answer to “what does a 2026 data platform look like?”

Questions about this project?

Email me