Project
AI Stock Intelligence — CAN/US Markets
Multi-source ingestion + RAG on 40 stocks across TSX/NYSE/NASDAQ.
PythonDelta LakeAirflowChromaDBGoogle GeminiStreamlitDocker
8+
Data sources
40
Stocks tracked
TSX · NYSE · NASDAQ
Markets
Airflow / Docker
Orchestration
The problem
Retail investors drown in fragmented signals — price ticks, news, Reddit chatter, analyst notes — with no way to unify them into a single, explainable recommendation.
What I built
A containerized analytics platform that merges financial, news, and social data into a unified Delta Lake store, applies NLP sentiment, and surfaces recommendations via a RAG-powered Q&A interface:
- Ingestion:
yfinance, NewsAPI, Reddit PRAW, and 5+ supplementary sources. - Processing: TextBlob sentiment, a 5-factor composite scoring model, feature enrichment.
- Storage: Delta Lake for structured history + ChromaDB vector store for unstructured embeddings.
- AI Layer: Google Gemini + RAG for natural-language questions grounded in ingested data.
- Orchestration: Airflow DAGs, full Docker/Podman infra, deployable on Streamlit Cloud.
Why it matters
This project shows I can reach beyond classic batch ETL — modern data engineering now includes vector stores, LLM APIs, and grounded retrieval. It’s an honest answer to “what does a 2026 data platform look like?”
Questions about this project?
Email me