Business Analyst · Data Scientist · Data Engineer
MS Applied Data Intelligence @ SJSU · MS Computing Science @ Imperial College London
📍 San Jose, CA · ✉️ janeheng.ic@gmail.com · LinkedIn · LeetCode
- 🎓 MS in Applied Data Intelligence @ San Jose State University (graduating Dec 2026)
- 🎓 MSc in Computing Science @ Imperial College London
- 🎓 BEng in eCommerce Engineering @ Queen Mary University of London
- 💼 4 years as a Software PM leading 9 enterprise SaaS platforms — returned to build hands-on ML & data engineering skills
- 🛠️ Built 8 end-to-end projects across deep learning, distributed systems, pipeline engineering, and analytics
- 🤖 Exploring agentic AI systems: MRKL architecture, semantic caching, LLM-powered concierge agents
- 📊 Daily practice in SQL, Pandas, and Python — 81 problems solved
- 🔓 No sponsorship needed · Open to BA / DA / DE / DS roles · Available Dec 2026
| Project | Description | Tech | Highlights |
|---|---|---|---|
| GNN-RAG Threat Detection | CVE knowledge base with semantic threat retrieval — links network attacks to MITRE ATT&CK tactics | Python · GNN · RAG · Pinecone | 17,014 CVE vectors · 16.48ms latency |
| DualBranchSER: Speech Emotion Recognition | Dual-branch CNN + Bi-LSTM classifying emotions from audio, beating baseline on IEMOCAP | PyTorch · CNN · Bi-LSTM | +15.8% over baseline · 55.9% acc |
| Food Desert & Public Health Analysis | Socioeconomic drivers of food insecurity across 2,275 US counties, interactive dashboard | Scikit-learn · Streamlit · DBSCAN | 55%→68% acc · live dashboard |
| Deep Learning & Computer Vision | NN from scratch through super-resolution — LeNet-5, AlexNet, Autoencoder, NPU deployment | PyTorch · CNN · Super-Resolution | SR PSNR 23.98dB · NPU deployment |
| Project | Description | Tech | Highlights |
|---|---|---|---|
| KayakClone AI Agent | AI concierge for a distributed travel platform — MRKL agent, semantic cache, Kafka pipeline, WebSocket alerts, React frontend | FastAPI · OpenAI · Redis · Kafka · React | 6 MRKL tools · 40% cache hit · 85ms p50 |
| Airbnb Clone: Distributed Booking Platform | Co-built distributed booking platform — Kafka event pipeline, MongoDB sessions, Redux state management, Docker/K8s deployment, JMeter load testing up to 500 concurrent users | React · Redux · Node.js · Kafka · MySQL · MongoDB · Docker · Kubernetes | HPA 2–5 replicas · 7-service docker-compose · JMeter @ 500 users |
| Distributed Spatial Join: Blue Brain Project | Master's thesis at Imperial College London — a data modeling sub-project within the Blue Brain Project, simulating mouse neurons and synapses at scale via a distributed spatial-join pipeline | C++ · Hadoop · MapReduce | TB-scale neuroscience data · MapReduce optimization |
| Project | Description | Tech | Highlights |
|---|---|---|---|
| Automated Stock Analytics Pipeline | End-to-end ELT pipeline for NVIDIA stock data with automated quality checks | Airflow · dbt · Snowflake | ELT pipeline · data quality tests |
| TSLA Sentiment Pipeline | 24/7 sentiment monitoring on financial news for Tesla stock | Airflow · Snowflake · NLP | 24/7 monitoring (coming soon) |
| Project | Description | Tech | Highlights |
|---|---|---|---|
| Netflix Content Popularity Prediction | NLP on titles/descriptions to predict popularity, Power BI dashboard for content strategy | Python · Power BI · TF-IDF | 81% acc · AUC 0.75 (coming soon) |
| Category | Platform | Completed | Topics Covered |
|---|---|---|---|
| SQL | StrataScratch | 53 problems | Filtering · Aggregations · Joins · Window Functions · Date & Time · Pattern Matching · CTE · Set Operations |
| Pandas | LeetCode | 18 problems | DataFrame Basics · Reshape (concat/pivot/melt) · Method Chaining |
| Python | LeetCode | 10 problems | Array · Hash Map |
💼 LinkedIn · 📧 janeheng.ic@gmail.com · 🔓 No sponsorship needed