Skip to content
View SunnyJaneH's full-sized avatar

Block or report SunnyJaneH

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SunnyJaneH/README.md

Hi there, I'm Jie (Jane) Heng 👋

Business Analyst · Data Scientist · Data Engineer
MS Applied Data Intelligence @ SJSU · MS Computing Science @ Imperial College London

Profile Views

📍 San Jose, CA  ·  ✉️ janeheng.ic@gmail.com  ·  LinkedIn  ·  LeetCode


💎 About Me

  • 🎓 MS in Applied Data Intelligence @ San Jose State University (graduating Dec 2026)
  • 🎓 MSc in Computing Science @ Imperial College London
  • 🎓 BEng in eCommerce Engineering @ Queen Mary University of London
  • 💼 4 years as a Software PM leading 9 enterprise SaaS platforms — returned to build hands-on ML & data engineering skills
  • 🛠️ Built 8 end-to-end projects across deep learning, distributed systems, pipeline engineering, and analytics
  • 🤖 Exploring agentic AI systems: MRKL architecture, semantic caching, LLM-powered concierge agents
  • 📊 Daily practice in SQL, Pandas, and Python — 81 problems solved
  • 🔓 No sponsorship needed · Open to BA / DA / DE / DS roles · Available Dec 2026

⚡ Tech Stack

Programming & Data Science Python Java JavaScript SQL Pandas

Machine Learning & AI PyTorch TensorFlow scikit-learn OpenAI

Big Data & Cloud Spark Kafka Hadoop HDFS AWS Redis Docker

Web & API Development FastAPI React Pydantic

Data Engineering & Warehouse Airflow Snowflake dbt ClickHouse Pinecone MongoDB

Data Visualization & BI Power BI Tableau Apache Superset Matplotlib Seaborn


🚀 Projects

Deep Learning & AI

Project Description Tech Highlights
GNN-RAG Threat Detection CVE knowledge base with semantic threat retrieval — links network attacks to MITRE ATT&CK tactics Python · GNN · RAG · Pinecone 17,014 CVE vectors · 16.48ms latency
DualBranchSER: Speech Emotion Recognition Dual-branch CNN + Bi-LSTM classifying emotions from audio, beating baseline on IEMOCAP PyTorch · CNN · Bi-LSTM +15.8% over baseline · 55.9% acc
Food Desert & Public Health Analysis Socioeconomic drivers of food insecurity across 2,275 US counties, interactive dashboard Scikit-learn · Streamlit · DBSCAN 55%→68% acc · live dashboard
Deep Learning & Computer Vision NN from scratch through super-resolution — LeNet-5, AlexNet, Autoencoder, NPU deployment PyTorch · CNN · Super-Resolution SR PSNR 23.98dB · NPU deployment

Distributed Systems & Backend

Project Description Tech Highlights
KayakClone AI Agent AI concierge for a distributed travel platform — MRKL agent, semantic cache, Kafka pipeline, WebSocket alerts, React frontend FastAPI · OpenAI · Redis · Kafka · React 6 MRKL tools · 40% cache hit · 85ms p50
Airbnb Clone: Distributed Booking Platform Co-built distributed booking platform — Kafka event pipeline, MongoDB sessions, Redux state management, Docker/K8s deployment, JMeter load testing up to 500 concurrent users React · Redux · Node.js · Kafka · MySQL · MongoDB · Docker · Kubernetes HPA 2–5 replicas · 7-service docker-compose · JMeter @ 500 users
Distributed Spatial Join: Blue Brain Project Master's thesis at Imperial College London — a data modeling sub-project within the Blue Brain Project, simulating mouse neurons and synapses at scale via a distributed spatial-join pipeline C++ · Hadoop · MapReduce TB-scale neuroscience data · MapReduce optimization

Data Engineering & Warehouse

Project Description Tech Highlights
Automated Stock Analytics Pipeline End-to-end ELT pipeline for NVIDIA stock data with automated quality checks Airflow · dbt · Snowflake ELT pipeline · data quality tests
TSLA Sentiment Pipeline 24/7 sentiment monitoring on financial news for Tesla stock Airflow · Snowflake · NLP 24/7 monitoring (coming soon)

Data Visualization & Analytics

Project Description Tech Highlights
Netflix Content Popularity Prediction NLP on titles/descriptions to predict popularity, Power BI dashboard for content strategy Python · Power BI · TF-IDF 81% acc · AUC 0.75 (coming soon)

📈 Daily Practice

🔗 LeetCode Profile

Category Platform Completed Topics Covered
SQL StrataScratch 53 problems Filtering · Aggregations · Joins · Window Functions · Date & Time · Pattern Matching · CTE · Set Operations
Pandas LeetCode 18 problems DataFrame Basics · Reshape (concat/pivot/melt) · Method Chaining
Python LeetCode 10 problems Array · Hash Map

💼 LinkedIn  ·  📧 janeheng.ic@gmail.com  ·  🔓 No sponsorship needed

Pinned Loading

  1. MLCountyHealth MLCountyHealth Public

    Forked from IamSavitha/MLCountyHealth

    A comprehensive Machine Learning study on county health factors. Comparative analysis of Random Forest, SVM, and Linear Regression to predict community health outcomes using socioeconomic data.

    Jupyter Notebook

  2. DualBranchSER DualBranchSER Public

    Lightweight dual-branch CNN + Bi-LSTM architecture for real-time Speech Emotion Recognition · 55.9% accuracy · +15.8% over baseline · IEMOCAP · PyTorch

    Jupyter Notebook

  3. GNN-RAG-Detection GNN-RAG-Detection Public

    GNN and RAG-based threat detection · CVE data pipeline · Pinecone · MITRE ATT&CK

    Python

  4. DeepLearning-ComputerVision-PyTorch DeepLearning-ComputerVision-PyTorch Public

    Neural networks from scratch to super-resolution: LeNet-5, AlexNet, Autoencoder, custom SR models · PSNR 23.98dB · NPU deployment · PyTorch

    Python

  5. KayakClone-AI-Agent KayakClone-AI-Agent Public

    AI concierge agent for travel planning — MRKL architecture, semantic cache (Redis + embeddings), Kafka pipeline, WebSocket notifications, FastAPI + React

    Python

  6. Airbnb-Clone-AWS Airbnb-Clone-AWS Public

    Distributed Airbnb booking platform: React + Redux frontend, Node.js backend, Kafka event-driven booking flow, MongoDB session storage, Kubernetes with HPA (2-5 replicas), JMeter load tested up to …

    Python