Data Scientist · AI Engineer · CMU Heinz '26
6+ years building ML systems that ship: churn models, recommendation engines, ETL pipelines, and LLM-powered NLP. Now at CMU, studying how to do it at scale.
Data Scientist with 6+ years of production experience, currently pursuing a Master's at Carnegie Mellon University (Heinz College).
My background spans the full ML lifecycle: from ETL pipeline engineering and feature design to shipping churn models and NLP systems used by global teams. I prioritize data quality above all else, because clean, reliable data is the foundation everything else is built on.
At CMU, I'm expanding into machine learning, A/B testing, time series forecasting, and unstructured data analytics, bringing coursework directly into applied projects.
Fluent in English, native Korean. Based in Pittsburgh, PA.
Replaced a 500M KRW/year third-party vendor with an internal NLP pipeline for social sentiment and topic classification. Upgraded from RoBERTa to fine-tuned GPT APIs on proprietary labeled data, improving accuracy from 60% to 92% with better handling of Korean slang and multilingual context. Adopted by APAC and US teams as a regional standard.
Users were leaving Kimberly-Clark's e-commerce app for Coupang's Rocket Delivery. Identified delivery zone as the key churn signal through feature engineering on shipping addresses. Proposed and validated a 2-day to 1-day shipping pilot for high-risk regions, confirmed via A/B test over 2 months.
Sales dropped sharply at the diaper size 3 to 4 transition. Built a recommendation model using child age, previous size, and order recency to predict which parents were approaching size 4. Sent targeted coupons to 5,000 predicted users and 5,000 random users. Targeted cohort showed 2x the conversion rate.
Engineered end-to-end ETL pipelines using Azure Data Factory and Snowflake, integrating raw e-commerce logs and demographic data into a unified analytical dataset. Automated daily workflows via ADF triggers and transformed data using Databricks (PySpark/Pandas), enabling real-time insights via Power BI without manual intervention.
Lending platform integrating real-time weather data and LLM-based interview transcription to optimize credit risk in climate-vulnerable regions. XGBoost risk scoring engine correlates extreme weather patterns with loan defaults.
Proximity-based networking app using Bluetooth Low Energy GAP advertising packets for real-time, privacy-first connections. Mutual digital handshake protocol for selective data sharing.
Validation framework on Databricks identifying cost-efficiency outliers by correlating humanitarian project budgets with population data. Interactive dashboards for beneficiary targeting performance auditing.
K-Means clustering and Logistic Regression to identify high-potential subscription segments. Designed targeted A/B testing framework to optimize the Super Duolingo upsell funnel.
Covers 6+ years across data science, AI engineering, and analytics, including all credentials, coursework, and additional roles.
Open to full-time roles, internships, and interesting collaborations in data science and AI.