Data Scientist · AI Engineer · CMU Heinz '26

Daniel "Heedong" Jang

6+ years building ML systems that ship: churn models, recommendation engines, ETL pipelines, and LLM-powered NLP. Now at CMU, studying how to do it at scale.

Currently
  • Status Open to opportunities
  • Based Pittsburgh, PA
  • Study CMU Heinz — MISM-BIDA
  • Grad December 2026
  • Focus ML Engineering · LLM Systems
01

About

Data Scientist with 6+ years of production experience, currently pursuing a Master's at Carnegie Mellon University (Heinz College).

My background spans the full ML lifecycle: from ETL pipeline engineering and feature design to shipping churn models and NLP systems used by global teams. I prioritize data quality above all else, because clean, reliable data is the foundation everything else is built on.

At CMU, I'm expanding into machine learning, A/B testing, time series forecasting, and unstructured data analytics, bringing coursework directly into applied projects.

Fluent in English, native Korean. Based in Pittsburgh, PA.

Core Stack
PythonPyTorchScikit-learnLangChainTensorFlowSQL RJava
Infrastructure
Azure Data FactorySnowflakeDatabricks AWS Google Colab MSSQL Power BITableau
Methods
LLM Applications Time Series Forecasting A/B Testing NLP Machine Modeling Deep Learning Prediction Modeling Statistics
02

Education

Aug 2025 — Dec 2026
Carnegie Mellon University
Pittsburgh, PA
M.S. Information Systems Management
  • Heinz College of Information Systems and Public Policy
  • Concentration: Business Intelligence and Data Analytics
Aug 2010 — May 2016
Syracuse University
Syracuse, NY
B.S. Information Management and Technology
  • School of Information Studies
  • Graduated Cum Laude
  • With mandatory military service completed 2012–2014, Republic of Korea Army
03

Experience

Oct 2021 — Jun 2025
Kimberly-Clark
Seoul, South Korea
Data Scientist — AI Labs
  • Refactored RoBERTa-based NLP pipeline to fine-tuned GPT system, improving accuracy from 60% to 92% and cutting vendor costs by 90%, adopted by APAC and US teams
  • Engineered scalable ETL pipelines using Azure Data Factory and Snowflake, reducing data processing latency from 24 hours to real-time
  • Built LightGBM churn model for MomQ e-commerce platform, delivery zone feature engineering drove 20% retention improvement among high-risk users
  • Developed size-transition recommendation model, A/B tested across 10,000 users, achieving 2x conversion rate in targeted cohort vs. random group
  • Collaborated with marketing, sales, and product stakeholders to align data science solutions with business objectives across global teams
Mar 2020 — Oct 2021
Codestates
Seoul, South Korea
AI Bootcamp — Code States
  • Completed an intensive AI/ML curriculum covering machine learning, deep learning, statistics, and data engineering
  • Built hands-on projects across EDA, feature engineering, hypothesis testing, clustering, and linear algebra
  • Transitioned from data analysis and consulting into full data science, leading directly to a Data Scientist role at Kimberly-Clark
Nov 2018 — Sep 2019
Unico Search
Seoul, South Korea
Consultant — ICT
  • Analyzed candidate response rates and recruitment metrics to optimize sourcing strategies and improve hiring efficiency
  • Consulted with tech industry clients to understand technical requirements and match them with qualified ICT professionals
  • Monitored industry trends and leveraged market data to provide strategic talent acquisition insights
Oct 2017 — Current
Saerona Solar Systems
Seoul, South Korea
Entrepreneur
  • Managed end-to-end solar energy generation projects including site research, hardware procurement, and infrastructure installation
  • Structured and negotiated a 20-year Power Purchase Agreement with Korea Electric Power Corporation (KEPCo) to commercialize energy output
Aug 2016 — Jul 2017
Cavtil, Inc.
New York, NY
Data Analyst
  • Conducted market and competitive research to evaluate viability of K-12 educational content services, analyzing user demographics and purchasing data
  • Designed interactive Tableau dashboards to translate complex data into actionable business strategies
  • Delivered data-driven insights and strategic recommendations directly to CEO and COO to guide product development
Dec 2015 — Jan 2016
Jeju Air
Seoul, South Korea
Data Analyst Intern
  • Analyzed qualitative user feedback from customer service channels, translating pain points into actionable service quality improvements
  • Evaluated digital platform UI/UX to identify user friction points and recommend data-driven usability enhancements
03

Projects

Professional
Kimberly-Clark · 2021–2024
NLP-powered Risk Management Dashboard
60% to 92% Accuracy · 90% Cost reduction

Replaced a 500M KRW/year third-party vendor with an internal NLP pipeline for social sentiment and topic classification. Upgraded from RoBERTa to fine-tuned GPT APIs on proprietary labeled data, improving accuracy from 60% to 92% with better handling of Korean slang and multilingual context. Adopted by APAC and US teams as a regional standard.

RoBERTaGPT Fine-tuningNLPRisk ManagementMultilingualPower BI
Kimberly-Clark · 2022–2023
MomQ Churn Prediction
20% Retention improvement among high-risk users

Users were leaving Kimberly-Clark's e-commerce app for Coupang's Rocket Delivery. Identified delivery zone as the key churn signal through feature engineering on shipping addresses. Proposed and validated a 2-day to 1-day shipping pilot for high-risk regions, confirmed via A/B test over 2 months.

LightGBMFeature EngineeringA/B TestingEDA
Kimberly-Clark · 2023
Size 4 Diaper Recommendation Model
2x Conversion vs. random group (10K user A/B test)

Sales dropped sharply at the diaper size 3 to 4 transition. Built a recommendation model using child age, previous size, and order recency to predict which parents were approaching size 4. Sent targeted coupons to 5,000 predicted users and 5,000 random users. Targeted cohort showed 2x the conversion rate.

Recommendation ModelScikit-learnA/B TestingCRM
Kimberly-Clark · 2022–2024
Scalable ETL Pipeline for ML Consumption
24h to Real-time Data processing latency

Engineered end-to-end ETL pipelines using Azure Data Factory and Snowflake, integrating raw e-commerce logs and demographic data into a unified analytical dataset. Automated daily workflows via ADF triggers and transformed data using Databricks (PySpark/Pandas), enabling real-time insights via Power BI without manual intervention.

Azure Data FactorySnowflakeDatabricksPySparkPower BI
Academic
Feb 2026 · CMU
Cedar: Climate-Informed Micro-finance Platform

Lending platform integrating real-time weather data and LLM-based interview transcription to optimize credit risk in climate-vulnerable regions. XGBoost risk scoring engine correlates extreme weather patterns with loan defaults.

XGBoostLLMPythonRisk Modeling
Feb 2026 · ProdHacks
Orbit: BLE Proximity Networking App

Proximity-based networking app using Bluetooth Low Energy GAP advertising packets for real-time, privacy-first connections. Mutual digital handshake protocol for selective data sharing.

BLEMobilePrivacy-First UX 🏆 ProdHacks 2026 Finalist
Jan 2026 · CMU x Databricks x UN
UN Humanitarian Data Validation Framework

Validation framework on Databricks identifying cost-efficiency outliers by correlating humanitarian project budgets with population data. Interactive dashboards for beneficiary targeting performance auditing.

DatabricksPythonData Visualization
Oct 2025 · CMU
Duolingo User Segmentation and Monetization

K-Means clustering and Logistic Regression to identify high-potential subscription segments. Designed targeted A/B testing framework to optimize the Super Duolingo upsell funnel.

Scikit-learnK-MeansA/B TestingLogistic Regression
Machine Learning Unstructured Data Analytics A/B Testing and Design Time Series Forecasting Data Science and Big Data Applied Econometrics Database Management Data-Focused Python Stats for IT Managers Advanced Business Analytics Java Programming
04

Resume

Full CV available for download.

Covers 6+ years across data science, AI engineering, and analytics, including all credentials, coursework, and additional roles.

Download PDF Request Version
Selected Credentials
Java Programming, UCSC Silicon Valley Extension2025
Deep Learning Time Series, Learning Spoons2024
DP-203: Azure Data Engineering, Microsoft2022
AI Boot Camp, Codestates2021
Data Analysis with Python, Fastcampus2019

Let's build
something good.

Open to full-time roles, internships, and interesting collaborations in data science and AI.