Daniel Jang — Data Scientist

About

Data Scientist with 6+ years of production experience, currently pursuing a Master's at Carnegie Mellon University (Heinz College).

My background spans the full ML lifecycle: from ETL pipeline engineering and feature design to shipping churn models and NLP systems used by global teams. I prioritize data quality above all else, because clean, reliable data is the foundation everything else is built on.

At CMU, I'm expanding into machine learning, A/B testing, time series forecasting, and unstructured data analytics, bringing coursework directly into applied projects.

Fluent in English, native Korean. Based in Pittsburgh, PA.

Core Stack

PythonPandasPyTorchScikit-learnLangChainTensorFlowSQL RJava

Infrastructure

Azure Data FactorySnowflakeDatabricks AWS Google Colab MSSQL Power BITableau

Methods

LLM Applications Time Series Forecasting A/B Testing NLP Machine Modeling Deep Learning Prediction Modeling Statistics

Experience

Oct 2021 — Jun 2025

Kimberly-Clark

Seoul, South Korea

Data Scientist — AI Labs

Refactored RoBERTa-based NLP pipeline to fine-tuned GPT system, improving accuracy from 60% to 92% and cutting vendor costs by 90%, adopted by APAC and US teams
Engineered scalable ETL pipelines using Azure Data Factory and Snowflake, reducing data processing latency from 24 hours to real-time
Built LightGBM churn model for MomQ e-commerce platform, delivery zone feature engineering drove 20% retention improvement among high-risk users
Developed size-transition recommendation model, A/B tested across 10,000 users, achieving 2x conversion rate in targeted cohort vs. random group
Collaborated with marketing, sales, and product stakeholders to align data science solutions with business objectives across global teams

Mar 2020 — Oct 2021

Codestates

Seoul, South Korea

AI Bootcamp — Code States

Completed an intensive AI/ML curriculum covering machine learning, deep learning, statistics, and data engineering
Built hands-on projects across EDA, feature engineering, hypothesis testing, clustering, and linear algebra
Transitioned from data analysis and consulting into full data science, leading directly to a Data Scientist role at Kimberly-Clark

Nov 2018 — Sep 2019

Unico Search

Seoul, South Korea

Consultant — ICT

Analyzed candidate response rates and recruitment metrics to optimize sourcing strategies and improve hiring efficiency
Consulted with tech industry clients to understand technical requirements and match them with qualified ICT professionals
Monitored industry trends and leveraged market data to provide strategic talent acquisition insights

Oct 2017 — Current

Saerona Solar Systems

Seoul, South Korea

Entrepreneur

Managed end-to-end solar energy generation projects including site research, hardware procurement, and infrastructure installation
Structured and negotiated a 20-year Power Purchase Agreement with Korea Electric Power Corporation (KEPCo) to commercialize energy output

Aug 2016 — Jul 2017

Cavtil, Inc.

New York, NY

Data Analyst

Conducted market and competitive research to evaluate viability of K-12 educational content services, analyzing user demographics and purchasing data
Designed interactive Tableau dashboards to translate complex data into actionable business strategies
Delivered data-driven insights and strategic recommendations directly to CEO and COO to guide product development

Dec 2015 — Jan 2016

Jeju Air

Seoul, South Korea

Data Analyst Intern

Analyzed qualitative user feedback from customer service channels, translating pain points into actionable service quality improvements
Evaluated digital platform UI/UX to identify user friction points and recommend data-driven usability enhancements

Projects

Professional

Kimberly-Clark · 2021–2024

NLP-powered Risk Management Dashboard

60% to 92% Accuracy · 90% Cost reduction

Replaced a 500M KRW/year third-party vendor with an internal NLP pipeline for social sentiment and topic classification. Upgraded from RoBERTa to fine-tuned GPT APIs on proprietary labeled data, improving accuracy from 60% to 92% with better handling of Korean slang and multilingual context. Adopted by APAC and US teams as a regional standard.

RoBERTaGPT Fine-tuningNLPRisk ManagementMultilingualPower BI

Kimberly-Clark · 2022–2023

MomQ Churn Prediction

20% Retention improvement among high-risk users

Users were leaving Kimberly-Clark's e-commerce app for Coupang's Rocket Delivery. Identified delivery zone as the key churn signal through feature engineering on shipping addresses. Proposed and validated a 2-day to 1-day shipping pilot for high-risk regions, confirmed via A/B test over 2 months.

LightGBMFeature EngineeringA/B TestingEDA

Kimberly-Clark · 2023

Size 4 Diaper Recommendation Model

2x Conversion vs. random group (10K user A/B test)

Sales dropped sharply at the diaper size 3 to 4 transition. Built a recommendation model using child age, previous size, and order recency to predict which parents were approaching size 4. Sent targeted coupons to 5,000 predicted users and 5,000 random users. Targeted cohort showed 2x the conversion rate.

Recommendation ModelScikit-learnA/B TestingCRM

Kimberly-Clark · 2022–2024

Scalable ETL Pipeline for ML Consumption

24h to Real-time Data processing latency

Engineered end-to-end ETL pipelines using Azure Data Factory and Snowflake, integrating raw e-commerce logs and demographic data into a unified analytical dataset. Automated daily workflows via ADF triggers and transformed data using Databricks (PySpark/Pandas), enabling real-time insights via Power BI without manual intervention.

Azure Data FactorySnowflakeDatabricksPySparkPower BI

Academic

Feb 2026 · CMU

Cedar: Climate-Informed Micro-finance Platform

Lending platform integrating real-time weather data and LLM-based interview transcription to optimize credit risk in climate-vulnerable regions. XGBoost risk scoring engine correlates extreme weather patterns with loan defaults.

XGBoostLLMPythonRisk Modeling

Feb 2026 · ProdHacks

Orbit: BLE Proximity Networking App

Proximity-based networking app using Bluetooth Low Energy GAP advertising packets for real-time, privacy-first connections. Mutual digital handshake protocol for selective data sharing.

BLEMobilePrivacy-First UX 🏆 ProdHacks 2026 Finalist

Jan 2026 · CMU x Databricks x UN

UN Humanitarian Data Validation Framework

Validation framework on Databricks identifying cost-efficiency outliers by correlating humanitarian project budgets with population data. Interactive dashboards for beneficiary targeting performance auditing.

DatabricksPythonData Visualization

Oct 2025 · CMU

Duolingo User Segmentation and Monetization

K-Means clustering and Logistic Regression to identify high-potential subscription segments. Designed targeted A/B testing framework to optimize the Super Duolingo upsell funnel.

Scikit-learnK-MeansA/B TestingLogistic Regression

Coursework

Machine Learning Unstructured Data Analytics A/B Testing and Design Time Series Forecasting Data Science and Big Data Applied Econometrics Database Management Data-Focused Python Stats for IT Managers Advanced Business Analytics Java Programming

Daniel "Heedong" Jang

About

Education

Experience

Projects

Resume

Full CV available for download.

Let's build
something good.

Daniel "Heedong" Jang

About

Education

Experience

Projects

Resume

Full CV available for download.

Let's buildsomething good.

Let's build
something good.