Minkey Chang

Data Scientist / AI Engineer

← Back

Capstone Project

NICE Credit Bureau

Oct 2024Jan 2025

The project used large-scale consumer credit bureau data to produce synthetic datasets that could be shared and analyzed without exposing individuals. The goal was to preserve analytical utility while making re-identification infeasible. I worked on a sequential synthesis approach (Synthpop-style: one variable at a time, conditioning on the rest) and contributed the implementation as a module in the open-source library Synthcity.

  • Worked with large-scale consumer credit bureau data to produce synthetic data that is safe and unidentifiable while preserving utility for analysis.
  • Implemented sequential synthesis (Synthpop-style) in Python and contributed it as a module in Synthcity.

I was able to try out synthesizing private credit bureau data with anonymization in mind. It was my first experience contributing to an open-source project and helped me get used to the contribution pipeline and general Python code structure and collaboration.

Related: Creating and Evaluating Synthetic Tabular Data · Synthcity (GitHub)