Capstone Project
NICE Credit Bureau
Oct 2024 – Jan 2025
The project used large-scale consumer credit bureau data to produce synthetic datasets that could be shared and analyzed without exposing individuals. The goal was to preserve analytical utility while making re-identification infeasible. I worked on a sequential synthesis approach (Synthpop-style: one variable at a time, conditioning on the rest) and contributed the implementation as a module in the open-source library Synthcity.
- Worked with large-scale consumer credit bureau data to produce synthetic data that is safe and unidentifiable while preserving utility for analysis.
- Implemented sequential synthesis (Synthpop-style) in Python and contributed it as a module in Synthcity.
I was able to try out synthesizing private credit bureau data with anonymization in mind. It was my first experience contributing to an open-source project and helped me get used to the contribution pipeline and general Python code structure and collaboration.
Related: Creating and Evaluating Synthetic Tabular Data · Synthcity (GitHub)