Nubank fine-tunes customer transaction foundation models with joint fusion for improved benchmark AUC
Nubank needed to tailor pre-trained transaction-based foundation models to specific downstream tasks, and to incorporate tabular features such as bureau information alongside sequential transaction embeddings in a jointly-optimised way rather than training them separately.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Pre-train on transaction data
Self-supervised learning produces general unsupervised embeddings representing a customer's behavior from transaction data.
Tools used
DCNv2XGBoostLightGBM
Outcome
Supervised fine-tuning achieved a 1.68% relative improvement in AUC across benchmark tasks, and joint fusion demonstrated further advantage over late fusion, with the DCNv2-based model consistently and reliably beating GBT baselines after incorporating numerical embeddings and regularisation.
What failed first
Late fusion trained embeddings separately from tabular features, yielding suboptimal performance. Initial DNN-based DCNv2 models showed -0.40% performance versus GBT baselines, and GBTs are not differentiable and therefore incompatible with joint fusion.