284,807 real transactions. XGBoost + PyTorch Autoencoder ensemble with full SHAP explainability. 93% Precision ยท 82% Recall on a highly imbalanced dataset.
From raw imbalanced data to an explainable ensemble model with threshold tuning.
All models evaluated on the same held-out test set (56,962 transactions). Metric: PR-AUC โ not accuracy.
| Model | ROC-AUC | PR-AUC | F1 | |
|---|---|---|---|---|
| Logistic Regression | 0.1036 | |||
| Random Forest | 0.8333 | |||
| LightGBM | 0.8137 | |||
| PyTorch Autoencoder | โ | 0.7136 | ||
| XGBoost (base) | 0.8195 | |||
| XGBoost (tuned) | 0.8696 | โ Primary | ||
| Ensemble (XGB + AE) | โ | 0.8743 | โ Best F1 |
โ ๏ธ Why accuracy is misleading here
Predicting "Legit" for every transaction gives 99.83% accuracy โ but catches zero frauds. PR-AUC and F1 are the right metrics. The tuned model catches 80 out of 98 fraud cases in the test set at 93% precision.
Critical design decisions that determined whether the model actually catches fraud.
Complete EDA, SMOTE pipeline, model training, SHAP plots, and autoencoder โ all in one Jupyter notebook.