โš ๏ธ IMBALANCED BINARY CLASSIFICATION ยท 0.17% FRAUD

Credit Card Fraud
Detection

284,807 real transactions. XGBoost + PyTorch Autoencoder ensemble with full SHAP explainability. 93% Precision ยท 82% Recall on a highly imbalanced dataset.

284K
Transactions
0.17%
Fraud Rate
0.9857
ROC-AUC
87%
F1 Score
Pipeline

End-to-End Pipeline

From raw imbalanced data to an explainable ensemble model with threshold tuning.

๐Ÿ“Š EDA284K transactions
โ†’
๐Ÿ”ง Feature Eng.Hour from Time
โ†’
โš–๏ธ SMOTETrain data only
โ†’
๐Ÿ‹๏ธ Training4 models + AE
โ†’
๐ŸŽฏ Threshold0.50 โ†’ 0.9837
โ†’
๐Ÿง  EnsembleXGB + AutoEnc
โ†’
๐Ÿ” SHAPExplainability

XGBoost Classifier

n_estimators400
max_depth5
learning_rate0.2
subsample0.8
Optimal threshold0.9837
Tuning methodRandom Search (40 trials)
ROC-AUC0.9857
F1 (tuned)0.8696

PyTorch Autoencoder

Architecture29โ†’64โ†’32โ†’16โ†’32โ†’64โ†’29
Parameters9,453
Trained onLegit transactions only
Legit recon. error0.082
Fraud recon. error14.769 (180ร— higher)
PR-AUC0.6330
Ensemble weight0.6 XGB + 0.4 AE
GPU supportCUDA / MPS / CPU
Benchmarks

Model Comparison

All models evaluated on the same held-out test set (56,962 transactions). Metric: PR-AUC โ€” not accuracy.

ModelROC-AUCPR-AUCF1
Logistic Regression
0.9705
0.7173
0.1036
Random Forest
0.9692
0.8543
0.8333
LightGBM
0.9417
0.7253
0.8137
PyTorch Autoencoder โ€”
0.6330
0.7136
XGBoost (base)
0.9835
0.8658
0.8195
XGBoost (tuned)
0.9857
0.8614
0.8696 โœ“ Primary
Ensemble (XGB + AE) โ€”
0.8320
0.8743 โœ“ Best F1

โš ๏ธ Why accuracy is misleading here

Predicting "Legit" for every transaction gives 99.83% accuracy โ€” but catches zero frauds. PR-AUC and F1 are the right metrics. The tuned model catches 80 out of 98 fraud cases in the test set at 93% precision.

Key Learnings

What We Learned

Critical design decisions that determined whether the model actually catches fraud.

๐Ÿ“
Accuracy โ‰  Performance
99.83% accuracy by predicting all legit. PR-AUC is the only meaningful metric for 0.17% fraud rate data.
๐Ÿšฐ
No Data Leakage
SMOTE and StandardScaler fit on training data only. Applying them on the full dataset would leak test statistics and inflate metrics.
๐ŸŽฏ
Threshold Tuning
Moving from default 0.50 to optimal 0.9837 threshold improved F1 from 0.79 to 0.87 โ€” a 10% jump with no retraining needed.
๐Ÿ”ฌ
Autoencoder Gap
Fraud transactions have 180ร— higher reconstruction error than legit ones (14.77 vs 0.08) โ€” a powerful anomaly signal.
๐Ÿงฉ
SHAP Explainability
V14, V17, and V12 are the strongest fraud indicators. SHAP makes the model interpretable for real-world deployment and auditing.
โšก
GPU Auto-detection
Autoencoder training auto-detects CUDA โ†’ Apple MPS โ†’ CPU. Same notebook runs on Colab, MacBook, or any local machine.
Dataset & Stack

Built With

Dataset

SourceKaggle ยท ULB ML Group
Transactions284,807
Fraud cases492 (0.172%)
Features30 (Time, Amount, V1โ€“V28)
V1โ€“V28PCA-anonymised
Split80/20 stratified
SMOTE after splitTrain only (no leakage)

Tech Stack

BoostingXGBoost, LightGBM
Classical MLscikit-learn, Random Forest
Deep learningPyTorch (Autoencoder)
Imbalanceimbalanced-learn (SMOTE)
ExplainabilitySHAP (TreeExplainer)
VisualizationMatplotlib, Seaborn
NotebookJupyter (ccfd.ipynb)
XGBoostLightGBM PyTorchscikit-learn SMOTESHAP Random ForestLogistic Regression PandasNumPyMatplotlibSeaborn

Explore the Full Notebook

Complete EDA, SMOTE pipeline, model training, SHAP plots, and autoencoder โ€” all in one Jupyter notebook.

View on GitHub โ† Back to Portfolio