⚠️ IMBALANCED BINARY CLASSIFICATION · 0.17% FRAUD

Credit Card Fraud
Detection

284,807 real transactions. XGBoost + PyTorch Autoencoder ensemble with full SHAP explainability. 93% Precision · 82% Recall on a highly imbalanced dataset.

View on GitHub ← Back to Portfolio

Pipeline

End-to-End Pipeline

From raw imbalanced data to an explainable ensemble model with threshold tuning.

📊 EDA284K transactions

→

🔧 Feature Eng.Hour from Time

→

⚖️ SMOTETrain data only

→

🏋️ Training4 models + AE

→

🎯 Threshold0.50 → 0.9837

→

🧠 EnsembleXGB + AutoEnc

→

🔍 SHAPExplainability

XGBoost Classifier

n_estimators400

max_depth5

learning_rate0.2

subsample0.8

Optimal threshold0.9837

Tuning methodRandom Search (40 trials)

ROC-AUC0.9857

F1 (tuned)0.8696

PyTorch Autoencoder

Architecture29→64→32→16→32→64→29

Parameters9,453

Trained onLegit transactions only

Legit recon. error0.082

Fraud recon. error14.769 (180× higher)

PR-AUC0.6330

Ensemble weight0.6 XGB + 0.4 AE

GPU supportCUDA / MPS / CPU

Benchmarks

Model Comparison

All models evaluated on the same held-out test set (56,962 transactions). Metric: PR-AUC — not accuracy.

Model	ROC-AUC	PR-AUC	F1
Logistic Regression	0.9705	0.7173	0.1036
Random Forest	0.9692	0.8543	0.8333
LightGBM	0.9417	0.7253	0.8137
PyTorch Autoencoder	—	0.6330	0.7136
XGBoost (base)	0.9835	0.8658	0.8195
XGBoost (tuned)	0.9857	0.8614	0.8696	✓ Primary
Ensemble (XGB + AE)	—	0.8320	0.8743	✓ Best F1

⚠️ Why accuracy is misleading here

Predicting "Legit" for every transaction gives 99.83% accuracy — but catches zero frauds. PR-AUC and F1 are the right metrics. The tuned model catches 80 out of 98 fraud cases in the test set at 93% precision.

Key Learnings

What We Learned

Critical design decisions that determined whether the model actually catches fraud.

📏

Accuracy ≠ Performance

99.83% accuracy by predicting all legit. PR-AUC is the only meaningful metric for 0.17% fraud rate data.

🚰

No Data Leakage

SMOTE and StandardScaler fit on training data only. Applying them on the full dataset would leak test statistics and inflate metrics.

🎯

Threshold Tuning

Moving from default 0.50 to optimal 0.9837 threshold improved F1 from 0.79 to 0.87 — a 10% jump with no retraining needed.

🔬

Autoencoder Gap

Fraud transactions have 180× higher reconstruction error than legit ones (14.77 vs 0.08) — a powerful anomaly signal.

🧩

SHAP Explainability

V14, V17, and V12 are the strongest fraud indicators. SHAP makes the model interpretable for real-world deployment and auditing.

⚡

GPU Auto-detection

Autoencoder training auto-detects CUDA → Apple MPS → CPU. Same notebook runs on Colab, MacBook, or any local machine.

Dataset & Stack

Built With

Dataset

SourceKaggle · ULB ML Group

Transactions284,807

Fraud cases492 (0.172%)

Features30 (Time, Amount, V1–V28)

V1–V28PCA-anonymised

Split80/20 stratified

SMOTE after splitTrain only (no leakage)

Tech Stack

BoostingXGBoost, LightGBM

Classical MLscikit-learn, Random Forest

Deep learningPyTorch (Autoencoder)

Imbalanceimbalanced-learn (SMOTE)

ExplainabilitySHAP (TreeExplainer)

VisualizationMatplotlib, Seaborn

NotebookJupyter (ccfd.ipynb)

XGBoostLightGBM PyTorchscikit-learn SMOTESHAP Random ForestLogistic Regression PandasNumPyMatplotlibSeaborn

Explore the Full Notebook

Complete EDA, SMOTE pipeline, model training, SHAP plots, and autoencoder — all in one Jupyter notebook.