🎸

Multi-Modal Instrument Recommender

A hybrid neural recommender system trained on the Amazon Musical Instruments dataset, fusing collaborative-filtering ID embeddings with multi-modal item attributes — product text (TF-IDF LSA), price, category, and brand.

PyTorch 2.x CUDA GPU BPR Loss TF-IDF + LSA Real Amazon Data 99.2% Sparse 2.8M Parameters Adam + CosineAnnealingLR

View on GitHub 📓 Open Notebook

Key Results

Evaluated on held-out test set (20% split) · Amazon Musical Instruments · 99.2% sparsity · 30-epoch Adam + BPR (CosineAnnealingLR)

0.0367

Hit Rate @10

3.3× random baseline

0.0155

NDCG @10

↑ 2× vs prev run

0.0204

MRR @10

↑ 2× vs prev run

0.1147

BPR Final Loss

30-epoch convergence

1,429

Users

9,022 interactions

84,901

Item Embeddings

Full catalog coverage

Training Analysis

SGD plateau diagnosis → Adam + BPR + CosineAnnealingLR drives loss from 0.717 → 0.115 over 30 epochs

📉 SGD MSE — Plateau After Epoch 5

📊 Adam + BPR — 0.717 → 0.115 in 30 Epochs

🎯 Evaluation Metrics @ K

⚡ SGD Gradient Plateau (% near-zero tensors)

Ablation Study

Three variants trained identically (Adam + BPR, 20 epochs) — isolates contribution of each signal

Variant	ID Embed	Attr Enc	HR@10	NDCG@10	Params
ID-Only	✅	❌	0.0100	0.0063	2,781,697
Attr-Only	❌	✅	0.0000	0.0000	28,353
Full Hybrid ★	✅	✅	0.0367	0.0155	2,799,105

🔬 Ablation NDCG@10 Comparison

Key finding: Full Hybrid (30 epochs, CosineAnnealingLR) achieves HR@10=0.0367 and NDCG@10=0.0155 — significantly outperforming all ablation variants. Attr-Only scores zero, confirming collaborative filtering (ID embeddings) is essential. ID-Only runs for only 20 epochs in the ablation vs 30 for the full model; the full model's longer training and combined signals explain the gap. The 2.8M item embeddings cover the full Amazon catalog (84,901 items), enabling cold-start recommendations even for items with no interactions.

Feature Engineering

73-dimensional item vectors combining structured metadata with text-derived semantic signals

📦 Item Features (73-dim)

Multi-modal product attributes extracted from metadata

price_norm cat_Musical Instruments cat_Accessories brand_Fender brand_Yamaha brand_Gibson ... 15 categories + 30 brands tfidf_lsa_0..31 (×32)

TF-IDF: 1,000-vocab bigrams on product titles → TruncatedSVD(32) — 27.6% variance explained

👤 User Features (4-dim)

Interaction-derived behavioral signals

avg_rating_norm log_num_reviews recency (last_ts) verified_purchase_ratio

log_reviews and last_ts standardized with StandardScaler. Positive interactions = rating ≥ 4.

🔤 TF-IDF LSA Pipeline

①

Tokenize

Bigrams, stop words removed

②

TF-IDF

1,000 features, log(1+tf)

③

SVD

32 components (LSA)

④

Normalize

z-score per dim

🏗️ Tech Stack

ML Framework

PyTorch 2.x CUDA GPU

Data

pandas numpy

Features

scikit-learn TF-IDF+SVD

Optimizer

Adam + BPR CosineAnnealingLR