🎸

Multi-Modal Instrument Recommender

A hybrid neural recommender system trained on the Amazon Musical Instruments dataset, fusing collaborative-filtering ID embeddings with multi-modal item attributes — product text (TF-IDF LSA), price, category, and brand.

PyTorch 2.x CUDA GPU BPR Loss TF-IDF + LSA Real Amazon Data 99.2% Sparse 2.8M Parameters Adam + CosineAnnealingLR

Key Results

Evaluated on held-out test set (20% split) · Amazon Musical Instruments · 99.2% sparsity · 30-epoch Adam + BPR (CosineAnnealingLR)

0.0367
Hit Rate @10
3.3× random baseline
0.0155
NDCG @10
↑ 2× vs prev run
0.0204
MRR @10
↑ 2× vs prev run
0.1147
BPR Final Loss
30-epoch convergence
1,429
Users
9,022 interactions
84,901
Item Embeddings
Full catalog coverage

Model Architecture

Symmetrical Hybrid Engine — two mirrored towers each fusing ID embeddings with attribute MLPs

USER TOWER user_id_emb Embedding(1429 × 32) user_attr_enc 4 → 64 → 32 (ReLU) cat([id, attr]) dim = 64 (2E) ITEM TOWER item_id_emb Embedding(84,901 × 32) item_attr_enc 73 → 64 → 32 (ReLU) price · cat · brand · TF-IDF(32) cat([id, attr]) dim = 64 (2E) Interaction MLP 128 → 64 → 32 → 1 (BatchNorm + Dropout) → sigmoid → score cat([u_fused, i_fused]) dim=128 (4E) BPR Loss −log σ(score⁺ − score⁻)

Training Analysis

SGD plateau diagnosis → Adam + BPR + CosineAnnealingLR drives loss from 0.717 → 0.115 over 30 epochs

📉 SGD MSE — Plateau After Epoch 5

📊 Adam + BPR — 0.717 → 0.115 in 30 Epochs

🎯 Evaluation Metrics @ K

⚡ SGD Gradient Plateau (% near-zero tensors)


Ablation Study

Three variants trained identically (Adam + BPR, 20 epochs) — isolates contribution of each signal

Variant ID Embed Attr Enc HR@10 NDCG@10 Params
ID-Only 0.0100 0.0063 2,781,697
Attr-Only 0.0000 0.0000 28,353
Full Hybrid ★ 0.0367 0.0155 2,799,105

🔬 Ablation NDCG@10 Comparison

Key finding: Full Hybrid (30 epochs, CosineAnnealingLR) achieves HR@10=0.0367 and NDCG@10=0.0155 — significantly outperforming all ablation variants. Attr-Only scores zero, confirming collaborative filtering (ID embeddings) is essential. ID-Only runs for only 20 epochs in the ablation vs 30 for the full model; the full model's longer training and combined signals explain the gap. The 2.8M item embeddings cover the full Amazon catalog (84,901 items), enabling cold-start recommendations even for items with no interactions.


Feature Engineering

73-dimensional item vectors combining structured metadata with text-derived semantic signals

📦 Item Features (73-dim)

Multi-modal product attributes extracted from metadata

price_norm cat_Musical Instruments cat_Accessories brand_Fender brand_Yamaha brand_Gibson ... 15 categories + 30 brands tfidf_lsa_0..31 (×32)

TF-IDF: 1,000-vocab bigrams on product titles → TruncatedSVD(32) — 27.6% variance explained

👤 User Features (4-dim)

Interaction-derived behavioral signals

avg_rating_norm log_num_reviews recency (last_ts) verified_purchase_ratio

log_reviews and last_ts standardized with StandardScaler. Positive interactions = rating ≥ 4.

🔤 TF-IDF LSA Pipeline

Tokenize
Bigrams, stop words removed
TF-IDF
1,000 features, log(1+tf)
SVD
32 components (LSA)
Normalize
z-score per dim

🏗️ Tech Stack

ML Framework

PyTorch 2.x CUDA GPU

Data

pandas numpy

Features

scikit-learn TF-IDF+SVD

Optimizer

Adam + BPR CosineAnnealingLR