Course Project · MLDS · IISc Bengaluru

PCA · Classification
Segmentation · Statistical Testing

End-to-end computer vision assignment — from PCA from scratch to deep multi-label classification and semantic segmentation on PASCAL VOC, validated with Wilcoxon signed-rank and Bootstrap CI.

📐 PCA from Scratch 🏷️ 20-class Multi-label Clf 🖼️ Semantic Segmentation 📊 Wilcoxon + Bootstrap IISc · MLDS · 2026

Results at a Glance

Key Numbers

~100
PCA components for 90% variance
20
VOC classes · Multi-label
0.7872
Model A mIoU (ResNet-50 + ASPP)
6.2e-34
Wilcoxon p-value (A vs B)

Project Structure

Four Parts

Part 1
PCA from Scratch
NumPy-only PCA via covariance matrix + eigendecomposition. Variance explained curve, reconstruction at k = 25 / 50 / 100 / 200.
Part 2A
Multi-Label Classification
ResNet-50 + custom head. 20 VOC classes, BCELoss, sigmoid outputs. Discriminative learning rates + cosine schedule.
Part 2B
Semantic Segmentation
Two models: ResNet-50 + ASPP + U-Net decoder (Model A, mIoU 0.7872) and ResNet-18 U-Net (Model B, mIoU 0.6848).
Part 3
Statistical Testing
Wilcoxon signed-rank test (p = 6.19e-34) + Bootstrap 95% CI [0.7642, 0.8068]. Model A conclusively superior.

Part 1

PCA from Scratch

NumPy-only implementation — no sklearn.decomposition.PCA. Covariance matrix → eigendecomposition → top-k projection → reconstruction.

📊
90% Variance at k ≈ 100
The scree plot shows a clear elbow near k = 100, confirming that natural images are highly compressible. Strong diminishing returns beyond that.
🖼️
Reconstruction Quality
k = 25 retains coarse shape/colour. k = 100–200 is visually near-identical to original. High-frequency textures and edges are lost first — expected from linear PCA.
📉
MSE vs Components
Reconstruction MSE drops rapidly from k = 1 to k = 100 then plateaus. The elbow matches the 90% explained variance threshold.
⚠️
PCA Limitation
PCA is linear — it cannot capture non-linear structure in image data. Textures, fine edges, and complex objects require far more components than smooth regions.

Part 2A

Multi-Label Classification

Architecture — ResNet-50 + Custom Head
ClfModel
ResNet-50 (ImageNet pretrained, IMAGENET1K_V2) → GlobalAvgPool → [2048] → Dropout(0.5) → Linear(2048 → 512) → BatchNorm1d → ReLU → Dropout(0.3) → Linear(512 → 20) → Sigmoid() ← per-class probabilities in [0,1]
BCELoss backbone lr: 5e-5 head lr: 1e-3 Cosine Schedule Discriminative LR RandomHorizontalFlip ColorJitter 224×224 input

20 PASCAL VOC Classes

aeroplanebicyclebirdboatbottle buscarcatchaircow diningtabledoghorsemotorbikeperson pottedplantsheepsofatraintvmonitor

Part 2B

Semantic Segmentation

Two models trained on the same dataset, evaluated on the same 15% hold-out split (seed 42) for the statistical comparison in Part 3.

Model A · Primary
ResNet-50 + ASPP + U-Net Decoder mIoU 0.7872
ResNet-50 encoder (ImageNet pretrained) → layer1[256] → layer2[512] → layer3[1024] → layer4[2048] → ASPP (rates = 1, 6, 12, 18 + GlobalAvgPool) → [256] → UpBlock×4 (bilinear upsample + skip connections) → Final Conv → [21 classes] + Aux head from layer3 (CE + Dice loss)
384×384 input CE + Dice loss Class-balanced sampler Multi-scale TTA (384, 480) Horizontal flip TTA 4 forward passes albumentations aug
Model B · Comparison Baseline
ResNet-18 + Simple U-Net mIoU 0.6848
ResNet-18 encoder (lighter backbone) → U-Net decoder (no ASPP, no auxiliary head) → Final Conv → [21 classes]
No ASPP No aux head Lighter model segmenter_B.pth

Part 3

Statistical Testing

Why Wilcoxon, not t-test?

IoU scores are bounded to [0,1] and right-skewed. Shapiro–Wilk confirms the paired differences are non-normal (p = 2.895e-13), violating the t-test normality assumption. The Wilcoxon signed-rank test is non-parametric — it only assumes meaningful rank ordering — making it robust to skewed distributions and outliers in IoU data.

MetricValueInterpretation
Paired samples (n)33015% hold-out split, seed 42
Model A mean mIoU0.7872ResNet-50 + ASPP (primary)
Model B mean mIoU0.6848ResNet-18 U-Net (baseline)
Difference (A − B)+0.1024Model A is better by ~10pp
Shapiro–Wilk p2.895e-13Non-normal → Wilcoxon justified
Wilcoxon W6106.0Signed-rank statistic
Wilcoxon p-value6.1936e-34Reject H₀ — models differ significantly
Bootstrap CI (95%)[0.7642, 0.8068]B = 1000, percentile method, Model A
Bootstrap SE0.0108Tight — mIoU estimate is stable

Conclusion

p = 6.19e-34 ≪ 0.05 → we reject H₀. Model A (ResNet-50 + ASPP) is conclusively superior to Model B (ResNet-18 U-Net). The bootstrap 95% CI [0.7642, 0.8068] is tight, confirming the mIoU estimate is stable on the hold-out split.

Run Locally

Getting Started

# 1. Clone and install git clone https://github.com/rajneeshbabu/image-classification-and-segmentation.git cd image-classification-and-segmentation pip install -r requirements.txt # 2. Training notebook (Parts 1, 2A, 2B) — run top to bottom jupyter notebook training_notebook.ipynb # 3. Statistical tests (Part 3) — requires weights from step 2 jupyter notebook statistical_tests.ipynb

Using the evaluator API

from classification.model import ClassificationModel from segmentation.model import SegmentationModel import numpy as np # Classification — returns {class_name: probability} clf = ClassificationModel(weights_dir="classification/weights") probs = clf.predict(image) # image: H×W×3 uint8 np.ndarray # Segmentation — returns H×W uint8 label map (0–20) seg = SegmentationModel(weights_dir="segmentation/weights") mask = seg.predict(image) # multi-scale + flip TTA applied automatically

Tech Stack

Libraries & Tools

Python 3.10+ PyTorch 2.0+ torchvision NumPy albumentations scikit-learn scipy pandas Pillow scikit-image matplotlib tqdm