Course Project · MLDS · IISc Bengaluru

PCA · Classification
Segmentation · Statistical Testing

End-to-end computer vision assignment — from PCA from scratch to deep multi-label classification and semantic segmentation on PASCAL VOC, validated with Wilcoxon signed-rank and Bootstrap CI.

📐 PCA from Scratch 🏷️ 20-class Multi-label Clf 🖼️ Semantic Segmentation 📊 Wilcoxon + Bootstrap IISc · MLDS · 2026

GitHub Repo 📄 Report PDF ← Portfolio

Results at a Glance

Key Numbers

~100

PCA components for 90% variance

VOC classes · Multi-label

0.7872

Model A mIoU (ResNet-50 + ASPP)

6.2e-34

Wilcoxon p-value (A vs B)

Project Structure

Four Parts

Part 1

PCA from Scratch

NumPy-only PCA via covariance matrix + eigendecomposition. Variance explained curve, reconstruction at k = 25 / 50 / 100 / 200.

Part 2A

Multi-Label Classification

ResNet-50 + custom head. 20 VOC classes, BCELoss, sigmoid outputs. Discriminative learning rates + cosine schedule.

Part 2B

Semantic Segmentation

Two models: ResNet-50 + ASPP + U-Net decoder (Model A, mIoU 0.7872) and ResNet-18 U-Net (Model B, mIoU 0.6848).

Part 3

Statistical Testing

Wilcoxon signed-rank test (p = 6.19e-34) + Bootstrap 95% CI [0.7642, 0.8068]. Model A conclusively superior.

Part 1

PCA from Scratch

NumPy-only implementation — no sklearn.decomposition.PCA. Covariance matrix → eigendecomposition → top-k projection → reconstruction.

📊

90% Variance at k ≈ 100

The scree plot shows a clear elbow near k = 100, confirming that natural images are highly compressible. Strong diminishing returns beyond that.

🖼️

Reconstruction Quality

k = 25 retains coarse shape/colour. k = 100–200 is visually near-identical to original. High-frequency textures and edges are lost first — expected from linear PCA.

📉

MSE vs Components

Reconstruction MSE drops rapidly from k = 1 to k = 100 then plateaus. The elbow matches the 90% explained variance threshold.

⚠️

PCA Limitation

PCA is linear — it cannot capture non-linear structure in image data. Textures, fine edges, and complex objects require far more components than smooth regions.

Part 2A

Multi-Label Classification

Architecture — ResNet-50 + Custom Head

ClfModel

ResNet-50 (ImageNet pretrained, IMAGENET1K_V2) → GlobalAvgPool → [2048] → Dropout(0.5) → Linear(2048 → 512) → BatchNorm1d → ReLU → Dropout(0.3) → Linear(512 → 20) → Sigmoid() ← per-class probabilities in [0,1]

BCELoss backbone lr: 5e-5 head lr: 1e-3 Cosine Schedule Discriminative LR RandomHorizontalFlip ColorJitter 224×224 input

20 PASCAL VOC Classes

aeroplanebicyclebirdboatbottle buscarcatchaircow diningtabledoghorsemotorbikeperson pottedplantsheepsofatraintvmonitor

Part 2B

Semantic Segmentation

Two models trained on the same dataset, evaluated on the same 15% hold-out split (seed 42) for the statistical comparison in Part 3.

Model A · Primary

ResNet-50 + ASPP + U-Net Decoder mIoU 0.7872

ResNet-50 encoder (ImageNet pretrained) → layer1[256] → layer2[512] → layer3[1024] → layer4[2048] → ASPP (rates = 1, 6, 12, 18 + GlobalAvgPool) → [256] → UpBlock×4 (bilinear upsample + skip connections) → Final Conv → [21 classes] + Aux head from layer3 (CE + Dice loss)

384×384 input CE + Dice loss Class-balanced sampler Multi-scale TTA (384, 480) Horizontal flip TTA 4 forward passes albumentations aug

Model B · Comparison Baseline

ResNet-18 + Simple U-Net mIoU 0.6848

ResNet-18 encoder (lighter backbone) → U-Net decoder (no ASPP, no auxiliary head) → Final Conv → [21 classes]

No ASPP No aux head Lighter model segmenter_B.pth

Part 3

Statistical Testing

Why Wilcoxon, not t-test?

IoU scores are bounded to [0,1] and right-skewed. Shapiro–Wilk confirms the paired differences are non-normal (p = 2.895e-13), violating the t-test normality assumption. The Wilcoxon signed-rank test is non-parametric — it only assumes meaningful rank ordering — making it robust to skewed distributions and outliers in IoU data.

Metric	Value	Interpretation
Paired samples (n)	330	15% hold-out split, seed 42
Model A mean mIoU	0.7872	ResNet-50 + ASPP (primary)
Model B mean mIoU	0.6848	ResNet-18 U-Net (baseline)
Difference (A − B)	+0.1024	Model A is better by ~10pp
Shapiro–Wilk p	2.895e-13	Non-normal → Wilcoxon justified
Wilcoxon W	6106.0	Signed-rank statistic
Wilcoxon p-value	6.1936e-34	Reject H₀ — models differ significantly
Bootstrap CI (95%)	[0.7642, 0.8068]	B = 1000, percentile method, Model A
Bootstrap SE	0.0108	Tight — mIoU estimate is stable

Conclusion

p = 6.19e-34 ≪ 0.05 → we reject H₀. Model A (ResNet-50 + ASPP) is conclusively superior to Model B (ResNet-18 U-Net). The bootstrap 95% CI [0.7642, 0.8068] is tight, confirming the mIoU estimate is stable on the hold-out split.

Run Locally

Getting Started

# 1. Clone and install
git clone https://github.com/rajneeshbabu/image-classification-and-segmentation.git
cd image-classification-and-segmentation
pip install -r requirements.txt

# 2. Training notebook (Parts 1, 2A, 2B) — run top to bottom
jupyter notebook training_notebook.ipynb

# 3. Statistical tests (Part 3) — requires weights from step 2
jupyter notebook statistical_tests.ipynb
    

Using the evaluator API

from classification.model import ClassificationModel
from segmentation.model   import SegmentationModel
import numpy as np

# Classification — returns {class_name: probability}
clf   = ClassificationModel(weights_dir="classification/weights")
probs = clf.predict(image)   # image: H×W×3 uint8 np.ndarray

# Segmentation — returns H×W uint8 label map (0–20)
seg   = SegmentationModel(weights_dir="segmentation/weights")
mask  = seg.predict(image)   # multi-scale + flip TTA applied automatically
    

PCA · Classification Segmentation · Statistical Testing

Key Numbers

Four Parts

PCA from Scratch

Multi-Label Classification

Semantic Segmentation

Statistical Testing

Getting Started

Libraries & Tools

PCA · Classification
Segmentation · Statistical Testing