End-to-end computer vision assignment — from PCA from scratch to deep multi-label classification and semantic segmentation on PASCAL VOC, validated with Wilcoxon signed-rank and Bootstrap CI.
Results at a Glance
Project Structure
Part 1
NumPy-only implementation — no sklearn.decomposition.PCA. Covariance matrix → eigendecomposition → top-k projection → reconstruction.
Part 2A
20 PASCAL VOC Classes
Part 2B
Two models trained on the same dataset, evaluated on the same 15% hold-out split (seed 42) for the statistical comparison in Part 3.
Part 3
Why Wilcoxon, not t-test?
IoU scores are bounded to [0,1] and right-skewed. Shapiro–Wilk confirms the paired differences are non-normal (p = 2.895e-13), violating the t-test normality assumption. The Wilcoxon signed-rank test is non-parametric — it only assumes meaningful rank ordering — making it robust to skewed distributions and outliers in IoU data.
| Metric | Value | Interpretation |
|---|---|---|
| Paired samples (n) | 330 | 15% hold-out split, seed 42 |
| Model A mean mIoU | 0.7872 | ResNet-50 + ASPP (primary) |
| Model B mean mIoU | 0.6848 | ResNet-18 U-Net (baseline) |
| Difference (A − B) | +0.1024 | Model A is better by ~10pp |
| Shapiro–Wilk p | 2.895e-13 | Non-normal → Wilcoxon justified |
| Wilcoxon W | 6106.0 | Signed-rank statistic |
| Wilcoxon p-value | 6.1936e-34 | Reject H₀ — models differ significantly |
| Bootstrap CI (95%) | [0.7642, 0.8068] | B = 1000, percentile method, Model A |
| Bootstrap SE | 0.0108 | Tight — mIoU estimate is stable |
Conclusion
p = 6.19e-34 ≪ 0.05 → we reject H₀. Model A (ResNet-50 + ASPP) is conclusively superior to Model B (ResNet-18 U-Net). The bootstrap 95% CI [0.7642, 0.8068] is tight, confirming the mIoU estimate is stable on the hold-out split.
Run Locally
Using the evaluator API
Tech Stack