Machine Learning · Signal Processing · Real-Time Mobile

Human Activity
Recognition

Classifying 6 daily activities from smartphone accelerometer and gyroscope signals. UCI HAR Dataset — 30 subjects, Samsung Galaxy S II, 50 Hz, 561 pre-extracted features. Best model: 95.52% accuracy. Live phone demo runs the ML model entirely in your browser.

🚀 Live Demo — Try on Your Phone GitHub Repo ↗ UCI Dataset

Subjects

561

Features

10K

Samples

95.5%

Accuracy

Dataset

Experiment Overview

30 volunteers aged 19–48 performed 6 activities wearing a Samsung Galaxy S II at the waist. Signals captured at 50 Hz, segmented into 2.56 s windows with 50% overlap.

🚶

WALKING

Train: 1,226 · Test: 496

⬆️

WALKING UPSTAIRS

Train: 1,073 · Test: 471

⬇️

WALKING DOWNSTAIRS

Train: 986 · Test: 420

🪑

SITTING

Train: 1,286 · Test: 491

🧍

STANDING

Train: 1,374 · Test: 532

🛌

LAYING

Train: 1,407 · Test: 537

Signal Channels

Body acceleration X / Y / Z
Body gyroscope X / Y / Z
Total acceleration X / Y / Z
128 timesteps per window · 50 Hz

Feature Engineering (561)

Time-domain: mean, std, MAD, energy, IQR, entropy, AR coefficients
Frequency-domain (FFT): meanFreq, skewness, kurtosis, bandEnergy
17 signal types × multiple statistics

Model Evaluation

Results — 5 Models Compared

All models evaluated on held-out test subjects (9 of 30 subjects never seen during training).

Logistic Regression

95.52%BEST

Linear baseline — strong thanks to hand-crafted 561 features

SVM (RBF, C=10)

95.49%

Near-identical to LR; RBF kernel captures non-linear boundaries

LightGBM

93.45%

200 estimators, LR=0.1 — fast gradient boosting

XGBoost

93.42%

200 trees, max depth=6

Random Forest

92.57%

200 trees — best for feature importance analysis; gravity features dominate

Per-Activity Performance (Best Model — Logistic Regression)

🚶

WALKING

96%

⬆️

WALK UPSTAIRS

95%

⬇️

WALK DOWNSTAIRS

94%

🪑

SITTING

93%

🧍

STANDING

94%

🛌

LAYING

100%

Key Findings

Analysis & Insights

What the data reveals about activity recognition using inertial sensors.

🎯 LAYING — Perfect Classification

Gravity acts along a completely different axis when horizontal. The gravity acceleration along Z becomes dominant — completely separating LAYING from all other activities. 100% precision and recall on the test set.

⚠️ SITTING vs STANDING — Hardest Pair

Both are static postures with similar gravity magnitude. The only difference is a subtle change in trunk angle. ~48 SITTING windows misclassified as STANDING in the confusion matrix — the hardest confusion pair.

📊 Top 10 Features — Random Forest Importance

tGravityAcc-mean()-X

4.18%

tGravityAcc-min()-X

3.19%

tGravityAcc-mean()-Y

2.98%

angle(X,gravityMean)

2.60%

tGravityAcc-energy()-X

2.41%

angle(Y,gravityMean)

2.32%

tGravityAcc-max()-X

2.25%

tGravityAcc-min()-Y

2.21%

tGravityAcc-max()-Y

1.91%

tGravityAcc-energy()-Y

1.62%

Gravity features dominate — they encode body orientation and effectively separate static from dynamic activities.

Why linear models win: The 561 features were hand-crafted by domain experts — means, standard deviations, FFT coefficients, etc. By the time data reaches the classifier, activities are already nearly linearly separable. Logistic Regression (95.52%) matches SVM (95.49%) because the feature engineering does the heavy lifting. Boosting methods achieve ~93% — still excellent, but the marginal gains from non-linearity are small.

Implementation

ML Pipeline

End-to-end flow from raw sensor signals to activity prediction.

📡 Raw Sensor Signals
50 Hz · 9 channels

→

🔧 Pre-processing
Butterworth filter · 2.56s windows

→

📐 Feature Extraction
561 features · time + freq domain

→

⚖️ StandardScaler
zero mean · unit variance

→

🤖 Classifier
LR / SVM / RF / XGB / LGB

→

🎯 Activity Label
95.52% accuracy

📓 Notebook Contents

① Data loading & class distribution
② PCA 2D projection (coloured by activity)
③ t-SNE embedding (2,000 samples)
④ Raw body acceleration signal plots
⑤ Top feature mean comparison
⑥ 5 classifiers trained & evaluated
⑦ Confusion matrices (SVM + LR)
⑧ Feature importance (Random Forest)
⑨ Static vs dynamic activity scatter
⑩ Model save (joblib)

⚙️ Tech Stack

Python 3.10Scikit-learnXGBoost LightGBMNumPyPandas MatplotlibSeabornStreamlit t-SNEPCAStandardScaler joblib

📁 Repository

          har_notebook.ipynb

          app.py  (Streamlit)

          UCI HAR Dataset/

          requirements.txt

Reference

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. ESANN 2013, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.

↗ UCI ML Repository — Human Activity Recognition

📱

Try It Live on Your Phone

Open the Streamlit app on your phone. The ML model runs entirely in your browser — no data leaves your device. Tap Start, move naturally, see your activity predicted in real time.

🚀 Open Live Demo View Source Code

✅ iPhone Safari

✅ Android Chrome

⚡ No install needed

🔒 No data sent to server