SceneClarity: A Unified Framework for Scene Reliability Estimation and Classification in Autonomous Vehicle Perception

UTMIST Machine Learning Project ML Research Project August 2025 – April 2026

PyTorchTensorFlowscikit-learnPandasNumPyJupyter NotebookGoogle ColabYOLOv11MobileNetV3ResNet-50LaneNetDockerREST APIsNext.jsReactGitHubVisual Studio CodeJira

SceneClarity: A Unified Framework for Scene Reliability Estimation and Classification in Autonomous Vehicle Perception thumbnail 2

SceneClarity: A Unified Framework for Scene Reliability Estimation and Classification in Autonomous Vehicle Perception thumbnail 3

SceneClarity: A Unified Framework for Scene Reliability Estimation and Classification in Autonomous Vehicle Perception thumbnail 4

SceneClarity: A Unified Framework for Scene Reliability Estimation and Classification in Autonomous Vehicle Perception thumbnail 5

SceneClarity: A Unified Framework for Scene Reliability Estimation and Classification in Autonomous Vehicle Perception thumbnail 6

UTMIST SceneClarity Presentation arrow_outward SceneClarity Research Paper Abstract arrow_outward

Led the development of the SceneClarity ML project, a modular framework for estimating scene-level reliability in autonomous vehicle perception, addressing degradation under adverse conditions such as fog, rain, snow, and glare where failures often co-occur and are difficult to diagnose at the system level.
The architecture separates perception, environmental inference, and aggregation modules through a fixed interface, allowing components to be replaced without redesigning the aggregation logic.
Introduces a framework that aggregates perception outputs and environmental signals into a global reliability score with attribution to likely degradation factors, representing reliability as a decomposition over semantically interpretable scene-level components, unlike per-prediction uncertainty methods.
Implemented as a real-time system producing structured outputs and visualizations to support failure analysis, safety monitoring, and debugging.

Vision Transformer (ViT-B/16) Architecture Implementation arrow_outward

Independent Research December 2025 – January 2026 GitHub

GitHub arrow_outward

PythonPyTorchTorchvisionTorchinfoNumPyMatplotlibPILKagglehubJupyter NotebookGoogle ColabGitGitHub

Vision Transformer (ViT-B/16) Architecture Implementation thumbnail 1

Vision Transformer (ViT-B/16) Architecture Implementation thumbnail 2

GitHub Repository with Code and Documentation arrow_outward

Implemented the Vision Transformer (ViT-B/16) architecture from scratch in PyTorch, following the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." Manually built all core components, including convolutional patch embeddings, class and positional embeddings, Multi-Head Self-Attention (MSA) and MLP blocks with Layer Normalization (LN) and residual connections, as well as the final classification head.
Used the equations and architectural definitions from the original paper to reason about data flow and tensor transformations throughout the model, explicitly tracking tensor shapes step-by-step from input images to output classification in order to ensure correctness and deepen understanding of the model structure.
Validated the implementation end-to-end by training the model from scratch on a 5-class weather image classification dataset sourced from Kaggle. Documented training simplifications relative to the paper and compared the custom implementation with PyTorch's built-in ViT.

science Research

SceneClarity: A Unified Framework for Scene Reliability Estimation and Classification in Autonomous Vehicle Perception

Autonomous Vehicle Path Planning, Deep Learning & Ethics