Author Identifier
Sangay Tenzin: http://orcid.org/0000-0001-5257-0302
Date of Award
2026
Document Type
Thesis
Publisher
Edith Cowan University
Degree Name
Master of Engineering Science
School
School of Engineering
First Supervisor
Alexander Rassau
Second Supervisor
Douglas Chai
Abstract
The need for effective Simultaneous Localisation and Mapping (SLAM) solutions has been pivotal across a wide range of applications, including autonomous vehicles, industrial robotics, and mobile service platforms where accurate localisation and environmental perception are essential. Visual SLAM (VSLAM) has emerged as a popular approach due to its cost effectiveness and ease of deployment. However, state-of-the-art VSLAM systems using conventional cameras face significant limitations, including high computational requirements, sensitivity to motion blur, restricted dynamic range, and poor performance under variable lighting conditions.
Event cameras present a promising alternative by producing asynchronous, high-temporal resolution data with low latency and power consumption. These characteristics make them ideal for use in dynamic and resource-constrained environments. Complementing this, neuromorphic processors designed for efficient event-driven computation are inherently compatible with sparse temporal data. Despite their synergy, the adoption of event cameras and neuromorphic computing in SLAM remains limited due to the scarcity of public datasets, underdeveloped algorithmic tools, and challenges in multimodal sensor fusion.
This thesis develops integrated Visual Odometry (VO) and Loop Closure (LC) models that leverage neuromorphic sensing, spiking neural networks (SNN), and probabilistic factor-graph optimisation as a pathway towards full event camera-based SLAM. A synchronised multimodal dataset, captured with a Prophesee STM32-GENx320 event camera, Livox MID-360 LiDAR, and Pixhawk 6C Mini IMU spans indoor and outdoor scenarios over 2,285 s. Raw events are aggregated into voxel-grid tensors using 100 ms windows with 20 temporal bins. LiDAR odometry from point clouds is refined with inertial constraints in Georgia Tech Smoothing and Mapping (GTSAM) to produce pseudo-ground-truth trajectories for supervised learning.
Two SNN models are developed using the SpikingJelly framework: a spiking VO network that predicts six-degree-of-freedom (6-DOF) pose increments from voxel grids, and a LC network that estimates inter-frame similarity scores for global trajectory correction. Both models are trained with surrogate gradient learning and employ Leaky Integrate-and-Fire neurons. The VO model uses a hybrid loss function that combines Root Mean Square Error for translation with a geodesic loss on the Special Orthogonal Group in 3D for rotation prediction. The LC model is optimised using a joint loss comprising a triplet margin loss for learning discriminative embeddings and a cross-entropy loss for binary classification.
These frontend models are integrated into a modular backend system based on a sliding window factor graph. The backend fuses VO predictions with IMU pre-integration and LC constraints and performs real-time optimisation using GTSAM. Empirical evaluation on kilometre-scale sequences demonstrates robust performance in diverse indoor and outdoor environments, achieving sub-metre Absolute Trajectory Error and competitive Relative Pose Error. Additionally, hardware benchmarking across conventional and neuromorphic processor such as BrainChip Akida platforms reveals up to a four times reduction in latency and an order of-magnitude gain in energy efficiency on neuromorphic hardware.
The main contributions of this work include a pipeline towards full VSLAM architecture combining SNNs, event-based vision, and multimodal sensor fusion for 6-DOF pose estimation and LC; a novel training pipeline using voxelised asynchronous event data and GTSAM refined pseudo-ground-truth; a modular backend architecture that performs drift-resilient optimisation using VO, IMU, and LC constraints; a cross-platform benchmarking study that highlights the advantages of neuromorphic hardware; and a synchronised multimodal dataset supporting the above components.
Overall, this thesis provides a pipeline towards a scalable and energy-efficient SLAM solution that bridges neuromorphic sensing, spiking computation, and probabilistic inference contributing substantially to the advancement of real-time robotic perception and autonomy and laying a strong foundation for next-generation lightweight, intelligent robotic systems. However, the system's performance is sensitive to sensor calibration and timestamp alignment, and the dataset’s specificity may limit generalisation across broader deployment scenarios without further adaptation.
Access Note
Access to this thesis is embargoed until 10th February 2027
DOI
10.25958/tfkv-5t55
Recommended Citation
Tenzin, S. (2026). Towards neuromorphic visual SLAM: A spiking neural network for efficient pose estimation and loop closure based on event camera data. Edith Cowan University. https://doi.org/10.25958/tfkv-5t55