Research outputs 2022 to 2026

A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization

Shihab Hossain
Kaushik Deb
Saadman Sakib
Iqbal H. Sarker, Edith Cowan UniversityFollow

Author Identifier

Iqbal H. Sarker: https://orcid.org/0000-0003-1740-5517

Document Type

Journal Article

Publication Title

Multimedia Tools and Applications

Publisher

Springer

School

School of Science / Centre for Securing Digital Futures

Comments

Hossain, S., Deb, K., Sakib, S., & Sarker, I. H. (2024). A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization. Multimedia Tools and Applications, 84, 6219–6272. https://doi.org/10.1007/s11042-024-19022-0

Abstract

In assisted living facilities or nursing homes, residents’ movements or actions can be monitored using Human Activity Recognition (HAR), ensuring they receive proper care and attention. The significance of HAR is substantial in reviewing and updating emergency response plans to address unusual behavior patterns of individuals in the context of daily living activities. Recognizing activity from video data entails extracting spatial features and subsequently determining the temporal variations across these extracted spatial parameters. A specified number of frames is required to be sampled to analyze video data in recognizing the association of semantic information across the sequential frames. Even while sample frames engage in an essential function, they are often selected at random or skipped sequentially, resulting in temporal data loss. A proper video summary that retains the originality of the video while presenting the most important details might be a solution to the problem highlighted. Addressing the issue, we propose a cluster-based approach for selecting keyframes that facilitates generating video summarization by extracting the relevant frames. Additionally, we explore two different deep learning strategies for recognizing action to assess the effective one: (a) pose-based activity recognition model and (b) single hybrid pre-trained CNN-LSTM model. The experimental findings demonstrate the efficacy of the single hybrid CNN-LSTM technique. Our proposed model yields a mean accuracy of 95.56% for the RGB video data modality, surpassing the performance of several recent works of multimodal using the MSRDailyActivity3D dataset. In addition, the proposed model is evaluated using two challenging datasets: PRECIS HAR and UCF11. Our proposed single hybrid CNN-LSTM model achieves 95.12% precision, 95.11% recall, and 95.03% f1 score on the MSRDailyActivity3D dataset.

DOI

10.1007/s11042-024-19022-0

Access Rights

subscription content

Link to Full Text

COinS

Link to publisher version (DOI)

10.1007/s11042-024-19022-0

Research outputs 2022 to 2026

A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization

Author Identifier

Document Type

Publication Title

Publisher

School

Comments

Abstract

DOI

Access Rights

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization

Authors

Author Identifier

Document Type

Publication Title

Publisher

School

Comments

Abstract

DOI

Access Rights

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations