Research outputs 2022 to 2026

MHAiR: A dataset of audio-image representations for multimodal human actions

Abstract

Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can be used as a benchmark dataset for evaluating the performance of machine learning models for human action recognition and related tasks. These audio-image representations could be suitable for a wide range of applications, such as surveillance, healthcare monitoring, and robotics. The dataset can also be used for transfer learning, where pre-trained models can be fine-tuned on a specific task using specific audio images. Thus, this dataset can facilitate the development of new techniques and approaches for improving the accuracy of human action-related tasks and also serve as a standard benchmark for testing the performance of different machine learning models and algorithms.

Keywords

human action recognition, image representations, multimodal dataset, computer vision

Document Type

Journal Article

Date of Publication

2024

Publication Title

Data

Publisher

MDPI

School

School of Engineering / School of Science

RAS ID

62441

Funding Information

Edith Cowan University / Australia and Higher Education Commission (HEC), Pakistan / Australian Government

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Related Publications

Shaikh, M. (2023). Multimodal human action recognition using deep learning. Edith Cowan University. https://doi.org/10.25958/v1j4-6h36

Comments

Shaikh, M. B., Chai, D., Islam, S. M. S., & Akhtar, N. (2024). MHAiR: A dataset of audio-image representations for multimodal human actions. Data, 9(2), article 21. https://doi.org/10.3390/data9020021

Download

Included in

Data Science Commons, Engineering Commons

COinS

Link to publisher version (DOI)

10.3390/data9020021

Research outputs 2022 to 2026

MHAiR: A dataset of audio-image representations for multimodal human actions

Abstract

Keywords

Document Type

Date of Publication

Publication Title

Publisher

School

RAS ID

Funding Information

Creative Commons License

Related Publications

Comments

Included in

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

MHAiR: A dataset of audio-image representations for multimodal human actions

Authors/Creators

Abstract

Keywords

Document Type

Date of Publication

Publication Title

Publisher

School

RAS ID

Funding Information

Creative Commons License

Related Publications

Comments

Included in

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations