Research outputs 2022 to 2026

M3T: Multi-class multi-instance multi-view object tracking for embodied aI tasks

Mariia Khan, Edith Cowan UniversityFollow
Jumana Abu-Khalaf, Edith Cowan UniversityFollow
David Suter, Edith Cowan UniversityFollow
Bodo Rosenhahn

Author Identifier

Mariia Khan: https://orcid.org/0000-0001-6662-4607

Jumana Abu-Khalaf: https://orcid.org/0000-0002-6651-2880

David Suter: https://orcid.org/0000-0001-6306-3023

Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

13836 LNCS

First Page

246

Last Page

261

Publisher

Springer

School

School of Science / School of Engineering

RAS ID

57991

Comments

Khan, M., Abu-Khalaf, J., Suter, D., & Rosenhahn, B. (2023, February). M3T: Multi-class multi-instance multi-view object tracking for embodied aI tasks. In Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022. Selected Papers (pp. 246-261). Cham: Springer Nature Switzerland.

https://doi.org/10.1007/978-3-031-25825-1_18

Abstract

In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.

DOI

10.1007/978-3-031-25825-1_18

Related Publications

Khan, M., Abu-Khalaf, J., Suter, D., Rosenhahn, B., Qiu, Y., & Cong, Y. (2025). M3T. Edith Cowan University. https://doi.org/10.25958/yq3n-fy41

Access Rights

subscription content

Link to Full Text

COinS

Link to publisher version (DOI)

10.1007/978-3-031-25825-1_18

Research outputs 2022 to 2026

M3T: Multi-class multi-instance multi-view object tracking for embodied aI tasks

Author Identifier

Document Type

Publication Title

Volume

First Page

Last Page

Publisher

School

RAS ID

Comments

Abstract

DOI

Related Publications

Access Rights

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

M3T: Multi-class multi-instance multi-view object tracking for embodied aI tasks

Authors

Author Identifier

Document Type

Publication Title

Volume

First Page

Last Page

Publisher

School

RAS ID

Comments

Abstract

DOI

Related Publications

Access Rights

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations