Author Identifiers
Mariia Khan: https://orcid.org/0000-0001-6662-4607
Jumana Abu-Khalaf: https://orcid.org/0000-0002-6651-2880
David Suter: https://orcid.org/0000-0001-6306-3023
Publication Date
2025
Document Type
Dataset
Publisher
Edith Cowan University
School or Research Centre
School of Science
Description
For embodied agents, such as robots, tracking objects in their surroundings through visual observation is essential — a task, referred to as Visual Object Tracking (VOT). For instance, during a rearrangement task, a robot may need to track objects, as part of the scene change understanding process, to accurately restore them to their original states. Classic Multiple Object Tracking (MOT) datasets typically focus on tracking moving, single-class object instances in a video from a fixed viewpoint, limiting their applicability to embodied AI tasks. In embodied AI tasks, objects belong to multiple classes, are often static, and are observed from continuously changing viewpoints, as the camera is mounted on a moving robotic agent. This ego-centric perspective in M3T dataset introduces unique challenges, such as frequent attention shifts; large camera motions, causing frequent object disappearances; and object manipulations, leading to occlusions, rapid changes in object scale, pose, or appearance. The proposed M3T dataset is specifically designed for the 2D scene understanding stage of embodied AI tasks. M3T expands the scope of traditional MOT datasets to accommodate the complexities of ego-centric visual exploration and static, multi-class object tracking in dynamic environments.
Additional Information
M3T dataset includes the largest amount of scenes (1,048 tracking sequences), generated using the Ai2Thor simulator. The M3T dataset offers the highest class diversity, featuring 42 indoor object types. Unlike other datasets, which primarily track vehicles or pedestrians in mostly static scenes, 41 of M3T's classes are interactable objects. These objects can change location or state, enabling tracking in dynamic, constantly evolving indoor environments.
DOI
10.25958/yq3n-fy41
Methodology
Ai2Thor Embodied Ai simulator was used to create the dataset.
Start of data collection time period
2021
End of data collection time period
2023
File Format(s)
png, txt
File Size
1.37 GB
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License.
Contact
mariia.khan@ecu.edu.au
Citation
Khan, M., Abu-Khalaf, J., Suter, D., Rosenhahn, B., Qiu, Y., & Cong, Y. (2025). M3T. Edith Cowan University. https://doi.org/10.25958/yq3n-fy41