Author Identifier

Mariia Khan: http://orcid.org/0000-0001-6662-4607

Date of Award

2025

Document Type

Thesis

Publisher

Edith Cowan University

Degree Name

Doctor of Philosophy (Joint-ECU home)

School

School of Science

First Supervisor

Jumana Abu-Khalaf

Second Supervisor

David Suter

Third Supervisor

Bodo Rosenhahn

Fourth Supervisor

Qiu Yue

Abstract

Embodied AI explores intelligent agents that learn through interaction with their environment, aiming to replicate human-like learning processes. Achieving this requires agents capable of understanding a scene via various sensors, reasoning about their actions, and reacting accordingly. These abilities are necessary for service domestic robots to assist humans in their day-to-day activities. Embodied AI tasks can include but are not limited to: visual exploration, visual navigation, instruction following and embodied question answering, which typically consider static (unchanging) environments, where objects do not move over time. This thesis addresses one of the most challenging Embodied AI tasks – visual room rearrangement, focusing on its’ Walkthrough (Scene Understanding) and Scene Change Detection stages.

The main purpose of the rearrangement task is for the agent to change the location or state of one or more objects in the environment, from an initial state to a desired goal state [1]. In this task, the environment undergoes continual changes due to the agent’s actions over time. The rearrangement task presents several challenges. These include understanding dynamic scenes through object recognition, localization, and tracking in constantly changing embodied environments. Additionally, it involves detecting and describing scene changes across different stages of the rearrangement task. To achieve this, the novel methods are proposed and evaluated using data, collected in the Ai2Thor simulator.

First, this work introduces four datasets - SAOM, M3T, EmbSCU, and PanoSCU - to support eight embodied research tasks: single-view and panoramic object detection, single-view and panoramic segmentation, single-view and panoramic change understanding, embodied object tracking, and change reversal.

For the Scene Understanding stage of the rearrangement task, this work proposes a real-to-simulation fine-tuning strategy for the Segment Anything Model (SAM). This includes the development of SAOMv1 and SAOMv2 for single-view object segmentation and the PanoSAM model for panoramic object segmentation. Furthermore, the M3T-Round method is proposed, enabling multi-class, multi-instance, and multi-view object tracking in embodied AI scenes.

In the Scene Change Detection stage of the rearrangement task, this thesis proposes methods for both single-view and panoramic Scene Change Understanding (SCU) tasks. The EmbSCU method is not only able to detect changes, but also provides change descriptions and generates language rearrangement instructions for the robotic agents to revert the changes. Panoramic SCU task extends the SCU capabilities to full-scene panoramas, capturing a broader range of changes in the scene. Through the experiments, the challenges and limitations of current methods are highlighted for the panoramic change captioning task.

This work advances embodied AI by enhancing robot’s perception, memory, and planning abilities, providing a foundation for intelligent agents to interact with dynamic embodied environments.

Related Datasets

Khan, M. (2025). SAOM. Edith Cowan University. https://doi.org/10.25958/b7p2-z351

https://ro.ecu.edu.au/datasets/147/

Related Publications

Khan, M., Abu-Khalaf, J., Suter, D., & Rosenhahn, B. (2023, February). M3T: Multi-class multi-instance multi-view object tracking for embodied AI tasks. In Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022. Selected Papers (pp. 246-261). Cham: Springer Nature Switzerland.

https://doi.org/10.1007/978-3-031-25825-1_18

https://ro.ecu.edu.au/ecuworks2022-2026/1974/

Khan, M., Qiu, Y., Cong, Y., Rosenhahn, B., Abu-Khalaf, J., & Suter, D. (2024, October). Segment Any Object Model (SAOM): Real-to-simulation fine-tuning strategy for multi-class multi-instance segmentation. In 2024 IEEE International Conference on Image Processing (ICIP) (pp. 582-588). IEEE. https://doi.org/10.1109/ICIP51287.2024.10647744

https://ro.ecu.edu.au/ecuworks2022-2026/5877/

Khan, M., Qiu, Y., Cong, Y., Rosenhahn, B., Suter, D., & Abu-Khalaf, J. (2024, October). Indoor Scene Change Understanding (SCU): Segment, describe, and revert any change. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 9777-9783). IEEE. https://doi.org/10.1109/IROS58592.2024.10801354

https://ro.ecu.edu.au/ecuworks2022-2026/5849/

Khan, M., Qiu, Y., Cong, Y., Abu-Khalaf, J., Suter, D., & Rosenhahn, B. (2025). PanoSCU: A simulation-based dataset for panoramic indoor scene understanding. IEEE Access, 13, 72456-72476. https://doi.org/10.1109/ACCESS.2025.3561055

https://ro.ecu.edu.au/ecuworks2022-2026/6234/

Sun, Y., Qiu, Y., Khan, M., Matsuzawa, F., & Iwata, K. (2024). The STVchrono dataset: Towards continuous change recognition in time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14111-14120). https://doi.org/10.1109/CVPR52733.2024.01338

https://ro.ecu.edu.au/ecuworks2022-2026/5887/

DOI

10.25958/7v39-0395

Recommended Citation

Khan, M. (2025). Embodied AI for challenging rearrangement tasks in the context of service and assistive robots. Edith Cowan University. https://doi.org/10.25958/7v39-0395

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Theses: Doctorates and Masters

Embodied AI for challenging rearrangement tasks in the context of service and assistive robots

Author Identifier

Date of Award

Document Type

Publisher

Degree Name

School

First Supervisor

Second Supervisor

Third Supervisor

Fourth Supervisor

Abstract

Related Datasets

Related Publications

DOI

Recommended Citation

Included in

Search

Links

Browse

Author Information

Links

Paper Locations

Theses: Doctorates and Masters

Embodied AI for challenging rearrangement tasks in the context of service and assistive robots

Author

Author Identifier

Date of Award

Document Type

Publisher

Degree Name

School

First Supervisor

Second Supervisor

Third Supervisor

Fourth Supervisor

Abstract

Related Datasets

Related Publications

DOI

Recommended Citation

Included in

Share

Search

Links

Browse

Author Information

Links

Paper Locations