The STVchrono dataset: Towards continuous change recognition in time

Author Identifier

Mariia Khan: https://orcid.org/0000-0001-6662-4607

Document Type

Conference Proceeding

Publication Title

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

First Page

14111

Last Page

14120

Publisher

IEEE

School

School of Science

RAS ID

76849

Comments

Sun, Y., Qiu, Y., Khan, M., Matsuzawa, F., & Iwata, K. (2024). The stvchrono dataset: Towards continuous change recognition in time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14111-14120). https://doi.org/10.1109/CVPR52733.2024.01338

Abstract

Recognizing continuous changes offers valuable insights into past historical events, supports current trend analysis, and facilitates future planning. This knowledge is crucial for a variety of fields, such as meteorology and agriculture, environmental science, urban planning and construction, tourism, and cultural preservation. Currently available datasets in the field of scene change understanding primarily concentrate on two main tasks: the detection of changed regions within a scene and the linguistic description of the change content. Existing datasets focus on recognizing discrete changes, such as adding or deleting an object from two images, and largely rely on artificially generated images. Consequently, the existing change understanding methods primarily focus on identifying distinct object differences, overlooking the importance of continuous, gradual changes occurring over extended time intervals. To address the above issues, we propose a novel benchmark dataset, STVchrono, targeting the localization and description of long-term continuous changes in real-world scenes. The dataset consists of 71,900 photographs from Google Street View API taken over an 18-year span across 50 cities all over the world. Our STVchrono dataset is designed to support real-world continuous change recognition and description in both image pairs and extended image sequences, while also enabling the segmentation of changed regions. We conduct experiments to evaluate state-of-the- art methods on continuous change description and segmentation, as well as multimodal Large Language Models for describing changes. Our findings reveal that even the most advanced methods lag human performance, emphasizing the need to adapt them to continuously changing real-world scenarios. We hope that our benchmark dataset will further facilitate the research of temporal change recognition in a dynamic world. The STVchrono dataset is available at STVchrono Dataset.

DOI

10.1109/CVPR52733.2024.01338

Access Rights

subscription content

Share

 
COinS