Infrared ship video target tracking based on cross-connection and spatial transformer network
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
School of Engineering
National Natural Science Foundation of China (61806013, 61906005) / General project of Science and Technology Plan of Beijing Municipal Education Commission (KM202110005028) / International Research Cooperation Seed Fund of Beijing University of Technology (2021A01)
In this paper, to improve the efficiency of infrared (IR) ship target tracking, an efficient SiamRPN++ method based on AlexNet with cross connection and spatial transformer network is proposed. The cross-connection method integrates the features of shallow layers and the deep layers to increase the spatial information of the output features. To reduce the influence of target rotation and scaling on tracking accuracy, we introduced the spatial transformer network to explicitly learn rotation invariance, which can supplement the implicit rotation invariance learned by convolutional neural network. Moreover, in order to train and evaluate the model more appropriately and to deal with the problem of the lacking IR ship video target tracking data set, we constructed an IR ship video tracking data set including 6725 frames of images. The experimental results show that the proposed method can effectively improve the speed to 63.9 FPS, which is 9.7 times faster than the SIAMRPN++ method under the condition of ensuring the accuracy and the average intersection of union (mIoU).