Semantic plagiarism detection of figures in scholarly documents: A conceptual framework
Author Identifier (ORCID)
Syed Mohammed Shamsul Islam: https://orcid.org/0000-0002-3200-2903
Abstract
Scholarly full-text documents on machine learning typically include numerous result figures that convey valuable information, such as experimental outcomes, assessments, and comparisons between models. However, research work often carries a great risk of plagiarism. Plagiarism can be textual as well as plagiarism of figures. The existing literature largely explores the plagiarism in the text; that is any degree of similarity between the texts of the scholarly documents, thus ignoring the figures. This study builds on the previous literature and brings new insights by proposing a conceptual framework of a system for detecting plagiarism in result-figures of scholarly documents. This would involve generating semantically enriched summaries specific to result-figures, which will be achieved by extracting relevant information from the figures themselves including the area under the curve (AUC), as well as their associated captions in full-text documents. To accomplish this, this study propose to classify the extracted figures and analyze them by parsing the figure text, legends, and data plots, using a convolutional neural network classification model like ResNet50 that is pretrained on 1.2 million images from ImageNet. The specialized candidate figure summaries would then be evaluated against the specialized actual figure summaries using Jaccard similarity and edit distance metrics thus catering the challenging task of detecting plagiarism of figures.
Document Type
Conference Proceeding
Date of Publication
2024
Volume
2024
Publication Title
2024 IEEE International Conference on Future Machine Learning and Data Science (FMLDS)
Publisher
IEEE
School
School of Science
RAS ID
82332
Event Dates
20-23 November 2024
Additional Information
Subscription content
ISBN
979-8-3503-9121-3
First Page
69
Last Page
74
Comments
Batool, H., Islam, S., Janjua, N. (2024). Semantic plagiarism detection of figures in scholarly documents: A conceptual framework. Proceedings - 2024 IEEE International Conference on Future Machine Learning and Data Science, FMLDS 2024 (69-74). IEEE. https://doi.org/10.1109/FMLDS63805.2024.00023.