Semantic plagiarism detection of figures in scholarly documents: A conceptual framework

Author Identifier (ORCID)

Syed Mohammed Shamsul Islam: https://orcid.org/0000-0002-3200-2903

Abstract

Scholarly full-text documents on machine learning typically include numerous result figures that convey valuable information, such as experimental outcomes, assessments, and comparisons between models. However, research work often carries a great risk of plagiarism. Plagiarism can be textual as well as plagiarism of figures. The existing literature largely explores the plagiarism in the text; that is any degree of similarity between the texts of the scholarly documents, thus ignoring the figures. This study builds on the previous literature and brings new insights by proposing a conceptual framework of a system for detecting plagiarism in result-figures of scholarly documents. This would involve generating semantically enriched summaries specific to result-figures, which will be achieved by extracting relevant information from the figures themselves including the area under the curve (AUC), as well as their associated captions in full-text documents. To accomplish this, this study propose to classify the extracted figures and analyze them by parsing the figure text, legends, and data plots, using a convolutional neural network classification model like ResNet50 that is pretrained on 1.2 million images from ImageNet. The specialized candidate figure summaries would then be evaluated against the specialized actual figure summaries using Jaccard similarity and edit distance metrics thus catering the challenging task of detecting plagiarism of figures.

Document Type

Conference Proceeding

Date of Publication

2024

Volume

2024

Publication Title

2024 IEEE International Conference on Future Machine Learning and Data Science (FMLDS)

Publisher

IEEE

School

School of Science

RAS ID

82332

Event Dates

20-23 November 2024

Additional Information

Subscription content

ISBN

979-8-3503-9121-3

Comments

Batool, H., Islam, S., Janjua, N. (2024). Semantic plagiarism detection of figures in scholarly documents: A conceptual framework. Proceedings - 2024 IEEE International Conference on Future Machine Learning and Data Science, FMLDS 2024 (69-74). IEEE. https://doi.org/10.1109/FMLDS63805.2024.00023.

First Page

69

Last Page

74

Share

 
COinS
 

Link to publisher version (DOI)

10.1109/FMLDS63805.2024.00023