A framework to support autonomous construction of knowledge graphs from unstructured text

Author Identifier

Muhammad Ali Hur

https://orcid.org/0000-0001-9537-2648

Date of Award

2024

Document Type

Thesis

Publisher

Edith Cowan University

Degree Name

Doctor Of Philosophy

School

School of Science

First Supervisor

Dr Mohiuddin Ahmed

Second Supervisor

Dr Naeem Janjua

Abstract

This thesis delves into the automated construction of Knowledge Graphs (KGs) from unstructured text data, aiming to overcome the challenges inherent in extracting and representing contextual information with the necessary semantic depth and breadth. While various approaches exist for extracting semantic relations such as temporal, causal, and rhetorical from unstructured text, they often focus on one type of relation at the expense of others, resulting in a fragmented contextual representation. Consequently, these approaches fail to produce KGs with rich semantic information, hindering the development of advanced AI applications like recommendation engines and semantic search. Moreover, existing approaches prioritize data integration over quality improvement, neglecting critical aspects such as accuracy, consistency, and completeness, which can lead to errors and inconsistencies in the resulting KGs. To address these issues, this research proposes a comprehensive framework that integrates sophisticated semantic models and linguistic analysis techniques to enhance the depth and precision of semantic representations within KGs. By leveraging a graph-based approach, the proposed framework captures diverse semantic relationships and contextual cues present within unstructured text, providing a structured foundation for the seamless integration and interpretation of textual information. The contributions of this research include the development of semantic enrichment techniques, unified context representation frameworks, and advanced semantic analysis models. These contributions enable the extraction and representation of semantic relations at various linguistic levels, including morphological, syntactic, and semantic aspects. Furthermore, the research explores the practical implications of the proposed framework, demonstrating its utility in various domains such as natural language processing, information retrieval, and knowledge management. The framework undergoes validation utilizing a gold-standard MEANTIME dataset to ensure its efficacy in domain-agnostic text representation. This validation assesses the accuracy of semantic elements including events, entities, event participants, temporal relations, and coreference links. Additionally, for the validation of domain-specific KGs, a carefully crafted domain-specific stock market ontology and a set of competency questions serve as benchmarks against which the domain-specific KG is rigorously evaluated and validated. Overall, this thesis contributes to advancing the state-of-the-art in automated knowledge extraction from unstructured ii text data, paving the way for more informed decision-making and sophisticated information processing systems.

Comments

Author also known as Ali Hur

DOI

10.25958/r2qt-b933

Access Note

Access to this theses is embargoed until 6th September 2025.

Access to this thesis is restricted. Please see the Access Note below for access details.

Share

 
COinS