Author Identifier

Joseph Kane: https://orcid.org/0000-0001-9728-4529

Date of Award

2025

Document Type

Thesis - ECU Access Only

Publisher

Edith Cowan University

Degree Name

Doctor of Philosophy (Integrated)

School

School of Science

First Supervisor

Mike Johnstone

Second Supervisor

Patryk Szewczyk

Abstract

Text-to-speech conversion has been extensively researched and developed since the advent of integrated circuits in computers in 1958. Over sixty years later, most computer-generated voices remained easily identifiable as robotic. The aim of this study was to enhance the realism of computer-generated text-to-speech systems. Increased realism improves artificial voices for individuals reliant on assistive technologies. This research demonstrated that the variable modulated timings of syllables was the most effective way of making robotic sounding voice, become more naturally human. The variable timings reflected the human need to draw breath, with faster speech and longer breaks between words for longer sentences. The research identified classification engines designed for prosody and emotional capture, and examined studies capturing paralinguistic elements capable of conveying more meaning than the literal interpretation of spoken words. Emotive text-to-speech engines were also analysed to leverage prior knowledge of techniques required for modifiable pitch, timbre, and tempo, thereby creating a richer audio experience within the text-to-speech algorithm. Through laboratory experiments, this research created a modular platform for digital speech enhancement. The study filled gaps in academic knowledge, contributing to the development of a flexible and scalable approach to text-to-speech enhancement. Applications of this algorithm include improving high-definition audio codecs for telephony, restoring old recordings, and enhancing human-computer interfaces. Such advancements have the potential to lower barriers to computing and improve accessibility for a wide range of users.

DOI

10.25958/gx2w-rk61

Access Note

Access to this thesis is embargoed until 7th August 2026

Available for download on Friday, August 07, 2026

Share

 
COinS