Analysing small-sample-sized methylation data to identify biomarkers for congenital heart defects

Author Identifiers

Kan Yu


Date of Award


Degree Type


Degree Name

Master of Science (Computer Science) Research


School of Science

First Advisor

Jitian Xiao

Second Advisor

Leisa Armstrong

Third Advisor

Brad (Guicheng) Zhang


Congenital Heart Defects (CHDs) are the most common type of human congenital anomaly, representing 0.8~1.2% of infants at birth and accounting for over 40% of prenatal deaths. Although the exact aetiology remains a significant challenge, epigenetic modifications, such as Deoxyribonucleic Acid (DNA) methylation, are thought to contribute to the pathogenesis of CHDs.

We aimed to investigate the value of machine learning (ML) in enhancing CHDs diagnosis, particularly for identifying susceptive genes by exploring high-throughput DNA methylation data. The Illumina Human Methylation EPIC BeadChip was used to screen the genome-wide DNA methylation profiles of 24 infants diagnosed with CHDs and 24 healthy infants without heart diseases. Primary preprocessing was conducted by using RnBeads and limma packages. The significantly differentially-methylated CpG sites in top 660 genes with the lowest p-value were selected and further investigated by using a random forest (RF) algorithm.

After training the algorithm, the RF classifiers were applied to a validation dataset of the testing samples with an accuracy rate of 100%. Three genes (MIR663, FGF3 and FAM64A) were identified not only for diagnosing CHDs, but also for predicting CHDs by RF model, with an average sensitivity and specificity of 85% and 95%, respectively. This finding highlights that aberrant DNA methylation plays a significant role in the pathogenesis of CHDs, which may provide us with a potential approach in understanding CHDs. The sample size is limited in the current study. Future research works may consider replicating and refining our key findings in large-scale studies.

Access Note

Access to this thesis is embargoed until 14 Oct 2022. At the expiration of the embargo period, access to the thesis will be restricted to current ECU staff and students. Email queries to library@ecu.edu.au

Access to this thesis is restricted. Please see the Access Note below for access details.