Author Identifier (ORCID)
Liang Wang: https://orcid.org/0000-0001-5339-7484
Abstract
Thalassemia is an inherited blood disorder. Current diagnostic methods mainly rely on sophisticated equipment and specifically trained technicians. This study aims to identify and genotype thalassemia by applying machine learning (ML) algorithms to routine blood parameters. This study recruited a derivation cohort of 31,311 individuals from four independent hospitals and developed eight machine learning (ML) models for the purpose. The performance of these models was compared using a set of evaluation metrics. An additional cohort of 2000 patients was recruited for external validation to assess the generalization of the models. The results demonstrated that the categorical boosting (CatBoost) model exhibited the best discriminative ability in both the training and external validation cohorts. The model was then integrated into an online platform, which holds the potential to act as an auxiliary tool for identifying and genotyping thalassemia via automatic analysis of routine blood test parameters.
Document Type
Journal Article
Date of Publication
12-1-2025
Volume
8
Issue
1
Publisher
Nature
School
Centre for Precision Health / School of Medical and Health Sciences
Funders
Research Foundation for Advanced Talents of Guangdong Provincial People's Hospital (KY012023293)
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Comments
Lai, J., Tang, J., Gong, S., Qin, M., Zhang, Y., Liang, Q., Li, L., Cai, Z., & Wang, L. (2025). Development and validation of an interpretable risk prediction model for the early classification of thalassemia. Npj Digital Medicine, 8. https://doi.org/10.1038/s41746-025-01766-0