Author Identifier (ORCID)
Liang Wang: https://orcid.org/0000-0001-5339-7484
Abstract
Objectives Rapid discrimination of infections caused by Mycobacterium tuberculosis (MTB) and non-tuberculous mycobacteria (NTM) is crucial in clinical settings. Despite overlapping clinical and radiological features, the two require markedly different therapeutic approaches and public health responses. Current laboratory methods are time-consuming and complex, underscoring the urgent need for a simple and efficient diagnostic tool to inform public health decision-making. Methods Demographic, haematological and biochemical data were collected from two hospitals in Jiangsu province, China, between December 2018 and October 2024. A total of 400 patients were included in the training cohort, with 66 patients used for external validation. Six machine learning models were developed using routine laboratory features, and their performance was evaluated using multiple metrics. Results The random forest (RF) model outperformed others using 49 routine lab features, achieving 82.71% accuracy in the internal cohort and 87.69% in external validation. SHapley Additive exPlanations (SHAP) model identified the top 10 critical features influencing model decisions, namely, chloride, sodium, gender, prealbumin, high-density lipoprotein, procalcitonin, albumin, globulin, total protein and creatine. Based on these indicators, an interactive web-based tool was developed (https://mtb-ntm.streamlit.app). Discussion The features identified by the model align with established clinical parameters and existing studies. Certain previously underestimated variables, such as Cl and Na, exhibited substantial importance in distinguishing between MTB and NTM, offering valuable insights for the development of decision-support tools. Conclusion Routine laboratory indicators coupled with the RF model demonstrated potential capacity as an auxiliary diagnostic tool for discriminating MTB and NTM disease, offering effective medical support in resource-limited and remote settings.
Document Type
Journal Article
Date of Publication
10-1-2025
Volume
32
Issue
1
PubMed ID
41106844
Publication Title
BMJ Health and Care Informatics
Publisher
BMJ Publishing Group
School
School of Medical and Health Sciences
RAS ID
87991
Funders
Research Foundation for Advanced Talents of Guangdong Provincial People’s Hospital (KY012023293) / Research Training Program Australian Commonwealth Government
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Comments
Tang, J., Xiong, X., Huang, T., Zhang, Y., Yao, L., Zhang, W., Xie, Y., Liang, Q., Tan, Z., Jiang, K., Liu, X., & Wang, L. (2025). Rapid discrimination of Mycobacterium tuberculosis and non-tuberculous mycobacteria disease via interpretive machine learning analysis of routine laboratory tests. BMJ Health & Care Informatics, 32(1), e101575. https://doi.org/10.1136/bmjhci-2025-101575