Research outputs 2014 to 2021

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

Kevin M. Mendez, Edith Cowan UniversityFollow
Stacey N. Reinke, Edith Cowan UniversityFollow
David I. Broadhurst, Edith Cowan UniversityFollow

Author Identifier (ORCID)

Kevin M. Mendez
https://orcid.org/0000-0002-8832-2607
Stacey Reinke
https://orcid.org/0000-0002-0758-0330
David Broadhurst
https://orcid.org/0000-0003-0775-9581

Abstract

Introduction:

Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models.

Objectives:

We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis.

Methods:

We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks.

Results:

There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice.

Conclusion:

The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm.

Keywords

[RSTDPub], Artificial neural network, Jupyter, Machine learning, Metabolomics, Open source, Partial least squares, Random forest, Support vector machines

Document Type

Journal Article

Date of Publication

1-1-2019

Publication Title

Metabolomics

Publisher

Springer

School

Centre for Metabolomics and Computational Biology / School of Science

RAS ID

30637

Funders

Australian Research Council

Grant Number

ARC Number : LE170100021

Grant Link

http://purl.org/au-research/grants/arc/LE170100021

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Related Publications

Mendez, K. M. (2020). Deriving statistical inference from the application of artificial neural networks to clinical metabolomics data. https://ro.ecu.edu.au/theses/2296

Comments

Mendez, K. M., Reinke, S. N., & Broadhurst, D. I. (2019). A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification. Metabolomics, 15(12).

https://doi.org/10.1007/s11306-019-1612-4

Download

Included in

Computational Biology Commons

COinS

Link to publisher version (DOI)

10.1007/s11306-019-1612-4

Research outputs 2014 to 2021

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

Author Identifier (ORCID)

Abstract

Keywords

Document Type

Date of Publication

Publication Title

Publisher

School

RAS ID

Funders

Grant Number

Grant Link

Creative Commons License

Related Publications

Comments

Included in

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2014 to 2021

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

Authors/Creators

Author Identifier (ORCID)

Abstract

Keywords

Document Type

Date of Publication

Publication Title

Publisher

School

RAS ID

Funders

Grant Number

Grant Link

Creative Commons License

Related Publications

Comments

Included in

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations