Abstract

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.

Keywords

Categorical attributes, Clustering, Cyber security, Healthcare, Initialization, K-means, Unsupervised learning

Document Type

Journal Article

Date of Publication

8-1-2020

Volume

Issue

Publication Title

Electronics

Publisher

MDPI

School

School of Science

RAS ID

32030

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Comments

Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics, 9(8), 1295. https://doi.org/10.3390/electronics9081295

First Page

Last Page

Download

Included in

Computer Sciences Commons

COinS

Link to publisher version (DOI)

10.3390/electronics9081295

Research outputs 2014 to 2021

The k-means algorithm: A comprehensive survey and performance evaluation

Abstract

Keywords

Document Type

Date of Publication

Volume

Issue

Publication Title

Publisher

School

RAS ID

Creative Commons License

Comments

First Page

Last Page

Included in

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2014 to 2021

The k-means algorithm: A comprehensive survey and performance evaluation

Authors/Creators

Abstract

Keywords

Document Type

Date of Publication

Volume

Issue

Publication Title

Publisher

School

RAS ID

Creative Commons License

Comments

First Page

Last Page

Included in

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations