A unified form of fuzzy c-means and k-means algorithms and its partitional implementation
School of Engineering
Executive Unit for Financing Higher Education, Research, Development and Innovation (UEFISCDI) of Romania
© 2021 Elsevier B.V. This paper proposes as an element of novelty the Unified Form (UF) clustering algorithm, which treats Fuzzy C-Means (FCM) and K-Means (KM) algorithms as a single configurable algorithm. UF algorithm was designed to facilitate the FCM and KM algorithms software implementation by offering a solution to implement a single algorithm, which can be configured to work as FCM or KM. The second element of novelty of this paper is the Partitional Implementation of Unified Form (PIUF) algorithm, which is built upon the UF algorithm and designed to solve in an elegant manner the challenges of processing large datasets in a sequential manner and the scalability of the UF algorithm for processing datasets of any size. PIUF algorithm has the advantage of overcoming any possible hardware limitations that can occur if large volumes of data are processed (required to be stored, loaded in memory and processed by a certain specified computational system). PIUF algorithm is designed and formulated to be used on a single machine if the processed dataset is very big and it cannot be entirely loaded in the memory; at the same time it can be scaled to multiple processing nodes for reducing the processing time required to find the optimal solution. UF and PIUF algorithms are implemented and validated in BigTim platform, which is a distributed platform developed by the authors, and offers support for processing various datasets in a parallel manner but they can be implemented in any other data processing platforms. The Iris dataset is considered and next modified to obtain different datasets of different sizes in order to test the algorithms implementations in BigTim platform in different configurations. The analysis of PIUF algorithm and the comparison with FCM, KM and DBSCAN clustering algorithms are carried out using two performance indices; three performance indices are employed to evaluate the quality of the obtained clusters.