Improving Efficiency of K-Means Algorithm for Large Datasets

Clustering is a process of grouping objects into different classes based on their similarities. K-means is a widely studied partitional based algorithm. It is reported to work efficiently for small datasets; however the performance is not very appreciable in terms of time of computation for large datasets. Several modifications have been made by researchers to address this issue. This paper proposes a novel way of handling the large datasets using K-means in a distributed manner to obtain efficiency. The concept of parallel processing is exploited by dividing the datasets to a number of baskets and then applying K-means in parallel manner to each such basket. The proposed BasketK-means provides a very competitive performance with considerably less computation time. The simulation results on various real datasets and synthetic datasets presented in the work clearly emphasize the effectiveness of the proposed approach.

MoreLess

Year of publication:	2016
Authors:	Swapna, Ch. Swetha ; Kumar, V. Vijaya ; Murthy, J.V.R
Published in:	International Journal of Rough Sets and Data Analysis (IJRSDA). - IGI Global, ISSN 2334-4601, ZDB-ID 2798043-1. - Vol. 3.2016, 2 (01.04.), p. 1-9
Publisher:	IGI Global
Subject:	K-Means \| Large Datasets \| Parallel Clustering \| Performance Measures

More details

Type of publication:	Article
Language:	English
Other identifiers:	10.4018/IJRSDA.2016040101 [DOI]
Source:	Other ZBW resources

Persistent link: https://www.econbiz.de/10012047351