Improving Efficiency of K-Means Algorithm for Large Datasets
Clustering is a process of grouping objects into different classes based on their similarities. K-means is a widely studied partitional based algorithm. It is reported to work efficiently for small datasets; however the performance is not very appreciable in terms of time of computation for large datasets. Several modifications have been made by researchers to address this issue. This paper proposes a novel way of handling the large datasets using K-means in a distributed manner to obtain efficiency. The concept of parallel processing is exploited by dividing the datasets to a number of baskets and then applying K-means in parallel manner to each such basket. The proposed BasketK-means provides a very competitive performance with considerably less computation time. The simulation results on various real datasets and synthetic datasets presented in the work clearly emphasize the effectiveness of the proposed approach.
Year of publication: |
2016
|
---|---|
Authors: | Swapna, Ch. Swetha ; Kumar, V. Vijaya ; Murthy, J.V.R |
Published in: |
International Journal of Rough Sets and Data Analysis (IJRSDA). - IGI Global, ISSN 2334-4601, ZDB-ID 2798043-1. - Vol. 3.2016, 2 (01.04.), p. 1-9
|
Publisher: |
IGI Global |
Subject: | K-Means | Large Datasets | Parallel Clustering | Performance Measures |
Saved in:
Saved in favorites
Similar items by subject
-
Building Daily Economic Sentiment Indicators
Rey del Castillo, Pilar, (2022)
-
Nonlinearities in macroeconomic tail risk through the lens of big data quantile regressions
PrĂ¼ser, Jan, (2023)
-
Buchen, Teresa, (2013)
- More ...