Data repair of density-based data cleaning approach using conditional functional dependencies
Purpose Data quality is a major challenge in data management. For organizations, the cleanliness of data is a significant problem that affects many business activities. Errors in data occur for different reasons, such as violation of business rules. However, because of the huge amount of data, manual cleaning alone is infeasible. Methods are required to repair and clean the dirty data through automatic detection, which are data quality issues to address. The purpose of this work is to extend the density-based data cleaning approach using conditional functional dependencies to achieve better data repair. Design/methodology/approach A set of conditional functional dependencies is introduced as an input to the density-based data cleaning algorithm. The algorithm repairs inconsistent data using this set. Findings This new approach was evaluated through experiments on real-world as well as synthetic datasets. The repair quality was determined using the F -measure. The results showed that the quality and scalability of the density-based data cleaning approach improved when conditional functional dependencies were introduced. Originality/value Conditional functional dependencies capture semantic errors among data values. This work demonstrates that the density-based data cleaning approach can be improved in terms of repairing inconsistent data by using conditional functional dependencies.
Year of publication: |
2021
|
---|---|
Authors: | Al-Janabi, Samir ; Janicki, Ryszard |
Published in: |
Data Technologies and Applications. - Emerald Publishing Limited, ISSN 2514-9318, ZDB-ID 2935212-5. - Vol. 56.2021, 3, p. 429-446
|
Publisher: |
Emerald Publishing Limited |
Subject: | Data management | Information systems | Data repair | Integrity constraints |
Saved in:
Online Resource
Saved in favorites
Similar items by subject
-
Beverungen, Daniel, (2025)
-
Beverungen, Daniel, (2025)
-
Shankaranarayanan, Ganesan, (2015)
- More ...
Similar items by person
-
Finding consistent weights assignment with combined pairwise comparisons
Janicki, Ryszard, (2018)
- More ...