Improving the t-Digest Data Structure to Accurately Estimate Quantiles in Online Streaming Data
Quantile estimation is an important and frequent task in data analysis. A naive approach by sorting then selecting the corresponding element in the sorted list cannot be done in a very large dataset or online streaming dataset. The data structure t-digest is an efficient way to reduce the running time of the quantile estimation task while maintaining an acceptable accuracy level. In the t-digest approach, selecting an appropriate scale function is the key point to speed the calculation process. In this paper we discussed several scale functions. We proposed two new scale functions. The evaluation results show that the two new functions can reduce the actual running time of the process while keeping the accuracy level the same.