Vangumalli, Dinesh Reddy, Nikolopoulos, Konstantinos and Litsiou, Konstantia (2021) Aggregate selection, individual selection, and cluster selection: an empirical evaluation and implications for systems research. Cybernetics and Systems: An International Journal, 52 (7). pp. 553-578. ISSN 0196-9722
|
Accepted Version
Available under License Creative Commons Attribution Non-commercial. Download (906kB) | Preview |
Abstract
Data analysts when forecasting large number of time series, they regularly employ one of the following methodological approaches: either select a single forecasting method for the entire dataset (aggregate selection), or use the best forecasting method for each time series (individual selection). There is evidence in the predictive analytics literature that the former is more robust than the latter, as in individual selection you tend to overfit models to the data. A third approach is to first identify homogeneous clusters within the dataset, and then select a single forecasting method for each cluster (cluster selection). To that end, we examine three machine learning clustering methods: k-medoids, k-NN and random forests. The evaluation is performed in the 645 yearly series of the M3 competition. The empirical evidence suggests: a) random forests provide the best clusters for the sequential forecasting task, and b) cluster selection has the potential to outperform aggregate selection.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.