e-space
Manchester Metropolitan University's Research Repository

    Clustering, Forecasting and Forecasting clusters: using k-medoids, k-NNs and random forests to improve forecasting

    Vangumalli, Dinesh, Nikolopoulos, Konstantinos and Litsiou, Konstantia (2019) Clustering, Forecasting and Forecasting clusters: using k-medoids, k-NNs and random forests to improve forecasting. Working Paper. Bangor Business School, United Kingdom.

    [img]
    Preview
    Published Version
    Download (1MB) | Preview

    Abstract

    Data analysts when facing a forecasting task involving a large number of time series, they regularly employ one of the following two methodological approaches: either select a single forecasting method for the entire dataset (aggregate selection), or use the best forecasting method for each time series (individual selection). There is evidence in the predictive analytics literature that the former is more robust than the latter, as in individual selection you tend to overfit models to the data. A third approach is to firstly identify homogeneous clusters within the dataset, and then select a single forecasting method for each cluster (cluster selection). This research examines the performance of three well-celebrated machine learning clustering methods: k-medoids, k-NN and random forests. We then forecast every cluster with the best possible method, and the performance is compared to that of aggregate selection. The aforementioned methods are very often used for classification tasks, but since in our case there is no set of predefined classes, the methods are used for pure clustering. The evaluation is performed in the 645 yearly series of the M3 competition. The empirical evidence suggests that: a) random forests provide the best clusters for the sequential forecasting task, and b) cluster selection has the potential to outperform aggregate selection.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    85Downloads
    6 month trend
    291Hits

    Additional statistics for this dataset are available via IRStats2.

    Repository staff only

    Edit record Edit record