e-space
Manchester Metropolitan University's Research Repository

    Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters

    Koliopoulos, A, Yiapanis, P, Tekiner, T, Nenadic, G and Keane, J (2016) Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters. In: 5th IEEE International Congress on Big Data.

    [img]
    Preview

    Available under License In Copyright.

    Download (148kB) | Preview

    Abstract

    Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes performance overheads due to only supporting on-disk data. Data Analytic algorithms usually require multiple iterations over a dataset and thus, multiple, slow, disk accesses. In contrast, modern clusters possess increasing amounts of main memory that can provide performance benefits by efficiently using main memory caching mechanisms. Apache Spark is an innovative distributed computing framework that supports in-memory computations. Even though this type of computations is very fast, memory is a scarce resource and this can cause bottlenecks to execution or, even worse, lead to failures. Spark offers various choices for memory tuning but this requires in-depth systems-level knowledge and the choices will be different across various workloads and cluster settings. Generally, the optimal choice is achieved by adopting a trial and error approach. This work describes a first step towards an automated selection mechanism for memory optimization that assesses workload and cluster characteristics and selects an appropriate caching scheme. The proposed caching mechanism decreases execution times by up to 25% compared to the default strategy and reduces the risk of main memory exceptions.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    425Downloads
    6 month trend
    332Hits

    Additional statistics for this dataset are available via IRStats2.

    Repository staff only

    Edit record Edit record