e-space
Manchester Metropolitan University's Research Repository

Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters

Koliopoulos, A and Yiapanis, P and Tekiner, T and Nenadic, G and Keane, J (2016) Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters. In: 5th IEEE International Congress on Big Data.

[img]
Preview

Download (148kB) | Preview

Abstract

Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes performance overheads due to only supporting on-disk data. Data Analytic algorithms usually require multiple iterations over a dataset and thus, multiple, slow, disk accesses. In contrast, modern clusters possess increasing amounts of main memory that can provide performance benefits by efficiently using main memory caching mechanisms. Apache Spark is an innovative distributed computing framework that supports in-memory computations. Even though this type of computations is very fast, memory is a scarce resource and this can cause bottlenecks to execution or, even worse, lead to failures. Spark offers various choices for memory tuning but this requires in-depth systems-level knowledge and the choices will be different across various workloads and cluster settings. Generally, the optimal choice is achieved by adopting a trial and error approach. This work describes a first step towards an automated selection mechanism for memory optimization that assesses workload and cluster characteristics and selects an appropriate caching scheme. The proposed caching mechanism decreases execution times by up to 25% compared to the default strategy and reduces the risk of main memory exceptions.

Impact and Reach

Statistics

Downloads
Activity Overview
45Downloads
79Hits

Additional statistics for this dataset are available via IRStats2.

Actions (login required)

Edit Item Edit Item