Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters

Koliopoulos, A, Yiapanis, P, Tekiner, T, Nenadic, G and Keane, J (2016) Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters. In: 5th IEEE International Congress on Big Data.

Preview

Available under License In Copyright.
Download (148kB) | Preview

Abstract

Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes performance overheads due to only supporting on-disk data. Data Analytic algorithms usually require multiple iterations over a dataset and thus, multiple, slow, disk accesses. In contrast, modern clusters possess increasing amounts of main memory that can provide performance benefits by efficiently using main memory caching mechanisms. Apache Spark is an innovative distributed computing framework that supports in-memory computations. Even though this type of computations is very fast, memory is a scarce resource and this can cause bottlenecks to execution or, even worse, lead to failures. Spark offers various choices for memory tuning but this requires in-depth systems-level knowledge and the choices will be different across various workloads and cluster settings. Generally, the optimal choice is achieved by adopting a trial and error approach. This work describes a first step towards an automated selection mechanism for memory optimization that assesses workload and cluster characteristics and selects an appropriate caching scheme. The proposed caching mechanism decreases execution times by up to 25% compared to the default strategy and reduces the risk of main memory exceptions.

Item Type:	Conference or Workshop Item
Peer-reviewed:	No
Date Deposited:	01 Jul 2016 11:40
Publisher:	IEEE
Additional Information:	This is an author final copy of a paper presented at the 5th IEEE International Congress on Big Data, copyright IEEE/International Congress on Big Data.
Divisions:
URI:	https://e-space.mmu.ac.uk/id/eprint/615364

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

445Downloads

6 month trend

359Hits

Additional statistics for this dataset are available via IRStats2.

Repository staff only

Edit record