Layer-wise partitioning and merging for efficient and scalable deep learning

Akintoye, Samson B, Han, Liangxiu ORCID: https://orcid.org/0000-0003-2491-7473, Lloyd, Huw ORCID: https://orcid.org/0000-0001-6537-4036, Zhang, Xin ORCID: https://orcid.org/0000-0001-7844-593X, Dancey, Darren, Chen, Haoming and Zhang, Daoqiang (2023) Layer-wise partitioning and merging for efficient and scalable deep learning. Future Generation Computer Systems, 149. pp. 432-444. ISSN 0167-739X

Preview

Published Version
Available under License Creative Commons Attribution.
Download (4MB) | Preview

Official URL: http://dx.doi.org/10.1016/j.future.2023.07.043

Abstract

Deep Neural Network (DNN) models are usually trained sequentially from one layer to another, which causes forward, backward and update locking problems, leading to poor performance in terms of training time. The existing parallel strategies to mitigate these problems provide suboptimal runtime performance. In this work, we have proposed a novel layer-wise partitioning and merging, forward and backward pass parallel framework to provide better training performance. The novelty of the proposed work consists of (1) a layer-wise partition and merging model which can minimise communication overhead between devices without the memory cost of existing strategies during the training process; (2) a forward pass and backward pass parallelisation to address the update locking problem and minimise the total training cost. The experimental evaluation on real use cases shows that the proposed method outperforms the state-of-the-art approaches in terms of training speed; and achieves almost linear speedup without compromising the accuracy performance of the non-parallel approach.

Item Type:	Article
Peer-reviewed:	Yes
Date Deposited:	31 Aug 2023 13:16
Publisher:	Elsevier BV
Additional Information:	This is an Open Access article which appeared in Future Generation Computer Systems
Divisions:	Faculties > Science and Engineering
Subject terms:	0803 Computer Software, 0805 Distributed Computing, 0806 Information Systems, Distributed Computing
Data Access Statement:	The authors do not have permission to share data.
URI:	https://e-space.mmu.ac.uk/id/eprint/632478
DOI:	https://doi.org/10.1016/j.future.2023.07.043
ISSN	0167-739X

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

249Downloads

6 month trend

336Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record