On Regularisation Methods for Analysis of High Dimensional Data

Sirimongkolkasem, Tanin and Drikvandi, Reza ORCID: https://orcid.org/0000-0002-7245-9713 (2019) On Regularisation Methods for Analysis of High Dimensional Data. Annals of Data Science, 6 (4). pp. 737-763. ISSN 2198-5804

Preview

Published Version
Available under License Creative Commons Attribution.
Download (1MB) | Preview

Official URL: https://link.springer.com/article/10.1007/s40745-0...

Abstract

High dimensional data are rapidly growing in many domains due to the development of technological advances which helps collect data with a large number of variables to better understand a given phenomenon of interest. Particular examples appear in genomics, fMRI data analysis, large-scale healthcare analytics, text/image analysis and astronomy. In the last two decades regularisation approaches have become the methods of choice for analysing such high dimensional data. This paper aims to study the performance of regularisation methods, including the recently proposed method called de-biased lasso, for the analysis of high dimensional data under different sparse and non-sparse situations. Our investigation concerns prediction, parameter estimation and variable selection. We particularly study the effects of correlated variables, covariate location and effect size which have not been well investigated. We find that correlated data when associated with important variables improve those common regularisation methods in all aspects, and that the level of sparsity can be reflected not only from the number of important variables but also from their overall effect size and locations. The latter may be seen under a non-sparse data structure. We demonstrate that the de-biased lasso performs well especially in low dimensional data, however it still suffers from issues, such as multicollinearity and multiple hypothesis testing, similar to the classical regression methods.

Item Type:	Article
Peer-reviewed:	Yes
Date Deposited:	15 Apr 2019 11:03
Publisher:	Springer Nature
Additional Information:	This is an Open Access article published in Annals of Data Science, published by Springer, copyright The Author(s).
Divisions:	Organisation > Science and Engineering
Subject terms:	De-biased lasso, High dimensional data, Lasso, Linear regression model, Regularisation, Sparsity
URI:	https://e-space.mmu.ac.uk/id/eprint/622792
DOI:	https://doi.org/10.1007/s40745-019-00209-4
ISSN	2198-5804
e-ISSN	2198-5812

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

396Downloads

6 month trend

369Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record