Evans, Lewis, Owda, Majdi, Crockett, Keeley ORCID: https://orcid.org/0000-0003-1941-6201 and Vilas, Anna (2018) Big Data Fusion Model for Heterogeneous Financial Market Data (FinDF). In: Intelligent Systems Conference (IntelliSys) 2018, 06 September 2018 - 07 September 2018, London, UK.
|
Accepted Version
Available under License In Copyright. Download (670kB) | Preview |
Abstract
The dawn of big data has seen the volume, variety, and velocity of data sources increase dramatically. Enormous amounts of structured, semi-structured and unstructured heterogeneous data can be garnered at a rapid rate, making analysis of such big data a herculean task. This has never been truer for data relating to financial stock markets, the biggest challenge being the 7 Vs of big data which relate to the collection, pre-processing, storage and real-time processing of such huge quantities of disparate data sources. Data fusion techniques have been adopted in a wide number of fields to cope with such vast amounts of heterogeneous data from multiple sources and fuse them together in order to produce a more comprehensive view of the data and its underlying relationships. Research into the fusing of heterogeneous financial data is scant within the literature, with existing work only taking into consideration the fusing of text-based financial documents. The lack of integration between financial stock market data, social media comments, financial discussion board posts and broker agencies means that the benefits of data fusion are not being realised to their full potential. This paper proposes a novel data fusion model, inspired by the data fusion model introduced by the Joint Directors of Laboratories, for the fusing of disparate data sources relating to financial stocks. Data with a diverse set of features from different data sources will supplement each other in order to obtain a Smart Data Layer, which will assist in scenarios such as irregularity detection and prediction of stock prices.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.