Kleerekoper, Anthony, Lujan, Mikel and Brown, Gavin (2013) Exploring Sketches for Probability Estimation with Sublinear Memory. In: 2013 IEEE International Conference on Big Data, 06 October 2013 - 09 October 2013, Silicon Valley, CA, USA.
|
Accepted Version
Available under License In Copyright. Download (382kB) | Preview |
Abstract
As data sets become ever larger it becomes increasingly complex to apply traditional machine learning techniques to them. Feature selection can greatly reduce the computational requirements of machine learning but it too can be memory intensive. In this paper we explore the use of succinct data structures called sketches for probability estimation as a component of information theoretic feature selection. These data structures are sublinear in the number of items but were designed only for estimating the frequency of the most frequent items. To the best of our knowledge this is the first time they have been examined for estimating the frequency of all items and we find that often some information theoretic measures can be estimated to within a few percent of the correct values.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.