e-space
Manchester Metropolitan University's Research Repository

    Machine learning for understanding complex, interlinked social data

    Little, Claire (2018) Machine learning for understanding complex, interlinked social data. Doctoral thesis (PhD), Manchester Metropolitan University.

    [img]
    Preview

    Available under License Creative Commons Attribution Non-commercial No Derivatives.

    Download (14MB) | Preview

    Abstract

    With the growing availability of ‘big’ data, increasing computer power, and improved data storage capacities, machine learning techniques are now frequently employed in order to make sense of data. Yet, the social sciences have been slow to adopt these techniques, and there is little evidence of their use in some academic fields. This thesis explores the methods most commonly utilised in social science research, that is, linear regression and null hypothesis significance testing, in order to identify how machine learning methods might complement these more established methods. A case study exploring the Troubled Families programme provides a practical example of how machine learning techniques can be utilised on complex, interlinked social data in order to provide deeper understanding and more insight into the data. Eleven different types of families were identified using cluster analysis, and analysis was performed in order to understand how the family’s lives changed after joining the TF programme when compared to before. The analysis provided insight into the various types of families that existed and the problems that they had. It also highlighted that, had the data been analysed on an overall global level, it would have been prone to an averaging effect whereby many of the changes that occurred were not apparent; analysis on the cluster-level resulted in identification of cluster-level patterns, and a greater understanding of the data. This thesis demonstrated that machine learning techniques, such as cluster analysis and decision tree learning, can be effectively utilised on complex ‘real-life’ social science datasets. These methods can identify hidden groups and relationships, and important predictors in a dataset, provide a better understanding of the structure of the data, and aid in generating research questions and hypotheses.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    285Downloads
    6 month trend
    511Hits

    Additional statistics for this dataset are available via IRStats2.

    Repository staff only

    Edit record Edit record