Kendrick, Connah (2018) Markerless facial motion capture: deep learning approaches on RGBD data. Doctoral thesis (PhD), Manchester Metropolitan University.
|
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (7MB) | Preview |
Abstract
Facial expressions are a series of fast, complex and interconnected movement that causes an array of deformations, such as stretching, compressing and folding of the skin. Identifying expression is a natural process in human vision, but due to the diversity of faces, it has many challenges for computer vision. Research in markerless facial motion capture using single Red Green Blue (RGB) camera has gained popularity due to the wide access of the data, such as from mobile phones. The motivation behind this work is much of the existing work attempts to infer the 3-Dimensional (3D) data from 2-Dimensional (2D) images, such as in motion capture multiple 2D cameras are calibration to allow some depth prediction. Whereas, the inclusion of Red Green Blue Depth (RGBD) sensors that give ground truth depth data could gain a better understanding of the human face and how expressions are visualised. The aim of this thesis is to investigate and develop novel methods of markerless facial motion capture, where the focus is on the inclusions of RGBD data to provide 3D data. The contributions are: A tool to aid in the annotation of 3D facial landmarks; A novel neural network that demonstrate the ability of predicting 2D and 3D landmarks by merging RGBD data; Working application that demonstrates complex deep learning network on portable handheld devices; A review of existing methods of denoising fine detail in depth maps using neural networks; A network for the complete analysis of facial landmarks and expressions in 3D. The 3D annotator was developed to overcome the issues of relying on existing 3D modelling software, which made feature identification difficult. The technique of predicting 2D and 3D with auxiliary information, allowed high accuracy 3D landmarking, without the need for full model generation. Also, it outperformed other recent techniques of landmarking. The networks running on the handheld devices show as a proof of concept that even without much optimisation, a complex task can be performed in near real-time. Denoising Time of Flight (ToF) depth maps, showed much more complexity than the tradition RGB denoising, where we reviewed and applied an array of techniques to the task. The full facial analysis showed that when neural networks perform on a wide range of related task for auxiliary information allow for deep understanding of the overall task. The research for facial processing is vast, but still with many new problems and challenges to face and improve upon. While RGB cameras are used widely, we see the inclusion of high accuracy and cost-effective depth sensing device available. The new devices allow better understanding of facial features and expression. By using and merging RGB data, the area of facial landmarking, and expression intensity recognition can be improved.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.