Kaleem, Mohammed (2015) Methodology and algorithms for Urdu language processing in a conversational agent. Doctoral thesis (PhD), Manchester Metropolitan University.
|
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
Abstract
This thesis presents the research and development of a novel text based goal-orientated conversational agent (CA) for the Urdu language called UMAIR (Urdu Machine for Artificially Intelligent Recourse). A CA is a computer program that emulates a human in order to facilitate a conversation with the user. The aim is investigate the Urdu language and its lexical and grammatical features in order to, design a novel engine to handle the language unique features of Urdu. The weakness in current Conversational Agent (CA) engines is that they are not suited to be implemented in other languages which have grammar rules and structure totally different to English. From a historical perspective CA’s including the design of scripting engines, scripting methodologies, resources and implementation procedures have been implemented for the most part in English and other Western languages (i.e. German and Spanish). The development of an Urdu conversational agent has required the research and development of new CA framework which incorporates methodologies and components in order overcome the language unique features of Urdu such as free word order, inconsistent use of space, diacritical marks and spelling. The new CA framework was utilised to implement UMAIR. UMAIR is a customer service agent for National Database and Registration Authority (NADRA) designed to answer user queries related to ID card and Passport applications. UMAIR is able to answer user queries related to the domain through discourse with the user by leading the conversation using questions and offering appropriate advice with the intention of leading the discourse to a pre-determined goal. The research and development of UMAIR led to the creation of several novel CA components, namely a new rule based Urdu CA engine which combines pattern matching and sentence/string similarity techniques along with new algorithms to process user utterances. Furthermore, a CA evaluation framework has been researched and tested which addresses the gap in research to develop the evaluation of natural language systems in general. Empirical end user evaluation has validated the new algorithms and components implemented in UMAIR. The results show that UMAIR is effective as an Urdu CA, with the majority of conversations leading to the goal of the conversation. Moreover the results also revealed that the components of the framework work well to mitigate the challenges of free word order and inconsistent word segmentation.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.