Tabassam, Muhammad Rauf ORCID: https://orcid.org/0009-0007-7423-3033, Waheed, Hajra ORCID: https://orcid.org/0000-0003-0168-0063, Safder, Iqra ORCID: https://orcid.org/0000-0001-9818-4693, Sarwar, Raheem ORCID: https://orcid.org/0000-0002-0640-807X, Aljohani, Naif Radi ORCID: https://orcid.org/0000-0001-9153-1293, Nawaz, Raheel ORCID: https://orcid.org/0000-0001-9588-0052, Hassan, Saeed-Ul ORCID: https://orcid.org/0000-0002-6509-9190, Zaman, Farooq ORCID: https://orcid.org/0000-0002-9861-4013 and Ahsan, Ahtazaz ORCID: https://orcid.org/0000-0001-7772-5462 (2024) UPON: Urdu Poetry Generation Using Deep Learning: A Novel Approach and Evaluation. ACM Transactions on Asian and Low-Resource Language Information Processing. ISSN 2375-4702
|
Accepted Version
Available under License Creative Commons Attribution. Download (631kB) | Preview |
Abstract
Poetry represents the oldest and most esteemed literary form, allowing poets to convey ideas while carefully attending to elements such as meaning, coherence, poetic quality, and fluency. Notably, the creation of good poetry entails considerations of rhyme and meter. With the advent of artificial intelligence (AI), significant advancements have been made in automatic text generation, primarily within languages such as English and Chinese. However, the generation of Urdu poetry presents a unique challenge due to the language’s inherent ambiguity, cultural and historical nuances, and the demand for creativity. The existing body of literature has only marginally explored Urdu prose and has almost entirely overlooked the domain of Urdu poetry generation, primarily due to the scarcity of comprehensive training data. In response to this deficiency, this research endeavor addresses this challenge. It begins by introducing a specialized Urdu poetry dataset adhering to a specific meter, ’behr-e-khafeef,’ which incorporates approximately 20,000 couplets from the Rekhta repository. Subsequently, a character-based encoding methodology is proposed to transform these couplets into a numerical representation, assigning a distinct identifier to each character. The generation process initiates with the creation of the first verse through a character-level LSTM, followed by the application of a machine translation technique, specifically sequence-to-sequence learning, to formulate the second verse based on the first. The generated poetry is subjected to evaluation based on metrics, including BLEU scores. Additionally, an expert panel of Urdu poets is engaged to conduct a human assessment of the generated couplets, with the evaluation encompassing critical dimensions such as meaning, coherence, poetic quality, and fluency. Our findings are juxtaposed with existing poetry generation systems, demonstrating a notable advancement in the state-of-the-art, as evidenced by a BLEU score of 0.23. The research culminates with the presentation of prospective avenues for further exploration, aimed at inspiring the scholarly community to enhance the domain of poetry generation and augment existing contributions in this field.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.