StyleBabel: artistic style tagging and captioning

Ruta, Dan, Gilbert, Andrew, Aggarwal, Pranav, Marri, Naveen, Kale, Ajinkya, Briggs, Jo ORCID: https://orcid.org/0000-0002-4041-1918, Speed, Chris, Jin, Halin, Faieta, Baldo, Filipkowski, Alex, Lin, Zhe and Collomosse, John (2022) StyleBabel: artistic style tagging and captioning. In: ECCV 2022: 17th European Conference on Computer Vision, 23 October 2022 - 27 October 2022, Tel Aviv, Israel.

Preview

Accepted Version
Download (6MB) | Preview

Official URL: https://doi.org/10.1007/978-3-031-20074-8_13

Abstract

We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. StyleBabel was collected via an iterative method, inspired by ‘Grounded Theory’: a qualitative approach that enables annotation while co-evolving a shared language for fine-grained artistic style attribute description. We demonstrate several downstream tasks for StyleBabel, adapting the recent ALADIN architecture for fine-grained style similarity, to train cross-modal embeddings for: 1) free-form tag generation; 2) natural language description of artistic style; 3) fine-grained text search of style. To do so, we extend ALADIN with recent advances in Visual Transformer (ViT) and cross-modal representation learning, achieving a state of the art accuracy in fine-grained style retrieval.

Item Type:	Conference or Workshop Item (Paper)
Published Proceedings:	Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VIII
Peer-reviewed:	Yes
Date Deposited:	12 Jun 2023 08:01
Publisher:	Springer
Additional Information:	This version of the conference paper has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-031-20074-8_13
Divisions:	Faculties > Arts and Humanities
Subject terms:	Datasets and evaluation, Image and video retrieval, Vision + language, Vision applications and systems, Artificial Intelligence & Image Processing
URI:	https://e-space.mmu.ac.uk/id/eprint/632095
DOI:	https://doi.org/10.1007/978-3-031-20074-8_13
ISSN	0302-9743
e-ISSN	1611-3349

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

86Downloads

6 month trend

28Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record