Robot Shape and Location Retention in Video Generation Using Diffusion Models

Wang, Peng ORCID: https://orcid.org/0000-0001-9895-394X, Guo, Zhihao, Sait, Abdul Latheef and Pham, Minh Huy (2024) Robot Shape and Location Retention in Video Generation Using Diffusion Models. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7375-7382. Presented at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 14 October 2024 - 18 October 2024, Abu Dhabi, United Arab Emirates.

Preview

Accepted Version
Available under License In Copyright.
Download (533kB) | Preview

Official URL: https://doi.org/10.1109/IROS58592.2024.10802156

Abstract

Diffusion models have marked a significant mile-stone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. The proposed models incorporate techniques such as embedding accessible robot pose information and applying semantic mask regulation within the scalable and efficient ConvNext backbone network. These techniques are designed to refine intermediate outputs, therefore improving the retention performance of shape and location. Through extensive experimentation, our models have demonstrated notable improvements in maintaining the shape and location of different robots, as well as enhancing overall video generation quality, compared to the benchmark diffusion model. Codes will be open-sourced at: https://github.com/PengPaulWang/diffusion-robots.

Item Type:	Conference or Workshop Item (Paper)
Published Proceedings:	2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Peer-reviewed:	No
Date Deposited:	07 Jan 2025 10:56
Publisher:	IEEE
Additional Information:	© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Divisions:	Faculties > Science and Engineering > Department of Computing and Maths
URI:	https://e-space.mmu.ac.uk/id/eprint/637823
DOI:	https://doi.org/10.1109/iros58592.2024.10802156
ISSN	2153-0858
e-ISSN	2153-0866

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

9Downloads

6 month trend

39Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record