Presentation + Paper
13 June 2023 Learning disentangled representation of video for pallet decomposition in industrial warehouses
Author Affiliations +
Abstract
Over the past decade, several approaches have been proposed to learn disentangled representations for video prediction. However, reported experiments are mostly based on standard benchmark datasets such as Moving MNIST and Bouncing Balls. In this work, we address the problem of learning disentangled representation for video prediction in an industrial environment. To this end, we use decompositional disentangled variational auto-encoder, a deep generative model that aims to decompose and recognize overlapped boxes on a pallet. Specifically, this approach disentangles each frame into a dynamic component (box appearance) and a temporally variant component (box location). We evaluate this approach on a new dataset, which contains 40000 video sequences. The experimental results demonstrate the ability to learn both the decomposition of the bounding boxes and their reconstruction without explicit supervision.
Conference Presentation
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ikram Eddahmani, Chi-Hieu Pham, Thibault Napoléon, Isabelle Badoc, and Marwa El-Bouz "Learning disentangled representation of video for pallet decomposition in industrial warehouses", Proc. SPIE 12527, Pattern Recognition and Tracking XXXIV, 1252704 (13 June 2023); https://doi.org/10.1117/12.2663161
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Machine learning

Data modeling

Education and training

Performance modeling

Video coding

Monochromatic aberrations

Back to Top