Presentation + Paper
7 June 2024 Multi-modal knowledge distillation for domain-adaptive action recognition
Author Affiliations +
Abstract
Effectively recognizing human actions from variant viewpoints is crucial for successful collaboration between humans and robots. Deep learning approaches have achieved promising performance in action recognition given sufficient well-annotated data from the real world. However, collecting and annotating real-world videos can be challenging, particularly for rare or violent actions. Synthetic data, on the other hand, can be easily obtained from simulators with fine-grained annotations and variant modalities. To learn domain-invariant feature representations, we propose a novel method to distill the pseudo labels from the strong mesh-based action recognition model into a light-weighted I3D model. In this way, the model can leverage robust 3D representations and maintain real-time inference speed. We empirically evaluate our model on the Mixamo→Kinetics dataset. The proposed model achieves state-of-the-art performance compared to the existing video domain adaptation methods.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Xiaoyu Zhu, Wenhe Liu, Celso M. de Mello, and Alexander Hauptmann "Multi-modal knowledge distillation for domain-adaptive action recognition", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350E (7 June 2024); https://doi.org/10.1117/12.3013318
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
3D vision

Video processing

Back to Top