Multi-modal knowledge distillation for domain-adaptive action recognition

Xiaoyu Zhu; Wenhe Liu; Celso M. de Mello; Alexander Hauptmann

doi:10.1117/12.3013318

7 June 2024 Multi-modal knowledge distillation for domain-adaptive action recognition

Xiaoyu Zhu, Wenhe Liu, Celso M. de Mello, Alexander Hauptmann

Proceedings Volume 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II; 130350E (2024) https://doi.org/10.1117/12.3013318
Event: SPIE Defense + Commercial Sensing, 2024, National Harbor, Maryland, United States

Abstract

Effectively recognizing human actions from variant viewpoints is crucial for successful collaboration between humans and robots. Deep learning approaches have achieved promising performance in action recognition given sufficient well-annotated data from the real world. However, collecting and annotating real-world videos can be challenging, particularly for rare or violent actions. Synthetic data, on the other hand, can be easily obtained from simulators with fine-grained annotations and variant modalities. To learn domain-invariant feature representations, we propose a novel method to distill the pseudo labels from the strong mesh-based action recognition model into a light-weighted I3D model. In this way, the model can leverage robust 3D representations and maintain real-time inference speed. We empirically evaluate our model on the Mixamo→Kinetics dataset. The proposed model achieves state-of-the-art performance compared to the existing video domain adaptation methods.

Conference Presentation

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Xiaoyu Zhu, Wenhe Liu, Celso M. de Mello, and Alexander Hauptmann "Multi-modal knowledge distillation for domain-adaptive action recognition", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350E (7 June 2024); https://doi.org/10.1117/12.3013318

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
10 PAGES + PRESENTATION

DOWNLOAD PAPER SAVE TO MY LIBRARY

WATCH
PRESENTATION

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

3D vision

Video processing

RELATED CONTENT

A new multimodal interactive way of subjective scoring of 3D...
Proceedings of SPIE (March 06 2014)

Simulator sickness analysis of 3D video viewing on passive 3D...
Proceedings of SPIE (March 12 2013)

Camera networks and microphone arrays for video conferencing
Proceedings of SPIE (November 22 1999)

Real-time video-based rendering for augmented spatial communication
Proceedings of SPIE (December 28 1998)

Continuously adjustable Pulfrich spectacles
Proceedings of SPIE (February 11 2011)

Classification and simulation of stereoscopic artifacts in mobile 3DTV content
Proceedings of SPIE (February 20 2009)

Motion estimation through efficient matching of a reduced number of...
Proceedings of SPIE (February 26 2008)

Subscribe to Digital Library

Receive Erratum Email Alert

Keywords/Phrases

Search In:

Publication Years