Sequential and causal relationships among actions are critical for accurate video interpretation. Therefore, capturing both short-term and long-term temporal information is essential for effective action recognition. Current research, however, primarily focuses on fusing spatial features from diverse modalities for short-term action recognition, inadequately modeling the complex temporal dependencies in videos, leading to suboptimal performance. To address this limitation, we propose a skeleton-weighted and multi-scale temporal-driven action recognition network that integrates RGB and skeleton modalities to effectively capture both short-term and long-term temporal information. First, we propose a temporal-enhanced adaptive graph convolutional network. This network derives motion attention masks from the skeletal joints and transfers them to RGB videos to generate visually salient regions, thereby achieving a concise and effective input representation. Subsequently, we develop a multi-scale local–global temporal modeling network driven by a self-attention mechanism, which effectively captures fine-grained local details of individual actions along with global temporal relationships among actions across multiple temporal resolutions. Moreover, we design a multi-level adaptive temporal scale mixer module that efficiently integrates multi-scale features, creating a unified temporal feature representation to ensure temporal consistency. Finally, we conducted extensive experiments on the NTU-RGBD-60, NTU-RGBD-120, NW-UCLA, and Kinetics datasets to validate the effectiveness of the proposed method.
Graph-based Approximate Nearest Neighbor Search (ANNS) algorithms have attracted more attention due to their better performance. Numerous ANNS optimization methods have been proposed by researchers. However, Current graph-based ANNS algorithms still cannot index billion-scale datasets on a single server with 256GB RAM. Although several researchers have investigated on this problem, we believe there is still an improvement in reducing disk accesses. In this paper, we provide a fast navigation layer to decrease the number of disk accesses by assisting query points in swiftly reaching the range of strongly connected components. Compared with the state-of-the-art ANNS algorithms, NDANN reduces the mean latency by about 20% under the same recall.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.