Video anomaly detection (VAD) aims to identify events and behaviors in video sequences that deviate from established normal patterns. Traditionally, unsupervised VAD has been viewed as the one-class classification (OCC) task, which is predicated on the use of training data that encompass only normal events and exclude any anomaly samples. This approach can recognize previously unseen anomaly patterns. However, it may misclassify unfamiliar normal patterns as anomalies. In addition, as anomalous samples are not included in OCC training, it leads to unclear classification boundaries, which reduces the generalization ability of the model. To address these challenges, we introduce the self-supervised memory-guided and attention feature fusion method, which models normal events using optimized memory modules and attention feature fusion modules. The method not only generates pseudo-normal and pseudo-anomaly data but also significantly enhances the capability of the model in key feature identification and exploitation, thereby improving the capture and generalization of complex data relationships. Experimental results from three benchmark datasets-UCSD Ped2, CUHK Avenue, and ShanghaiTech show that our method achieves AUROCs of 99.5%, 90.9%, and 81.8%, respectively, demonstrating the efficacy of our approach. Our code is available at https://github.com/jzt-dongli/Self-Supervised-Memory-guided-and-Attention-Feature-Fusion-for-Video-Anomaly-Detection. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one