The advancement of video forgery technologies has made it increasingly easy to create highly realistic yet forged video content, raising significant concerns about their potential misuse. While existing forgery detection methods primarily focus on single-step forgeries, user-friendly video editing tools now enable multiple sequential forgeries. Detecting these sequential alterations is essential for identifying forged videos and understanding the forgery process. We propose a novel approach for detecting sequential video forgery. Unlike traditional binary classification tasks, our method predicts sequences of forgery steps. Our framework employs the Swin Transformer for spatiotemporal relation extraction, enhanced by a spatiotemporal relationships attention (STR) module and hierarchical attention for sequential relation modeling. Experiments show our method outperforms state-of-the-art techniques, achieving significant improvements in fixed-Acc and adaptive-Acc metrics. This research sets a new benchmark for sequential video forgery detection, providing an effective framework and valuable dataset for the community. Our results demonstrate the method’s capability in detecting complex, multi-step forgeries in video content. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Video
Counterfeit detection
Transformers
Video coding
Windows
Semantic video
Data modeling