We introduce a novel method for deepfake video detection that utilizes pixel‑wise temporal frequency spectra. Unlike previous approaches that stack 2D frame‑wise spatial frequency spectra, we extract pixel‑wise temporal frequency by performing a 1D Fourier transform on the time axis per pixel, effectively identifying temporal artifacts. We also propose an Attention Proposal Module (APM) to extract regions of interest for detecting these artifacts. Our method demonstrates outstanding generalizability and robustness in various challenging deepfake video detection scenarios.
We will investigate temporal‑frequency regularization to mitigate compression‑induced degradation.
@misc{kim2025spatialfrequencypixelwisetemporal, title = {Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection}, author = {Taehoon Kim and Jongwook Choi and Yonghyun Jeong and Haeun Noh and Jaejun Yoo and Seungryul Baek and Jongwon Choi}, year = {2025}, eprint = {2507.02398}, archivePrefix= {arXiv}, primaryClass = {cs.CV}, url = {https://arxiv.org/abs/2507.02398} }
This work was partly supported by the IITP grant funded by the Korea government (MSIT): No. RS-2025-02263841: Development of a Real-time Multimodal Framework for Comprehensive Deepfake Detection Incorporating Common Sense Error AnalysisRS-2021-II211341: Artificial Intelligence Graduate School Program (Chung-Ang University) No. RS-2020-II201336: AIGS program (UNIST)