We introduce a novel method for deepfake video detection that utilizes pixel-wise temporal frequency spectra. Unlike previous approaches that stack 2D frame-wise spatial frequency spectra, we extract pixel-wise temporal frequency by performing a 1D Fourier transform on the time axis per pixel, effectively identifying temporal artifacts. We also propose an Attention Proposal Module (APM) to extract regions of interest for detecting these artifacts. Our method demonstrates outstanding generalizability and robustness in various challenging deepfake video detection scenarios.
This sensitivity to heavy compression remains a limitation. In future work, we will investigate temporal-frequency regularization techniques to mitigate performance degradation.
Cite our paper using the following BibTeX entry