Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection, ICCV 2025

Abstract

We introduce a novel method for deepfake video detection that utilizes pixel-wise temporal frequency spectra. Unlike previous approaches that stack 2D frame-wise spatial frequency spectra, we extract pixel-wise temporal frequency by performing a 1D Fourier transform on the time axis per pixel, effectively identifying temporal artifacts. We also propose an Attention Proposal Module (APM) to extract regions of interest for detecting these artifacts. Our method demonstrates outstanding generalizability and robustness in various challenging deepfake video detection scenarios.

Method & Architecture

Frequency Extraction

Our method captures subtle temporal artifacts in deepfake videos by applying a 1D Fourier transform to each pixel over time, unlike previous methods that rely on spatial frequency stacking.

Proposed Architecture

Frequency Feature Extractor and Joint Transformer Module

The pipeline consists of a Frequency Feature Extractor (with pixel-wise temporal Fourier transform and Attention Proposal Module) and a Joint Transformer Module for robust deepfake detection.

Pixel-wise Temporal Frequency Extraction: 1D Fourier transform along the time axis for each pixel, capturing subtle temporal artifacts.
Attention Proposal Module (APM): Learns to focus on regions with temporal artifacts using weak supervision.
Joint Transformer Module: Fuses global/part-based frequency features and spatio-temporal context for final classification.

Experiments & Results

Attention Proposal Module (APM) Visualization

Visualization of APM proposed regions over time

The APM automatically focuses on regions (e.g., eyes, mouth) where temporal incoherence is most likely, enabling more precise detection of deepfake artifacts.

Performance Comparison

Our method achieves state-of-the-art video-level AUC across multiple datasets, demonstrating superior generalization and robustness compared to previous approaches.

Achieves state-of-the-art performance on multiple datasets (FF++, CDF, DFDC, KoDF, etc.).
Demonstrates strong generalization in cross-deepfakes and cross-synthesis experiments.
APM is effective in identifying regions of interest for deepfake detection.

Key Contributions

Introduces pixel-wise temporal frequency for deepfake video detection.
Utilizes an Attention Proposal Module (APM) to identify regions of interest.
Leverages a joint transformer module to leverage temporal-frequency information.
Achieves state-of-the-art performance and generalizability.

Limitations & Future Work

Limitations:

Heavy compression (H.264, JPEG, WebP) merges neighboring pixels and diminishes pixel-level motion, inducing domain shifts in the temporal frequency.
Compressed videos align with the raw signal closely at low frequencies, but diverge significantly in the high-frequency range.

We measured the average pixel-wise temporal frequency under various compression schemes (H.264, JPEG, WebP).

Future Work:

This sensitivity to heavy compression remains a limitation. In future work, we will investigate temporal-frequency regularization techniques to mitigate performance degradation.

📚 Citation

Cite our paper using the following BibTeX entry

@misc{kim2025spatialfrequencypixelwisetemporal,
      title={Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection}, 
      author={Taehoon Kim and Jongwook Choi and Yonghyun Jeong and Haeun Noh and Jaejun Yoo and Seungryul Baek and Jongwon Choi},
      year={2025},
      eprint={2507.02398},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.02398}, 
}

🙏 Acknowledgement

This work was partly supported by Institute of Information & Communication Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT):

No. RS-2025-02263841: Development of a Real-time Multimodal Framework for Comprehensive Deepfake Detection Incorporating Common Sense Error Analysis
RS-2021-II211341: Artificial Intelligence Graduate School Program (Chung-Ang University)
No. RS-2020-II201336: AIGS program (UNIST)