Beyond Spatial Frequency: Pixel‑wise Temporal Frequency‑based Deepfake Video Detection

Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, Jongwon Choi*
Chung‑Ang Univ · NAVER Cloud · UNIST
Chung-Ang University NAVER Cloud UNIST
GitHub arXiv ICCV paper
⭐ We got Highlight!

Interactive Demos

Abstract

We introduce a novel method for deepfake video detection that utilizes pixel‑wise temporal frequency spectra. Unlike previous approaches that stack 2D frame‑wise spatial frequency spectra, we extract pixel‑wise temporal frequency by performing a 1D Fourier transform on the time axis per pixel, effectively identifying temporal artifacts. We also propose an Attention Proposal Module (APM) to extract regions of interest for detecting these artifacts. Our method demonstrates outstanding generalizability and robustness in various challenging deepfake video detection scenarios.

Method & Architecture

Temporal artifacts and pixel‑wise frequency extraction
Frequency Extraction: 1D FFT per‑pixel along time captures subtle temporal artifacts.
Frequency Feature Extractor + Joint Transformer Module
Architecture: Frequency Feature Extractor + APM → Joint Transformer for robust detection.

Experiments & Results

APM visualization focusing on eyes/mouth regions
APM highlights regions (eyes, mouth) where temporal incoherence is likely.
Video‑level AUC comparison across datasets
Consistent SOTA video‑level AUC across datasets demonstrates strong generalization.

Limitations & Future Work

Known Limitation
  • Heavy compression (H.264/JPEG/WebP) merges neighboring pixels, weakening pixel‑level motion → temporal‑frequency shift.
  • Low‑frequency components align with raw signals, but high‑frequency components diverge after compression.
Future Work

We will investigate temporal‑frequency regularization to mitigate compression‑induced degradation.

Average pixel‑wise temporal frequency under compression
Average temporal‑frequency under H.264/JPEG/WebP compression.

📚 Citation

@misc{kim2025spatialfrequencypixelwisetemporal,
  title        = {Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection},
  author       = {Taehoon Kim and Jongwook Choi and Yonghyun Jeong and Haeun Noh and Jaejun Yoo and Seungryul Baek and Jongwon Choi},
  year         = {2025},
  eprint       = {2507.02398},
  archivePrefix= {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2507.02398}
}
⬇️ Jump to Demos

🙏 Acknowledgement

This work was partly supported by the IITP grant funded by the Korea government (MSIT): No. RS-2025-02263841: Development of a Real-time Multimodal Framework for Comprehensive Deepfake Detection Incorporating Common Sense Error AnalysisRS-2021-II211341: Artificial Intelligence Graduate School Program (Chung-Ang University) No. RS-2020-II201336: AIGS program (UNIST)