Gabor-Visual-Microphone
Visual Microphone: Phase-Based Audio Recovery
Overview
This repository contains a C++ implementation of the Visual Microphone algorithm. The project explores the passive recovery of acoustic signals by analyzing microscopic vibrations of objects in high-speed video recordings.
Unlike classic computer vision methods based on optical flow (intensity analysis), this approach utilizes local phase analysis with a bank of complex Gabor filters. By leveraging the Fourier Shift Theorem, we detect sub-pixel motions ( to pixels) invisible to the naked eye.
Methodology
The Physics
Sound waves cause physical objects to vibrate. These vibrations result in minute spatial displacements of the object's surface on the video sensor.
Why Phase?
Traditional intensity-based methods fail when is smaller than the quantization noise level. We move to the frequency domain using the Fourier Shift Theorem:
Global motion translates to a linear phase shift in the frequency domain. We extract this motion using a bank of complex Gabor filters:
The local phase of the filter response is linearly related to the object's displacement:
Results
Experimental setup: High-speed camera (2200 FPS), foil bag target, distance 2m.
1. Phase vs. Intensity
Comparison of raw pixel intensity analysis versus our Gabor phase demodulation method. The phase signal clearly reconstructs the acoustic wave, while intensity is lost in quantization noise.
2. Signal Localization (Heatmap)
The algorithm automatically weighs regions with high texture contrast. Red areas contribute most to the recovered sound.
3. Recovered Spectrogram
Spectrogram of the recovered simple melody. Harmonic structures are clearly visible.
Installation & Build
Prerequisites
- C++ Compiler (supporting C++17)
- CMake (>= 3.10)
- OpenCV (4.x)
- OpenMP (usually included with GCC/Clang)
Building via CMake (Recommended)
For macOS with Homebrew OpenMP, you might need specific flags (see CMakeLists.txt)
Manual Compilation (Linux/macOS)
Usage
Run the executable with the input video path. Optional arguments include output filename and sampling rate.
Example:
The program will:
- Analyze the video using parallel batch processing.
- Extract phase variations across multiple scales and orientations.
- Reconstruct the audio signal.
- Apply a high-pass Butterworth filter.
- Save the result to .wav.
Authors
- Alexander Spiridonov - Research & Development - spiridon-2005@mail.ru
- Maxim Beskonchin - Research & Development - mabeskonchin@gmail.com
Saint Petersburg Electrotechnical University "LETI"
Faculty of Computer Science and Technology