← All posts

Tuning Pulse

Notes from turning a webcam heart-rate experiment into a more honest signal pipeline.

Summary by AI of course for note taking.

Pulse started as a small rPPG experiment: point a camera at skin, watch tiny color changes over time, and estimate heart rate from the dominant frequency. The first version worked well enough on desktop when the manual focus window was placed carefully, but mobile made the weak spots obvious. A full-frame average was too noisy, indoor lighting looked suspiciously heartbeat-shaped, and the app was too eager to show a plausible BPM even when a wall could produce a similar peak.

We tightened the app in layers. First we made the sampled region explicit, then added a separate control region so the app could see what the room/camera was contributing. Then we moved away from single-frame guesses and started smoothing the spectrum itself. Finally, we added regression-based common-mode rejection and a POS-style RGB signal, which is better suited to pulse-from-color than raw brightness.

The honest update: the app is not reliable yet. After all of that instrumentation, the readings still look too random. One test against a pulse oximeter at about 130 BPM still landed around the resting-rate area in Pulse, which means the pipeline is probably finding a convenient spectral peak rather than the person’s actual pulse. The current build is therefore best treated as a signal-processing sandbox with better visibility into its failure modes, not as a working heart-rate monitor.

Camera Framesactual video-frame timing when available
Skin ROIvisible red sample box
Control ROIgreen background sample
RGB Meansone averaged color per frame
POS Signalcombine RGB to reduce lighting noise
Common-Mode Rejectionremove the part explained by control
Detrend + RMS Normalizereject flat/drift-only windows
Window + FFTHanning window, power spectrum
Spectrum EMAaverage evidence before picking BPM
BPM + Confidencecompare against the control readout
DiagnosticsFFT chart, BPM bins, control BPM, fresh runs

What We Tried

  • Whole-frame averaging: too sensitive to background, motion, and exposure shifts.
  • Manual ROI: better, but still vulnerable to room lighting and camera artifacts.
  • Fixed auto ROI: made mobile usable by sampling a visible upper-face patch instead of the whole frame.
  • Control region: exposed the uncomfortable truth that indoor lighting/camera timing can produce believable BPM peaks.
  • Simple subtraction: helped sometimes, but assumed the control and skin regions had the same artifact strength.
  • Regression subtraction: scales the control signal before removal, which is a better cheap common-mode rejection step.
  • Spectrum averaging: reduced one-window FFT jitter by smoothing the power spectrum before peak selection.
  • POS rPPG: replaced brightness-first detection with an RGB method designed to suppress illumination changes.
  • Fresh camera starts: reset buffers when the camera starts, not when it stops, so a new run does not inherit stale averages while the old run remains inspectable after stopping.
  • Chart labels: added BPM/frequency labels to the FFT and power charts so suspicious peaks can be traced to actual bins instead of vibes.
  • Slider controls: moved buffer size, slide step, and BPM bounds to sliders because these are tuning parameters, not form-entry chores.
  • Wider BPM search: kept the lower bound open enough for real resting heart rates and the upper bound high enough for stress/exertion tests, even though that makes false peaks easier to see.

Where It Landed

The current pipeline is intentionally more skeptical. It draws both the skin sample and the control sample, shows both BPM estimates, labels the charts, and uses the control signal as a diagnostic instead of pretending every clean-looking peak is a heartbeat. The main readout now comes from a POS-derived signal after common-mode rejection, detrending, normalization, FFT, and spectrum smoothing.

The next likely step is not more smoothing. It is proving where the signal is getting lost. Right now the app can still settle near a plausible low-BPM answer when the real heart rate is much higher, so something fundamental is flawed: the ROI, color transform, normalization, frequency selection, camera timing assumptions, or some combination of them.

The current debugging posture is:

  • If the control region has a strong peak near the main peak, the reading is contaminated.
  • If the main region reads similarly with a face and with a wall, the app is measuring the environment or camera pipeline.
  • If a known 130 BPM subject reads near 56 BPM, the peak picker is probably locking onto a low-frequency artifact, not pulse.
  • If the FFT bins jump too coarsely, buffer size and actual camera frame rate need to be made more explicit.
  • If everything looks plausible but wrong, the next useful change is better ground-truth testing and ROI splitting, not prettier smoothing.

Relearning The Theory

Here is the mental model I wanted back in my head while working on this.

Remote PPG

Remote photoplethysmography, or rPPG, is the camera version of the optical pulse sensors in watches and fingertip monitors. Blood volume changes slightly alter how skin reflects light. A camera cannot see the pulse directly, but it can see tiny periodic color changes if the signal is stable enough.

The paper that made POS click for this build is Algorithmic Principles of Remote PPG by Wang, den Brinker, Stuijk, and de Haan. Their key idea is that skin color, lighting variation, and pulse variation occupy different directions in RGB space. POS, short for “Plane-Orthogonal-to-Skin,” projects the RGB signal onto a plane that tries to suppress skin-tone/illumination changes while preserving pulse-like color variation.

In code terms, that is why Pulse now keeps the rolling red, green, and blue means instead of only storing one brightness value.

Control Signal

The green control box is a background sample. If the control region has the same frequency as the face region, the app is probably seeing the room, camera, or lighting, not a heartbeat.

The first attempt was direct subtraction:

clean = roi - control

That assumes both regions receive the artifact at the same scale. They do not. A wall, a forehead, and a shadow can all respond to lighting flicker differently.

The current version uses simple linear regression:

beta = covariance(roi, control) / variance(control)
clean = roi - beta * control

That removes the part of the ROI signal that is linearly explained by the control region. This is the same basic shape as common-mode rejection: keep what is unique to the measurement, remove what is shared by the environment.

Detrending And Normalization

The old normalizer was z-score normalization: subtract the mean and divide by standard deviation. That is fine for many signals, but dangerous here because a nearly flat wall signal can be scaled up until it looks important.

Pulse now removes a linear trend, measures the RMS of the remaining signal, rejects tiny residuals, and only then normalizes:

detrended = signal - bestFitLine(signal)
rms = sqrt(mean(detrended^2))
normalized = detrended / rms

The goal is to avoid turning “almost nothing” into a confident frequency peak.

Windowing

Before the FFT, Pulse applies a Hann window. A finite buffer rarely begins and ends at exactly the same point in the waveform, so the FFT sees a hard edge. That hard edge smears energy into nearby frequencies. Windowing tapers the ends of the signal to reduce that smear.

SciPy’s Hann window docs are a compact refresher. One vocabulary note: “Hanning” is commonly said, but the formal name is Hann.

FFT And Power Spectrum

The FFT is how Pulse asks: “Which repeating rates are present in this time signal?” SciPy’s Fourier transform tutorial describes Fourier analysis as expressing a signal as a sum of periodic components.

Pulse converts the windowed signal into a power spectrum, then looks only inside the heart-rate band. Each bin corresponds to a possible BPM:

bpm = bin * 60 * frameRate / fftSize

This is why camera frame rate and buffer size matter. At 30fps with a 300-frame buffer padded to a 512-point FFT, bins are about 3.5 BPM apart. Shorter buffers produce chunkier BPM jumps.

Spectrum Smoothing

Instead of picking the peak from a single FFT window, Pulse keeps an exponential moving average of the power spectrum:

smoothed = alpha * latest + (1 - alpha) * previous

That smooths the evidence before choosing a BPM. It is usually better than smoothing only the displayed BPM, because the peak picker sees a steadier spectrum.

Eulerian Video Magnification

The thing I half-remembered from school was Eulerian Video Magnification, from MIT CSAIL. It reveals tiny temporal changes in video by filtering and amplifying the right frequency band. Their examples include pulse color changes and subtle motion. A later overview, Eulerian Video Magnification and Analysis, is a good bridge between the demo and the signal-processing idea.

Pulse is not doing full video magnification. It is borrowing the same instinct: isolate the tiny temporal component first, then analyze it. POS is the cheaper, rPPG-specific version of that idea.