News Security Technology

New AI Detects Deepfakes By Analyzing Speaker Behavior

The latest generative AI can perfectly mimic vocal biomarkers, rendering traditional deepfake detection software obsolete

The arms race against deepfake technology has reached a critical turning point, with experts warning that state-of-the-art synthetic speech can now consistently replicate the subtle vocal biomarkers used by leading software tools to authenticate human voices.

Rana Gujral, CEO of Behavioral Signals, revealed in a recent interview that the only reliable future defence lies not in analyzing the physical characteristics of the voice, but in the realm of behavioral biometrics. This approach focuses on sentiment, intonation, and linguistic habits the “how” of human communication—to identify AI-generated or manipulated audio.

The challenge is severe. Most current deepfake detection algorithms rely on identifying tiny, naturally occurring physical characteristics—such as micro-pauses, slight jitters, and throat resonance. Gujral warns that advanced AI systems can now synthesize speech and then insert these biomarkers convincingly enough to fool the best-performing voice liveness and deepfake detection systems on the market.

“That’s the big problem, and nobody’s talking about it,” Gujral said. “The fact is that the vast majority of these tools do not work.”

Moving Beyond Rudimentary AI

Behavioral Signals, which spun out of the Signal Analysis and Interpretation Labs (SAIL), developed its approach after critiquing existing sentiment AI tools. Gujral argues that these older tools, which often simply convert speech to text and parse the words, are too rudimentary.

His company’s behavioral signal processing (BSP) engines take a different route, focusing on analyzing non-lexical signals like tone of voice, pitch (prosody), and intonations in real-time. They have further refined their models by adding layers of behavioral analysis, including engagement, empathy, and politeness.

This advanced technology led to early applications in optimizing call centre interactions and, notably, attracted investment from the CIA’s venture arm, In-Q-Tel, culminating in years of classified work with U.S. agencies.

Power Of Personalised Fingerprint

Instead of looking for generic human vocal patterns, the Behavioral Signals solution builds a “personalised fingerprint” of a speaker, mapping their unique communication style. This includes tracking preferred spacing, specific pause patterns, and how their tone shifts mid-sentence.

The system utilizes dynamic models like transformer encoders and temporal attention networks, moving beyond the typical, less effective Convolutional Neural Networks (CNNs) often repurposed from older voice biometrics models. By achieving mid-nineties accuracy, the model has surpassed human parity in voice tone recognition.

Any significant departure from this established behavioral profile triggers an alert, identifying potential synthetic interference.

Gujral confirms that the behavioral analysis component is robust: “This part is foolproof because there are no known real generators that can create an audio that can fool this.”

The implications for digital trust are vast, especially as forecasts suggest voice deepfake checks will triple to surpass more than 4.8 billion by 2027. The application of this technology extends beyond simple fraud detection to protecting the reputations of high-value clients, such as celebrities or executives, by providing full forensics and enabling mass analysis of content for authenticity.

Gujral concludes that the problem is fundamentally “bigger than deepfakes,” representing a larger necessity for understanding and authenticating human interactions in the digital age.

Leave a Reply

Your email address will not be published. Required fields are marked *