Researchers Take on the Growing Risk of Fake Audio

 Romit Barua, Gautham Koorma, and Sarah Barrington, all from the MIMS ’23 program, initially presented their research on voice cloning as their final project for the Master of Information Management and Systems degree. Barrington has since become a Ph.D. student at the I School. Teaming up with Professor Hany Farid, they delved into techniques to distinguish between a real and a cloned voice designed to mimic a specific person.


Professor Farid initially underestimated the capabilities of AI-powered voice cloning but acknowledged the rapid evolution of the technology. The team's research focused on analyzing perceptual features in audio samples, particularly looking at patterns in audio waves. Real human voices, they observed, tend to have more pauses and variations in volume due to filler words and movement during recording. Identifying pauses and amplitude (voice consistency and variation) became key indicators of authenticity.


Taking a more detailed approach, the team employed an 'off-the-shelf' audio wave analysis package, extracting over six thousand features and narrowing them down to the twenty most important ones. However, they found this method less accurate. Their most successful results came from using learned features, involving training a deep-learning model. The model processed raw audio, extracting multi-dimensional representations (embeddings) to distinguish real and synthetic audio, achieving as little as 0% error in lab settings.


While highly accurate, the team acknowledged that this method might be challenging to understand without proper context. Their research aimed to address concerns about the malicious use of voice cloning and deepfakes, emphasizing the real-world utility of these technologies for nefarious purposes, such as bypassing biometric verification or fraudulent calls for money.


The team believes their work is crucial in developing robust and scalable detection systems to protect the general public from deepfake threats. Following the online publication of their research, Barrington, Barua, and Koorma were invited to present their findings at prestigious conferences, including the Nobel Prize Summit and the IEEE WIFS conference in Nuremberg, Germany.


Reflecting on their experience at WIFS, Koorma emphasized the forum's value in engaging with researchers in digital forensics and deepening their knowledge. Barua added that it provided an excellent opportunity to explore collaboration possibilities in the field of deepfake detection.


As society grapples with the implications of deepfakes affecting not only prominent figures but everyday individuals, the team's research offers a promising and scalable approach to safeguarding the public. By exploring perceptual features, spectral analysis, and leveraging advanced deep learning models, their work represents a significant step toward restoring trust in online audio content and mitigating risks posed by advancing technology.

Popular posts from this blog

The consultancy firm Houwen has officially pledged its support to sc Heerenveen

How to Teach a Creative AI Model