AI-Made Sounds Are Real Enough To Trick Humans
This computer is smart enough to supply audio for silent video clips. Welcome to the Turing Test for sound.
A drumstick tapping a book, splashing through water, and swatting at foliage makes distinct sounds, but now it's nearly impossible to tell whether they were produced by a human -- or a very smart computer.
Researchers in MIT's Computer Science and Artificial Intelligence Lab created an algorithm to fill in the audio for silent video clips. The AI works so well that human listeners thought the noises were original recordings. The Turing Test for sound has arrived.
To develop and train their algorithm, the MIT team recorded video and audio of a drumstick producing all kinds of sounds. They picked a drumstick for consistency, using it to do things like rustle ivy, tap the ground, scrape rocks, and thump chairs. The resulting 1,000 videos and 46,000 sounds resemble an avant-garde art project.
"When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it," PhD student Andrew Owens told MIT News. "An algorithm that simulates such sounds can reveal key information about objects' shapes and material types, as well as the force and motion of their interactions with the world."
After compiling the dataset, the team fed it into an algorithm that relies on a technique called deep learning where computers teach themselves how to find patterns. The algorithm predicted sounds for silent video clips by examining the sound properties in each original video frame, matching them to the most similar sounds in the database, and then stitching audio together, the researchers said in a press release.
The computer did a bang-up job. When the team did an online study asking humans to distinguish which of two videos had the original recorded sound, they picked the AI version twice as often. The algorithm isn't without hiccups, though. It has trouble with erratic movement, which can produce false hits, the researchers explain in their forthcoming paper.
Beyond tripping up human listeners, the new algorithm has potential applications for sound effect design for movies and TV. Even bigger than that, similar algorithms could help robots make better predictions about how a material or object will respond to touch, improving physical interactions.
The researchers plan to present their algorithm at the conference on Computer Vision and Pattern Recognition in Las Vegas, which starts June 26. Watch the MIT team literally beat the bushes for the sake of AI, and see if you can tell the difference between man and machine: