Seeing Isn't Believing: This New AI System Can Create “Deep Fake” Videos
Sophisticated image processing technology threatens to swamp the internet with next-generation fake news.
What happens when you can’t trust what your eyes see on screen? As the post-truth era collides with fast-evolving digital technology, online misinformation campaigns are threatening to trigger a catastrophic Information Age crisis. Just ask the Federal Election Commission.
Here’s the latest development: Scientists have developed an automated AI-powered software system that can be used to both create and detect “deep fake” videos — manipulated video images that can spread misinformation and malicious hoaxes. Similar technology has already been used to generate fake celebrity videos and even revenge porn.
Deep fake videos, also simply called deepfakes, rely on image synthesis techniques that can convert the content of one video content into another’s style virtually flawlessly. In a deepfake, footage of a politician or public figure can be altered so that he or she convincingly appears to be saying things that they haven’t.
Researchers at Carnegie Mellon University have developed a new technique that can generate deepfakes automatically, with no need for human intervention. Powered by artificial intelligence and machine learning, the system can copy the facial expressions of a subject in one video and then map the data onto images in another. Barack Obama can be easily transformed into Donald Trump, or John Oliver can suddenly become Stephen Colbert.
The system can also convert black-and-white movies to color, or manipulate imagery so that a hibiscus flower is converted to appear as a daffodil. Because this new tool can transform large amounts of video automatically, it could be particularly useful for filmmakers or game designers looking to create detailed digital environments.
But the CMU team is also acutely aware that the technology could be used for deepfakes, demonstrating how the system can match images of Barack Obama and Donald Trump, making it appear that one is speaking words that are actually being spoken by the other.
"It was an eye-opener to all of us in the field that such fakes would be created and have such an impact," Bansal noted. "Finding ways to detect them will be important moving forward."
Bansal and his colleagues presented one such method this week at the European Conference on Computer Vision, in Munich, Germany. By revealing details on the technique to fellow developers and researchers, the CMU team hopes to make deepfakes easier to identify, even as the technology grows more sophisticated.
The CMU technique employs a class of algorithms called generative adversarial networks, or GANS. These algorithms are separated into two models, then basically cut loose to compete against one another. The “discriminator” model learns the style of one image or video, while the “generator” learns how to create images that match that style to fool the discriminator.
By putting these two models in competition with each other, the AI system essentially teaches itself how to create the most realistic looking images. The generator tries to trick the discriminator and the discriminator scores the effectiveness of the generator.
The system is a kind of visual version of similarly AI-powered language translation software, in which English is translated into Spanish, then back again to English again. By correcting itself over and over, the system can learn new processes on its own and quickly improve its results.
This imaging software could potentially have other applications as well, Bansai said. For instance, it could be applied to help self-driving cars operate more safely in low-light conditions. More details on the technique, including code for developers, is available on the dedicated project page.