David Edelman, former special adviser to Presidents Bush and Obama on technology and cyber security, was working in the White House during the tragic Orlando nightclub shooting two years to the week ago.
“I was there the night that it happened and we were doing the grim ritual that happens all too often in the White House, of preparing a statement to assure the nation,” Edelman recalls.
Giving a presentation at AMP’s Amplify technology event in Sydney yesterday, Edelman played a clip of Obama’s speech given in the days following the tragedy.
The sadness in Obama’s eyes, his deflated posture is plain to see. Except it wasn’t really Obama – the footage of the speech was entirely fabricated.
The clip was the output of a recurrent neural network model developed by researchers at the University of Washington, who last year created a ‘photorealistic talking head model’ of Obama that can ‘speak’ any audio input given.
“That was a fake. It was 100 per cent computer generated. Not super-imposing his face on someone else, completely generated from scratch. He did give that speech but he gave it from a different place entirely, the east room of the White House, a totally different place,” Edelman, now director of a technology policy project at MIT, said.
The so called ‘Deep Fake’ video was so good it had Edelman fooled.
“Now I wrote speeches for the guy. I know him. I cannot tell the difference between the real and the fake. That’s the power of the technology we’re talking about,” he said.
The AI techniques behind ‘Deep Fake’ videos are advancing rapidly. In a paper due to be published in journal ACM Transactions on Graphics next month, researchers from Stanford University, University of Bath and others, as well as Technicolor, describe a “generative neural network with a novel space-time architecture”.
The results are scarily impressive. The work – dubbed ‘Deep Video Portraits’ – allows a video of someone speaking to be mapped onto a ‘portrait video’ of someone else. And not just the lip movements and basic facial expressions, but the source actor’s full 3D head position, head rotation, blinking and eye gaze.
As well as ‘re-animating’ footage of themselves, the researchers also demonstrate their words and actions being replicated by UK prime minister Theresa May, deceased president Ronald Reagan and Russian president Vladamir Putin.
The researchers are aware of their work’s potential for harm.
“Unfortunately, besides the many positive use cases, such technology can also be misused,” writes the paper’s co-author, Stanford university visiting Professor Michael Zollhofer, on his blog.
“For example, the combination of photo-real synthesis of facial imagery with a voice impersonator or a voice synthesis system, would enable the generation of made-up video content that could potentially be used to defame people or to spread so-called fake-news,” he adds.
Imagine the damage
At present, the creation advanced ‘Deep Fake’ videos is limited to those with the necessary computer science skills, but not for long, Edelman predicts.
“Today, right now, this is pretty hard to pull off…but the truth is it is increasingly moving to the realm of the possible,” he said. “And in a year or two it’s going to be as easy as Microsoft Paint and as available.”
Nor will the outputs be limited to politicians. Public figures are obvious initial targets, due to the sheer amount of publicly available, high quality footage of them, but Deep Fakes will inevitably be created depicting business leaders, newsreaders and anyone who has posted a video of themselves on social media or YouTube.
“What happens when it isn’t just used for political coercion which is certainly could be? You can imagine trying to drive down the market with a sudden piece of ‘Deep Fake’ news that becomes viral. What happens when this becomes commonplace for all of us? What happens when a CEO is caught on video just moments before the big earnings call?” Edelman said.
“What happens when it’s a video of you doing something you never did? Sexually harassing someone or maybe abusing one of your employees? What would you say? Would your employer believe you that the video has been faked? Imagine the damage that this technology can do,” he added.
The ‘Deep Video Portraits’ researchers suggest that advances in digital forensics will lead to approaches that can automatically prove the authenticity of a clip. They also emphasise the need for sophisticated fraud detection and watermarking algorithms.
“In my personal opinion, most important is that the general public has to be aware of the capabilities of modern technology for video generation and editing,” writes Professor Zollhofer.
“This will enable them to think more critically about the video content they consume every day, especially if there is no proof of origin.”
If the fight against plain old, text-based Fake News is anything to go by, that may be easier said than done.