Fake News self-made!
In today’s digital age, “I saw it with my own eyes” can hardly be considered a valid argument. This is especially true when people themselves believe that digitally prepared content, such as online news or social media posts, has been consumed. The possibilities for manipulating digital content are simply too great.
A typical example to show what is possible is a video of an Obama speech that he never actually gave.
Now a new service has just been announced, www.nuis.tv, which automatically converts news texts into video articles with AI support. This is intended to enable users on a smartphone to no longer have to read current news articles, but to have them presented to them by a news anchor.
In order to be able to implement this, it is not only necessary to artificially generate images of people, but also to have them read the messages aloud. The necessary technologies for converting text into speech and lip synchronization already exist.
In this article, I would like to show how digital content can be generated. The latest work from artificial intelligence research is used for this.
Step 1) Select image or video material or create an artificial one
For this step, a photo or video of a person can be selected, or artificially created, to make the statement. The website https://thispersondoesnotexist.com/ creates photos of people who have never existed and probably never will.
Step 2) Write or generate the text of the message
Let us now turn to the generation of text messages. The website https://bellard.org/textsynth/ offers an implementation of the “GPT2 language model” from OpenAI (https://openai.com/blog/better-language-models/ ). This can be used starting from a piece of text to continue this.
The following sentence from the New York Times was given as a start.
“But despite the large drop in new infections since early this year, the U.S. death rate remains at nearly 1,500 people every day.”
The story continued as follows:
“The CDC estimates that 1.6 million Americans die annually from infections linked to SARS. The virus causes fever, respiratory illness, weakness, and confusion. It can cause diarrhea, kidney and liver failure and is believed to have caused more than 300,000 deaths worldwide.
The Ebola outbreak is the most severe since the devastating 1976 SARS outbreak that killed at least 15,000 people.”
The story reads like a real news report, but the facts do not correspond to reality. This shows that fact-checking techniques are becoming more and more important. Just because something sounds real doesn’t mean the content is accurate.
Step 3) Generating the audio recording from the text
If you want to have the text spoken by the person, you need a sound file with the speech. But not just spoken with any voice, but with the voice of the desired person. This is done by NVIDIA’s Flowtron (https://github.com/NVIDIA/flowtron) model. It generates spoken texts that are hardly distinguishable from real ones. Here, the voice of the speaker can be specified by a pattern.
Step 4) Generation of the lip sync video
Finally, Wav2Lip (https://github.com/Rudrabha/Wav2Lip) can also solve the problem of lip synchronization. The software generates realistic speaking faces for any human speech and facial identity.
The video https://www.youtube.com/watch?v=0fXaDCZNOJc shows the results of the procedure.
If you now combine all these technologies, you get a system that not only invents a text according to specifications but also lets any person (real or invented) recite it as a video.
This shows that you can no longer be sure that a video you have seen on social media, in which a person makes a statement, is not completely fictional.