Detecting Satire and Fake News with Machine Learning

Image for post
Image for post

Sometimes it is even hard for humans to understand if a news article is real, fake or satire. So I asked my self if I can train a machine learning model to decide to which class (real or satire) a given article belongs. There are websites like publishing satire news every day, which can be used together with regular news sites, to collect training data for this classification problem.

I grabbed large datasets of news articles in the German language from news agencies and newspapers via their websites:

and from the satirical news sites:

for training and testing of the model. In total, I collected 63,868 articles from 2008 to 2018 and stored them in a local database.

Image for post
Image for post
Database of news articles

To train a classifier I used the “ScikitLearn” Package with a linear Support Vector Classifier (SVC). The news texts were vectorized with a count vectorizer and Tf-idf weighting (see the code below).

80% of the data was used for the training of the classifier and 20% for testing. On the test-set, I achieved an accuracy of 0.996, precision of 0.986, a recall of 0.952 and an F1 score of 0.969. In the confusion matrix below you can see the distribution of the correct and wrong classifications. Only 11 of the real news are classified as satire but 42 of the satirical texts are not detected as satire. Quite good results.

Image for post
Image for post
Confusion matrix

I think the presented method can be used with other languages and I expect similar results as with the German news.

Are computers better than humans in detecting satire in texts?

More details can be found in the article

University of Applied Sciences Upper Austria / School of Informatics, Communications and Media

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store