A Data Analysis of Coverage in Austrian Online News

Image for post
Image for post
Coverage over the last three month / Image by Author

Due to the extensive coverage of the COVID19 pandemic, it is easy to get the impression that nothing else is being reported. In this article, I would like to examine on the basis of data whether this subjective impression corresponds to reality. I will also take a look at how coverage of different topics has evolved over the last few months. Regional differences and the tone of the news coverage, i.e. whether it has been more positive or negative, will also be discussed.

The Data

For the analysis, I collected and evaluated a total of 148,991 articles published online in the Austrian…

Word Vectors for Chess Moves

Image for post
Image for post
Similarity map of chess moves / Image by Author

In this article, I want to analyze which moves in a game of chess are close, in the sense that they often occur in similar situations in games. If two moves often occur after or before the same moves, then these moves are similar in a certain sense.

For example, which move is close to the opening with the queenside pawn “d4”?

Is it possible to recognize a general structure and to represent it visually?

Data source

The source for my analyses are files of games played on the internet chess server Lichess. At https://database.lichess.org/ you can find all the games played…

Data analysis of online news

Image for post
Image for post
Length and volume of online news per weekday and time / Image by Author

The news published online by daily newspapers is an important source of information. Not only do they contain the statements to be disseminated, but also implicitly other information about the publisher and its employees. This flow of information is usually not intended, and the publishers are not even aware of it.

These are not secret hidden messages embedded in individual messages, as some people believe to find secret messages in Beatles songs, but information that is only apparent when a large amount of data is viewed together and correctly combined. …

with “BigML”

Image for post
Image for post
Distributions and correlations

Machine Learning is an important technology for handling data in today’s world. It is used to derive models of reality from data. For example, you can use it to segment customer data in an online store or to optimize a performance marketing campaign.
This usually requires the use of a programming language with a large number of program libraries for the selected language. Very often “Python” or “R” are used here today and libraries like “Scikit Learn” and “TensorFlow”.

But there is another way!

Another way the platform “BigML” tries to go is by offering a user interface that allows them to control all steps…

A visual approach with different machine learning classifiers

Image for post
Image for post

In this article, I would like to show how different machine learning methods can be used to classify customers into buying and non-buying using tracking data from an online shop. With features aggregated from the raw data, such as number of visits and number of page views, forecast models are trained and visualized.

Special attention is paid to the visual presentation of the forecast models with the help of 2-D plots and coloring of the decision boundaries. The peculiarities of the different methods become apparent as well as situations with under- and over-adjustment of the models. …

Lessons learned from an Eye-Tracking Study

Image for post
Image for post
Adapted Pacman Version

In a recent paper for the ETRA ’20 ACM Symposium on Eye Tracking Research and Applications, we took a closer look at the gaze behavior of computer gamers. The gaze behavior of players in different difficult situations is examined in order to gain potential insights for game design.

A comparative study was conducted in which the test persons played the game Pac-Man in three difficulty levels while their gaze behavior was recorded with an eye-tracking device. …

The race for larger language models is entering the next round.

Image for post
Image for post
Image: www.pexels.com

Progress in NLP applications is driven by larger language models consisting of neural networks using the Transformer Architecture. On the occasion of the recently published results of the currently largest model — GPT-3 of Open AI, I would like to take a closer look at these advances.

Are the models more than just huge “lookup tables” with intelligent interpolation methods?

On May 28, 2020, a paper (https://arxiv.org/abs/2005.14165) by OpenAI researchers was published on ArXiv about GPT-3, a language model that is capable of achieving good results in a number of benchmark language processing tasks ranging from language translation and news article writing to question answering. …

With Web Analytics Data and k-Means Clustering

Image for post
Image for post
Identifying clusters of similar Customers

In this article I will describe how we can segment customers based on web analytics data from an online shop. Based on the results, on-site personalization can be realized and targeted campaigns can be started for the users in the segments.

On the way there, we will first explore the data in more detail (“Explorative Data Analysis”), then do suitable preprocessing of the data, calculate the segmentation, and finally visualize the clusters. For the calculations we will use Google Colab.


Using the example of current data on COVID19 infections

In my earlier article “Animated Information Graphics” I dealt with how time-dependent data can be displayed as animations using “Python” and “Plotly”. In this article, I want to show how to create animations of information graphics with the new 2020 version of the software Tableau.

Image for post
Image for post


For the examples I use data on COVID19 infections in the individual countries, which can be downloaded from the “European Centre for Disease Prevention and Control”.
You can follow the examples with the free version “Tableau Public”.

“Racing Barchcharts”

An animated version of bar charts, which enjoys great popularity on YouTube, are the so-called “Racing Barcharts”. Here…

Information graphics and statistics on COVID19

This article shows, with some information graphs and statistics, anomalies in the ratio of mortality rates and the number of serious cases for Italy and Spain. In these two countries, too many people die in relation to the number of serious cases reported. How can this be explained?

Image for post
Image for post
(data source: https://www.worldometers.info, graphic: www.stoeckl.ai)

The danger of the COVID19 virus has been and still is much talked and written about. On the one hand, they try to derive statements from statistical data, on the other hand, reports in the media and through private contacts describe the situation and drama of individuals. So everybody is already familiar…

Andreas Stöckl

University of Applied Sciences Upper Austria / School of Informatics, Communications and Media http://www.stoeckl.ai/profil/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store