In this article, I want to analyze which moves in a game of chess are close, in the sense that they often occur in similar situations in games. If two moves often occur after or before the same moves, then these moves are similar in a certain sense.
For example, which move is close to the opening with the queenside pawn “d4”?
Is it possible to recognize a general structure and to represent it visually?
The source for my analyses are files of games played on the internet chess server Lichess. At https://database.lichess.org/ you can find all the games played on this server collected by month for download in Portable Game Notation (PGN) format. These are simple text files with a defined structure, which on the one hand can be read and written by almost any chess software, and on the other hand, can be easily manipulated as text files. …
The news published online by daily newspapers is an important source of information. Not only do they contain the statements to be disseminated, but also implicitly other information about the publisher and its employees. This flow of information is usually not intended, and the publishers are not even aware of it.
These are not secret hidden messages embedded in individual messages, as some people believe to find secret messages in Beatles songs, but information that is only apparent when a large amount of data is viewed together and correctly combined. …
Machine Learning is an important technology for handling data in today’s world. It is used to derive models of reality from data. For example, you can use it to segment customer data in an online store or to optimize a performance marketing campaign.
This usually requires the use of a programming language with a large number of program libraries for the selected language. Very often “Python” or “R” are used here today and libraries like “Scikit Learn” and “TensorFlow”.
Another way the platform “BigML” tries to go is by offering a user interface that allows them to control all steps of a “Machine Learning” project via menus. …
In this article, I would like to show how different machine learning methods can be used to classify customers into buying and non-buying using tracking data from an online shop. With features aggregated from the raw data, such as number of visits and number of page views, forecast models are trained and visualized.
Special attention is paid to the visual presentation of the forecast models with the help of 2-D plots and coloring of the decision boundaries. The peculiarities of the different methods become apparent as well as situations with under- and over-adjustment of the models. …
In a recent paper for the ETRA ’20 ACM Symposium on Eye Tracking Research and Applications, we took a closer look at the gaze behavior of computer gamers. The gaze behavior of players in different difficult situations is examined in order to gain potential insights for game design.
A comparative study was conducted in which the test persons played the game Pac-Man in three difficulty levels while their gaze behavior was recorded with an eye-tracking device. …
Progress in NLP applications is driven by larger language models consisting of neural networks using the Transformer Architecture. On the occasion of the recently published results of the currently largest model — GPT-3 of Open AI, I would like to take a closer look at these advances.
On May 28, 2020, a paper (https://arxiv.org/abs/2005.14165) by OpenAI researchers was published on ArXiv about GPT-3, a language model that is capable of achieving good results in a number of benchmark language processing tasks ranging from language translation and news article writing to question answering. …
In this article I will describe how we can segment customers based on web analytics data from an online shop. Based on the results, on-site personalization can be realized and targeted campaigns can be started for the users in the segments.
On the way there, we will first explore the data in more detail (“Explorative Data Analysis”), then do suitable preprocessing of the data, calculate the segmentation, and finally visualize the clusters. For the calculations we will use Google Colab.
In my earlier article “Animated Information Graphics” I dealt with how time-dependent data can be displayed as animations using “Python” and “Plotly”. In this article, I want to show how to create animations of information graphics with the new 2020 version of the software Tableau.
For the examples I use data on COVID19 infections in the individual countries, which can be downloaded from the “European Centre for Disease Prevention and Control”.
You can follow the examples with the free version “Tableau Public”.
An animated version of bar charts, which enjoys great popularity on YouTube, are the so-called “Racing Barcharts”. Here, the animation is created by a sequence of individual bar charts that are sorted according to an order, and follow this order (“The Race”) over time. …
This article shows, with some information graphs and statistics, anomalies in the ratio of mortality rates and the number of serious cases for Italy and Spain. In these two countries, too many people die in relation to the number of serious cases reported. How can this be explained?
The danger of the COVID19 virus has been and still is much talked and written about. On the one hand, they try to derive statements from statistical data, on the other hand, reports in the media and through private contacts describe the situation and drama of individuals. So everybody is already familiar with reports, for example, that of an Italian doctor has been confronted. But also in the private sphere, the reports of experiences are accumulating, which contradicts the still circulating reports of the kind “Is only another flu”. …
Natural Language Processing (NLP), the ability of a computer program to understand human language in this way, is an application area of artificial intelligence (AI). Language models that are created (“trained”) with large amounts of text are an important basis for NLP. The texts usually originate from articles published on the Internet and therefore reflect the opinions of the authors.
Do modern language models, which use a huge amount of such “training texts”, represent the general opinion?
In this article, I would like to investigate this by looking at a language model regarding opinions about Austria.
The model “GPT2” from “Open AI” is used, a powerful system for English, which was created with 40GB of text data. I use the largest available variant where 1.5 billion parameters were optimized. …
About