In this post I will move forward on the analysis of Twitter datasets, that I have started in December. You can read the previous post, “About Iran and IRA Twitter datasets (for fun) – Part I”. This time I have focused on the potential Russian propaganda in Europe, so I have decided to analyse the tweets written in German, Italian, Spanish, French, Dutch and Danish. I have left out English tweets because it would have been difficult to separate the propaganda in the United Kingdom from the propaganda in the United States. In my own Python code, in order to read data I have mainly used these packages: pandas, langid, emoji, matplotlib and wordcloud. The module langid has been useful to classify the languages used in the users’ descriptions, it is not always correct, but it allowed me to make a comparison between the descriptions’ languages and the tweets’ languages. You can find a list of Python module to recognize the language of a text in this topic on StackOverflow. The module wordcloud allows to do awesome plots with words (thanks to Python-Graph-Gallery for the shared code).
...