Analysing IRA tweets
Menu
Introduction
Among the enormous volume of tweets exchanged during the 2016 US presidential election campaign, millions were sent by alleged Russian troll accounts. Those accounts have been accused of manipulating the campaign and its outcome.Given how dramatic and dangerous the consequences of a country influencing the democratic process of another can be, we want to get a better insight on the methods and the real effects of the trolls.
On one side, we compare their activities with the evolution of popularity of the candidates along the campaign. We will also take a look at the major events of the election and determine if they were influenced by the trolls and vice-versa.
On the other side, we will study the methods employed by the Russians to disrupt the presidential election. We will examine their vocabulary, the subjects they discuss as well as the media and people they refer to.
We also keep an eye on the strategy employed by Russian trolls in other countries, and mainly in Ukraine.
Our analysis will mainly focus on trying to provide an answer to the following questions.
- Is there a relation between the candidates popularities and the activities of the trolls?
- Did the trolls influence the major events of the campaign? Is it the other way around?
- Which subjects are discussed by the trolls?
- Which media do they tend to talk about and link in their posts?
- Do they tend to show direct support or hatred for specific people?
- Did the strategy of the trolls change over time?
Dataset
The main dataset we are going to use for this project is the Russian and Iranian Troll dataset from Twitter. It contains over 10 million tweets of accounts associated with IRA and Iranian trolls. It gives information about the account itself (Twitter handle, number of followers and following, account preferences...), and about the individual tweet (content, date and time of publication, type of post, number of retweets...).Regarding the popularities of the candidates, we have identified an important amount of data source on realclearpolitics.com.
We also took a look at the events after the campaign, with for example the evolution of the approval rate of Donald Trump proposed by FiveFirtyEight.
Finally, for the timeline of the campaign and its major events, Wikipedia has a lot of information.
NB : Not all the tweets collected are trolls per se. The tweets originate from alleged troll accounts that were identified by Twitter. However, in order to not be spotted immediately as fake foreign accounts they also tried to pose as any regular American or Russian account. Therefore the dataset also contains plenty of seemingly innocent tweets.
Global Temporal Analysis
In order to assess the volumetry of tweets sent by IRA's identified troll farm we would like to display the overall frequency distribution of the tweets. In addition, we would like to observe the evolution of the number of troll accounts with time. We might end up with an interesting trend that could be explained/supported by events in the actuality at the time being.We notice that the first recorded tweets from troll accounts go back to 9th of May 2009: coincidentally, Donald Trump's first tweet dates from the 4th of May 2009.
- Period 1 (US 2012 Elections) : In yellow we can observe a first increasing trend that is shaped like a hill. It's located between February 2012 and June 2013 with
a noticeable spike in October 2012, the month right before US 2012 Elections.
Taking a look at the graph showing the evolution of the number of troll accounts with time we can argue that for such a small amount of them during period 1 they had to be quite active for that campaign.
- Period 2 (Hiring Time) : Starting from mid 2013 and mostly at the beginning of 2014 we can notice a sudden increase in both the number of accounts and the troll feed activity, which are obviously correlated. IRA's page on Wikipedia underlines the job proposal made in August 2013 :
" Internet operators wanted !"
- Period 3 (Donbass' War, Full Swing) : This time window corresponds to a massive tweet campaign to support pro-Russian factions in Donbass war in Ukraine.
- Period 4 (US 2016 Elections) :
This period of high activity embeds both Ukraine's war (2014-present, but its intensity reached a peak in mid-2014) and US 2016 presidential elections. After, the troll fever seems to drop monotically until June 2016 before growing up again right before the final elections.
- Period 5 (Ukraine's war & Presidential Instability) : The activity of troll accounts did not decrease right after the elections. The contestations of the election results, the presidential transition, etc. all make great troll topics. We have to keep in mind that until the end of January 2017, Trump's election was not confirmed yet by the congress ( wikipedia/2016 US elections ). Trolls stop after 100 months, at the beginning of the academic year 2017-2018 (end of October 2017).
Textual analysis
Before doing further analysis let us first see what the Tweets actually contain. That is, let us take a look at the- Languages : What are the languages used by the trolls ?
- Hashtags : Which hashtags do they use ?
- Words : Which are the favourite words of the trolls ?
- Sentiment analysis : What are their opinions on controversial subjects ?
- What are the mains differences between Russian and English trolls ?
Languages
Hashtags
Another important aspect of the tweets are the hashtags. They represent a lot of information as they can tell about the subject of the tweet, the opinion of the tweeter, ... Let us compare the most popular hashtags between Russian and English ones.- ВСУ is the "Armed Forces of Ukraine"
- ДНР is Donetsk People's Republic which is a proto-state of Ukraine
- ПозорWADA talks about the cancellation of the Russian paralympic games
- АнгелыВсердце refers to the events of the killed children of Donbass
- ЛНР means Luhansk People's Republic which is another proto-state of Ukraine
2014 | 2015 | 2016 | Ever |
---|---|---|---|
#BringBackOurGirls | #JeSuisParis | #Rio2016 | #BlackLivesMatter |
#TweetLikeJadenSmith | #BlackLivesMatter | #Election2016 | #CupforBen |
#BreakTheInternet | #MarriageEquality | #PokemonGo | #brexit |
#LifetimeMoviesBeLike | #LoveWins | #Euro2016 | #EdBallsDay |
#DonLemonReporting | #IStandWithAhmed | #Oscars | #FollowFriday |
Words
Now that we have analysed the hashtags, let us focus on the words themselves. This allows us to see what are the typical words used by trolls. Note that in order to stay the more coherent possible we've stemmed the words and did not include the stopwords in the list.Sentiment analysis
The final part of this textual analysis consists of classifying the relevant tweets as positive or negative. Such a process can be performed via a sentiment analysis. Once again, we have separated the analysis of the English and Russian tweets. It must be noted that two parallel analysis were made. A quite naive one inspired by Jeffrey Breen and a more complex one based on a MIT tool. For the English tweets, we focused ourselves on the tweets talking about respectively Trump, Obama and Hillary.Detailed Temporal Analysis
Time Window 1 (Vladimir Putin and US 2012 Elections)
Small interactions with US 2012 elections and easy to spot intra-Russia tweets about the elections.Interesting Tweets
Among the 10 most retweeted directly-politically-related-tweets for the concerned time period, 3 really did catch our attention.Date | RU : text | EN : text |
---|---|---|
2013-02-04 (after US 2012 elections) | Давайте перенесемся в 1991 год. Вспомним все надежды, на демократию, Ельцина, радость от краха КПСС, и посмотрим на нашу действительность. | Let's fast forward to 1991. Recall all hopes for democracy, Yeltsin, the joy of the collapse of the CPSU, and look at our reality. |
2012-03-03 | Когда вы будете вводить демократию с помощью оружия? | When will you introduce democracy with weapons? |
2013-05-11 (after US 2012 elections) | Первые демократические #выборы в #Пакистан'е прошли на фоне взрывов и стрельбы. Власти отмечают высокую явку. | Obama fell off his bike and suffered responsibility for the & lt; & lt; terrorist attack & gt; & gt; a terrorist group from & lt; & lt; Al-Qaida; |
Time Window 2 (Hiring Time) and Time Window 3 (Donbass' war)
Recall :The semantic analysis performed above already showed the overall support tweets from IRA tried to send through.
Prior to those events a calm period had been observed, Ukrain's troubles seemed to trigger the trollers again. During the next time window we've spotted three major events (by this we mean events that were massively relayed by medias) for which IRA trollers did react.
Among them we can notice the shootdown of the russian war aircraft :
- 0 : (first graph) (light blue) Hiring Campaign and Start of Crimean armed conflict.
- 1 : (second graph) (blue) Shootdown of a war aircraft (proved to having been processed from ground by Russian units).
- 2 : (second graph) (green) Shootdown of a civil airplaine (proved to having been processed from ground by Russian units).
- 3 : (second gaph) (salmon) Russian army takes over the power in an Ukrainian region.
Time Window 4 (US 2016 Elections)
Let's now focus on the period of time corresponding to the whole presidential campaign (+ some time before as a flourishing speculative period)- 1 : In green, the observable peak of US political troll tweets is directly correlated to Donald J. Trump announcement about his
candidacy to the US 2016 presidential elections. Trolls made a lot of tweets about that fact to best spread the word. (6th of June 2015)
- 2 : In light blue, was held the second republican debate, Donald J. Trump did attend this event. (16th of Sep 2015)
- 3 : In salmon, Hillary Clinton's mail gates. At the time of the peak FBI's former director, Comey (who will be fired several months after as we will see), testifies about the extremely careless attitude of Hillary towards her mails while being state's secretaress for president Obama. Those revealings were a really good opportunity for trollers to lead a front attack against Clinton.
- 4 : In purple, took place the first presidential debate between Donald J. Trump and Hillary Clinton : (26 of september 2016, after the primaries)
Event followed by many people and for which the outcome (win the debate/lose the debate) is sometimes crucial in a campaign.
What's more ?
We have succeeded in spotting some peaks immediately related to campaign's events !In order to get a quantitative effect of the trolls tweets we will display the aggregated sentimental scores per week and per opponent.
We have aggregated the number of positive/negative tweets referring to Trump (resp. Clinton) (following our sentimental analysis in the previous section) per week during the period 4 (whole presidential campaign). Then we computed favorability deltas that were just the difference between the positive and the negative scores (tweets counts) of a candidate during a week. Let's review the evolution of the favorability deltas (using more advanced sentimental analysis)
TRUMP : summary statistics of sentimental deltas | CLINTON : summary statistics of sentimental deltas | |
---|---|---|
min | -127 | -54 |
mean | 1.52 | -3.15 |
max | 362 | 28 |
We tried to put side to side intentions of votes for Trump (resp. Hillary) with the volume of politically related tweets to see if there were easy to catch correlations but we did not bring to light any apparent results. A lot of different social media sources must be taken into account, it is really hard to get significant results only based on the given datasets. Most of the interesting statistics about social influence / popularity scores ... etc are chargeable (you must pay for them), also we should have had access to studies (for instance the studies used by AllCott-US2016-fakeNews) to get an insight on how are people affected by the presence of the well established troll tweets. However the existence of trolls is undeniable (examples are to follow below) but their spread to american citizens (or more globally, for all tweets' topics, human beings) is really tough.
Most of the interesing tweets (with respect to the content) are among the most retweeted ones. Having a closer look to the 1000 most RT tweets during the period 4 (preceeding the final elections), we managed to spot an user profile that produced up to 56.7% of those 1000 tweets!
Here is a sample of his achievements :
Date | EN | Tweet Recount | HashTags |
---|---|---|---|
2016-10-11 | OMG, this new Anti-Hillary ad is brilliant!👌 It's fantastic!!!!!! Spread it far & wide! | 10756 | [] |
2016-10-18 | RT the hell out of it: Dem party operatives: 'We've been bussing people in.. for 50 yrs and we're not going to stop now' #EvangelicalTrump | 6772 | [EvangelicalTrump] |
2016-10-20 | 🚨DISGUSTING Watch: Hillary laughing when Trump said gays get thrown off buildings in Muslim counties‼️‼️ #debatenight #trumpwon #debate | 3762 | [debatenight, trumpwon, debate] |
Time Window 5 (Ukraine's war & Presidential Instability)
In this time window we did observe two main events that pushed massively troll accounts to their keyboards.We only focused on the political tweets since the peaks of activity are much more visible that way.
- 1 : In green, the peak corresponds to Times' revealings in July 2017. These revelations pushed the US into a big national debate and a lot of
political ruction.
- 2 : In light blue, the victory of Trump at US 2016 elections, welcommed by IRA's trollers.
Interesting Tweets
Among the 10 most retweeted directly-politically-related-tweets for the concerned time period, 3 really did catch our attention.EN : text |
---|
JULIAN ASSANGE: I investigated both presidential candidates — Hillary was the only one with corrupt ties to Russia https://t.co/MVVPkLORTbсть. |
President Trump calling Jim Acosta/CNN fake news is officially the best thing I've seen this week. Cry baby Jim, cry! |
So much evidence that Comey covered up civil rights abuses. Certainly a bigger story than that Russian #fakenews https://t.co/OqKw5dqU19 |
Looking at the global timeline of events during the campaign : around July 2017 8th, Times revealed (accused) Trump had been negociating with russian contacts in Trump's tower. It was proven 6 months later to be true but didn't fail to attract a lot of mediatic buzz. As the tweet_texts mention it there was also Comey's gate (FBI former director) ! Trolls here are just using the most basic technique : bad faith !
The most retweeted tweets speak for themselves :
- Comey is a liar and shouldn't investigate about Trump's legitimacy.
- Hillary is the only one corrupted w/ russians.
- To Trump, journalist Acosta is "fake news" (it launched a new trend by the way)
Retweets
Retweets are one of the most important functionalities of Twitter. They often lead to a snowball effect: a post that receives attention gets retweeted several times, which gives it more visibility, thus more attention... Retweets create retweets! It is a really convenient and fast way to propagate a message.Retweeted by the trolls
A quick glance at the messages written by the trolls that got the most retweets shows us that Russian is the main language for those. It has twice the count of English.We can see that the exposed trolls present in our database do not represent the majority of the most retweeted handles, only 6 of them are in the top 20 (one of which is anonymized). Besides those, two accounts have been deleted and their handles are unretrievable. All the accounts in this top 20 appear to mainly write in Russian (shown in red on the graph). This also holds true for the troll accounts: even though some of them picked English as account language (shown in blue on the graph), they all tweeted mainly in Russian.
Most of the most retweeted accounts are news agencies, several of those are owned by the Russian government. It is also interesting to note that almost all of them experienced major changes in 2013-2014. Here are the most notable ones:
- rianru : Refers to RIA Novosti (Russian news & Information Agency, "Novosti" means "news"), the previous official international news agency of Russia. It has been replaced in 2014 by Rossia Segodnia, but the name RIA is still used.
- GazetaRu : Gazeta.ru is one of the most popular Russian online newspapers. Was part of a major companies fusion in 2013, under the name Afisha.Rambler.SUP, later renamed Rambler&co in 2014.
- RT_russian : Previously named Russia Today, now simply known as RT (sheer coincidence?). This news channel was created by RIA Novosti, and is globally considered as a propaganda outlet. New centre in 2013, RT UK launched in 2014.
- vesti_news : A news channel belonging to VGTRK, a media company owned by the Russian government. Experienced a brand refresh in 2014.
- lentaruofficial : Lenta.ru is another popular online newspaper which has been restructured in 2014 and has since be accused of serving propaganda purposes.
- tass_agency : Also a government owned news agency, of which the name changed in 2014.
- lifenews_ru : News website, pro-Kremlin, and allegedly spreading fake news. TV channel launched 2013.
- riafanru : This is the most retweeted troll account, with a counter exceeding 70 000 (almost three times the counter of the second most retweeted troll). The corresponding website (Federal News Agency) seems to be focused on Ukraine, Syria, Russia, and fearmongering. It was founded in 2014 and is still running in 2018. (Account active between 2015-2017.)
Retweeters
Now that we have seen the most retweeted accounts, let us consider the main "retweeters", focusing again on the unanomyzed accounts. Surely, since Russian accounts are the most retweeted, Russian accounts should also be responsible for most of the retweets?Trolls retweet network
It is now time to visualize the links between the troll accounts that are not anonymized. As above, accounts with English language settings are in blue, while Russians are in red. The size of a node is proportional to its degree. The color of a link indicates its importance: red links correspond to more interactions between two accounts.There are clearly two main clusters, divided by language. Again, some accounts with English settings are actually tweeting in Russian. All the accounts in the "Novosti" (recall that it means "news") cluster are in this position.
The two big groups are however connected! Those are the links between them:
|
|
russilanrogov is the only Russian connexion of Pamela_Moore13 and TEN_GOP. It also seems to have right troll tendencies. Although it writes mostly in Russian, several of its tweets are in English or even in French, and are very focused on politics.
Interestingly, the relations always appear to go in the same direction: the Russian accounts retweet the English speaking ones, and not the other way around.
Following an article by The Washington Post, it has been observed that several close relatives of Donald Trump have engaged with some of the accounts mentionned in this table, one of which being on the RU side.
In brief
Regarding retweets, the IRA trolls are split in two main groups: Russian speaking and English speaking.The Russian trolls tend to interact much more between each other, as we can see both on the bar charts and on the network graph. They also happen to sometimes use posts from American trolls, while the contrary does not appear to be true. Furthermore, they seem to focus their activity on a handful of very popular media, the majority of which is directly linked to the government.
English speaking trolls, on the other hand, seem to use retweets very actively and to diversify their sources. The graph below shows that they also are much more successful in spreading their messages through retweets. The posts written in English by the trolls got more than twice the number of retweets as the posts in Russian.
URLs
Another way to understand better what the trolls are talking about is to analyze the urls present in their tweets. There are more than 4 millions of them, some tweets contain more than one.Domains
Let us first get an overview of the domains that are the most used, and compare them based on the languaged of the tweet in which they appear. Clearly, the popularity of the domains are mostly determined by Russian tweets. We can already see that some websites are used a lot by both communities. bit.ly, ift.tt, goo.gl and j.mp are all url-shorteners ; ift.tt also provides social media management services, like dlvr.it. twitter.com and youtu.be are of course very popular social media, and also propose shortened versions of their urls. Those short urls are very useful on Twitter, due to the character limit, which explains the popularity of those domains.Splitting the domains between the two languages makes it easier to compare and analyze them. The most popular domains present in tweets in English are rather unsurprising: Twitter, many url-shorteners and popular video media like Youtube and Vine. The only weird domain is LoseFatTips.pw. From what we can find on the Wayback Machine, it seems to simply be a shady website which is now terminated, that used to redirect visitors to Asian websites. Its presence in the top 10 is probably due to one account spamming it constantly. The popular domains found in Russian tweets are a bit more informative than the American ones. The usual url-shorteners and social media management tools are present, but they compete with several news websites. Several of those are recognizable: they correspond to some of the most retweeted Twitter handles: riafan.ru, gazeta.ru, russian.rt.com, nevnov.ru, and vesti.ru.
URLs
Going a bit deeper, we observe that some URLs are actually repeated a few hundred times. They are listed in the table below, according to the language of the associated tweets. Although several links are dead, some of them can be retrieved thanks to the Wayback Machine. Russian sites can be translated with an online tool such as Lexilogos to get an idea of their content.The English speaking side however is more diverse. There are some political (fake) news websites: U.S. Freedom Army, covfefe.bz, or even The Telegraph, a British newspaper which has a notable conservative stance and which has been accused of spreading pro-Russian propaganda. The most intriguing part of this ranking is the presence of four webradios. They do not appear related, and only one of them offers some kind of news reports. Their purpose is unclear.
In brief
The URLs are coherent with what we learned previously. Besides the usual popular websites, we can clearly distinguish the behavior of the trolls depending on their language. Russians focus on spreading news reports, while Americans tend to diversify their activities and to look more like casual users.Conclusion
To conclude, what our analysis pointed out is the fact that there are many things a creative mind could say while performing classical data analysis on IRA's tweets dataset. As previously mentionned, we sometimes faced a lack of data. Mostly data about the real effects of an (over)exposure to fake news/trolls on the opinion of people. Such information sometimes exist but are either incomplete or not free. After understanding the impossibility of completely answering all of our initial questions, we started to think differently. We managed to discover as many pieces of evidence as possible, traces of possible interferences of IRA's trolls, both on American and Ukrainian politics.At the end of the day we would like to underline the correct identification of the troll accounts. Our analysis showed so many correlations and confounding events that it clearly appears those accounts were willing to spread misinformation through the Internet.
Here are, for each analysis, the lessons we can learn:
- The temporal analysis showed that the volume of activity of the trollers matches very well mediatic (political) events ; especially regarding US 2016 presidential elections and the Crimean armed conflict (Donbass' war).
- Textual analysis shed light on the fact that trolls tend to use a vocabulary that is, in general, more negative. Their favourite word is Trump !
- Retweets and URLs showed that Russian speaking accounts interact more between each other and focus on spreading messages from their government's news media. Meanwhile, English speaking accounts look more casual, and manage to get retweeted more frequently.