Word Cloud on Second Presidential Debate :- SSIS and PowerBI

With the recent debates on the US presidential election we felt it presented a good opportunity to do some text analytics on the second presidential debates. We used the links in the reference section below to do most of the analysis and the results are presented as is. The intention of the post is to show how it is possible to do interesting analytics in the MSBI Stack and nothing else.

First we used loaded the transcript into MS SQL Server and then used Term extraction to extract words using the default settings of the transformation.

Inside the DFT

Once the terms were extracted the rest of the analytics was performed on PowerBI portal after downloading the Word Cloud App, link in the reference section.

TRUMPS WORD CLOUD FOR THE SECOND DEBATE

At first glance its obvious “Donald Trump used a lot of words, ”

 

CLINTONS WORD CLOUD FOR THE SECOND DEBATE

 

Most frequently used words for each candidate by Score

PersonCurrent

term

Score

PersonCurrent2

term3

Score4

Clinton

America

12

Trump

country

31

Clinton

child

14

Trump

disaster

11

Clinton

country

24

Trump

Hillary

14

Clinton

Donald

23

Trump

Hillary Clinton

12

Clinton

lot

29

Trump

ISIS

15

Clinton

people

42

Trump

lot

12

Clinton

president

18

Trump

money

14

Clinton

way

12

Trump

people

46

Clinton

woman

14

Trump

problem

11

Clinton

year

13

Trump

Russia

17

  

 

  

Trump

tax

17

  

 

  

Trump

thing

16

  

 

  

Trump

way

11

  

 

  

Trump

word

11

  

  

  

Trump

year

17

 

We used a list of positive and negative sentiment words from twitter to analyze which candidate used what kind of words and trump is the clear winner for negative words 14/3 while both are evenly matched for positive words at 5/3 for Clinton.

There are some interesting other points to note like Donald Trump uses a lot of words that revolve around Terror and ISIS while Hillary Clinton used a lot of words relating to Healthcare and the Supreme Court.

Hope you found this Post interesting and full disclosure we haven’t actually heard the debate since that wasn’t the point of the exercise it was just trying to use a machine to summarize a debate and provide unbiased text analytics.

References

http://www.politico.com/story/2016/10/2016-presidential-debate-transcript-229519

https://github.com/jeffreybreen/twitter-sentiment-analysis-tutorial-201107/blob/master/data/opinion-lexicon-English/positive-words.txt

https://github.com/jeffreybreen/twitter-sentiment-analysis-tutorial-201107/blob/master/data/opinion-lexicon-English/negative-words.txt

https://app.powerbi.com/visuals/