Continue to dig tweets. After we reviewed how to count positive, negative and neutral tweets in previous post, I discovered another great idea. Suppose positive or negative is not enough and we want to understand the rate of positivity or negativity. For example, «good» in tweet has 4 points rating, but «perfect» has 6. Thus, we can try to measure the rate of satisfaction or opinion in tweets.
So, we need other dictionary for managing this task, dictionary with rating of words. We can create it or find results of great research of affective ratings (e.g. here).
And of course, our algorithm should bypass Twitter’s API limitation via accumulating historical data. This approach was described in previous post.
Note, I use average rating of evaluated words which I find in tweet. For example, if we found «good» (4 points) and «perfect» (6 points) in one tweet, it would be evaluated as (4+6)/2=5. This is better than use total sum in case of all words in dictionary have positive rating. For example, one «good» (4 points) should be better than three «bad» (1,5 points each). This solves via average value.
With political transparency an increasingly topical subject in the wake of press scandals and allegations of official coverup, it is a useful exercise to examine any data reflecting the interaction between our leaders and our purveyors of news.
Our crack-shot R trainer Luba Gloukhov generated a spirited (pun intended!) discussion from her post K-means Clustering 86 Single Malt Scotch Whiskies, with mentions of her analysis at FlowingData and Reddit amongst others.