Twitter sentiment analysis based on affective lexicons with R

Continue to dig tweets. After we reviewed how to count positive, negative and neutral tweets in the previous post, I discovered another great idea. Suppose positive or negative mark is not enough and we want to understand the rate of positivity or negativity. For example, if word “good” has 4 points rating, but “perfect” has 6. In this way we can try to measure the rate of satisfaction or opinion in tweets and take a chart with trend as the following:


We need another dictionary for managing this task, specifically the dictionary with rating of words. We can create it or find results of great research of affective ratings (e.g. here).

And of course, our algorithm should bypass Twitter’s API limitation via accumulating historical data. This approach was described in the previous post.

Note, I will use average rating for evaluating tweets based on words rating it consists of. For example, if we’ve found “good” (4 points) and “perfect” (6 points) in the tweet, it would be evaluated as (4+6)/2=5. In this way we will avoid the influence of several negative words that could have higher total rating, e.g. one word “good” (4 points) should have higher rating than three words “bad” (for 1,5 points each).

Let’s start. We need to create Twitter Application ( in order to have an access to Twitter’s API. Then we will get Consumer Key and Consumer Secret. And finally, our code in R:

#connect all libraries
#connect to API
 download.file(url='', destfile='cacert.pem')
 reqURL <- ''
 accessURL <- ''
 authURL <- ''
 consumerKey <- '____________' #put the Consumer Key from Twitter Application
 consumerSecret <- '______________'  #put the Consumer Secret from Twitter Application
 Cred <- OAuthFactory$new(consumerKey=consumerKey,
 Cred$handshake(cainfo = system.file('CurlSSL', 'cacert.pem', package = 'RCurl')) #There is URL in Console. You need to go to, get code and enter it on Console
save(Cred, file='twitter authentication.Rdata')
 load('twitter authentication.Rdata') #Once you launched the code first time, you can start from this line in the future (libraries should be connected)
#the function for extracting and analyzing tweets
 search <- function(searchterm)
 #extracct tweets and create storage file
 list <- searchTwitter(searchterm, cainfo='cacert.pem', n=1500)
 df <- twListToDF(list)
 df <- df[, order(names(df))]
 df$created <- strftime(df$created, '%Y-%m-%d')
 if (file.exists(paste(searchterm, '_stack_val.csv'))==FALSE) write.csv(df, file=paste(searchterm, '_stack_val.csv'), row.names=F)
#merge the last extraction with storage file and remove duplicates
 stack <- read.csv(file=paste(searchterm, '_stack_val.csv'))
 stack <- rbind(stack, df)
 stack <- subset(stack, !duplicated(stack$text))
 write.csv(stack, file=paste(searchterm, '_stack_val.csv'), row.names=F)
#tweets evaluation function
 score.sentiment <- function(sentences, valence, .progress='none')
 scores <- laply(sentences, function(sentence, valence){
 sentence <- gsub('[[:punct:]]', '', sentence) #cleaning tweets
 sentence <- gsub('[[:cntrl:]]', '', sentence) #cleaning tweets
 sentence <- gsub('\\d+', '', sentence) #cleaning tweets
 sentence <- tolower(sentence) #cleaning tweets
 word.list <- str_split(sentence, '\\s+') #separating words
 words <- unlist(word.list)
 val.matches <- match(words, valence$Word) #find words from tweet in "Word" column of dictionary
 val.match <- valence$Rating[val.matches] #evaluating words which were found (suppose rating is in "Rating" column of dictionary).
 val.match <- na.omit(val.match)
 val.match <- as.numeric(val.match)
 score <- sum(val.match)/length(val.match) #rating of tweet (average value of evaluated words)
 }, valence, .progress=.progress)
 scores.df <- data.frame(score=scores, text=sentences) #save results to the data frame
valence <- read.csv('dictionary.csv', sep=',' , header=TRUE) #load dictionary from .csv file
Dataset <- stack
 Dataset$text <- as.factor(Dataset$text)
 scores <- score.sentiment(Dataset$text, valence, .progress='text') #start score function
 write.csv(scores, file=paste(searchterm, '_scores_val.csv'), row.names=TRUE) #save evaluation results into the file
#modify evaluation
 stat <- scores
 stat$created <- stack$created
 stat$created <- as.Date(stat$created)
 stat <- na.omit(stat) #delete unvalued tweets
 write.csv(stat, file=paste(searchterm, '_opin_val.csv'), row.names=TRUE)
 ggplot(stat, aes(created, score)) + geom_point(size=1) +
 stat_summary( = 'mean_cl_normal', mult = 1, geom = 'smooth') +
ggsave(file=paste(searchterm, '_plot_val.jpeg'))
search("______") #enter keyword


Finally, we will get 4 files:

  • storage file with initial data,
  • file with tweets rating,
  • cleaned (without unvalued tweets) file with tweets and dates,
  • chart where we can see density of tweet ratings and mean as a trend that looks like:


  • Pingback: Twitter sentiment analysis based on affective l...()

  • Pingback: Twitter sentiment analysis in R | Analyze Core()

  • macwanjason

    Nice set of articles. Thanks!

    • AnalyzeCore

      Thank you! Shares are appreciated :)

  • Klescoet

    I have a problem of certificate for the authentification
    is your CA up to date?

    here is the error message:

    Error: SSL certificate problem, verify that the CA cert is OK. Details:
    error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

    • AnalyzeCore

      Hello Klescoet!

      I’ve just checked, it works for me. Make sure you have registered Twitter app, they changed a link for that. Now you can register it on .

  • salmi


    i have a problem of warnings message

    here is the error message:
    Saving 11.9 x 7.96 in image
    There were 50 or more warnings (use warnings() to see the first 50)

    • AnalyzeCore

      Try to use some extra parameters for ggsave() function (e.g. width and height, dpi), which setup parameters for image.

      • Santosh

        Hi sir,
        it’s not the problem with ggplot. Also, could you please update the code for more suitable version of connecting to twitter because afaik, v1.1 is deprecated. I have some concept issues in twitter sentiment analysis, it would be great if I could talk to you personally.

        • AnalyzeCore

          Yes, this warning is connected with values probably. I’m not sure that I will update the code every time they change something. There are a lot of articles how to connect with Twitter’s API on the internet. Therefore, you could combine the core idea of analysis from my article and an actual way for connection.

          • Santosh

            Sorry sir, it was not so subtle of me to ask you in that way and the intention was purely an academic concern. However, I am unable to trace the actual part of the code to be debugged and I defer to your expertise. In that connection I would like to state that the problem, or atleast the first 50 warnings were concerned about NA values irrespective of the topic/search term given. In this regard, it would be quite helpful of you to tell us, where probably the mistake might have happened.
            Also, as aforementioned I have some conceptual issues on twitter analysis on sentiments which can’t be discussed at length here, the topic being too broad and sensitive. Hence I ask you the permission to talk with you privately in this regard. With all respect sir, I admire your work and the service it provides. Thankyou.

          • AnalyzeCore

            Santosh, you can easily find my contacts on the right side of a blog’s page.