This project was completed as a final research project for INFO-640 Data Analysis in Fall 2020. The project explores whether sentiment over wearing masks to prevent COVID-19 infection changed over time in relation to President Donald Trump’s positive diagnosis on October 3, 2020. Using R and the Twitter API, I performed a sentiment analysis using tweets over an 8-week period, with Trump’s diagnosis as the center-point. The findings of this analysis show that there was an increase in positive sentiment in tweets during the week of Trump’s diagnosis.
Following an event announcing the Supreme Court nomination of Amy Coney Barrett at the White House Rose Garden on September 26, 2020, the President and many attendees were infected with COVID-19. In photos of the event, held both outdoors and indoors, attendees are seen not wearing masks. Is it possible that public sentiment regarding wearing masks would change as a result of this outbreak? Would it be a wake-up call for some or would attitudes remain unchanged? If there were changes in attitudes, are there patterns to be seen in the data from Twitter? I explore this question by doing a Twitter sentiment analysis from one month before to one month after the White House outbreak. Since the outbreak and Trump’s recovery from infection lasted 2-3 weeks, I selected the time frame of 4 weeks before and 4 weeks after in order to be able to measure sentiment before and after the amplified breaking news cycle to determine whether there had been any change in attitudes.
Using the Twitter API and the Rtweet R package, I extracted 10,000 tweets using the keyword ‘mask’ and associated metadata from September 2 through October 30 (one month before and one month after the White House COVID outbreak).
This dataset was cleaned using the Tidyverse and Tidytext R packages. The first level of cleaning the text of this tweet dataset was to remove punctuation, convert all text to lowercase, and strip whitespace. The next steps of cleaning the tweet dataset were to remove external links, retweets, and stop words. There is an argument for keeping retweets as they may indicate a sentiment, but I am excluding them on the basis that they may fall outside of the scope of the date range being analyzed for this project.
Next, I performed a sentiment analysis of the data using the Tidytext R package and the Bing and NRC sentiment lexicons to determine the distribution of negative and positive attitudes towards mask-wearing over the course of the month before and after the White House coronavirus outbreak. I created visualizations of this analysis in R using ggplot2 plots and wordclouds. Finally, I used the Topicmodels R package to topic model tweets containing the words ‘mask’ and ‘trump’ during the week of Trump’s COVID diagnosis.
I have omitted retweets, as previously mentioned, because they may fall outside of the scope of the date range of this analysis. I have also excluded variables such as profile URL, profile photo, and follower count because they do not provide any meaningful information to this text analysis.
After cleaning the tweet data, I and used the ggplot2 package to visualize what the most frequent words in the tweet dataset were.
Not surprisingly, ‘mask’, ‘wear’, and ‘wearing’ are the top 3 words, followed by ‘people’ and ‘trump’. Most of the other words appear with a similar rate of frequency. To understand this better, the next stage is to do a sentiment analysis on the mask-related tweets. A sentiment-based wordcloud is a good place to start.
The above wordcloud, using the Bing sentiment lexicon, looks a bit different from the previous one because emphasizes the frequency of a positive or negative sentiment associated with a word, rather than simply the most frequently used words. Interestingly, it appears that ‘trump’ has been used with a positive sentiment in many of the tweets. We will explore this further.
The above visualization illustrates a bit more clearly the breakdown of positive and negative sentiments. On the negative side, there are many tweets containing expletives, perhaps expressions of anger or frustration. On the positive side, we do again see ‘trump’ ranked at the top of the positive sentiments. Considering the context of the time that these tweets were created, with Presidential campaigns in their final stretch and a COVID-19 outbreak at the White House, I do find this to be quite surprising.
To better understand this, I ran a similar sentiment analysis (also with the Bing lexicon) that looks at sentiment week-by-week over the 8-week timeframe covered by this dataset.
The above visualization shows ‘trump’ at the top most weeks, but is highest on October 6, 3 days after Trump’s diagnosis. What does this mean? Was Trump’s diagnosis positively received? Did an overwhelming amount of people wish him well? Or is it a possibility that the sentiment of surprise is interpreted as positive by the Bing lexicon?
To get a better sense of this, I ran a sentiment analysis using the NRC Word-Emotion Association lexicon, which associates words to eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) as well as their positive or negative sentiments.
We can see in the above visualization that the only emotion that seems to significantly increase around Trump’s October 3 diagnosis is surprise. This could explain what the Bing lexicon assesses as a spike in positive sentiment at that same time. As we can see in the below line graph, it is a significant spike.
To attempt to understand these sentiments in further context, I decided to do some topic modeling of the tweets during the week of Trump’s COVID diagnosis.
The results of this topic modeling do not provide a clear context for the different conversations around wearing masks and Trump’s diagnosis happening on Twitter at the time. There is significant overlap between each topic and not very clear distinctions between them.
At the inception of this project, my question was whether sentiment about the practice of wearing masks would change over time following Trump’s diagnosis. I do not believe the findings of my analysis significantly indicate a change. It seems, instead, that Tweet sentiment is tied to the news cycle and responses to major events. This raises the question: is Twitter simply reactive to events rather than a place to examine a nuanced collective thought process. If the former, how would one go about an analysis that could go deeper than just reactions over a set amount of time? In the case of sentiment analysis, would a tendency towards reactiveness then elicit stronger sentiments? If so, how do we interpret those? Is there a way to go deeper or broader to contextualize them?