Cutting through the noise to motivate people: A comprehensive analysis of COVID-19 social media posts de/motivating vaccination

Ashiqur Rahman [email protected] Ehsan Mohammadi [email protected] Hamed Alhoori [email protected]

Abstract

The COVID-19 pandemic exposed significant weaknesses in the healthcare information system. The overwhelming volume of misinformation on social media and other socioeconomic factors created extraordinary challenges to motivate people to take proper precautions and get vaccinated. In this context, our work explored a novel direction by analyzing an extensive dataset collected over two years, identifying the topics de/motivating the public about COVID-19 vaccination. We analyzed these topics based on time, geographic location, and political orientation. We noticed that while the motivating topics remain the same over time and geographic location, the demotivating topics rapidly. We also identified that intrinsic motivation, rather than external mandate, is more advantageous to inspire the public. This study addresses scientific communication and public motivation in social media. It can help public health officials, policymakers, and social media platforms develop more effective messaging strategies to cut through the noise of misinformation and educate the public about scientific findings.

keywords:

misinformation , motivation , vaccine hesitancy , science communication , social media , social psychology

^†^†journal: Natural Language Processing Journal

\affiliation

[niu]organization=Department of Computer Science, Northern Illinois University, addressline=100 Normal Road, city=DeKalb, postcode=60115, state=IL, country=USA

\affiliation

[usc]organization=School of Information Science, University of South Carolina, addressline=800 Sumter Street, city=Columbia, postcode=29208, state=SC, country=USA

1 Introduction

Social media plays a vital role in modern life by availing communication, information dissemination, and steering social conversations [99]. These features are more impactful during a state of crisis [75]. We have seen this during major social events like mass shootings, natural disasters, national elections, and even in anti-vaccination campaigns [8, 15, 43, 75].

The Coronavirus pandemic created significant dependence on social media. While the social web was essential for disseminating healthcare information, important announcements, and educating the public, misinformation has also spread with little oversight. Studies suggest that exposure to information on social media greatly influenced preventative behavior during the COVID-19 pandemic [100, 53]. Hence, the excessive dissemination of misinformation on social media was a significant concern. As soon as the coronavirus emerged, racism, rumor, and fear-mongering started spreading like wildfire on different platforms [27]. Although there were projects such as Poynter [46] and EUvsDisinfo [37], actively monitoring and debunking false news, misinformation on social media was widely available. The World Health Organization (WHO) partnered with major tech giants such as Facebook, Google, LinkedIn, Microsoft, Reddit, and Twitter to fight against misinformation [84]. However, misinformation was still widely available on these platforms. WHO director-general called it the fight against ‘trolls and conspiracy theories.’

While vaccination is one of the most effective tools for preventing diseases and keeping communities safe [72, 39], it needed a significant percentage of the population to be vaccinated to achieve herd immunity [95] and leave the pandemic behind us. For comparison, measles requires 95% vaccination, and polio requires 80% vaccination coverage to achieve this collective immunity [95]. Research suggests that although it might be unattainable to achieve herd immunity in the traditional sense against COVID-19 [63], a high vaccination rate reduces the disease’s effect and makes it more manageable [29] and returns our life to normalcy.

Once different vaccines were accessible to the public around the world [94], motivating enough people to get vaccinated quickly became a challenge. Different factors caused vaccine hesitancy, including but not limited to public trust in the development and approval of vaccines, economic disparity, education, and ethnicity [25, 54, 68, 87, 41]. Misinformation on social media played a vital role in emboldening the misconception about vaccination. For instance, a survey in the UK revealed that people who relied on social web platforms to acquire information were more reluctant to receive vaccines [69]. Similarly, another study confirmed the connection between the cluster of the unconvinced public on Facebook and the networks of anti-vaxxers [50]. The understanding of fairness and transparency of the social media platform also impacts the vaccination decision [92]. Research also suggests that COVID-19 was highly politicized in the mainstream news media [1, 42], contributing to demotivation and distrust of the vaccination [86]. The study by Liu et al. [57] shows that the public is less likely to follow the directives from authority when the message is politically polarizing. Delays in achieving the vaccination target had economic impacts as well. A study found that the total monetary harm from “non-vaccination” in the United States was between 50 to 300 million dollars per day [16]. Studies also suggested that disproving misinformation was insufficient, and rebuilding public trust in government institutions and scientific processes was essential [41].

With the advancement of Large Language Models (LLMs) and their accessibility, which can quickly generate believable but fictional scientific texts without proper references or scientific knowledge, the risk of misguiding the public has become even more significant. This makes it essential to find ways to reach the public with factual information, breaking through the clutter of misinformation and disinformation. [5, 101]

Therefore, it is crucial to study, identify, and address the factors that demotivate the public and increase hesitancy and distrust towards vaccination. The knowledge gathered from this study can be helpful to policymakers, healthcare workers, and social media platforms to improve the handling of misinformation and alleviate doubts in the community during emergencies.

In this study, we especially focused on the de/motivation of getting vaccines during the COVID-19 pandemic and social media’s role in it. We aim to explore this issue on Twitter with the following research questions:

1.

What were the most popular topics on Twitter that were de/motivating people about the COVID-19 vaccine?
2.

Which topics are influencing the public stance towards COVID-19 vaccination?
3.

Do the motivating and demotivating topics about the COVID-19 vaccine on Twitter change based on time, geographic location, or political landscape within the US?

1.1 Contributions

Throughout the study, we answered the questions above and delivered the following data and machine-learning models. We shared the models and data generated from this work with the research community.

1.

A labeled Twitter dataset spanning from January 2020 to December 2021 containing location, motivating status, vaccination stance, and topic label.
2.

An analysis of the COVID-19 vaccination topic distribution over time, US states, and political orientation of different states.
3.

A machine-learning model to classify tweets as motivating or demotivating about the COVID-19 vaccine.
4.

A machine-learning model to identify the COVID-19 vaccination stance of a user based on their tweets.
5.

Topic models to identify COVID-19 vaccination-related motivating and demotivating topics on Twitter.

In the next section, we discussed related works in this field, followed by a detailed explanation of the data collection process. Then, we discussed the methods for de/motivation classification of the tweets and topic extraction, followed by models for stance detection. We ended the paper with a discussion about our work, limitations, ethical concerns, and conclusion. We also included all the datasets in A, models in B, and environmental parameters in C for reproducibility purposes.

2 Related Works

2.1 Misinformation and Vaccine

For better or worse, social media has become an increasingly popular method for everyday people to obtain information on various topics like scientific findings, current events, news, political occurrences, and many more. Social media can be an effective way for individuals to stay connected with the outside world and each other. Still, any user can post whatever content they desire, regardless of the validity of the information that the post contains. Thus, a paradox emerges in which everyday people have access to more information at their fingertips than ever before and an increased propensity for exposure to misinformation. This creates a chaotic information landscape characterized by a general inability of people to distinguish between fact and fiction in the pieces of information they encounter. This whirlwind of disseminated information and misinformation dramatically impacts the overall public perception involving specific issues. Many researchers have attempted to analyze the relationship between social media trends and public opinion regarding public health issues, like vaccination and immunization programs [51, 83, 65, 10, 11, 9].

Several researchers have conducted analyses of Twitter content to determine the general public’s opinions on certain vaccines [10, 11, 15, 31, 32, 52]. For example, Becker et al. [10] have analyzed the contents of tweets (primarily posted by users in India, Indonesia, and Vietnam) containing sentiments regarding the pediatric pentavalent vaccine (DTP-HepB-Hib) [71] programs in those areas. They found that 37% of the tweets contained negative sentiments, while 63% were positive or neutral; they also indicated that most of the Tweets contained links to websites or additional resources and did not add any additional content or comments. Blankenship et al. [11] took this process a few steps further—tweets are not only analyzed for their sentiments about vaccines, but also for their amount of engagement (retweets), categorizations of their content, and the types of curators that posted them. Results found no discernible variation in the number of times anti-vaccine tweets were retweeted across content categories. Twitter (12.9%), content curator “Trap It” (3.4%), and the Centers for Disease Control and Prevention (1.9%) were the top 3 domains among links in pro-vaccine tweets. Additionally, social media sites, including Twitter (14.9%), YouTube (8.4%), and Facebook (3.4%), were the most prevalent among the links in anti-vaccine tweets. The most frequently occurring theme in tweets with the hashtag #vaccineswork was the childhood vaccination (40%). Vaccines could reduce outbreaks and deaths, according to 29% of tweets, which also referenced worldwide immunization efforts and improvement 21% of the time [11]. Similarly, other studies [15] have been conducted to determine which types of accounts are most problematic in spreading misinformation on Twitter. An analysis of sophisticated bots, Russian trolls, and content polluters found that, regarding tweets about vaccines, Russian trolls can “amplify both sides” to create an online public discourse that can undermine public health; sophisticated bots, which are designed to look like legitimate accounts, can further undermine public health by increasing the number of those who hold apparent anti-vaccine sentiments.

Other studies [31, 32] explored the possibility of whether users are more likely to post anti HPV (Human Papillomavirus) vaccine tweets after being exposed to them themselves. Dunn et al. found that the probability of tweeting something negative after being exposed to negative tweets was 37.78%, which was substantially higher than the likelihood of doing so after previously being exposed to neutral or positive tweets, which was 10.92% [31]. In a subsequent study, Dunn et al. [32] expanded these results by attempting to create a model to explain the variance in HPV vaccine coverage. The study utilized an abundance of variables such as the exposure to HPV vaccine information on Twitter, socio-economic factors (e.g., poverty, education, insurance), racial and ethnic composition, and geographic location. They found that opinion exposure about the HPV vaccine on Twitter had more sway in determining vaccine coverage than socioeconomic factors.

Even more shocking, though, is the sheer volume of tweets containing information about vaccines—it has also been found that most of the tweets are posted by ordinary accounts (or lay consumers - i.e., not an academic, institutional, or celebrity), and when sources are linked in tweets, it is also generally a link to a post made by an ordinary account [52]. It is generally agreed upon by these researchers [52] that Twitter can be an effective way to monitor opinions about public health issues and disseminate accurate information about the same issues. However, given the polarity and divisiveness of the current Twitter climate, and the sheer volume of tweets being sent out, Twitter itself and the overall information landscape must be improved (i.e., fact-checking, monitoring problematic accounts, improving overall information and media literacy standards, etc.) before that goal can become a reality.

Other researchers chose to explore this dynamic of diverse public opinion as it exists on a different platform - YouTube [9, 14, 33]. For instance, Basch et al. [9] viewed and categorized (by poster) 87 YouTube videos containing the phrases “Vaccine Safety” and “Vaccines and Children.” The three most common categorizations of video posters were ordinary consumers, internet or TV news, and individual health professionals; shockingly, 65.5% of the videos were deemed to be “anti-vaccine”. Similarly, after analyzing 172 YouTube videos related to the HPV vaccine for their tone and sentiment, response and reaction, and video source, Briones et al. [14] found that more than 51% YouTube videos containing negative HPV vaccination sentiments compared to 32% of positive ones, and the “anti-vaccine” videos are far more likely to be liked or shared than the positive videos. Conversely, another study [33] of HPV vaccination sentiments on YouTube found that whether the video was positive or negative did not influence how many shares or views it received. The study also found that most videos could be classified as anti-vaccine. Other researchers [24] have conducted similar analyses in other languages and geographic locations. For example, 123 Italian YouTube videos about vaccines were analyzed, and the researchers discovered that 50% of the videos were positive in nature, 23% were negative, and 27% were neutral. Additionally, the study notes that both negative and positive videos alike utilized a “fear appeal” at a higher rate than any other persuasive strategy like solidarity, economic interest, etc. YouTube videos posted regarding vaccines (both positive and negative) are rooted in fear and disdain for those with the opposite opinion. This further emphasizes the detrimental impacts on public perception due to the dire state of online information seeking and sharing trends.

2.2 COVID-19 Misinformation Studies

COVID-19 is the first global pandemic in the social media era. This new experience opened up many nuances of social media and the fight against misinformation and fake news. Social media platforms have features such as automated bots that can facilitate the spread of misinformation [21]. Specifically, malicious activities have increased to an unprecedented level on social media during this pandemic [27]. The volume of COVID-19 misinformation led to dire consequences for the public and caused frontline workers to face even more challenges in stymieing the spread of coronavirus. Public health agencies called this unchecked volume of mis/disinformation on social media platforms - infodemic [47, 73].

Kim et al. [53] examined the effects of exposure to misinformation during the COVID-19 pandemic and identified that exposure to misinformation reduces the need to seek more preventative and treatment information, making it difficult to curb the spread of the disease.

Researchers have been scrambling to keep up with the dissemination of misinformation. Islam et al. [47] collected articles from various online sources like fact-checking websites, social media, newspapers, and television networks to examine rumors, stigma, and conspiracy theories about COVID-19 and how they potentially impact individuals and communities. Lazer et al. [55] examined tweets by 1.6 million registered voters in the United States to determine who is sharing the misinformation and its sources. They determined that there is a strong political divide for sharing misinformation, and mostly shared by people over 50. They also found that the belief in misinformation is more prevalent in the younger population.

Evanega et al. [34] investigated the topics spreading misinformation during the early parts of the COVID-19 pandemic. They found that the majority of the misinformation was driven by “miracle cure” topics and that prominent figures were the driving force in the spread of misinformation. They also noticed that only 16.4% of the overall conversation is about fact-checking or correcting the misinformation.

After analyzing 43.3 million tweets, Ferrara [35] found that automated social bots are used to disseminate misinformation and political conspiracy theories related to COVID-19. Al-Rakhami and Al-Amri [3] proposed a framework to use six different machine-learning algorithms to detect COVID-19 misinformation. They collected the data using Twitter API at the beginning of the pandemic and manually labeled the data to train the models.

Even the vaccination to prevent COVID-19 is being debated, and the misinformation is spread by the opponents of vaccination more frequently compared to the proponents [48]. Although officials are taking steps to handle the misinformation regarding the vaccine [22], the efforts are still falling short [93] to tackle the diverse reasons [59] for the spread of misinformation.

Thelwall et al. [86] in their study found that while the majority of the vaccine hesitancy in the English language twitter-sphere is related to right-wing conspiracy, there is a significant minority (18%) who are refusing the vaccine for non-political reasons like fear of being targeted as black, development and approval speed, etc. Their study implies that vaccine hesitancy is not just confined to right-wing echo chambers but can reach a wider audience.

Ahammad [2] found that misinformation can also spread using a positive tone and usually promotes alternate medicine, healthy living, and natural remedies. The positive sentiment-based fake news often increases hope and confidence in the public and can, in turn, reduce caution and make it difficult to contain the spread of the virus. The author also found that the prevalence of negative news, usually focused on crime and justice, can reduce public trust in authorities and increase anxiety, skepticism, and vaccine hesitancy.

2.3 COVID-19 Vaccine Sentiment and Stance Detection

Sentiment analysis is one of the major research areas in natural language processing (NLP) and can help us determine the overall perception of the population about any topic. Many researchers performed sentiment analysis on tweets. Some of the sentiment analysis research during the COVID-19 pandemic shines a light on how people are responding to the pandemic [90, 45].

Dubey [30] performed sentiment analysis on tweets from different countries between 11th and 31st March 2020. The researcher used the Syuzhet package [49], which classifies the tweets into eight different emotion categories. Within the data, Germany, France, the USA, and China showed balanced emotions between positive and negative tweets, while other countries showed a more positive attitude.

Manguri et al. [60] used the TextBlob python library, a Naive Bayes sentiment classifier model, on the tweets about COVID-19 for the week of 9th to 15th April 2020. The researchers found that people’s reactions vary from day to day, and the majority of the tweets were neutral.

Liu et al. [57] coded the tweets about COVID-19 from six political leaders using a template analysis technique in five dimensions of populist political communication styles. Their study showed that during a crisis, populist communication styles can influence public adherence to government policies, and a combination of engaging and intimate populist communication styles performs best.

Stance detection is somewhat different than traditional sentiment analysis. While sentiment analysis can detect whether a text is positive, negative, or neutral, stance detection can classify someone’s opinion as in favor or against a given target, which may or may not be present in the text, regardless of the emotion of the text [61].

Augenstein et al. [7] worked on detecting stance from tweets towards a target topic that is not present in the tweet. They showed that conditional Long-Short Term memory encoding is a suitable stance detection approach for an unseen target.

Dey et al. [28] proposed a two-phased Support Vector Model (SVM) approach for stance detection on Twitter data. In the first phase, they classified the tweets into “neutral” and “other” (non-neutral). Then in the second phase, they classified the non-neutral tweets into “favor” vs. “against.” This method outperformed the state-of-the-art models.

Cotfas et al. [23] worked with tweets between November 9, 2020, and December 8, 2020, the month following the COVID-19 vaccine announcement, and found that the majority of tweets were in “neutral” territory and tweets in “favor” outpass “against” stance towards the vaccine.

Poddar et al. [76] extended the work of Cotfas et al. [23] by analyzing tweets from pre-COVID and post-COVID on data ranging from January 2018 to March 2021. They identified the stance of users towards the COVID-19 vaccine and analyzed the topics they are tweeting to find a reason for the change in public stance.

2.4 De/Motivation Studies and Vaccination Intent

While the majority of research considers misinformation to be the primary culprit for vaccine hesitancy and worked to identify them [47, 54, 83], there are many different factors like racial fear, stigma, economic constraints, distrust of government, and many more, that can discourage people from the vaccination [25, 54, 68, 87, 41]. Research suggests people are usually motivated by gain, altruism, or a protective attitude [36]. Protection motivation theory implies that the severity and susceptibility increase the vaccine intention [6].

Human psychology research proposes that there can be intrinsic motivation, where people are motivated by internal realization, and extrinsic motivation, where external forces steer people towards something [74]. And instead of competition, rewards, or threat of punishment, intrinsic motivation such as earning respect gains better results [26, 82].

Schmitz et al. [80] support the previous results that autonomous or intrinsic motivation works best for vaccine intention and uptake, while controlled motivation (pressured by outside sources) does not work. They also noticed that people get more motivated by infection-related risk perception where personal health is at risk, rather than pandemic-related health concerns where the overall societal health is considered. They also found that distrust towards science also impacts the vaccine intention.

To acquire intrinsic motivation towards vaccination, understanding of the vaccination and trust in the science are necessary. Lack of understanding of scientific findings, distrust towards politicians and involvement of the federal government, fast-tracking and emergency authorization, concern about financial profits and political motives, and misrepresentation of the severity of COVID-19 are the primary reasons causing the failure to motivate people for vaccination [36, 64]. The efforts to motivate people and increase their vaccine knowledge fall short for several other reasons including, but not limited to, unavailability of insurance reimbursement for consultation, lack of counseling, unavailability of vaccine during a clinic visit, and ease of getting an exemption [89].

Compared with previous studies, we analyzed Twitter data over a longer period of time, which covers both before and after the rollout of major vaccines in the US and around the world. We also extracted the motivating and demotivating topics resonating in the Twitter-sphere regarding the COVID-19 vaccine and also analyzed the spread of the topics based on geographic locations in the US. We then analyzed the public stance toward the COVID-19 vaccine and identified the topics driving those stances. We also grouped the topics based on the political landscape of each state to investigate whether the misinformation tactics differ based on the majority political view of the area.

Existing studies emphasize that social media is essential for disseminating healthcare information. However, the overwhelming prevalence of misinformation makes it difficult to educate the public, and exposure to misinformation alters public opinion and makes the work of healthcare professionals even harder. Our study identifies patterns in the misinformation and topics impacting public opinion. This study offers a path forward to overcome the challenge of disseminating appropriate information to the public.

Our work in this study is novel in that we analyzed people’s stances over time to identify the de/motivational topics influencing their stances towards the COVID-19 vaccine. While different programs and campaigns [91, 44] were launched to educate people about COVID-19 and encourage the general public to vaccinate, our work can help identify specific topics that are impacting public motivation and can help in future emergencies to reach the public cutting through the noise of misinformation and have the most impact.

3 Data Collection

We have built a Twitter Dataset consisting of tweets and author information. Once we prepared the dataset, we used a machine-learning classifier to classify the tweets as motivating or demotivating, identify the stance of the tweets, and extract the most prominent topics in both motivating and demotivating classes. We also prepared several smaller ground truth datasets to train the machine-learning models. Table 1 lists the different datasets and their purposes.

Table 1: Datasets, sources, and their purpose

Dataset	Source	Purpose
Twitter dataset	Chen et al. [18]	Primary dataset containing the tweets and author information. We classified this data and performed analyses on this.
Motivation Training dataset	Cheng et al. [19], Muric et al. [67], Brandwatch [13]	Combination of three sources to build the ground truth dataset to classify de/motivating tweets.
Stance ground truth dataset	Poddar et al. [76], Cotfas et al. [23], manual labeling	Combination of three labeled datasets to build the ground truth dataset for COVID-19 vaccine stance detection.

3.1 Twitter Dataset

We have collected close to 16 million tweets between January 2020 and December 2021 that contain information about COVID-19. We have used the data from Chen et al. [18] by gathering the Tweet IDs from their GitHub repository [17]. Chen et al. used several keywords like ‘Coronavirus,’ ‘Corona,’ ‘COVID-19,’ ‘Pandemic,’ ‘stayathome,’ etc., to search for COVID-19 related tweets. In order to meet the rate limit of Twitter API [88] and collect the data within a reasonable amount of time, we had to reduce the number of tweets. We randomly sampled at least 100,000 IDs each week and made sure that the data was stratified to match the distribution of the source dataset [17]. Then, we used the Hydrator API tool [85] to collect all the tweet information. We collected 15,768,845 tweets using this method. Figure 1 shows the steps of the dataset creation.

Refer to caption — Figure 1: Preparing the datasets with COVID-19 related tweets.

After the data collection, we used the GeoPy [38] library to get the geographic location of the users from OpenStreetMap API [70]. In this step, we only considered tweets in the English language. Based on the location gathered using the API, we isolated the tweets from the United States and labeled each tweet by respective US states and territories. We dropped the tweets without any geographic locations for the authors. For a few tweets, we manually corrected any mislabeling of states with the help of other available information in each tweet, such as zip code, landmark name, etc. At the end of the process, we had $7,772,236$ tweets in our dataset. Figure 2 shows the distribution of tweets in major geographic locations in the US. Finally, we created a stratified set of $466,335$ tweets spread throughout the two years for our experiments. We ensured the frequency of tweets per week represents the original $7.7$ million data. This new set of tweets is the primary dataset for the study used for the classifications and analysis.

We also performed a cleanup of the dataset by removing retweet tags (“RT”), newlines (“\n”), special characters, URLs, and words that contain non-English characters. We analyzed the remaining tweets for duplicates and same-author duplicates. We noticed that there are very few duplicate tweets in the dataset. There are only $0.37\%$ tweets that have more than ten duplicates, insignificant enough to cause any bias in the data. In the case of duplicate tweets from the same user, we found only $0.13\%$ had more than three duplicate tweets, and no user had more than six duplicate tweets. We did not remove these duplicates since they are small enough to cause any bias, and the duplication of tweets may contain signals about society’s emotions, which can be useful.

3.2 Motivation Training Dataset

We created a dataset to train our machine-learning models to classify the tweets in the Twitter dataset as motivating or demotivating. We combined data from three different sources to build a robust training dataset for our models. We combined the COVID-19 rumor dataset by Cheng et al. [19], the “Avax Tweets” dataset - a COVID-19 vaccine hesitancy dataset from Muric et al. [67], and our own collection of authentic tweets about COVID-19 vaccination.

3.2.1 COVID-19 Rumor Dataset

It is a labeled dataset that contains COVID-19 rumors from both news sources and Twitter. The authors Cheng et al. [19] manually labeled 6,834 data points ( $4,129$ rumors from news and $2,705$ rumors from tweets). We used the texts of the rumor and the label indicating whether the text is true, false, or unverified from this dataset. Previous studies [53, 98, 56] suggest that exposure to inaccurate news and misinformation reduces vaccine intent. But more accurate information and interpersonal communication motivate towards vaccination. Following the findings of these studies, we considered the “true” news and tweets as motivating for vaccination while the “unverified” and “false” as demotivating.

3.2.2 Avax Tweets Dataset

The authors Muric et al. [67] curated a list of tweets that exhibit an antivaccine stance. The dataset contains over $1.8$ million tweets over one year, from October 2020 to November 2021. We have created a stratified sample of $100,000$ tweets from the dataset, ensuring the frequency of tweets per week is representative of the original dataset. Then, using the Hydrator API tool, we collected $79,093$ tweets from this sample. We considered these anti-vax tweets as demotivating tweets towards vaccination. We extracted the tweets from the dataset and labeled them as demotivating.

3.2.3 Authentic Vaccination Tweets Dataset

We collected historical tweets regarding COVID-19 vaccines from a curated list [62] of trusted sources from Fortune magazine. The list contained trusted public health officials, epidemiologists, virus experts, family doctors, and prominent health organizations. The authors of these accounts shared their experience in treating patients during the COVID-19 outbreak, their advice, and refuting misinformation. This curated list gives us a source of authentic tweets regarding the pandemic and vaccination that are actively motivating people to vaccinate and advising the best ways to stay safe. We used the Brandwatch [13] API to collect COVID-19 vaccine-related tweets by the users in the aforementioned list between January 2020 and December 2022. We collected $19,992$ tweets using this process, extracted the tweet texts, and labeled the tweets as motivating.

We merged the three datasets above to create the ground truth dataset for training our models. We labeled the dataset with binary classes indicating whether a tweet contains antivaccination rhetoric (demotivating) or not (motivating). This will be our “motivation training dataset” containing two features - the text and the label. Before training the machine-learning models, we cleaned the tweets by removing duplicates, retweet tags (“RT”), newlines (“\n”), special characters, URLs, and words that contain non-English characters. Finally, we converted all the tweets to lowercase. After the cleanup, our dataset contained $60,647$ demotivating tweets and $21,235$ motivating tweets. We upsampled the motivating tweets to create a balanced dataset with $121,294$ entries.

3.3 Stance Ground Truth Dataset

We extended the work of Poddar et al. [76] and Cotfas et al. [23] to identify the stance of tweets on the topic of vaccination. Although Poddar et al. [76] published their trained model, it did not perform well with newer tweets. After manually checking their results, we found that the stance prediction was correct for $55$ and $61$ percent for anti and pro-vaccination, respectively. We believe this resulted from the model being trained with data from a smaller timespan. Therefore, we decided to ignore their model, use their labeled data in combination with our own, and train a machine-learning model with this newer data from a wider timespan. For this purpose, we manually labeled $1,064$ tweets ( $469$ in favor, $195$ against, and $400$ unrelated) and combined that with the data from Cotfas et al. [23] ( $991$ in favor, $791$ against, and $1010$ unrelated) and the data from Poddar et al. [76] ( $1,364$ in favor, $490$ against, and $1,285$ unrelated). Finally, our ground truth data contained $6,995$ entries with $2,824$ in favor, $1,476$ against, and $2,695$ neutral tweets for the COVID-19 vaccine.

While manually labeling our data, we used two annotators to ensure there was no bias in the labeling. The Cohen’s Kappa score for the two annotators was $0.624$ , meaning the labeling of the two annotators aligns at a satisfactory level. We also manually checked $100$ random tweets from Cotfas et al. and $200$ random tweets from Poddar et al., and our labeling aligned more than $80\%$ of the time.

We have added a comprehensive list of all the data sources and their purpose in the A.1.

4 De/Motivating Topic Identification

We have used machine-learning models to classify the tweets from our Twitter dataset as motivating or demotivating. Then, we used topic modeling to identify the prominent topics related to vaccines within the classified tweets. Figure 3 shows the steps of our analysis.

4.1 Classifying the Tweets

We fine-tuned DistilBERT [79] and RoBERTa [58], two different BERT-based pre-trained Natural Language Processing (NLP) models from the “Hugingface transformers library” [97] and found DistilBERT to be performing the best. We tried different learning rates, batch sizes, and epochs to measure performance. With a 70-30 train-test split, the accuracy for DistilBERT was $96.3\%$ . Table 2 shows the performance of our training models.

Table 2: Models trained to identify de/motivating tweets

Model	Accuracy
DistilBERT	96.3%
RoBERTa	73.4%

After the training, we used the model to classify the tweets in the Twitter dataset to identify the motivating and demotivating tweets. The DistilBERT model classified $97,736$ as motivating and $368,597$ as demotivating tweets. After we classified the tweets, we used topic modeling to identify the vaccine-related topics from each class.

4.2 Topic Analysis

We used BERTopic [40] to create the topic models from our datasets. Although Latent Dirichlet Allocation (LDA) [12] is one of the most popular algorithms to do topic analysis, it takes some effort in hyperparameter tuning to generate meaningful topics and uses centroid-based topic extraction from document clusters. We also reviewed the Top2Vec [4] topic modeling algorithm that trains the document and word vector jointly in a single semantic space. However, BERTopic leverages transformers [96] and c-TF-IDF to create dense clusters allowing for easily interpretable topics. It is also possible to use pre-trained sentence transformer embedding models with BERTopic and find the prominence of any topic over time. Besides, the interactive visualization techniques make it much easier to investigate topic distribution using BERTopic.

We generated two separate topic models for the two classes in our dataset. After the training, each topic model returned the list of topics corresponding to the documents (tweets). We then extracted the top $10$ topics related to “vaccine” from each topic model.

4.2.1 Motivating Topics

We fitted the model exclusively with the tweets classified as “motivating” to identify the motivating topics. The top 10 frequent topics from the model are displayed in Table 8 with the number of tweets in that topic and the top 5 words in that topic. We have ignored the most frequent topic that contains stop words and pronouns.

Then, we extracted “vaccine” related topics from the topic model. This process finds the topics similar to the keyword using cosine similarity. Table 9 shows the top 10 “vaccine” related topics from the motivating tweets. The score in the table represents the topic’s semantic similarity with the keyword “vaccine”.

The topic hierarchy in Figure 4 shows the relationship between topics in the set, and Figure 5 shows the top 10 topics each year and the frequency of tweets for each topic.

Clustering the motivating topics related to “vaccine” shows that the topics are mostly clustered in two regions as displayed in Figure 6.

4.2.2 Demotivating Topics

We performed the same steps with demotivating tweets. We fitted the topic model with tweets classified as “demotivating.” Table 10 shows the top 10 frequent topics from the model, the number of tweets in each topic, and the top 5 words in that topic. Pronouns and stop words are ignored as before.

Similar to the earlier model, we extracted “vaccine” related topics from the topic model. Table 11 shows the top 10 “vaccine” related topics from the demotivating tweets.

The topic hierarchy in Figure 7 shows the relationship between topics in the set. Figure 8 shows the top 10 topics each year and the frequency of tweets for each topic.

Clustering the demotivating topics related to “vaccine” shows that the topics are scattered in different clusters, as displayed in Figure 6, which is different from the motivating topics.

5 Stance Detection

The stance of a text tells us whether the author is in favor or against a topic. To find the stance of users for the COVID-19 vaccine, we trained machine-learning models using this labeled data discussed in section 3.3 for stance detection. We used SimpleTransformer [78] to train models using Huggingface [96] transformers. Table 3 shows the transformer training results reporting the performance in Matthews Correlation Coefficient (MCC). We used the MCC to measure performance because this provides a better understanding of the performance on an imbalanced dataset [20]. Since the ratio of pro-vaccine, anti-vaccine, and neutral are not balanced, MCC provides a better understanding than traditional accuracy or F1 score. We trained Roberta and COVID-Twitter-BERT (ct-BERT) [66] with different epochs. The best-performing model is highlighted in bold in Table 3, which is ct-BERT with 10 epochs and is referred to as ct-BERT-10 hereafter.

Table 3: Transformer training performance

Transformer	MCC
Roberta (epochs: 10)	0.490
CT-BERT (epochs: 5)	0.580
CT-BERT (epochs: 10)	0.603

Then, we classified the Twitter dataset using the ct-BERT-10 model to identify the stance of the tweets. After classifying the tweets, we found that the ‘Favor’ stance outpasses ‘Against’ in different time segments and states. Figure 10 shows the number of tweets in each stance each month, and Figure 11 shows the stance in different states grouped by year and motivation classification. From Figure 10, we can notice some interesting trends, like spikes in the “favor” stance in the middle and the end of the year 2020.

6 Discussion

The COVID-19 pandemic was a disrupting event, changing our lives in more ways than we can imagine. After over three years, we are still figuring out the impacts and learning to live with the new normal. This pandemic opened up new research horizons and questions about how we express ourselves in social media and how to handle a big challenge like this, motivate a large population towards specific activities, and prevent future disasters.

Our research uncovered some fascinating trends. For example, the public stance on vaccination shifted depending on time and geographic location. We also found that using external motivators, like government mandates, could sometimes backfire and actually discourage people. This section goes deeper into these findings, addressing the research questions we outlined earlier. Table 12 in D.2 offers a sample of tweets from different motivating and demotivating categories and vaccination stances, providing a window into the broader dataset we analyzed.

Addressing our first research question (RQ1) necessitates establishing a framework for motivation and demotivation in the context of our study. Motivation is a subjective concept and a motivating statement for one individual might have the opposite effect on another. This study considers statements encouraging others to vaccinate for COVID-19 as motivating and vice versa.

Our analysis in section 4.2 revealed that motivating tweets frequently address topics like schooling, voting, sports, music, and even COVID-19 statistics. Within these topics, concerns related to personal health, protective measures, and vaccine-related news emerged as motivational factors for vaccination.

Conversely, demotivational tweets often centered on themes of conservative political ideology, vaccine mandates, and protective directives from the officials. A deeper analysis of vaccine-specific topics within the demotivating category identified hesitancy regarding vaccine efficacy, concerns about potential side effects and immunity, and a sense of distrust towards the vaccine within minority communities.

A critical finding of this study is the considerable variation observed in demotivating topics across factors like political orientation and geographic location. In contrast, motivating topics remained largely constant, irrespective of these variables. This trend is evident from visualizations in Figures 5, 6, 8, 9, 12, and 13.

Our study found minimal overlap between motivating and demotivating topics, suggesting the potential to identify and address specific concerns in hesitant communities. A key observation was that demotivating tweets demonstrated stronger political leaning and often towards conservative politics in the US. Interestingly, our analysis revealed that a subset of tweets promoting vaccination also demotivated the public.

Figure 4 visualizes two prominent branches of discussion in the motivating tweets. One segment focuses on policy matters, vaccine development, and similar topics, while the other emphasizes the symptoms and the dire physical impacts of the disease to encourage vaccination. This observation aligns with the existing literature about intrinsic motivation and protective instincts associated with vaccine intent [6, 26, 80, 82].

Similarly, Figure 7 illustrates three significant branches of discussion in demotivating tweets - vaccination policy and mandates in conjecture with vaccine promotion efforts, concerns surrounding the impact on immunocompromised individuals, and specific concerns and distrust in minority communities. Human psychology research suggests that extrinsic motivations do not achieve better results [26, 82], which aligns with our observations. Additionally, the identified concerns in minority communities and the effects of vaccines on immunocompromised individuals highlight that there were factors beyond misinformation that contributed to vaccine hesitancy. This observation demonstrated the need for better and more nuanced messaging strategies from policymakers and healthcare workers to address these specific doubts and knowledge gaps within certain population segments.

With the branches identified from the study, our study offers the potential to tailor communication efforts to address specific concerns in different communities, encourage them to vaccinate, and dispel misconceptions. This process of identifying and addressing community-specific concerns can be invaluable for policymakers in future emergencies.

We explored our second research question (RQ2) about topics influencing public stance towards vaccination based on the stance detection analysis detailed in section 5. Our findings suggest that a higher level of exposure to demotivating topics increases negative public stance. The visualization in Figure 10 illustrates that the number of anti-vaccination tweets remained relatively stable over time, while tweets favoring vaccination fluctuated significantly. This observed fluctuation warrants further investigation to determine potential correlations with factors such as vaccine innovation news, emergence of new COVID-19 variants, or shifts in the political climate.

Figure 11 also presents an interesting observation. Typically, for the majority of the tweets, demotivating tweets align with an anti-vaccination stance and vice versa, aligning with general expectations. However, there are also instances where opposing stances towards vaccination motivated and favorable stances towards vaccination demotivated the public. While seemingly counterintuitive, this finding aligns with research in human psychology [26, 82], which suggests that excessive external pressure toward a specific action can lead to a phenomenon known as psychological reactance [81], where individuals resist the pressure and move in the opposite direction. Further study in this direction can give us more insight into the specific topics that are shifting the stance of individual users.

The analysis to answer our third research question (RQ3) about the trends in de/motivating topics over time and geographic locations found some compelling insights. As evident from Figure 5, the top motivating topics related to COVID-19 vaccination remained relatively consistent across time and geographic locations. Conversely, demotivating topics exhibited significant variations over time and geographic locations, as shown in Figure 8. This disparity between motivating and demotivating topics is further evident in Figures 12 and 13, where motivating topics remained primarily consistent in different states regardless of their political orientation, while the demotivating topics exhibited variations.

We can extrapolate from our findings that regional politics played a crucial role in vaccine hesitancy during the COVID-19 pandemic and warrants finding a path to disseminate the message from policymakers and healthcare workers without politicizing them in future emergencies. This observation opens up new avenues for research into the impact of critical issues in regional politics during national or global emergencies.

Furthermore, the development of a visual analytics tool based on the results of this study [77] offers an effective way to explore and communicate the results visually and help policymakers understand the trends in social media. There are numerous ways to understand the motivation of the masses and constructively communicate with them to disseminate a message. Our study underscores the importance of recognizing and comprehending the diverse motivations of different communities. Our work indicates that effective communication strategies, moving beyond generic messaging or simply refuting misinformation with brute force, are necessary to communicate with the public. Policymakers and healthcare workers should prioritize recognizing and addressing the unique concerns of specific populations to cultivate intrinsic motivation.

7 Limitations

We worked particularly with tweets related to COVID-19, spanning over two years. We acknowledge that nuanced factors were involved in public reaction during the pandemic. The social distancing requirements, stay-home orders, remote work, loss of jobs, and economic and political factors played important roles. There can be different sociopolitical scenarios in a future event that will require considering those while generalizing the work of this study.

We also acknowledge that the automated labeling of Motivation Training Dataset in Section 3.2 required the assumption of “false” news and tweets as demotivating. While the assumption grounds itself on previous academic studies, future works with extensive user studies to validate the assumptions can strengthen the findings of the work and reveal new research areas.

In this study, we used a smaller sample of 500K tweets from the original 7.7M tweets to reduce resource usage. Implementing the same methods over the 7.7M tweets can yield further exciting observations.

Furthermore, we did not dig into user-specific topics that impacted individual users to change their stance toward vaccination - which can be an interesting direction for future research.

8 Ethical Concerns

There are some ethical concerns that we have taken into account. We anonymized the identifiable information on the tweets used for the machine-learning models and in the published datasets. Since we performed the anonymization programmatically, we acknowledge there is the possibility of a few tweets remaining identifiable based on the location or landmark information posted by the users in their tweets. We also acknowledge that malicious actors can also use the proposed method of reaching social media users and motivating them. Nevertheless, that is a predicament with all modern inventions and depends on who uses it. We believe the benefits of the findings in this study outweigh the risks.

9 Conclusion

While existing research has primarily focused on identifying misinformation in COVID-19 tweets and performing sentiment and topic analysis, a critical gap exists in comprehensively understanding the specific topics that demotivated public opinion toward COVID-19 vaccination during the pandemic. This study addresses this gap by analyzing a large dataset of tweets spanning the early stages of the pandemic through the end of 2021. It was evident during the pandemic that the public trust in institutions and scientific processes had diminished. The study about the relationship of resonating topics on social media with public motivation can provide essential knowledge and tools to cut through the noise of misinformation and communicate with the public. This knowledge is essential to rebuild trust in the institutions and improve public understanding of scientific processes.

Our findings reveal a compelling link between social media topics and public vaccination motivation. This study demonstrates that factors beyond misinformation influence vaccine hesitancy. While misinformation remains a concern, this research highlights the importance of intrinsic motivation and protective instinct in driving vaccine uptake. Our analysis also underscores the critical role of local politics in shaping public opinion.

In the current era of social media, information is at the fingertip of the public. Furthermore, the wide availability of LLMs facilitates the threat of misinformation and disinformation masquerading as credible information circulated through the population. Without appropriate knowledge to critically evaluate information, the interpretation, often by non-specialists, makes it more dangerous and causes adverse outcomes. The politicization of messages diluted with misinformation and misunderstanding also makes it impossible to educate the people during emergencies.

Our work underscores the importance of comprehending human psychology to motivate people during emergencies. External incentives and excessive regulations from institutions often backfire. By contrast, communication strategies addressing the unique concerns of different communities and compelling intrinsic motivations are more effective. This research offers valuable insights that can guide policymakers and healthcare workers in future emergencies to effectively communicate with the public by developing targeted messaging approaches to combat misinformation and educate the people.

References

Abbas [2020] A. H. Abbas. Politicizing the pandemic: A schemata analysis of covid-19 news in two selected newspapers. Int J Semiot Law, pages 1–20, July 2020. ISSN 1572-8722, 0952-8059. doi: 10.1007/s11196-020-09745-2. URL https://dx.doi.org/10.1007/s11196-020-09745-2.
Ahammad [2024] T. Ahammad. Identifying hidden patterns of fake COVID-19 news: An in-depth sentiment analysis and topic modeling approach. Natural Language Processing Journal, 6:100053, Mar. 2024. ISSN 2949-7191. doi: 10.1016/j.nlp.2024.100053. URL https://www.sciencedirect.com/science/article/pii/S2949719124000013.
Al-Rakhami and Al-Amri [2020] M. S. Al-Rakhami and A. M. Al-Amri. Lies kill, facts save: Detecting covid-19 misinformation in twitter. IEEE Access, 8:155961–155970, 2020. ISSN 2169-3536. doi: 10.1109/access.2020.3019600. URL https://dx.doi.org/10.1109/ACCESS.2020.3019600.
Angelov [2020] D. Angelov. Top2vec: Distributed representations of topics. Aug. 2020. URL https://arxiv.org/abs/2008.09470.
Anonymous [2023] Anonymous. The earth is flat because…: Investigating llms’ belief towards misinformation via persuasive conversation. Oct. 2023. URL https://openreview.net/pdf?id=DJXifFF2_M.
Ansari-Moghaddam et al. [2021] A. Ansari-Moghaddam, M. Seraji, Z. Sharafi, M. Mohammadi, and H. Okati-Aliabad. The protection motivation theory for predict intention of covid-19 vaccination in iran: a structural equation modeling approach. BMC Public Health, 21(1):1165, June 2021. ISSN 1471-2458. doi: 10.1186/s12889-021-11134-8. URL https://dx.doi.org/10.1186/s12889-021-11134-8.
Augenstein et al. [2016] I. Augenstein, T. Rocktäschel, A. Vlachos, and K. Bontcheva. Stance detection with bidirectional conditional encoding. June 2016. URL https://arxiv.org/abs/1606.05464.
Badawy et al. [2018] A. Badawy, E. Ferrara, and K. Lerman. Analyzing the digital traces of political manipulation: The 2016 russian interference twitter campaign. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 258–265, Aug. 2018. doi: 10.1109/asonam.2018.8508646. URL https://dx.doi.org/10.1109/ASONAM.2018.8508646.
Basch et al. [2017] C. H. Basch, P. Zybert, R. Reeves, and C. E. Basch. What do popular youtubetm videos say about vaccines? Child Care Health Dev., 43(4):499–503, July 2017. ISSN 0305-1862, 1365-2214. doi: 10.1111/cch.12442. URL https://dx.doi.org/10.1111/cch.12442.
Becker et al. [2016] B. F. H. Becker, H. J. Larson, J. Bonhoeffer, E. M. van Mulligen, J. A. Kors, and M. C. J. M. Sturkenboom. Evaluation of a multinational, multilingual vaccine debate on twitter. Vaccine, 34(50):6166–6171, Dec. 2016. ISSN 0264-410x, 1873-2518. doi: 10.1016/j.vaccine.2016.11.007. URL https://dx.doi.org/10.1016/j.vaccine.2016.11.007.
Blankenship et al. [2018] E. B. Blankenship, M. E. Goff, J. Yin, Z. T. H. Tse, K.-W. Fu, H. Liang, N. Saroha, and I. C.-H. Fung. Sentiment, contents, and retweets: A study of two vaccine-related twitter datasets. Perm. J., 22:17–138, 2018. ISSN 1552-5767, 1552-5775. doi: 10.7812/tpp/17-138. URL https://dx.doi.org/10.7812/TPP/17-138.
Blei et al. [2003] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3(Jan):993–1022, 2003. ISSN 1532-4435, 1533-7928. URL https://jmlr.org/papers/v3/blei03a.html.
Brandwatch [2020] Brandwatch. Brandwatch: A new kind of intelligence. https://www.brandwatch.com/, 2020. URL https://www.brandwatch.com/. Accessed: 2021-1-27.
Briones et al. [2012] R. Briones, X. Nan, K. Madden, and L. Waks. When vaccines go viral: an analysis of hpv vaccine coverage on youtube. Health Commun., 27(5):478–485, 2012. ISSN 1041-0236, 1532-7027. doi: 10.1080/10410236.2011.610258. URL https://dx.doi.org/10.1080/10410236.2011.610258.
Broniatowski et al. [2018] D. A. Broniatowski, A. M. Jamison, S. Qi, L. AlKulaib, T. Chen, A. Benton, S. C. Quinn, and M. Dredze. Weaponized health communication: Twitter bots and russian trolls amplify the vaccine debate. Am. J. Public Health, 108(10):1378–1384, Oct. 2018. ISSN 0090-0036, 1541-0048. doi: 10.2105/ajph.2018.304567. URL https://dx.doi.org/10.2105/AJPH.2018.304567.
Bruns et al. [2021] R. Bruns, D. Hosangadi, M. Trotochaud, and T. K. Sell. Covid-19 vaccine misinformation and disinformation costs. The Johns Hopkins Center for Health Security, Oct. 2021. URL https://centerforhealthsecurity.org/sites/default/files/2023-02/20211020-misinformation-disinformation-cost.pdf.
Chen [2020] E. Chen. Covid-19-tweetids. https://github.com/echen102/COVID-19-TweetIDs, May 2020. URL https://github.com/echen102/COVID-19-TweetIDs. Accessed: 2021-9-9.
Chen et al. [2020] E. Chen, K. Lerman, and E. Ferrara. Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set. JMIR Public Health Surveill, 6(2):e19273, May 2020. ISSN 2369-2960. doi: 10.2196/19273. URL https://dx.doi.org/10.2196/19273.
Cheng et al. [2021] M. Cheng, S. Wang, X. Yan, T. Yang, W. Wang, Z. Huang, X. Xiao, S. Nazarian, and P. Bogdan. A covid-19 rumor dataset. Front. Psychol., 12:644801, May 2021. ISSN 1664-1078. doi: 10.3389/fpsyg.2021.644801. URL https://dx.doi.org/10.3389/fpsyg.2021.644801.
Chicco and Jurman [2020] D. Chicco and G. Jurman. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1):6, Jan. 2020. ISSN 1471-2164. doi: 10.1186/s12864-019-6413-7. URL https://dx.doi.org/10.1186/s12864-019-6413-7.
Chu et al. [2010] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Who is tweeting on twitter: Human, bot, or cyborg? In Proceedings of the 26th Annual Computer Security Applications Conference, Acsac ’10, pages 21–30, New York, NY, USA, 2010. Acm. ISBN 9781450301336. doi: 10.1145/1920261.1920265. URL https://doi.acm.org/10.1145/1920261.1920265.
Cornwall [2020] W. Cornwall. Officials gird for a war on vaccine misinformation. Science, 369(6499):14–15, July 2020. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.369.6499.14. URL https://dx.doi.org/10.1126/science.369.6499.14.
Cotfas et al. [2021] L.-A. Cotfas, C. Delcea, I. Roxin, C. Ioanăş, D. S. Gherai, and F. Tajariol. The longest month: Analyzing covid-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement. IEEE Access, 9:33203–33223, 2021. ISSN 2169-3536. doi: 10.1109/access.2021.3059821. URL https://dx.doi.org/10.1109/ACCESS.2021.3059821.
Covolo et al. [2017] L. Covolo, E. Ceretti, C. Passeri, M. Boletti, and U. Gelatti. What arguments on vaccinations run through youtube videos in italy? a content analysis. Hum. Vaccin. Immunother., 13(7):1693–1699, July 2017. ISSN 2164-5515, 2164-554x. doi: 10.1080/21645515.2017.1306159. URL https://dx.doi.org/10.1080/21645515.2017.1306159.
Daly et al. [2021] M. Daly, A. Jones, and E. Robinson. Public trust and willingness to vaccinate against covid-19 in the us from october 14, 2020, to march 29, 2021. Jama, 325(23):2397–2399, June 2021. ISSN 0098-7484, 1538-3598. doi: 10.1001/jama.2021.8246. URL https://dx.doi.org/10.1001/jama.2021.8246.
Deci and Flaste [1996] E. L. Deci and R. Flaste. Why We Do What We Do: Understanding Self-Motivation. Penguin, Aug. 1996. ISBN 9780140255263. URL https://play.google.com/store/books/details?id=OoVPEAAAQBAJ.
Depoux et al. [2020] A. Depoux, S. Martin, E. Karafillakis, R. Preet, A. Wilder-Smith, and H. Larson. The pandemic of social media panic travels faster than the covid-19 outbreak. J. Travel Med., 27(3), May 2020. ISSN 1195-1982, 1708-8305. doi: 10.1093/jtm/taaa031. URL https://dx.doi.org/10.1093/jtm/taaa031.
Dey et al. [2017] K. Dey, R. Shrivastava, and S. Kaushik. Twitter stance detection — a subjectivity and sentiment polarity inspired two-phase approach. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 365–372, Nov. 2017. doi: 10.1109/icdmw.2017.53. URL https://dx.doi.org/10.1109/ICDMW.2017.53.
D’Souza and Dowdy [2021] G. D’Souza and D. Dowdy. Rethinking herd immunity and the covid-19 response end game. https://publichealth.jhu.edu/2021/what-is-herd-immunity-and-how-can-we-achieve-it-with-covid-19, Sept. 2021. URL https://publichealth.jhu.edu/2021/what-is-herd-immunity-and-how-can-we-achieve-it-with-covid-19. Accessed: 2022-4-28.
Dubey [2020] A. D. Dubey. Twitter sentiment analysis during covid-19 outbreak. International Journal of Computer Science Engineering, 8(02), Apr. 2020. doi: 10.2139/ssrn.3572023. URL https://papers.ssrn.com/abstract=3572023.
Dunn et al. [2015] A. G. Dunn, J. Leask, X. Zhou, K. D. Mandl, and E. Coiera. Associations between exposure to and expression of negative opinions about human papillomavirus vaccines on social media: An observational study. J. Med. Internet Res., 17(6):e144, June 2015. ISSN 1439-4456, 1438-8871. doi: 10.2196/jmir.4343. URL https://dx.doi.org/10.2196/jmir.4343.
Dunn et al. [2017] A. G. Dunn, D. Surian, J. Leask, A. Dey, K. D. Mandl, and E. Coiera. Mapping information exposure on social media to explain differences in hpv vaccine coverage in the united states. Vaccine, 35(23):3033–3040, May 2017. ISSN 0264-410x, 1873-2518. doi: 10.1016/j.vaccine.2017.04.060. URL https://dx.doi.org/10.1016/j.vaccine.2017.04.060.
Ekram et al. [2019] S. Ekram, K. E. Debiec, M. A. Pumper, and M. A. Moreno. Content and commentary: Hpv vaccine and youtube. J. Pediatr. Adolesc. Gynecol., 32(2):153–157, Apr. 2019. ISSN 1083-3188, 1873-4332. doi: 10.1016/j.jpag.2018.11.001. URL https://dx.doi.org/10.1016/j.jpag.2018.11.001.
Evanega et al. [2020] S. Evanega, M. Lynas, J. Adams, K. Smolenyak, and C. G. Insights. Coronavirus misinformation: quantifying sources and themes in the covid-19 ‘infodemic’. JMIR Preprints, 19(10):2020, 2020. URL https://preprints.jmir.org/preprint/25143.
Ferrara [2020] E. Ferrara. What types of covid-19 conspiracies are populated by twitter bots? First Monday, May 2020. ISSN 1396-0466, 1396-0466. doi: 10.5210/fm.v25i6.10633. URL https://firstmonday.org/ojs/index.php/fm/article/view/10633.
Finney Rutten et al. [2021] L. J. Finney Rutten, X. Zhu, A. L. Leppin, J. L. Ridgeway, M. D. Swift, J. M. Griffin, J. L. St Sauver, A. Virk, and R. M. Jacobson. Evidence-based strategies for clinical organizations to address covid-19 vaccine hesitancy. Mayo Clin. Proc., 96(3):699–707, Mar. 2021. ISSN 0025-6196, 1942-5546. doi: 10.1016/j.mayocp.2020.12.024. URL https://dx.doi.org/10.1016/j.mayocp.2020.12.024.
Force [2015] E. S. T. Force. Eu vs disinformation - eu vs disinformation. https://euvsdisinfo.eu/, 2015. URL https://euvsdisinfo.eu/. Accessed: 2020-8-8.
GeoPy [2020] GeoPy. Welcome to geopy’s documentation! — geopy 2.2.0 documentation. https://geopy.readthedocs.io/en/stable/, 2020. URL https://geopy.readthedocs.io/en/stable/. Accessed: 2021-7-30.
Gerberding and Haynes [2021] J. L. Gerberding and B. F. Haynes. Vaccine innovations - past and future. N. Engl. J. Med., 384(5):393–396, Feb. 2021. ISSN 0028-4793, 1533-4406. doi: 10.1056/NEJMp2029466. URL https://dx.doi.org/10.1056/NEJMp2029466.
Grootendorst and Reimers [2021] M. Grootendorst and N. Reimers. Bertopic: Leveraging bert and c-tf-idf to create easily interpretable topics, Dec. 2021. URL https://zenodo.org/record/5779238.
Guidry et al. [2021] J. P. D. Guidry, L. I. Laestadius, E. K. Vraga, C. A. Miller, P. B. Perrin, C. W. Burton, M. Ryan, B. F. Fuemmeler, and K. E. Carlyle. Willingness to get the covid-19 vaccine with and without emergency use authorization. Am. J. Infect. Control, 49(2):137–142, Feb. 2021. ISSN 0196-6553, 1527-3296. doi: 10.1016/j.ajic.2020.11.018. URL https://dx.doi.org/10.1016/j.ajic.2020.11.018.
Hart et al. [2020] P. S. Hart, S. Chinn, and S. Soroka. Politicization and polarization in covid-19 news coverage. Sci. Commun., 42(5):679–697, Oct. 2020. ISSN 1075-5470. doi: 10.1177/1075547020950735. URL https://doi.org/10.1177/1075547020950735.
Housholder and LaMarre [2015] E. Housholder and H. L. LaMarre. Political social media engagement: Comparing campaign goals with voter behavior. Public Relat. Rev., 41(1):138–140, Mar. 2015. ISSN 0363-8111. doi: 10.1016/j.pubrev.2014.10.007. URL https://www.sciencedirect.com/science/article/pii/S0363811114001489.
Hunt et al. [2022] I. d. V. Hunt, T. Dunn, M. Mahoney, M. Chen, V. Nava, and E. Linos. A social media-based public health campaign encouraging covid-19 vaccination across the united states. Am. J. Public Health, pages e1–e4, July 2022. ISSN 0090-0036, 1541-0048. doi: 10.2105/ajph.2022.306934. URL https://dx.doi.org/10.2105/AJPH.2022.306934.
Imran et al. [2022] M. Imran, U. Qazi, and F. Ofli. Tbcov: Two billion multilingual covid-19 tweets with sentiment, entity, geo, and gender labels. Brown Univ. Dig. Addict. Theory Appl., 7(1):8, Jan. 2022. ISSN 1040-6328. doi: 10.3390/data7010008. URL https://www.mdpi.com/2306-5729/7/1/8/htm.
Institute [2020] P. Institute. Ifcn covid-19 misinformation - poynter. https://www.poynter.org/ifcn-covid-19-misinformation/, 2020. URL https://www.poynter.org/ifcn-covid-19-misinformation/. Accessed: 2022-7-8.
Islam et al. [2020] M. S. Islam, T. Sarkar, S. H. Khan, A.-H. Mostofa Kamal, S. M. M. Hasan, A. Kabir, D. Yeasmin, M. A. Islam, K. I. Amin Chowdhury, K. S. Anwar, A. A. Chughtai, and H. Seale. Covid-19-related infodemic and its impact on public health: A global social media analysis. Am. J. Trop. Med. Hyg., 103(4):1621–1629, Oct. 2020. ISSN 0002-9637, 1476-1645. doi: 10.4269/ajtmh.20-0812. URL https://dx.doi.org/10.4269/ajtmh.20-0812.
Jamison et al. [2020] A. M. Jamison, D. A. Broniatowski, M. Dredze, A. Sangraula, M. C. Smith, and S. C. Quinn. Not just conspiracy theories: Vaccine opponents and proponents add to the covid-19 ‘infodemic’ on twitter. HKS Misinfo Review, Sept. 2020. doi: 10.37016/mr-2020-38. URL https://misinforeview.hks.harvard.edu/?p=2462.
Jockers [2020] M. Jockers. Introduction to the syuzhet package. https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html, Nov. 2020. URL https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html. Accessed: 2021-9-3.
Johnson et al. [2020] N. F. Johnson, N. Velásquez, N. J. Restrepo, R. Leahy, N. Gabriel, S. El Oud, M. Zheng, P. Manrique, S. Wuchty, and Y. Lupu. The online competition between pro- and anti-vaccination views. Nature, 582(7811):230–233, June 2020. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-020-2281-1. URL https://dx.doi.org/10.1038/s41586-020-2281-1.
Karafillakis et al. [2021] E. Karafillakis, S. Martin, C. Simas, K. Olsson, J. Takacs, S. Dada, and H. J. Larson. Methods for social media monitoring related to vaccination: Systematic scoping review. JMIR Public Health Surveill, 7(2):e17149, Feb. 2021. ISSN 2369-2960. doi: 10.2196/17149. URL https://dx.doi.org/10.2196/17149.
Keim-Malpass et al. [2017] J. Keim-Malpass, E. M. Mitchell, E. Sun, and C. Kennedy. Using twitter to understand public perceptions regarding the #hpv vaccine: Opportunities for public health nurses to engage in social marketing. Public Health Nurs., 34(4):316–323, July 2017. ISSN 0737-1209, 1525-1446. doi: 10.1111/phn.12318. URL https://onlinelibrary.wiley.com/doi/10.1111/phn.12318.
Kim et al. [2020] H. K. Kim, J. Ahn, L. Atkinson, and L. A. Kahlor. Effects of covid-19 misinformation on information seeking, avoidance, and processing: A multicountry comparative study. Sci. Commun., 42(5):586–615, Oct. 2020. ISSN 1075-5470. doi: 10.1177/1075547020959670. URL https://doi.org/10.1177/1075547020959670.
Kricorian et al. [2021] K. Kricorian, R. Civen, and O. Equils. Covid-19 vaccine hesitancy: misinformation and perceptions of vaccine safety. Hum. Vaccin. Immunother., pages 1–8, July 2021. ISSN 2164-5515, 2164-554x. doi: 10.1080/21645515.2021.1950504. URL https://dx.doi.org/10.1080/21645515.2021.1950504.
Lazer et al. [2020] D. Lazer, D. J. Ruck, A. Quintana, S. Shugars, K. Joseph, N. Grinberg, R. J. Gallagher, L. Horgan, A. Gitomer, A. Bajak, M. A. Baum, K. Ognyanova, H. Qu, W. R. Hobbs, S. McCabe, and J. Green. The state of the nation: A 50-state covid-19 survey. Technical Report 18, COVID States Project, Oct. 2020. URL https://news.northeastern.edu/wp-content/uploads/2020/10/COVID19_CONSORTIUM_REPORT_18_FAKE_NEWS_TWITTER_OCT_2020.pdf.
Lee et al. [2022] S. K. Lee, J. Sun, S. Jang, and S. Connelly. Misinformation of COVID-19 vaccines and vaccine hesitancy. Sci. Rep., 12(1):13681, Aug. 2022. ISSN 2045-2322. doi: 10.1038/s41598-022-17430-6. URL https://dx.doi.org/10.1038/s41598-022-17430-6.
Liu et al. [2023] L. Liu, K. Mirkovski, P. B. Lowry, and Q. Vu. ”do as i say but not as i do”: Influence of political leaders’ populist communication styles on public adherence in a crisis using the global case of covid-19 movement restrictions. Data Inf Manag, 7(2):100039, June 2023. ISSN 2543-9251. doi: 10.1016/j.dim.2023.100039. URL https://dx.doi.org/10.1016/j.dim.2023.100039.
Liu et al. [2019] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. July 2019. URL https://arxiv.org/abs/1907.11692.
Loomba et al. [2021] S. Loomba, A. de Figueiredo, S. J. Piatek, K. de Graaf, and H. J. Larson. Measuring the impact of covid-19 vaccine misinformation on vaccination intent in the uk and usa. Nat Hum Behav, 5(3):337–348, Mar. 2021. ISSN 2397-3374. doi: 10.1038/s41562-021-01056-1. URL https://dx.doi.org/10.1038/s41562-021-01056-1.
Manguri et al. [2020] K. H. Manguri, R. N. Ramadhan, and P. R. Mohammed Amin. Twitter sentiment analysis on worldwide covid-19 outbreaks. Kurdistan Journal of Applied Research, pages 54–65, May 2020. ISSN 2411-7706, 2411-7706. doi: 10.24017/covid.8. URL https://kjar.spu.edu.iq/index.php/kjar/article/view/512.
Mohammad et al. [2016] S. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, and C. Cherry. SemEval-2016 Task 6: Detecting Stance in Tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 31–41, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/S16-1003. URL https://aclanthology.org/S16-1003/.
Moore [2020] M. Moore. The best twitter accounts to follow for reliable information on the coronavirus outbreak. https://fortune.com/2020/03/14/coronavirus-updates-twitter-accounts-covid-19-news/, Mar. 2020. URL https://fortune.com/2020/03/14/coronavirus-updates-twitter-accounts-covid-19-news/. Accessed: 2022-2-11.
Morens et al. [2022] D. M. Morens, G. K. Folkers, and A. S. Fauci. The concept of classical herd immunity may not apply to covid-19. J. Infect. Dis., Mar. 2022. ISSN 0022-1899, 1537-6613. doi: 10.1093/infdis/jiac109. URL https://dx.doi.org/10.1093/infdis/jiac109.
Morris [2021] P. J. Morris. We have met the enemy, and he is us: Falling childhood immunization rates. N. C. Med. J., 82(2):122–125, Mar. 2021. ISSN 0029-2559. doi: 10.18043/ncm.82.2.122. URL https://dx.doi.org/10.18043/ncm.82.2.122.
Mota [2021] F. Mota. Expert opinions on the most promising treatments and vaccine candidates for covid-19: Global cross-sectional survey of virus researchers in the early months of the pandemic. JMIR Public Health and Surveillance, 4, Feb. 2021. doi: 10.2196/22483. URL https://publichealth.jmir.org/2021/2/PDF#page=129.
Müller et al. [2020] M. Müller, M. Salathé, and P. E. Kummervold. Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. May 2020. URL https://arxiv.org/abs/2005.07503.
Muric et al. [2021] G. Muric, Y. Wu, and E. Ferrara. Covid-19 vaccine hesitancy on social media: Building a public twitter data set of antivaccine content, vaccine misinformation, and conspiracies. JMIR Public Health Surveill, 7(11):e30642, Nov. 2021. ISSN 2369-2960. doi: 10.2196/30642. URL https://dx.doi.org/10.2196/30642.
Murphy et al. [2021] J. Murphy, F. Vallières, R. P. Bentall, M. Shevlin, O. McBride, T. K. Hartman, R. McKay, K. Bennett, L. Mason, J. Gibson-Miller, L. Levita, A. P. Martinez, T. V. A. Stocks, T. Karatzias, and P. Hyland. Psychological characteristics associated with covid-19 vaccine hesitancy and resistance in ireland and the united kingdom. Nat. Commun., 12(1):29, Jan. 2021. ISSN 2041-1723. doi: 10.1038/s41467-020-20226-9. URL https://dx.doi.org/10.1038/s41467-020-20226-9.
Nielsen et al. [2020] R. K. Nielsen, R. Fletcher, N. Newman, J. S. Brennen, and P. N. Howard. Navigating the ’infodemic’: How people in six countries access and rate news and information about Coronavirus. Reuters Institute for the Study of Journalism, Oxford, England, Apr. 2020. ISBN 9781907384745. URL https://ora.ox.ac.uk/objects/uuid:8e0d50bc-4b4b-4df5-988c-ca2e2d41e947.
Nominatim [2020] Nominatim. Nominatim demo. https://nominatim.openstreetmap.org/ui/search.html, 2020. URL https://nominatim.openstreetmap.org/ui/search.html. Accessed: 2021-7-30.
of Health and Family Welfare [2012] M. of Health and G. o. I. Family Welfare. Pentavalent vaccine, 2012. URL https://www.who.int/docs/default-source/searo/india/tobacoo/pentavalent-vaccine-guide-for-hws-with-answers-to-faqs.pdf?sfvrsn=903de90_2.
Office of Infectious Disease and HIV/AIDS Policy (2021) [OIDP] Office of Infectious Disease and HIV/AIDS Policy (OIDP). Vaccines protect you. https://www.hhs.gov/immunization/basics/work/prevention/index.html, Apr. 2021. URL https://www.hhs.gov/immunization/basics/work/prevention/index.html. Accessed: 2022-3-29.
Organization [2020] W. H. Organization. Coronavirus disease 2019 (covid-19): situation report, 73. Institutional Repository for Information Sharing, Situation Report(73), Apr. 2020. URL https://apps.who.int/iris/handle/10665/331686.
Oudeyer and Kaplan [2007] P.-Y. Oudeyer and F. Kaplan. What is intrinsic motivation? a typology of computational approaches. Front. Neurorobot., 1:6, Nov. 2007. ISSN 1662-5218. doi: 10.3389/neuro.12.006.2007. URL https://journal.frontiersin.org/article/10.3389/neuro.12.006.2007/abstract.
Palen [2008] L. Palen. Online social media in crisis events. https://er.educause.edu/articles/2008/8/online-social-media-in-crisis-events, Aug. 2008. URL https://er.educause.edu/articles/2008/8/online-social-media-in-crisis-events. Accessed: 2020-8-8.
Poddar et al. [2022] S. Poddar, M. Mondal, J. Misra, N. Ganguly, and S. Ghosh. Winds of change: Impact of covid-19 on vaccine-related opinions of twitter users. Proceedings of the International AAAI Conference on Web and Social Media, 16(1):782–793, May 2022. doi: 10.1609/icwsm.v16i1.19334. URL https://ojs.aaai.org/index.php/ICWSM/article/view/19334.
Rahman and Alhoori [2023] A. Rahman and H. Alhoori. Visualizing relation between (de)motivating topics and public stance toward covid-19 vaccine. In 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pages 299–300, Oct. 2023. doi: 10.1109/jcdl57899.2023.00067. URL https://doi.org/10.1109/JCDL57899.2023.00067.
Rajapakse [2021] T. Rajapakse. Simple transformers. https://simpletransformers.ai/, 2021. URL https://simpletransformers.ai/. Accessed: 2022-4-30.
Sanh et al. [2019] V. Sanh, L. Debut, J. Chaumond, and T. Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. Oct. 2019. URL https://arxiv.org/abs/1910.01108.
Schmitz et al. [2022] M. Schmitz, O. Luminet, O. Klein, S. Morbée, O. Van den Bergh, P. Van Oost, J. Waterschoot, V. Yzerbyt, and M. Vansteenkiste. Predicting vaccine uptake during covid-19 crisis: A motivational approach. Vaccine, 40(2):288–297, Jan. 2022. ISSN 0264-410x, 1873-2518. doi: 10.1016/j.vaccine.2021.11.068. URL https://dx.doi.org/10.1016/j.vaccine.2021.11.068.
Steindl et al. [2015] C. Steindl, E. Jonas, S. Sittenthaler, E. Traut-Mattausch, and J. Greenberg. Understanding Psychological Reactance: New Developments and Findings. Z. Psychol., 223(4):205–214, 2015. ISSN 2190-8370, 2151-2604. doi: 10.1027/2151-2604/a000222. URL https://dx.doi.org/10.1027/2151-2604/a000222.
Strickler [2006] J. Strickler. What really motivates people? The Journal for Quality and Participation; Cincinnati, 29(1):26–28, 2006. URL https://search.proquest.com/openview/763e6f586b889713c1382cf3815f1a33/1?pq-origsite=gscholar&cbl=37083.
Suarez-Lledo and Alvarez-Galvez [2021] V. Suarez-Lledo and J. Alvarez-Galvez. Prevalence of health misinformation on social media: Systematic review. J. Med. Internet Res., 23(1):e17187, Jan. 2021. ISSN 1439-4456, 1438-8871. doi: 10.2196/17187. URL https://dx.doi.org/10.2196/17187.
Tasnim et al. [2020] S. Tasnim, M. M. Hossain, and H. Mazumder. Impact of rumors or misinformation on coronavirus disease (covid-19) in social media. Mar. 2020. URL https://dx.doi.org/10.31235/osf.io/uf3zn.
the Now [2020] D. the Now. hydrator, 2020. URL https://github.com/DocNow/hydrator.
Thelwall et al. [2021] M. Thelwall, K. Kousha, and S. Thelwall. Covid-19 vaccine hesitancy on english-language twitter. Profesional de la Información, 30(2), Mar. 2021. ISSN 1699-2407, 1699-2407. doi: 10.3145/epi.2021.mar.12. URL https://revista.profesionaldelainformacion.com/index.php/EPI/article/view/86322.
Truong et al. [2022] J. Truong, S. Bakshi, A. Wasim, M. Ahmad, and U. Majid. What factors promote vaccine hesitancy or acceptance during pandemics? a systematic review and thematic analysis. Health Promot. Int., 37(1), Feb. 2022. ISSN 0957-4824, 1460-2245. doi: 10.1093/heapro/daab105. URL https://dx.doi.org/10.1093/heapro/daab105.
Twitter [2020] Twitter. Twitter developer. https://developer.twitter.com/, 2020. URL https://developer.twitter.com/. Accessed: 2020-1-25.
Vasudevan et al. [2021] L. Vasudevan, E. Walter, and G. Swamy. Vaccine hesitancy in north carolina: The elephant in the room? N. C. Med. J., 82(2):130–137, Mar. 2021. ISSN 0029-2559. doi: 10.18043/ncm.82.2.130. URL https://dx.doi.org/10.18043/ncm.82.2.130.
Venigalla et al. [2020] A. S. M. Venigalla, S. Chimalakonda, and D. Vagavolu. Mood of india during covid-19 - an interactive web portal based on emotion analysis of twitter data. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing, pages 65–68. Association for Computing Machinery, New York, NY, USA, Oct. 2020. ISBN 9781450380591. doi: 10.1145/3406865.3418567. URL https://doi.org/10.1145/3406865.3418567.
Venigalla et al. [2021] A. S. M. Venigalla, D. Vagavolu, and S. Chimalakonda. Survivecovid-19++ : A collaborative healthcare game towards educating people about safety measures for covid-19. In Companion Publication of the 2021 Conference on Computer Supported Cooperative Work and Social Computing, Cscw ’21, pages 222–225, New York, NY, USA, Oct. 2021. Association for Computing Machinery. ISBN 9781450384797. doi: 10.1145/3462204.3482891. URL https://doi.org/10.1145/3462204.3482891.
Villacis Calderon et al. [2023] E. D. Villacis Calderon, T. L. James, and P. B. Lowry. How facebook’s newsfeed algorithm shapes childhood vaccine hesitancy: An algorithmic fairness, accountability, and transparency (fat) perspective. Data and Information Management, 7(3):100042, Sept. 2023. ISSN 2543-9251. doi: 10.1016/j.dim.2023.100042. URL https://www.sciencedirect.com/science/article/pii/S2543925123000165.
Wardle and Singerman [2021] C. Wardle and E. Singerman. Too little, too late: social media companies’ failure to tackle vaccine misinformation poses a real threat. Bmj, 372:n26, Jan. 2021. ISSN 0959-8138, 1756-1833. doi: 10.1136/bmj.n26. URL https://dx.doi.org/10.1136/bmj.n26.
Who [2020a] Who. Coronavirus disease (covid-19): Vaccines. https://www.who.int/news-room/q-a-detail/coronavirus-disease-(covid-19)-vaccines, 2020a. URL https://www.who.int/news-room/q-a-detail/coronavirus-disease-(covid-19)-vaccines. Accessed: 2021-4-30.
Who [2020b] Who. Coronavirus disease (covid-19): Herd immunity, lockdowns and covid-19. https://www.who.int/news-room/q-a-detail/herd-immunity-lockdowns-and-covid-19, Dec. 2020b. URL https://www.who.int/news-room/q-a-detail/herd-immunity-lockdowns-and-covid-19. Accessed: 2021-3-4.
Wolf et al. [2019] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush. Huggingface’s transformers: State-of-the-art natural language processing. Oct. 2019. URL https://arxiv.org/abs/1910.03771.
Wolf et al. [2020] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, Oct. 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.6. URL https://aclanthology.org/2020.emnlp-demos.6.
Xin et al. [2021] M. Xin, S. Luo, R. She, X. Chen, L. Li, L. Li, X. Chen, and J. T. F. Lau. The Impact of Social Media Exposure and Interpersonal Discussion on Intention of COVID-19 Vaccination among Nurses. Vaccines (Basel), 9(10), Oct. 2021. ISSN 2076-393X. doi: 10.3390/vaccines9101204. URL https://dx.doi.org/10.3390/vaccines9101204.
Ye et al. [2020] X. Ye, B. Zhao, T. H. Nguyen, and S. Wang. Social media and social awareness. In H. Guo, M. F. Goodchild, and A. Annoni, editors, Manual of Digital Earth, pages 425–440. Springer Singapore, Singapore, 2020. ISBN 9789813299153. doi: 10.1007/978-981-32-9915-3“˙12. URL https://doi.org/10.1007/978-981-32-9915-3_12.
Yoo et al. [2023] W. Yoo, S.-H. Oh, and T. Kim. The effect of social media on preventive behavioural intention during the covid-19 pandemic: Mediating roles of interpersonal communication, social media expression and knowledge. Journal of Creative Communications, 18(2):166–182, July 2023. ISSN 0973-2586. doi: 10.1177/09732586231166115. URL https://doi.org/10.1177/09732586231166115.
Zhou et al. [2023] J. Zhou, Y. Zhang, Q. Luo, A. G. Parker, and M. De Choudhury. Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, number Article 436 in Chi ’23, pages 1–20, New York, NY, USA, Apr. 2023. Association for Computing Machinery. ISBN 9781450394215. doi: 10.1145/3544548.3581318. URL https://doi.org/10.1145/3544548.3581318.

Appendix A Datasets

A.1 Explanation of Datasets Used

We have collected, combined, and analyzed data from different sources. We created a comprehensive list in Table 4 to explain the sources and purpose of the datasets. The items in bold represent the datasets used in the models in this paper.

Table 4: Comprehensive list of datasets used and their purposes

Sl.	Dataset	Source	Purpose
1.	Completed Twitter Dataset	Chen et al. [18]	A list of tweet IDs related to COVID-19. We collected the detailed tweets from this dataset as explained in Section 3.1. We collected $15,768,845$ tweets to create this dataset
2.	Twitter Dataset	Subset of the above	We created a stratified subset of $466,335$ tweets from the above dataset and used it for the experiments in this paper.
3.	COVID-19 Rumor Dataset	[19]	A labeled dataset containing COVID-19 rumors from news sources and Twitter, as explained in Section 3.2.1.
4.	Avax Tweets Dataset	[67]	A list of tweet IDs exhibiting antivaccine stance. We created a stratified sample of $79,093$ detailed tweets from $1.8$ million tweets as described in Section 3.2.2.
5.	Authentic Vaccination Tweets Dataset	[13]	We manually collected vaccine-related tweets from trusted sources using Brandwatch as described in Section 3.2.3
6.	Motivation Training dataset	A combination of $3$ , $4$ , and $5$ above	Combination of three sources to build the ground truth dataset for the training of de/motivating tweets classifier. The process is explained in Section 3.2.
7.	Stance Tweets from Poddar	[76]	A set of labeled tweets expressing the stance towards COVID-19 vaccination. The dataset is explained in Section 3.3
8.	Stance Tweets from Cotfas	[23]	Another tweets dataset with label for stance towards COVID-19 vaccination, as explained in Section 3.3.
9.	Manually Labeled Stance Dataset	Manually labeled	We collected a stratified sample of $1,064$ tweets and manually labeled for stance towards COVID-19 vaccination. Details of the manual labeling are available in Section 3.3.
10.	Stance Ground Truth Dataset	A combination of $7$ , $8$ , and $9$ above	We combined the data from three sources above for training the stance classifier for this paper. This dataset is explained in Section 3.3.

A.2 Labeled Twitter Dataset

The dataset contains Tweet IDs along with the location and tweet timestamp. The tweets are labeled based on motivating/demotivating status, stance towards the COVID-19 vaccine, and topic in the tweet text. We removed the tweet texts and author information to comply with Twitter guidelines. You can use Hydrator API [85] to hydrate the tweets.

The anonymized dataset is available at:
https://zenodo.org/record/6842883/files/anonymized_tweets_with_labels.csv

Appendix B Machine Learning Models

B.1 De/Motivation Classifier

A pre-trained text classifier model to label the tweet texts as motivating or demotivating.

The model is available at:
https://zenodo.org/record/6842883/files/demotivation-classifier-
distilbart-model.7z

B.2 Vaccine Stance Classifier

A pre-trained text classifier model to identify the stance of a tweet text towards the COVID-19 vaccine.

The model is available at:
https://zenodo.org/record/6842883/files/stance-detection-ct-
BERT.7z

B.3 Topic Model

A pre-trained topic model based on BERTopic to identify the topics in tweet texts.

The model for demotivating tweets is available at:
https://zenodo.org/record/6842883/files/topic-model-demotivating-
bertopic.7z

The model for motivating tweets is available at:
https://zenodo.org/record/6842883/files/topic-model-motivating-
bertopic.7z

Appendix C Parameters for Reproducability

C.1 Hardware Specification

1.

The data collection process was performed on a desktop computer with a Windows 10 setup containing 16GB RAM.
2.

The models were trained using a Google Colab Pro+ High-RAM (52GB) setup with an A100 GPU.

C.2 De/Motivation Classifier

Parameters for training the de/motivation classifiers are provided in Table 5.

Table 5: Hyper-parameters for training the de/motivating classifiers

Model	Batch Size	Epochs	Learning Rate
DistilBERT (distilbert-base-uncased-finetuned-sst-2-english)	$16$	$3$	$5e-5$
RoBERTa (roberta-base)	$16$	$3$	$1e-3$

C.3 Topic Model

Parameters for the topic modeling using BERTopic is provided in Table 6.

Table 6: Parameters for re-training the BERTopic model

Parameter	Value
BERTopic version	$0.9.4$
Language	English
Calculate Probabilities	True
Verbose	True

C.4 Vaccine Stance Classifier

Parameters for training the stance classifier is provided in Table 7.

Table 7: Hyper-parameters for training the stance classifiers

Model	Batch Size	Epochs
RoBERTa (roberta-base)	$16$	$10$
CT-BERT (digitalepidemiologylab/covid-twitter-bert-v2)	$8$	$10$

Appendix D Miscellaneous

D.1 Topics from De/Motivating Classes

D.1.1 Motivating Topics

The top 10 frequent topics from the motivating tweets are displayed in table 8 with the number of tweets in that topic and the top 5 words in that topic.

Table 8: Topics from motivating tweets

Topic	Tweets	Top 5 words
Face mask	1452	mask, masks, wear, wearing, face
School	1290	schools, students, school, learning, teachers
Optimism	684	amp, ov, optimistic, myself, ensure
Sports	480	football, game, players, basketball, games
COVID-19 infection	479	sarscov2, sars, variant, infection, genome
Biden inaugural	435	biden, joe, bidens, inaugural, transition
Music	431	music, album, song, spotify, songs
Voting	423	mail, voting, ballots, vote, mailin
Death statistics	386	died, deaths, excess, 1000, per
Cursing COVID-19	351	corona, fck, sht, beer, btch*

D.1.2 Motivating Topics Related to “Vaccine”

Table 9 shows the top 10 “vaccine” related topics from the motivating tweets along with the similarity score to the keyword “vaccine.”

Table 9: Vaccine topics from motivating tweets

Topic	Similarity	Top 5 words
Vaccine rollout	0.77255	desperately, vaccines, therapeutics, rollout, development
Vaccination speed	0.64514	vaccineto, quicklyi, vaccinebut, bricker, coopting
Flu during COVID-19	0.61227	shot, flu, never, fightflu, twindemic
Misinformation and Trump	0.60793	misinformation, trust, pollquestion, trumpsvaccineisalie, hash
Infection prevention	0.60710	slim, iscovid19, contraceptive, prevent, immune
Vaccination directive	0.60689	vaccinationstha, adults, four, received, leading
Herd immunity	0.58614	herd, immunity, natural, seroprevalence, tcell
Flu season	0.57635	flu, influenza, cold, season, eradicated
Re-infection	0.55869	reinfection, suggesting, immunity, peop, hopef
Promoting vaccination	0.54879	vaccinat, protecting, stated, based, fully

D.1.3 Demotivating Topics

The top 10 frequent topics from the demotivating tweets are displayed in table 10 with the number of tweets in that topic and the top 5 words in that topic.

Table 10: Topics from demotivating tweets

Topic	Tweets	Top 5 words
Nevada governor	2723	amp, wheelchairuser, haventwith, readthere, donaldyou
Vaccine mandate	1180	vaccinated, vaccine, vaccines, stacontrolling, rollout
New York governor	1151	cuomo, cuomos, andrew, nursing, homes
NFL COVID-19 restrictions	871	nfl, reservecovid19, rodgers, browns, aaron
COVID-19 in nursing homes	801	nursing, homes, patients, elderly, governors
COVID-19 restrictions for churches	792	church, pastor, churches, worship, easter
Cruise ship COVID-19 outbreak	770	cruise, ship, princess, navy, passengers
Promoting vaccination	763	vaccinated, vaccine, vaccines, minimizes, downleadership
Social distancing	738	distancing, social, practicing, yip, appa
Protective guidance	709	corona, virusmeaning, enveloped, asteroid, dieme

D.1.4 Demotivating Topics Related to “Vaccine”

Table 11 shows the top 10 “vaccine” related topics from the demotivating tweets along with the similarity score to the keyword “vaccine.”

Table 11: Vaccine topics from demotivating tweets

Topic	Similarity	Top 5 words
Vaccine mandate	0.90244	vaccinated, vaccine, vaccines, stacontrolling, rollout
Benefits of vaccination	0.87415	vaccinated, vaccine, vaccines, minimizes, downleadership
Side effects	0.77769	covidvaccine, vaccinesideeffects, vaccinepasspos, fml, vaccinessavelives
Immunity from vaccination	0.66614	immunity, natural, vaccinat, provides, protection
Immunity and virus variants	0.65232	immune, immunity, autoimmune, newslasting, autoummune
Promote vaccination	0.63225	getvaccinated, getvaccinatednow, resurrect, pandemichelp, getvaccinatedasapthe
Immunocompromised	0.58262	immunocompromised, sacrifices, immunodeficient, guidelinescdc, judgment
Vaccination statistics	0.58117	unvaccinated, 992, 995, peopleunvac, foundunvaccinated
Natural immunity	0.57885	immunity, natural, itthat, isbig, covidnatural
Vaccination in black community	0.57702	vaccinesaid, gonna, blacktwitter, cigarette, yall

D.2 Sample Tweets from Different Classes

We have listed a few sample tweets from different de/motivating classes and vaccination stances in Table 12. In this table, we presented the tweets in their original form without any data processing or cleanup for a better understanding of the text. We only removed any icons (emoticons) present in the text.

Table 12: Sample tweets from different de/motivating classes and stances

Tweet	De/Motivating	Vaccination Stance
Can stem cells treat COVID-19? Preliminary pre-clinical results show that lung-specific stem cells significantly reduce inflammation and lung tissue damage.	Motivating	Favor
Two of the last four years on #Thanksgiving and all the other holidays, I was one of them. I’m proud of the work my coworkers and I did, all day, everyday (and night).
For god’s sake, thank service industry workers this year and do NOT give them #COVID19.	Motivating	Favor
BEST. VIDEO. ALL. YEAR. Please share with friends how the mRNA vaccine works to fight the coronavirus.
NOTA BENE—The mRNA never interacts with your DNA. #vaccinate
(Special thanks to the Vaccine Makers Project @vaccinemakers of @ChildrensPhila). #COVID19	Demotivating	Favor
Yes, things are broken but they don’t and won’t change if people don’t fight for it.
Anyway, be safe and keep others safe. The only reason we still have covid is because of the selfish fucks going out like nothing’s changed.	Demotivating	Favor
The policies peddled by incompetent health bureaucrats, weak politicians & the group-thinking media of LOCKDOWN, MASK & INJECT - all while suppressing Ivermectin is criminally negligent in its gross stupidity & total insanity.	Demotivating	Against
For the Record,
I am immune from ALL Mandatory Vaccinations. Mandatory vaccinations violate my Religious and Constitutional Rights and I do not consent to vaccinations of any kind, including COVID-19, the “flu” and any/all other applicable “diseases.”	Demotivating	Against
Re-infection with COVID is extremely rare. Our immune systems work remarkably well against SARS-Cov-2. Anyone saying otherwise is uninformed (not reading the studies/not seeing covid patients) or lying.	Motivating	Against
Not all 35 million took the flu vaccine. A lot less would catch it if they took it, but the danger of dying with the flu is much less than with Covid-19, so many don’t bother. Once there is a safe vaccine for C19, I hope many, many more people take it than take a flu vaccine.	Motivating	Against