COVID-19 misinformation and the 2020 US presidential election

Voting is the defining act for a democracy. However, voting is only meaningful if public deliberation is grounded in veritable and equitable information. This essay investigates the politicization of public health practices during the Democratic primaries in the context of the 2020 U.S. presidential election, using a dataset of more than 67 million tweets. We find the public sphere on Twitter is politically heterogeneous and the majority—liberal and conservative alike—advocates for wearing masks and vote-by-mail. However, a small, but dense group of conservative users push anti-mask and voter fraud narratives.

2020 U.S. Democratic primaries: the use of masks and the legitimacy of mail-in ballots. Whereas misinformation can arise from any community, health misinformation is associated with specific communities.
• A large and expansive cluster of politically heterogeneous users (both liberal and conservative) advocate for wearing masks and mail-in voting. A small but dense cluster of conservative users pushes misinformation about the inefficacy of masks and potential for voter fraud. • This study identifies one of the sources of amplification of misinformation during the  pandemic regarding public health practices and election integrity. We also suggest ways politicized health messages have impacted the most recent 2020 Democratic primaries. • A narrative's potential to be misinformation drives politicization of information just as much as misinformation itself does.

Implications
Voting is the defining act for a democracy. However, this action is only meaningful if public deliberation and decision-making are grounded in veritable and equitable information. Studying the possible effects of misinformation on voting behavior is thus a critical avenue of investigation even as we look back on the recent 2020 US presidential election cycle. This essay examines how health misinformation may be politicized -particularly how political alignment mediates the spread of COVID-19 misinformation.
It is useful to first disambiguate misinformation and disinformation, particularly in the context of politicization, or the use of information for political means. Misinformation is the spread of false information agnostic of intent, while disinformation is the intentional spread of false information. Disinformation campaigns often originate from specific institutions, such as intelligence agencies; however, these campaigns can also emerge spontaneously in online communities.
In this study, we focus on the politicization of COVID-19-related health misinformation and its spread and further analyze two of the most critical narratives during the 2020 US Democratic primary cycle: 1. The legality and fraudulence of voting by mail 2. The efficacy of masks Although there have been widespread misinformation campaigns to convince the populace otherwise, the practice of mail-in voting has already been adopted by several U.S. states and has not been shown to be prone to or affected by significant fraud (Qiu, 2021;Spencer, 2020). The presence of disputed and disproven anti-mask rhetoric on popular social media platforms may adversely affect voter turnout due to health concerns and accessibility to mail-in ballot resources. Research investigating the interplay between misinformation and voting behavior reports conflicting results: for example, using random dialin questionnaires, researchers found that both misinformation and factual information increase voter participation (White et al., 2006). Others focused on issues such as immigration, race, unemployment, and abortion. Automated phone calls ('robocalls') in Canada that contained misleading information about the location of polling stations resulted in a 3% average decrease in participation (Kessler et al., 2013).
These examples demonstrate the nuances in the effects of misinformation, as its transmission modality and type (e.g., political) may influence voter behavior. Today, the modality of interest has shifted from phones to social media, due to its ubiquitous presence. Efforts, spearheaded by the Russian Internet Research Agency (IRA) and others, to deliberately manipulate social media discourse have been well documented both in the 2016 U.S. presidential election (Bessi et al., 2016) and the 2017 French presidential election (Ferrara, 2017). The IRA appeared to have identified and targeted non-white voters (Badawy et al., 2019) months before the election with messages promoting racial identity (Dutt et al., 2018) that may have led to voter suppression (Kim et al., 2018), and certainly sowed division and conflict online (DiResta et al., 2019).
Many political scientists believe that an increase in information leads to electoral participation (Carpini et al., 1996). However, those who lack sufficient information tend to align with "opinion leaders" by following perceptions of knowledge or partisanship (Katz et al., 1966). Other factors, such as directionally motivated reasoning (Flynn et al., 2017), selective exposure (Guess et al., 2018), and correction-induced misperceptions (Nyhan et al., 2010), may also play a role in individual perception. With the COVID-19 pandemic characterized by uncertainty about the disease and best welfare practices, the public is vulnerable to partisan-driven misinformation at the intersection of public health and politics.
Recently, Jost and colleagues (Jost et al., 2018) noted that conservatives, in general, maintain more homogeneous social networks that are more conducive to the flow of misinformation, which would not only make them more vulnerable, but also generate dangerous cascade effects to the general public. Furthermore, prior studies have shown the elderly population engaging more with misinformation during the 2016 U.S. presidential election (Grinberg et al., 2019). As such, the elderly is one of the most susceptible populations to both digital misinformation and COVID-19 health complications.
The COVID-19 pandemic presents a novel chance to assess where health misinformation becomes political (Ferrara, 2020). While misinformation is the current label for our "narratives", the importance in our study, beyond the truth value, is its political impact. The 1918 Spanish Flu has been shown to have generated political extremism that led to higher votes for the Nazi party in areas with more pandemic deaths. The term of choice at the time was propaganda, but the meaning is the same: the deliberate spread of (mis)information to influence elections. In fact, the term misinformation has become so prevalent that it has become core to candidates' campaign strategies (such as Donald J. Trump's use of "fake news" to discredit the media). These narratives may be misinformation, and that possibility, rather than factuality itself, is what makes them effective in politics.
In this study, we investigate two major narratives incubated within the COVID-19 discourse and their interplay with the Democratic primary online chatter on Twitter from March 1, 2020 through August 30, 2020. Upon isolating two health-related narratives prone to misinformation, namely the use of masks in public and the issue of mail-in ballots, we show how mask-related discourse grows with discourse about voting. We find that instances of health-related misinformation continue to circulate after their initial reporting, and a common strategy is to use true stories to drive larger misinformation narratives. Topologically, a large and expansive cluster of politically heterogeneous users constitutes the majority of the public sphere on Twitter, and this group, in general, advocates for wearing masks and mail-in voting. In contrast, a small but dense cluster of conservative users pushes misinformation about the inefficacy of masks and voter fraud. We show that while misinformation, in general, can arise from any point in the network, there is a clear division between communities that spread mail-in ballot and mask misinformation and those that do not.

Findings
Finding 1: Four overarching themes regarding health policies and voting procedures emerged in our data set.
1. We first find, as expected, that Coronavirus discourse dominates much of the Democratic primary discussion during our observation period. This includes rulings by the United States Supreme Court surrounding religious gatherings to allegations that the Coronavirus is a hoax perpetrated by the Democratic party (Blue dotted line, Figure 1). 2. We then identify a second narrative surrounding mail-in ballots and the role the United States Postal Service (USPS) played in the distribution and collection of these ballots. In August, The Washington Post, along with many other news organizations, reported that Postmaster General Louis DeJoy had restructured the postal office and reallocated funding, leading to slower ballot delivery and returns during the primaries, with ramifications stretching beyond the Democratic primaries into the presidential elections (Red solid line, Figure 1). 3. We also find that there is general discourse surrounding imposed lockdowns, their efficacy, and constitutionality, as the United States faced a second wave during the summer of 2020 (Orange dotted line, Figure 2). 4. Finally, we observe numerous tweets surrounding masks and face coverings, with a large number of tweets perpetuating the messaging that masks are a hoax and are ineffective (Purple solid line, Figure 2).

Figure 1. Mail-in ballots and COVID-19-related tweets within primaries-related tweets, plotted as a 3-day rolling average of the percentage of primary-related tweets.
State abbreviations aligned with the day on which the respective state conducted their Democratic primary.

Figure 2. Lockdown and mask-related tweets within primaries-related tweets, plotted as a 3-day rolling average of the percentage of primary-related tweets. State abbreviations aligned with the day on which the respective state conducted their
Democratic primary.
Due to the nature of our dataset and research questions, it is unsurprising that COVID-19 is salient throughout our dataset. Several narratives emerge under the umbrella of COVID-19, with some of the most vocal believing that COVID-19 is a hoax pushed by the Democratic party or that the threat of COVID-19 had already passed. We also find that Hydroxychloroquine (HCQ) and the injection of household disinfectants began to circulate, largely due to Trump announcing that he was actively taking the former as a preventative measure and suggesting that the latter might be worth further scientific investigation as a potential way to combat COVID-19 (Oprysko, 2020).
The controversy around HCQ, in particular, emphasizes the constant evolution of the factuality of a claim. This also motivates our focus on politicization rather than on only factuality. In March, there had initially been suggestions that HCQ may have been effective against COVID-19, prompting the U.S. Food and Drug Administration (FDA) to issue an emergency use authorization (EUA) for HCQ. However, as more clinical reports and studies were conducted, it became apparent that the drug commonly used to treat malaria was not effective in treating COVID-19. The FDA rescinded its recommendation and eventually EUA in April and June respectively, and the World Health Organization removed it from their coronavirus treatment trials (Bull-Otterson et al., 2020;Edwards, 2020;World Health Organization, 2020). We note that the initial effectiveness of HCQ against COVID-19 was unclear due to lack of evidence, but as more evidence showed that HCQ was in fact ineffective, this mirrors the change in factuality of HCQ as a treatment in the context of COVID-19 over time. Despite this, this narrative's political use was evident, regardless of its validity. This demonstrates the dangers of the spread of unverified health-related news stories on social media prior to reaching medical consensus regarding the validity of the story. We also see that the use of these narratives can continue long after their initial reporting.
We also find that the topic of mail-in ballots become more prominent throughout our observation period. During the pandemic, to mitigate transmission risks, many voters began to contemplate voting by mail instead of voting in person. However, after DeJoy's changes to the USPS, Democrats began to call for investigations into these policy changes due to the potential implications they had on not only the primaries but also the U.S. presidential elections (Bogage et al., 2020). There were also many campaigns that claimed mail-in voting would increase voter fraud, a claim that has been deemed false by FactCheck multiple times since mid-April (Farley, 2020). This discourse increased in volume and representation in our dataset after Bernie Sanders conceded to Joe Biden on April 8, 2020, as the focus of the Democratic party shifted from the primaries to the upcoming presidential race.
Discourse surrounding social distancing, stay-at-home orders, and masks in the context of voting begins as early as mid-March and continues to attract attention over time. It then builds significant traction right after April, when multiple states held their primaries or decided to postpone them, implying that voting, social distancing, and mask discourse are largely event driven. The U.S. faced a second wave during the summer of 2020, which could explain the spikes in references to lockdowns and stay-at-home orders that initially beginning to relax but were reimposed in response to the summer spike in certain parts of the country (Wilson, 2020; "As U.S. Coronavirus Cases Hit 3.5 Million, Officials Scramble to Add Restrictions," 2020).
Finding 2: We find that there exists a clear political and content polarization in the retweet user network topology.
We consider political polarization and a user's history of spreading misinformation, as shown in Figure 3, below. In this network, we focus our attention on users, represented by nodes, who have tweeted about mail-in voting and mask-wearing. We constructed weighted directed edges between users, based on the number of interactions they had with each other (specifically retweets and original tweets). Figure 3 was generated first using node2vec (Grover and Leskovec, 2016), which represents social networks in high dimensional space. A two-dimensional layout was then extracted using the t-SNE algorithm (Maaten and Hinton, 2008). Figure 3a) shows the political affiliation of Twitter users. Figure 3b) shows users who have tweeted URLs from domains known for posting misinformation. Figure 3c) shows users who have tweeted factual information or misinformation about mask-wearing and mail-in ballots. High levels of polarization are observed.

Figure 3. Topological distribution of Twitter users who discuss mask-wearing and voting.
In Figure 3a), we observe a clear topological division between blue and red clusters in the top. By network topology, we refer to how nodes in the network are arranged, and how their embeddings are spaced and clustered (such as when represented in a two-dimensional visualization). In much of the public Twitter sphere, there is a heterogeneous cluster of users that has a well-mixed political news diet. The appearance of multiple, homogenous clusters indicates the presence of extreme political polarization. Users predominantly identify as center and left leaning, but there is a large cluster of conservative users in the upper right. This cluster is significantly denser and more homogeneous-we refer to this as the dense conservative cluster. Note that two nodes are plotted closer if they have a higher edge-weight (interact more frequently). As a result, groups of users with shared connections will be visualized closer together. While exact heterophily scores are possible, this would require labeling through community detection and merits the full scope of a separate study. Figures 3b) and 3c) show this data augmented with misinformation tags. Figure 3b) shows users (green) that have previously shared articles from questionable domains containing misinformation, as defined by Media Bias-Fact Check (Zandt, n.d.). We observe that misinformation is spread in both clusters and across a mixture of political affiliations; however, a significant amount arises from the conservative cluster on the upper right. Figure 3c) further shows the distribution of four narrative positions, best represented by the hashtags in Table A1 (see Appendix Part B, "Tagging public health misinformation"). As we discuss in the methods section, we leverage manual annotation to isolate misinformation and factual tweets, and then find co-occurring hashtags and terms to identify a larger set of tweets that align with the following positions: In the discourse about mask-wearing and voting by mail, we observe a clearer division. Whereas most users are predominantly marked by advocacy for mask-wearing and voting by mail, the denser conservative cluster pushes almost exclusively anti-mask wearing discourse and equates voting by mail to voter fraud. It is important to not see this as a reductive division across partisan lines. Figure 3b) shows misinformation can be spread by any user; however, the conservative clusters spread significantly more misinformation. Figure 3c) shows the majority of users from across party lines advocate for confirmed public health practices and safety precautions around voting. Interestingly, even within the dense conservative cluster, sub-communities emerge for which anti-mask or voter fraud discourse takes precedent.
In sum, the public sphere of users on Twitter engaged in conversation on COVID-19 and the primaries take on a specific topology. There is a heterogeneous user-base comprising of a loosely connected majority. In contrast, a dense network of conservative users emerges, disjoint from the majority, which affirms Jost and colleagues' observation that there exist higher levels of homogeneity amongst certain conservative populations (Jost et al., 2018). This dense group demonstrates a propensity to politicize health-related misinformation.
We find the top COVID-19 narratives, when tweeted during the 2020 Democratic primaries, to be highly politicized. We observe that it is not only the factual basis but also the potential for misinformation that contributes to the politicization of information online. For instance, one of the mask narratives stated that there was an N95 mask shortage in the US because the Obama administration had neglected to maintain the stockpile. This was denied by some left-leaning users but is actually true (Sherman, 2020). On the other hand, mask-related misinformation seemed to be pushed exclusively from the dense group of conservative users, which suggests selective exposure to fake news. In hindsight of the Democratic primaries and now the 2020 U.S. presidential elections, this paper provides a birds-eye view and warning on how misinformation and the potential to be perceived as misinformation may galvanize further politicization of surrounding public health policies.

Data curation
We leverage our public COVID-19 Twitter dataset (Chen et al., 2020) and U.S. presidential elections Twitter dataset (Chen et al., 2020) for this study, as Twitter provides a platform for users to engage in conversation surrounding events in real-time. Collection for the former dataset began in late January 2020, while the latter began in May 2019. At the time of this writing, we only had processed our elections data from March 2020 onwards, and so we chose to focus on tweets from both datasets that were posted between March 1, 2020 through August 30, 2020. The Democratic National Convention took place from August 17-20, marking the official shift from the primaries to the presidential election. For this study, we utilize release v2.12 from our COVID-19 dataset and release v1.3 from our U.S. presidential elections dataset. We tracked several related keywords and accounts for each dataset's respective topic, a sampling of which can be found in Table 1. We then filtered the general COVID-19 dataset for tweets related to the Democratic Primary using keywords of interest (Table 2). As we are interested in the U.S. Democratic primaries, we utilize userspecified locations included in each tweet's metadata and normalized these locations (Jiang et al., 2020). We require all tweets to contain normalized location data that originates from the United States with an identifiable state attribution and be tagged as an English tweet by Twitter. We used Latent Dirichlet allocation (LDA) to cluster the tweets into 8 topics (this was selected based on the number of topics with the highest coherence score) and tagged tweets based on their nearest probable topic (Blei et al., 2003). We describe how we construct the final dataset in the Appendix (see Appendix Part A, "Constructing the dataset").

Narrative and community detection
We then focused on two narratives: mask-wearing and voting by mail, using tweets that contain mask or mail-in ballot-related keywords, as listed in Table 3. We remove quoted tweets, as we are interested in original content and the amplification of certain viewpoints, and quoted tweets (retweets with comments) may contain contrarian commentary relative to the retweeted tweet. This results in 5,211,071 vote-bymail tweets and 1,014,751 mask tweets. With this dataset, we found relevant co-occurring hashtags from these tweets (see table A1 in Part B of the Appendix). Using these hashtags, we extracted tweets from the entire collection of primary-related tweets containing any of these hashtags. We also leverage specific hashtags that are indicative of stance to identify if a user has engaged in mask-wearing and voting by mail factual information or misinformation. Please refer to the Appendix Part B, "Tagging public health misinformation" for a more detailed discussion on how we determined hashtag ideology alignment and its surrounding discourse. To infer a user's political affiliation, we matched user-shared URLs with domains from Media Bias-Fact Check to five categories: left, lean left, center, lean right, and right (Zandt, n.d.). For better accuracy, we only included users with more than 10 politically leaning URLs in our visualization. We find the majority URL political affiliation and tag the users as such; in the case of ties, one of the political classifications was chosen at random uniformly. Finally, we merge these two tags for each tweet based on the posting user and cluster the users into one of four categories describing a user's political affiliation and their tendency to spread misinformation or factual information: 1) Democratic and fact, 2) Republican and fact, 3) Democratic and misinformation and 4) Republican and misinformation. This results in 1,253,022 unique users. The domains and the aggregate bias of the data are shown in Table 4 and Figure 4. The most frequent political affiliation of domains shared is from sources that are center left (or lean left), which is consistent with the labels the Pew Center assigns to the most reputable media outlets (Jurkowski et al, 2020). However, the most frequently retweeted individual domains include right-leaning media sources, Fox News, Dallas Morning News, and the Daily Caller. This suggests that conservative tweeters tend to have a more concentrated media diet.  Given that there are more than 67 million tweets, visualizing user behavior in a meaningful way is a highdimensional challenge. A network of social interaction was between Twitter users, where nodes are users and edges are the number of retweets. This is a directed graph, for which the original tweeter is the head. There were 1,028,742 unique users and 2,886,004 unique weighted edges. From there, we applied node2vec, which represents the network in Euclidean space (Grover and Leskovec, 2008). The algorithm conducts random walks to explore "neighborhoods," such that in the final representation nodes are preserved near their neighbors. We set the dimensions to 10 and the random walk length to 100-these were found through experimentation of visualization parameters. Next, we extract the two most prominent bases using the t-SNE algorithm (t-distributed stochastic neighbor embedding), which maps high dimensional data to lower dimensions by constructing Student tdistributions over the dataset. We set the dimensionality to two, as we want to visualize our networks in two dimensions. A discussion of the study limitations can be found in Part C of the Appendix. We find that for the subset of tweets that align with #WearAMask posted by Liberal users, the discourse encourages others to comply with regulations to wear masks. Some of the most frequent bigrams include "social distancing" and "wearing mask." We then look at tweets from Conservative users and find their conversation revolves around Donald Trump's decision to wear a mask and how this action can be used against the Democrats. However, when we look at the #MasksOff discourse, we find that regardless of party affiliation, both Conservatives and Liberals amplify misinformation messaging claiming that doctors believe that masks are adverse for one's health (an example of one such tweet can be seen in Figure A1). For mail-in ballots, liberals tweeting #VoteByMail frequently mention "vote safely," "expand votebymail," and "wear mask," all of which suggest that Liberals are encouraging voting by mail as a means to remain safe during the COVID-19 pandemic. Conservatives are also voicing the same concerns, with mentions of "stay home," "social distancing," but also amplify their unhappiness regarding the Texas Supreme Court's decision to deny Democratic efforts to expand mail-in voting in Texas. On the other side of the spectrum, Liberals and Conservatives posting #VoterFraud-related tweets all reference a testimony given to the House Judicial subcommittee that supports the notion that a shift in mail-in ballots will increase voter fraud in the upcoming U.S. presidential election.

Bibliography
What we find there is that, for tweets supporting factual information, there is slight variance in topic coverage when we compare tweets from users in different parties that are engaging in the same information stance (misinformation versus factual information). However, when we examine misinformation content, there is homogeneity between what users from both parties are pushing on Twitter. This suggests that, for both mail-in ballot and mask-related discourse, both the left and right are susceptible to the same kinds of misinformation.

C. Limitations
It is difficult to compare survey reported political affiliations with political affiliation inferred through social media posts (Deb et al., 2019). Because our data set was filtered for keywords directly related to the 2020 US Democratic primaries, we see a significantly larger volume of tweets from Democratic tagged users, and a much smaller number of tweets attributed to Republican users. Thus, conclusions regarding Republican and Republican-leaning users' narratives were based on a small sample size of users.
We also note that Twitter's free streaming API only returns 1% of the total tweet stream. This means that we are not able to collect all of the tweets that are a part of the COVID-19 and Democratic primaryrelated discourse. However, the 1% sample still serves as a fairly accurate representation of the discourse. Twitter has also recently removed location data from a tweet's metadata, which means that we have had to infer user location based on the user reported location. These locations may not consistently be accurate, and we are unable to identify geolocation data for users who do not specify a location or users who fail to list a location from which we are able to extract location data.