The spread of COVID-19 conspiracy theories on social media and the effect of content moderation

We investigate the diffusion of conspiracy theories related to the origin of COVID-19 on social media. By analyzing third-party content on four social media platforms, we show that: (a) In contrast to conventional wisdom, mainstream sources contribute overall more to conspiracy theories diffusion than alternative and other sources; and (b) Platforms’ content moderation practices are able to mitigate the spread of conspiracy theories. Nevertheless, we locate issues regarding the timeliness and magnitude of content moderation, as well as that platforms filter significantly fewer conspiracy theories coming from mainstream sources. Given this, we discuss policy steps that can contribute to the containment of conspiracy theories by media sources, platform owners, and users.


Essay summary
• We identified 11,023 unique URLs referring to the origin of COVID-19 appearing in 267,084 Facebook, Twitter, Reddit, and 4chan posts between January and March 2020. We classified them based on their source (mainstream, alternative, other) and their content (supporting conspiracy theories, used as evidence for conspiracy theories, neither). We considered URLs in the first two content categories as stories reinforcing conspiracy theories. We investigated whether posts containing these stories were removed or labeled as such by the platforms. Then, we employed appropriate statistical techniques to quantify conspiracy theory diffusion between social media platforms and measured the impact of content moderation.
• We found that alternative sources generated more stories reinforcing conspiracy theories than mainstream sources. However, similar stories coming from mainstream sources reached significantly more users. We further quantified conspiracy theory dynamics in the social media ecosystem. We found that stories reinforcing conspiracy theories had a higher virality than neutral or debunking stories. • We measured the amount of moderated content on Reddit, Twitter, and Facebook. We concluded that content moderation on each platform had a significant mitigating effect on the diffusion of conspiracy theories. Nevertheless, we found that a large number of conspiracy theories remained unmoderated. We also detected a moderation bias towards stories coming from alternative and other sources (with other sources comprising personal blogs and social media submissions, e.g. tweets, Facebook posts, Reddit comments, etc.). • Results suggest that policymakers and platform owners should reflect on further ways that can contain COVID-19-related conspiracy theories. Content moderation is an effective strategy but can be further improved by overcoming issues of timeliness and magnitude. There should also be additional transparency on how and why content moderation takes place, as well as targeted design interventions, which can inform and sensitize users regarding conspiracy theories.

Argument & Implications
The COVID-19 health crisis resulted in the burst of an unprecedented misinfodemic on social media: A vast amount of pandemic related misinformation appeared, which in turn influenced society's response to the virus (Gyenes et al., 2018). Given the absence of exact social and scientific knowledge about the origin, nature, and impact of the coronavirus, many conspiracy theories quickly emerged, seeking to provide explanations. To confront the overwhelming amount of misinformation, social media platforms and fact-checking agencies increased attempts to moderate such content by removing or flagging it, often relying on algorithmic decision making (ADM) systems (Brennen et al., 2020;Newton, 2020).
In this study, we aim to understand how conspiracy theories spread at the beginning of the COVID-19 health crisis, and based on this, uncover possibilities and issues of fact-checking in the social media ecosystem (Marwick & Lewis, 2017). To achieve this, we measured the appearance of stories reinforcing conspiracy theories on four platforms: Facebook, Twitter, Reddit, and the subsection of 4chan called "politically incorrect" or "/pol/", which is a prominent forum of conspiracy theorists. In contrast to other cases (Cosentino, 2020), 4chan was not the only source of conspiracy theories in the ecosystem (finding 1). We found that stories reinforcing conspiracy theories became more viral than stories either debunking them or having a neutral stance (finding 1). This complies with previous findings on misinformation (Vosoughi et al., 2018;Vicario et al., 2016).
Most of the stories reinforcing conspiracy theories originated from alternative sources, personal blogs, and social media posts (83%). However, such content coming from mainstream sources (17%) resulted in higher numbers of Facebook and Twitter shares (60% and 55% of the total respectively). Mainstream sources included high-credibility news outlets, such as the New York Post or Fox News, scientific websites such as biorxiv.org, and other widely credible sites, such as Wikipedia. Alternative sources included untrustworthy and low-credibility outlets, such as Infowars and Breitbart. Although alternative and other sources were the main carriers of conspiracy theories, mainstream sources had a higher impact on the spread of conspiracy theories (finding 2).
We investigated the platforms' moderation practices, which varied to a certain degree. Twitter and YouTube removed stories that supported conspiracy theories (Gadde & Derella, 2020;Binder, 2020), while Reddit and Facebook either removed or flagged them (Reddit content policy, 2020; Jin, 2020). On Reddit, removing or flagging depended on the rules of each sub-community, whereas on Facebook on whether the company reviewed the stories themselves (removed) or relied on third-party fact-checkers (flagged).
We concluded that the platforms' moderation practices strongly reduced the probability of stories reappearing in the total ecosystem. Hence, theoretically, instantly removing or filtering conspiracy theories would contain their spread. However, content moderation is a complex and time-consuming process, with human workers and ADM systems facing obstacles in accuracy and efficiency (Roberts, 2019;Graves, 2018;Gillespie, 2018;Serrano et al., 2020). This lies both in the large amount of content to be factchecked, but also in the nature of the content, which is often difficult to categorize as conspiracy theory or not (Krafft et al. 2020;Uscinski et al. 2013;Byfold, 2011;Dentith, 2014;Krause et al., 2020). In our study, we found that the platforms managed to fact-check only between 15% to 50% of posts containing stories reinforcing conspiracy theories, with moderation in many cases taking place weeks after they became viral (finding 3,4).
We observed that each platform faced different obstacles in content moderation. For example, content moderation on Twitter was less effective than on the other platforms (finding 4). We can probably explain this effect by the timeliness of content removal, as misinformation on Twitter spreads significantly in the first hours after its first appearance (Vosoughi et al., 2018). YouTube also faced issues of timeliness. For instance, a video that stated that the pandemic is a planned conspiracy gathered up to 5 million views in a period of only two days (Wong, 2020), with copies of the video continuously being re-uploaded after its removal. Facebook filtered the least amount of stories reinforcing conspiracy theories, while Reddit appeared to not moderate older content. These results illustrate the challenges that platforms and policymakers should overcome. Besides issues of timeliness and moderation magnitude, platforms should investigate if removing or flagging content is an optimal practice, not only for containing misinformation but also for maintaining a politically inclusive environment. Since Facebook and Reddit have mixed moderation policies, it would be important to quantify different effects between misinformation control and user engagement.
A further implication of the study is related to the existence of a moderation bias on all platforms, with stories reinforcing conspiracy theories and coming from mainstream sources being filtered significantly less. This is an important finding, given that mainstream sources prevailed as a key factor for conspiracy spread in our study, and that many ADM systems for classifying contents take a source's credibility level as input (Atananosova et al., 2019). Therefore, platform owners should pay more attention to what they moderate and why, and clearly explain their decisions to the users. Studies show that additional transparency and deliberation in content removal make users more aware of the type of information they are consuming, change the way they interact with it, and build trust between them and the services (Fazio, 2020;Ruzenberg, 2019;Suzor et al., 2018;Ruzenberg, 2019;Krause et al. 2020). Finally, mainstream sources should be aware that the information they produce in the process of reportage could be exploited for the support and general reinforcement of conspiracy theories.
We hope that these recommendations can guide platforms and policymakers towards solutions that can accompany traditional content moderation, which we found to be an effective technique for containing the spread of conspiracy theories.

Findings
This study investigates content moderation practices about conspiracy theories related to the origin of COVID-19. It includes four important findings regarding conspiracy theory dynamics on social media, as well as the possibilities and issues of fact-checking for mitigating the spread of conspiracy theories.
Finding 1: URLs reinforcing conspiracy theories went more viral than URLs being neutral or debunking conspiracy theories. In both cases, URL dissemination followed complex paths in the social media ecosystem.
For the three months under investigation, we quantified the spread of conspiracy theory related URLs on social media (RQ1). Results suggest that paths and intensity varied depending on the type of URL ( Figure  1). Neutral or debunking stories primarily spread in the ecosystem after being presented on Twitter, while a significant amount of URLs disseminated on other platforms through 4chan. On the other hand, stories reinforcing conspiracy theories followed different routes. URLs present on 4chan spread on Twitter, and stories on Twitter were further distributed on Facebook. Reddit had an impact on stories reinforcing conspiracy theories both on Facebook and 4chan, while Facebook was feeding 4chan with both URL types. Overall, conspiracy theory diffusion models showed that stories reinforcing conspiracy theories became more viral within the ecosystem than the rest. This complies with previous research studies stating that misinformation and provocative content is disseminated more than factual content on social networks (Vosoughi et al., 2018;Vicario et al., 2016). Furthermore, these findings show that information paths between social media are complex and content dependent, and reject the statement that fringe social media are the only contributors to conspiracy theory dissemination (Cosentino, 2020). In contrast, we found that all platforms contributed to the spread of stories reinforcing conspiracy theories. Finding 2: Mainstream sources played a bigger role in conspiracy theory dissemination than alternative and other sources.
We classified our sample of stories reinforcing conspiracy theories based on their source and quantified their popularity using Twitter and Facebook shares (RQ2). The 83% of the conspiracy theory reinforcing URLs originated from alternative or other sources, and only 17% came from mainstream sources. However, stories coming from mainstream sources were on average and overall more popular ( Figure 2). On average, mainstream URLs supporting conspiracy theories were shared four times more on Facebook and Twitter in comparison to URLs coming from alternative sources. Similarly, mainstream URLs used as evidence for the truthfulness of conspiracy theories were shared two times more. Overall, 17% of stories reinforcing conspiracy theories coming from mainstream sources resulted in 60% and 55% of the total Facebook and Twitter shares, respectively. These results are explainable since users usually read and share sources they trust (Brennen et al., 2020;Epstein et al., 2020), and mainstream sources have a higher reach and acceptance in society. In Table 1, we provide an exemplary set of URLs, their content type, and the number of Facebook and Twitter shares they evoked. For a more detailed analysis, refer to the appendix. Finding 3: Moderating content either by removing or flagging it significantly reduced the spread of conspiracy theories in the ecosystem.

Figure 2. Bar plots illustrating median Facebook and Twitter shares for URLs supporting conspiracy theories or used as evi
Information diffusion models quantified the impact of content moderation on the virality of conspiracy theories (RQ3). The models yielded for each case a value α <= 1, which denotes the probability that a submission containing a URL will lead to the creation of another submission containing the same URL. Table 2 illustrates the mean difference of that probability when comparing models trained on URLs that were either moderated or not. Results suggest that content moderation significantly decreased the probability that a story reinforcing conspiracy theories will reappear on the same platform, but also that it will diffuse on another platform. For Facebook and Reddit, this probability reduction exceeded 90%. By contrast, moderating content on Twitter had a smaller in-platform effect, which did not exceed 10%. A potential explanation for this is the nature of retweeting on the platform, with users spreading copies of a message in short periods after its initial submission. Thus, information can get viral before moderation mechanisms can trace it and remove it. Nonetheless, models provided evidence that content moderation practices indeed can reduce the spread of conspiracy theories in the social media ecosystem. Despite the finding that content moderation practices reduced the spread of conspiracy theories, our study also detected open issues when investigating RQ3. First, the biggest part of stories reinforcing conspiracy theories on the platforms remained unmoderated (between 50-85% depending on the platform) as shown on Figure 4. Especially for Reddit and Facebook, we found that if stories were not removed close to their initial submissions, the probability of them being removed later was very low. In contrast, YouTube and Twitter kept on filtering content later in time, although many of the stories had already reached peak virality. Second, we calculated the ratios of removed stories for each source type and located a source bias. On all three platforms, submissions with URLs coming from mainstream sources were removed or flagged significantly less by content moderators. This bias in content removal was translated into a relative percentage of 10 to 30 percent, depending on the platform.

Methods
We collected social media submissions from Reddit, Facebook, Twitter, and 4chan related to COVID-19 between January 1 and April 1, 2020. We extracted 9.5 million Reddit submissions and comments, 4.2 million Facebook posts, and 83 million tweets matching the query "COVID-19 OR coronavirus." For this, we used the Pushshift Reddit API (Baumgartner, 2018), Crowdtangle's historical data (Silverman, 2019), and the COVID-19 Twitter dataset developed by Chen et al. (2020). For 4chan, we crawled the total "Corona" thread and its sub-threads in 4chan's "politically incorrect" board and collected 1.5 million posts. From the complete dataset, we selected only the submissions that referred to the origin of COVID-19 by using the query "biowarfare OR biological weapon OR bioweapon OR umbrella corp OR man-made OR human origin OR man-made OR biosafety." We selected this query after reading multiple submissions and locating conspiracy theories that were reoccurring. As a final preprocessing step, we obtained all URLs from these submissions and created a list of 11,023 unique URLs. We visited each of the 11,023 URLs and manually coded the stories depending on their relation to conspiracy theories. To develop a coding scheme, we adopted a definition of conspiracies and conspiracy theories based on prior theoretical work. According to Keely (1999), a conspiracy is a secret plot by two or more powerful actors. Conspiracy theories are efforts to explain the ultimate causes of significant sociopolitical events, such as the origin of COVID-19, by claiming the existence of a secret plot, by challenging institutionalized explanations (Byford, J., 2011) and many times by denying science . As Byford (2011) states, conspiracy theories follow a three-point explanatory logic: (a) There is a conspiracy as the main narrative of a story. For COVID-19, stories argued that a set of powerful individuals or groups, be that governments, institutions, or wealthy actors developed the virus for their specific interests.
(b) Conspiracy theories generally ground their validity either on indirect evidence or on the absence of evidence. For COVID-19, many stories claimed that there is a conspiracy because there exist patents on engineering coronaviruses and even a book mentioning a virus originating from Wuhan. Similarly, some stories argued that the virus should be man-made because specific research publications could not conclude on the exact animal that carried the virus.
(c) Conspiracy theories are structured in a way that stories become irrefutable, and hence hard to challenge . For example, the statement "A book talked about a virus originating from Wuhan 40 years ago. Therefore, COVID-19 is man-made" is causally oversimplified and thus impossible to provide counterevidence to reject it.
By using this framework, we defined three labels for classifying URLs: [1] Supporting conspiracy theories. In this case, URLs supported a conspiracy theory. The authors believed that some actors conspired to create COVID-19 and justified their thesis in the existence or absence of specific evidence.
[2] Evidence used to support a conspiracy theory. This class included URLs that did not directly link to a conspiracy theory, but social media users cited them as evidence for the conspiracy theories. For example, users linked to older articles about bioweapons to prove that specific countries created COVID-19. We considered this category also as reinforcing conspiracy theories because social media submissions containing these URLs were moderated by social media platforms. Furthermore, users grounded conspiracy theories on them in the way mentioned in (b).
[3] Neither. URLs with stories that did not refer to any type of conspiracy, that debunked conspiracy theories, mentioned conspiracy theories without believing them, or cited third parties that did believe in them.
We further labeled the URLs according to their source type. We defined three classes: [i] Mainstream sources. These included scientific articles, patent repositories, Wikipedia, government websites, high credibility and widely acceptable media outlets. We used the list generated by Shao et al. (2016) and fact-checking websites (e.g. adfontesmedia.com, newsguardtech.com, allsides.com) to identify credible media outlets.
[ii] Alternative sources. These included media outlets defined as low credibility by Shao et al. (2016), or ranked as untrustworthy by previously mentioned fact-checking websites.
[iii] Other sources. These included social media submissions from Facebook, YouTube, Twitter, and Reddit, or personal websites and blogs.
To validate coding, two additional reviewers labeled a subsample of 300 URLs. The Krippendorf alpha was 0.92, while the pairwise Cohen's kappa were in all cases equal or greater than 0.9. These values suggest that there was high intercoder reliability in the labeled dataset. The subsample of 300 URLs and their corresponding reviewers' labels are available at the data repository of the study (see data availability).
For further examples of our coding scheme, please refer to Table 5 in the appendix. After labeling, 4,724 URLs were supporting conspiracy theories (1) or were used as evidence of conspiracy theories (2). We searched for these URLs in the original dataset and identified 267,084 submissions that contained them. We modeled URL cross-platform diffusion by using a mathematical technique known as Hawkes process. Hawkes process is a model that quantifies how specific events influence each other over time in an ecosystem containing multiple components. In our case, the components are the social media platforms, and an event is the appearance of a post containing a specific URL on any platform. The Hawkes process can quantify how likely it is that the posting of a URL on a platform will cause the same URL to be posted again on any platform in the information ecosystem (Zannetou et al., 2017). This is given by a parameter αij, which gives the expected number of times a URL will appear on platform j if it was only posted on platform i, and functions as a proxy for a URL's virality (Rizoiu et al., 2017). We calculated parameters αij to study the flow of conspiracy related content across the four social media platforms, as illustrated in finding 1. For more information refer to the appendix.
To investigate the role of mainstream, alternative, and other sources in conspiracy theory dissemination, we used Buzzsumo to obtain the total number of shares each URL evoked on Facebook and Twitter. Buzzsumo provided metrics for 1,850 URLs in our sample (finding 2). We then studied whether submissions containing conspiracy theory reinforcing URLs have been removed or flagged by the platforms. We crawled Facebook, Twitter, YouTube, and Reddit once at the beginning of April 2020 and once at the beginning of May 2020 to understand content moderation (finding 4). With this new information, we ran Hawkes processes separately on moderated and unmoderated content. Finally, we compared virality parameters α for each case to quantify how much content moderation influenced conspiracy diffusion in the ecosystem (finding 3).

Appendix Conspiracy theories: definition & coding
To develop a coding scheme, we adopted a definition of conspiracies and conspiracy theories based on prior theoretical work. According to Keely (1999), a conspiracy is a secret plot by two or more powerful actors. Conspiracy theories are efforts to explain the ultimate causes of significant sociopolitical events, such as the origin of COVID-19, by claiming the existence of a secret plot, by challenging institutionalized explanations  and many times by denying science . As  states, conspiracy theories follow a three-point explanatory logic: (a) There is a conspiracy as the main narrative of a story. For COVID-19, stories argued that a set of powerful individuals or groups, be that governments, institutions, or wealthy actors developed the virus for their specific interests.
(b) Conspiracy theories generally ground their validity either on indirect evidence or on the absence of evidence. For COVID-19, many stories claimed that there is a conspiracy because there exist patents on engineering coronaviruses and even a book mentioning a virus originating from Wuhan. Similarly, some stories argued that the virus should be man-made because specific research publications could not conclude on the exact animal that carried the virus. Clarke (2002) explained this type of argumentation in conspiracy theories by the fundamental attribution error bias: the tendency of humans to overstate or understate the relation between events and individuals to confirm personal dispositions.
(c) Conspiracy theories are structured in a way that stories become irrefutable, and hence hard to challenge ). This feature is a result of the nature of evidence used to support the conspiracy theories as mentioned in (b). For example, the statement "A book talked about a virus originating from Wuhan 40 years ago. Therefore, COVID-19 is man-made" is causally oversimplified and thus impossible to provide counterevidence to reject it.
By using this framework, we defined three labels for classifying URLs (Table 3): [1] Supporting conspiracy theories. In this case, URLs supported a conspiracy theory. The authors believed that some actors conspired to create COVID-19 and justified their thesis in the existence or absence of specific evidence.
[2] Evidence used to support a conspiracy theory. This class included URLs that did not directly link to a conspiracy theory, but social media users cited them as evidence for the conspiracy theories. For example, users linked to older articles about bioweapons to prove that specific countries created COVID-19. They also cited a Wikipedia article about the Wuhan Biosafety lab as proof that the virus leaked from there. We considered this category as reinforcing conspiracy theories because social media submissions containing these URLs were moderated by social media platforms. Furthermore, users grounded conspiracy theories on them in the way mentioned in (b).
[3] Neither. URLs with stories that did not refer to any type of conspiracy, that debunked conspiracy theories, mentioned conspiracy theories without believing them, or cited third parties that did believe in them. We further labeled the URLs according to their source type. We defined three classes (Table 4): [i] Mainstream sources. These included scientific articles, patent repositories, Wikipedia, government websites, high credibility and widely acceptable media outlets. We used the list generated by Shao et al. (2016) and fact-checking websites (e.g. adfontesmedia.com, newsguardtech.com, allsides.com) to identify credible media outlets.
[ii] Alternative sources. These included media outlets defined as low credibility by Shao et al. (2016), or ranked as untrustworthy by previously mentioned fact-checking websites.
[iii] Other sources. These included social media submissions from Facebook, YouTube, Twitter, and Reddit, or personal websites and blogs.

Hawkes processes
We modeled the diffusion of conspiratorial and normal URLs in the social media ecosystem in order to understand cross-platform dynamics and the effect of content moderation practices. We assumed that in a cross-platform setting, users share contents on a platform, and other users consume it and sometimes reshare it on the same or on another platform in the ecosystem. The total life-span of a specific content in the ecosystem can be described by a point-process, i.e., a set of points in time, where each point denotes the appearance of the content. This point-process is self-exciting, meaning that the occurrence of previous points makes the occurrence of future points more probable. For example, the appearance of a tweet will trigger the appearance of a set of retweets in the future, which will not have happened without the occurrence of the initial event.
In our case, the appearance of an event (a submission containing a specific URL on a specific social media platform) can trigger the appearance of a new event on any platform in the ecosystem. A mathematical model that can describe such a multi-dimensional self-exciting point-process is the Hawkes process. A Hawkes process is a -dimensional counting process ( )=( 1( )⋯ ( )), where each component is a counting process: , with D the number of social media platforms under consideration, and ,1 , ,2 ,… being the timestamps that an event (the appearance of a specific URL) will be observed on platform . The intensity N of such a process is given by the function: for =1,…, . Such an intensity function describes cross-platform effects induced by events created on platform i on events on platform . The nature of the effects are encoded by the kernel function , t j k are the timestamps for all events on platform j, and is a baseline intensity that gives the magnitude of influence (how much are specific contents spread in the ecosystem in general). In our study, we use a kernel function of exponential decay, because it is able to describe content virality (how rapidly and widely is specific content diffused in the network given its previous appearance), and memory over time (for how long it remains prevalent in the ecosystem). This function is described by: , where α ij is the virality parameter that gives how viral content became on platform j given the appearance of content on platform i, and β ij is the parameter that describes for how long contents appeared on platform j given their appearance on platform i. To calculate parameters α ij ,β ij we performed maximum likelihood estimation after splitting our data on train and test set. We fitted various Hawkes processes in order to understand differences in virality between normal and conspiratorial URLs, as well as between contents that were moderated or not for each platform.