How to Track Online Disinformation Networks

by Nicola Bruno

IN SHORT: With a clear, easy-to-use 4-step methodology, this guide shows you how to discover, map, and track online disinformation networks across languages and contexts.

Disinformation networks, often invisible yet impactful, are shaping our world in ways that we are only beginning to understand. From foreign influence operations such as the notorious Russian troll factories operating in different countries around the world, to content farms that pollute the information ecosystem with polarizing click-baiting content, increasingly disinformation is organized in network infrastructures with the aim of manipulating and deceiving in a coordinated way.

This guide is not just a manual; it’s a call to action for journalists, activists, and researchers to delve into the murky waters of digital deceit, in order to expose networked disinformation and manipulation campaigns. With a clear, easy-to-use 4-step methodology, the guide empowers you to discover, map, and track these deceptive networks across languages and contexts. Our journey goes beyond mere exposure; it’s about understanding the multi-dimensional nature of these campaigns and adopting a networked approach to reveal the ‘invisible’ forces at play behind manipulation campaigns.

Methodology and case-studies

As defined by the Media Manipulation Casebook by the Shorenstein Center on Media, Politics and Public Policy, “media manipulation is a process where actors leverage specific conditions or features within an information ecosystem in an attempt to generate public attention and influence public discourse through deceptive, creative, or unfair means”. It is a broad term that can be used to define a variety of other terms, such as disinformation, information operations, or influence operations. Most recently, the European Union has introduced the new acronym FIMI that stands for “foreign information manipulations and interference”. FIMI is a form of “strategic, coordinated and most importantly intentional manipulation”, as in the notorious case of the Russian “troll factory”, as well as other less known manipulation campaign run in different countries around the world, as some well-known pro-China or pro-India networks of disinformation.

In this guide, I will not only focus on cases of foreign influence with a strong underlying political objective. More and more often behind disinformation networks there can be economic goals, for monetizing through advertising. This is the case of the network of sites that during the 2016 US elections was run by a group of teenagers from a small town from Macedonia, or a case that I worked on extensively in Italy that led to the discovery of a network of sites located in a very small town in southern Italy. Other interesting case studies have been exposed by Buzzfeed News (about how one of the biggest alternative media networks in Italy were spreading anti-immigrant misinformation on Facebook) and the Poynter Institute (about a website impersonating a fact-checking outlet to publish fake news stories in Brazil). Nor should we overlook the new cases of coordinated disinformation based on extensive use of generative artificial intelligence (GenAI), as in the case of this network of 125 news unreliable AI-generated news brought to light by NewsGuard.

Even if intentions vary (political influence in some cases, economic in others), these case studies have in common the existence of disinformation networks that have a complex infrastructure based on several websites, acting in a coordinated manner with each other, often exploiting the amplification of social media channels.

Tracking disinformation and fact-checking: what is the difference?

Although such investigations can sometimes take their cue from the analysis of individual instances of misinformation (a photo, a video, a news item), they differ considerably from simple fact-checking in that they go beyond the verification of the individual content (text) and focus instead on everything that surrounds it (context).

As Withney Phillips and Ryan M. Milner well point out in “You Are Here: A Field Guide for Navigating Polarized Speech, Conspiracy Theories, and Our Polluted Media Landscape” (surely one of the best books to have appeared recently on the subject of disinformation), “the trick is to remember that polluted information never just appears” and we should learn to “triangulate” all the information we find online in order to better understand where they come from:

“The first task is to triangulate our respective ‘you are here’ stickers on the network map (…) Looking down at the roots beneath our feet helps us trace where polluted information came from and how it got there. What networks has the information traveled through, and what forces have pinged it from tree to tree, grove to grove? This pollution might be economically rooted, grown from profit-driven corporate institutions. It might be interpersonally rooted, grown from long-held assumptions about what’s acceptable to share, or funny to share, or necessary to share. It might be ideologically rooted, grown from the deep memetic frames that support people’s worldviews.” (You Are Here, p. 182-183)

By shifting the investigation from text to context, it is possible to better grasp the multi-dimensional nature of disinformation and thus identify actors that go beyond the author of a single post or content. Such investigations are certainly more time-consuming and complicated, but they are more effective in mapping the networked and participatory dynamics of digital disinformation.

Another important aspect to consider is that precisely because of the multi-dimensional nature of online information, not all actors involved may be aware that they are contributing to a process of disinformation. In such cases, experts speak of misinformation.

In order to better trace how disinformation is born, circulates and spreads in digital environments, in this guide I propose a step-by-step method (four steps) for uncovering disinformation networks in different contexts and languages. For the sake of ease of exposition, the steps will be presented in consecutive order, but, as is often the case in investigations, many steps will be recursive and it will therefore be necessary to move from one step to the next in a non-sequential manner.

Research steps and structure:

These are the four steps:

Identify a disinformation website

Discover who is behind a website

Follow the money!

Monitor social amplification

Each of the four steps of the process includes 3 specific formats (sub-sections) highlighted with different colour backgrounds. The 3 formats are:

a. Case Studies - examples to illustrate the specific method/ step

b. Tools - resources you can use to gather evidence

c. Now It’s Your Turn - suggestions to get you started with practicing.

1.Identify a disinformation website

Increasingly, disinformation campaigns are conducted on social media networks and private messaging apps such as Telegram or WhatsApp. Conducting investigations on these platforms is not easy, because very often it is not easy to reconstruct who runs an account or page.

With websites, on the other hand, you can have access to much more useful information. Websites let you have access to historical data, business connections and network dynamics that usually are not available for social media accounts or messaging apps. This is why this guide recommends you to identify a source website that spreads disinformation and then widen the look to the sites or social accounts it is connected to.

Usually, disinformation sites focus on two types of topics:

1.Polarizing topics, such as public health, migration, ongoing conflicts, “culture wars”;
2.”Data voids”, especially during breaking news events or spreading conspiracy theories (example: “Adrenochrome” in the Qanon conspiracy, see details here). Michael Golebiewski and danah boyd used the term “data void” in this 2018 report to describe “unique topics or terms that result in minimal, low-quality, or manipulative information from search engine queries”.

The best way to identify a disinformation site is to find a polarizing topic or data void and conduct a search on a specialized search engine such as Google Fact Check Explorer. This is a Google service that aggregates and searches within debunked content from dozens of fact-checking newsrooms around the world.

Case Study

1.I start from the data void “ivermectin”, an alleged anti-covid drug used by manipulators and conspiracy theorists during the Covid-19 pandemic.

2.I conduct a search in the “CoronaVirusFacts Alliance Database or Google Fact Check Explorer and I pick this AFP fact-check result (archived with the Wayback Machine here).

*Screenshot of search for “ivermectin” in Poynter’s CoronaVirusFacts Alliance Database. Taken on 20 February 2024 by ETI.

3.AFP tells us that one of the sites that spread the fake news is The Gateway Pundit (archived here with Archive.is / and with Wayback Machine). The domain identified will be used for further investigation in the next steps.

Tools:

In addition to Google Fact Check Explorer, I also recommend exploring other tools and databases that allow one to directly or indirectly reach sites that spread disinformation.

Google Fact-Check Explorer – Search engine that aggregates fact-checks conducted by various organizations in several languages and countries. It is possible to search both by keywords and by images.

EDMO Repository of Fact-checking articles – A database run by the European Digital Media Observatory that lets you access fact-checking articles published by EDMO and filter content from all EU member states + Norway.

List of fake news websites on Wikipedia – A collaboratively compiled list of sites that spread disinformation. Despite the title of the entry (the term ‘fake news’ is problematic and should be avoided), it provides useful insights. Another aspect to be careful of: all such lists can be controversial, as there are no standard methodologies to define a site in its entirety as disinformation (it may have published only a few instances of disinformation).

UkraineFacts Database – Database developed by Maldita.es in partnership with the International Fact-checking Network Signatories. Lists false and later verified news about the Russia-Ukraine conflict.

Covid-19 Misinformation Database – Another database of fact-checks by International Fact-checking Network signatories and other local partners.

EuVs.Disinfo Dataset – Over 16,000 campaign cases originated by media outlets close to the Kremlin and produced in several languages. The database is compiled by EUvsDisinfo and updated weekly.

IO Archive – An aggregator of datasets released from X (Twitter) and Reddit since 2018.

Now it’s your turn. Identify a disinformation website:

1.Try to identify a controversial issue on a topic you are dealing with. Or a ‘data void’ that is often used to misinform on a topic you know well.

2.Conduct a search on the resources listed on.

3.Try to select a website that spreads misinformation. In the next steps we will try to find out whether it is part of a larger disinformation network.

2.Discover who is behind the website

Once you have identified a disinformation website, the first thing to do is to find out who is behind it. You can check the “About” section of the site, or do a general search on a search engine. If you are lucky, you may find a Wikipedia entry or reports from the news, as is the case here with “The Gateway Pundit”.

But you are not always that lucky. On the contrary, the best manipulators tend to be very careful about covering their tracks.

This is where Whois can come to the rescue. Whois is a protocol used for “querying databases that store an Internet resource’s registered users or assignees”, meaning data about who registered and/or administers a website. Usually Whois offers you information as:

registrant name/address/contacts (meaning who registered the domain name)
date of registration
date of last update
historic data (IP, registrar, hosting changes)

Following the implementation of the General Data Protection Regulation (EU GDPR) legislation in Europe, many Whois records do not contain information on sensitive data such as the registrant’s name or contact details. However, Whois can still be useful for finding relevant information, such as when the domain was registered and when it was last updated.

ICANN’s new Registration Data Request Service (RDRS)functionality:

Since December 2023, ICANN has introduced a new Registration Data Request Service (RDRS) functionality that allows access to non-public information that has been hidden for privacy reasons. Non-public information data are the contact name, home or email address, and phone number related to a domain. The functionality is only active for generic top-level domains (such as.com, .org, .net, .edu, .gov, .mil) and not for national domains (such as, for example, .fr for France or .to for Tonga).

You need to create an ICANN account in order to send and monitor the requests. If the domain is one of the generic top-level domains, you can send a request with several reasons, including ‘Security Research’ or ‘Research (non-security)’, which can be used by investigators, journalists, activists and disinformation researchers. In the form, you need to explain what use you will make of the information obtained. As explained by ICANN, the request doesn’t guarantee access to registration data. At the same time, access to the data is not guaranteed if a privacy or proxy service is used. This video explains the steps for sending and managing the RDRS requests.

Now, let’s return to our attempts to find out who is behind a website.

Case Study

1.Who is behind the pro-Kremlin website sputniknews.com?

2.If you run a search on Whois.com, you will find out very easily (see below screenshot and this Wayback Machine archive of the page)

Screenshot of search for sputniknews.com registration data on whois.com. Also see full search results in this .pdf file

3.Registrant name is Rossiya Segodnya. If you search on a search engine about Rossiya Segodnya you will find out that it is “a media group owned and operated by the Russian government”.

Whois tools and guides:

Whois Domain Tools – One of the best known Whois record search engines. Simply enter your domain, pass the captcha test and view the information on your own site.

Whoisology – I like Whoisology especially for “connected domains” info that let you visualize at a glance what are other domains connected to the domain you are investigating. Note that ccTLD (country code top-level domains) data is only available for paid/premium members – meaning that searches for websites ending in ‘.eu’, ‘.ro’, ‘.fr, etc (e.g. lemonde.fr) aren’t available in the free version.

Whoxy – Whoxy visualizes very useful historical data about change in ownership or servers, related to the domain. Other services let you pay for accessing these data.

Whois basic guide: “How to See What’s Behind a Website”, Exposing the Invisible: the Kit.

Now it’s your turn. Conduct a Whois search:

Conduct a search for the websites you identified in the previous step / task and/or you can try with the following domains: Naturalnews.com - Newspunch.com - InfoWars.com – WeloveTrump.com

Conduct the search on different tools: Domain Tools, Whoisology (looking for connected domains) and Whoxy (looking for historical data)

3.Follow the money!

Considering that Whois data can remain obscured due to privacy laws and regulations, unmasking the entities behind disinformation websites becomes a challenging endeavor. However, a key aspect that often remains less protected and more revealing is the trail of advertising and traffic data. These data can provide crucial insights, as many disinformation sites fundamentally rely on the monetization of their content and the influence they wield through viral content.

Tracing the financial pathways of these sites often reveals much about their operational structures and objectives. Unlike the obscured registrant details in Whois records, advertising and traffic data are usually more accessible. This is because these sites, in their pursuit of revenue and impact, leverage advertising platforms and traffic analytics tools to maximize their reach and profitability. By analysing this data, one can identify patterns and connections that are otherwise invisible.

The strategy of ‘following the money’, a time-honoured technique in investigative journalism, proves highly effective in this domain. It involves scrutinizing the advertising models these sites employ, such as which ad networks they are part of, or the kinds of advertisements they display. This analysis can lead to an understanding of their funding sources and, by extension, their potential affiliations and motivations.

Websites usually use different tracking IDs that are publicly visible in the source code of their home pages and of their other webpages.

Brief explanations of key terms mentioned above:

Webpage source code

A webpage that you see in your browser is a graphical translation of code. Webpages are often written in plain text using a combination of scripting languages such as HTML (HyperText Markup Language) and JavaScript, among others. Together, these are referred to as a website’s source code, which includes both content and a set of instructions, written by programmers, that makes sure the content is displayed as intended.

Your browser processes these instructions behind the scenes and produces the combination of text and images you see when accessing a website. Your browser will let you view the source code of any page you visit with a simple combination from your keyboard: “ctrl+U” on most Windows and Linux machines or “command+U” on Mac. For additional tips on checking a webpage sourcecode, check out this guide on how to read source code.

See more in this ETI Kit guide: “How to See What’s Behind a Website”.

Trackers:

Trackers are tools or software used by websites in order to trace and collect data about their visitors and how they interact with the site. This article from the New York Times details an experiment of the author who, after visiting 47 websites, gathered evidence of being tracked by hundreds of trackers.

Tracking IDs are used for identifying specific services active on the websites, and which have different uses / purposes:

Website traffic tracking tools, such as Google Analytics or Adobe Analytics;
Advertising tracking tools, like Google AdSense, Google Tag Manager, Facebook Custom Audiences by Meta, Criteo;
Website behaviour tracking tools, services that show where users click, like Hotjar or Mouseflow among many others.

You can have different types of IDs, but the most relevant for our investigation are two Google properties:

Google Analytics: ID for traffic monitoring, usually identified by IDs as UA-xxxxxx or GTM-xxxxxxx (where ”xxxxxx” is replaced by a unique number/ code given to the website owner that creates an account on the specific ads platform)
Google Adsense: ID for advertising monitoring, usually appearing with the ID CA-PUB-xxxxxxxxxxx

Other useful IDs are:

Amazon affiliate
Sharethis
Email
Facebook app ID

These tracking IDs are usually visible in the source code of the website. You don’t need to be a developer to find these tracking IDs. There are two simple ways:

1.View the website source code (”ctrl+U” on most Windows and Linux browsers / “command+U” on Mac) and Search for “UA-” or “ca-pub” with ctrl+F or “command+F”
2.Use a tool that can automatically detect tracking IDs in a specific domain (see section below)

Important!

As of July 1 2023, Google deactivated the familiar “UA-” ID format as part of the launch of Google Analytics 4 GA4.

Publishers no longer have to add a UA- ID to their site to use Google Analytics.

As explained by investigator Craig Silverman below:

“Google does not require a site to remove its existing UA- ID as part of the migration to GA4. A legacy UA- ID remains on a site unless an owner chooses to remove it.

Google Analytics 4 now uses the G- ID, known as the Google Tag. This was already in use prior to July 1 and has been collected/tracked by core services like DNSlytics. “We are collecting AW-, DC-, G- and GTM- IDs since Q4 2022,” Paul Schouws, who runs DNSlytics, told me.

Google is eliminating the suffix that was part of UA- IDs. This was the number that followed the core ID. For example, the suffix in this ID is “-3”: UA-3742720-3. If the suffix was greater than one, it typically meant an ID was used on multiple sites.

I checked a few news sites and my very unscientific sample revealed the old UA- IDs were removed and replaced by a G- or GTM- ID. GTM-ID is linked to Google Tag Manager, a product used to manage various tags/IDs. Don’t be surprised if you see GTM- on a site instead of UA- or G-”. (Source: “What the rollout of Google Analytics 4 means for website investigations”, by Craig Silverman, 11 Jul 2023)

You can also check Google’s clarifications as to what each Google ID means and how they get replaced: https://developers.google.com/tag-platform/devguides/existing (archived with Wayback Machine here)

Let’s return now to our case study and practice.

Case Study

1.Let’s check what tracking IDs we can find on the domain historyofvaccines.org

2.You can view the website source code (”ctrl+U” or “command+U” depending on device) and search for “UA-” or “ca-pub” tracking IDs. You can also run searches for the above-mentioned more recent IDs (see Tip re GA4 above).

Screenshot: source code search for ads tracking ID on historyofvaccine.org, taken on 21 December 2023 by ETI.

3.Copy the tracking ID and paste it on Builtwith (you need to create a free account and log in to use the functions) > Go on the Relationship tab (Link: https://builtwith.com/relationships/historyofvaccines.org / or see screenshot below)

Screenshot: search for “historyofvaccine.org” on builtwith.com”, taken on 15 December 2023 by the author.

In most cases, you can also copy and paste the domain name on the tools (see section below) and they will automatically show you all the tracing IDs active on the website.

Tools:

There are different tools for finding tracking IDs. Starting from a single website, they can help you reveal which other websites are using the same IDs, meaning that these websites are in some way connected to each other.

Most tools have a “freemium” policy: you can search for free only a limited number of domains per day. For this reason it is useful to check the same ID on different services, because they have different databases and different pricing options.

The following services are the best options for finding the tracking IDs on a specific website.

Builtwith – It’s the best option to start with. The “Relationship profile” tool is very useful for finding shared IDs and trackers. Example: welovetrump.com (Link: https://builtwith.com/relationships/welovetrump.com / or screenshot below). This tool requires you to create a free account in order to find website details (it’s recommended to use a dummy email address for such purposes, unrelated to your personal or work accounts.)

Screenshot: search for “welovetrump.com” on builtwith.com”, taken on 23 May 2023 by the author.

Dnslytics – It has a very good database. Lets you “reverse” different kinds of information present on a website: Adsense, Analytics, IP, Mail server, Name Server (see screenshot below for an example with thegatewaypundit.com). You have a free version, or a monthly pass fee.

Screenshot: of search results for thegatewaypundit.com on dnslytics.

SpyOnWeb – You can use this tool for confirming or expanding results you get on other services (see link: https://spyonweb.com/thegatewaypundit.com / or screenshot below). It seems to have a smaller database than other services, but sometimes it helps you find new information. It is not necessary to create a user account to use the tool.

Screenshot: of search results for thegatewaypundit.com on SpyOnWeb.

AnalyzeId – Very good tool. It shows you results on a table, with a lot of IDs connections, including Amazon affiliate, Sharethis, email, Facebook apps, and a confidence rating (see link: https://analyzeid.com/id/thegatewaypundit.com or screenshot below). You can also export data in .csv format. The only problem is that it doesn’t show you all the results. To see everything you need to subscribe to a paid monthly plan.

Screenshot: of search results for thegatewaypundit.com on AnalyzeId.

The Wayback Machine – When you don’t find any ID on a website, you can try to do a search on Wayback Machine (on the Internet Archive) and see if you find any ID in an older version of the website. This is because Wayback Machine also stores the source code of a page and so you can easily retrieve an ID that is no longer available. Here is an example from “The Gateway Pundit” story (see link: https://web.archive.org/web/20220203203846/https://www.thegatewaypundit.com/2022/01/japans-kowa-company-finds-ivermectin-effective-omicron-covid-variants-phase-3-trial/ / or screenshot below).

Screenshot: of search results for thegatewaypundit.com on Wayback Machine.

Wayback Google Analytics Tool – At the end of 2023, OSINT investigations collective Bellingcat launched its “Wayback Google Analytics Tool”, which aims to make the identification and collection of Google IDs from archived websites more efficient. Read about the tool and how to use it here: “Using the Wayback Machine and Google Analytics to Uncover Disinformation Networks”, Justin Clark January 9, 2024 (note that this tool requires basic skills of using command line and Python but the project’s github page provides support and further guidelines for using it).

Now it’s your turn. Follow the money:

Conduct a search on the websites identified during Step 2, using the tools introduced before - Builtwith.com; Dnslytics; SpyOnWeb; AnalyzeId; Wayback Machine - and/or you can try with the following domains: Naturalnews.com - Newspunch.com - InfoWars.com – WeloveTrump.com

Collect available data highlighting websites having the same IDs and start to cluster the domains sharing same IDs.

4.Monitor social amplification

By social amplification I mean the process of distributing content to as many people as possible via various channels and strategies. It is a form of distributed amplification, “a tactic whereby a campaign operator explicitly or implicitly directs participants to rapidly and widely disseminate campaign materials, which may include propaganda and misinformation as well as activist messaging or verified information via their personal social media accounts” (Media Manipulation Casebook).

Most disinformation networks exploit social media and messaging apps as a primary method for amplifying their messages and reaching specific niches of audience. At this level, the “network effect” is particularly effective, because the algorithmic infrastructure of social media positively rewards the building of coordinated actions, as for example distributing the same content on different social media platforms and on various accounts within the same platform. You can find many other case-studies of networked disinformation on the Media Manipulation Casebook, as for example copypasta or distributed amplification for sharing the Plandemic documentary.

By tracking how specific content gets distributed and amplified on social media, you can easily identify channels and actors that, both intentionally or not, contribute to amplifying disinformation campaigns. When analysing social amplification, it is important to note that not all the actors involved in the distribution of a specific content may be aware of spreading disinformation or of being part of a bigger media manipulation campaign.

Warning:

Crowdtangle tool closing down!

As of March 2024, it was announced that CrowdTangle will no longer be available after August 14, 2024, see details in this CrowdTangle website update.

Case Study

Let’s see what social media accounts are amplifying the content of the website historyofvaccines.org (see screenshot below).

Screenshot: of Crowdtangle results for historyofvaccines.org, taken on 15 December 2023 by the author.

1.You can go on the home page (or on a specific webpage) of historyofvaccines.org and click on the Crowdtangle plugin in the browser (see the next section about installing/using Crowdtangle).

2.After logging in with your Meta / Facebook account (it is recommended to create a dummy account for research purposes), you will visualize Facebook pages and Instagram accounts that are linking to the home page of the website historyofvaccines.org

3.You can also download all the data in .csv format by clicking on “download” (see screenshot below).

Screenshot: of Crowdtangle results for historyofvaccines.org, taken on 15 December 2023 by the author.

Tools:

Buzzsumo – There are several “social listening” tools, such as Buzzsumo or Talkwalker, that can be very useful for conducting searches on how social media amplifies content coming from specific websites. But these services are usually not free, and can be very pricey. For a specific investigation you can try to use a free 30 day demo of Buzzsumo and use its Content Analyzer (no credit card required) to analyze what is the most shared content published by a specific domain you are working on. By this way you can easily discover which social media accounts are contributing to spreading that message. Here’s an example of a search with “ivermectin” on Buzzsumo Content Analyzer. You can get a list with the most shared stories with this keyword. Among trusted results, you can find also controversial sources as Bitchute.com and Zerohedge.com (see screenshot below).

Screenshot of a search for “ivermectin” on Buzzsumo Content Analyzer, taken on 5 January 2024 by the author.

Screenshot of a search for “adrenochrome” on Buzzsumo Content Analyzer, taken on 5 January 2024 by the author.

Crowdtangle – A free alternative to Buzzsumo is the Crowdtangle Link Checker plugin for the Chrome and Brave browsers. After installing it on your browser, let you check which are the main drivers of traffic to a specific page of a specific website. Crowdangle shows you Facebook, Instagram, X (Twitter)and Reddit pages and accounts that are amplifying that content. It is also possible to download the data in csv format: this can be very useful for more in depth analysis. – NOTE that this tool requires you to have an active Facebook account in order to function, as you need to be logged in when using it. For safety and privacy reasons, we recommend that you make a dummy/temporary Facebook account that you can use with this tool.

Now it’s your turn:

Install the CrowdTangle Chrome Extension on your browser (Chrome or Brave + a -dummy- Facebook account), if not already installed.

Run multiple searches on the most interesting website(s) previously identified in step 3 / “Follow the Money” (using the CrowdTangle extension)

Collect linked social accounts data.

Start your first investigation

The methodology shared in this guide can be easily replicated in specific languages or topics that interest you. To help you start your first survey, we provide you with this template (Excel sheet) that allows you to keep track of the information you collect and to analyse the clusters of disinformation that emerge. In each column you will find indications of the aspects to be taken into account, both those related to a specific domain (Whois, IPs, tracking IDs) and the broader social network disseminating a piece of content.

As a sample, you can chech this pre-filled template (Excel sheet) illustrating the 4-step methodology with an example taken from an investigation I have conducted in Italy.

As mentioned earlier, the 4-step methodology often requires you to move from one step to the next and then back or forward to connect the dots as they emerge.

Often one can get lost in a sea of information that does not seem to be connected and will have to start again. But if one does a good job of data collection and analysis, then it will be easier to glimpse and expose those networks of misinformation that are invisible at first glance.

Now it’s your turn:

Pick the most interesting case from your previous research.

Expand the search of connections between your website and any related social media accounts.

Collect identified data on (NoD) Sheet (Use the “Notes” column of the NoD sheet for any additional remarks or questions / challenges to address in the Debriefing.)

Considerations

In our quest to identify and expose the “suspects” within disinformation networks, it’s important to maintain a balanced perspective. Not every entity or individual we investigate is necessarily engaged in suspicious or malicious activity. Conversely, some seemingly innocent sources may harbour hidden agendas. It is therefore imperative that our approach to these investigations remains ethical and transparent. This includes rigorously verifying information, avoiding assumptions and being aware of our own biases.

By adopting this approach, we ensure that our findings are not only accurate, but also fair and trustworthy, contributing positively to the wider discourse on disinformation and media manipulation.

Below are some key considerations when conducting investigations, or any research by that matter.

Safety First

Prioritize your digital and physical safety and that of your sources, peers and data. Use secure communication and information collection channels, protect your devices, data and communications with strong passwords. Be mindful when choosing your digital tools for various purposes and conduct risk assessments when working in new as well as in seemingly familiar contexts. Consider adopting partial anonymisation tools like Virtual Private Networks (VPNs) and dummy email/online profile accounts to safeguard your identity, especially when researching sensitive or potentially dangerous topics. Researching and tracking disinformation networks often requires you to create accounts on various social media channels, data analysis platforms and web-tracking tools, therefore your identity will often be more vulnerable to exposure in such contexts.

You can read more about adopting basic safety measures, conducting risk assessment and choosing tools in context in these Tactical Tech articles:

Ethics

Always adhere to journalistic and research ethical standards. This means respecting people’s privacy and choices, avoiding deceptive practices, and ensuring your research methods don’t harm your information sources or the subjects of your investigations. Before exposing the identity of people behind disinformation networks, get in touch with them, try to understand why they do this, and include their viewpoint in your investigation if you can obtain one.

Methodological Rigour

Validate the credibility of your sources and cross-check information you obtain rigorously. Remember that not every piece of information can stand as evidence and you may get in legal trouble if you cannot back up your statements. Be cautious of confirmation bias, look for evidence that may disprove your hypotheses as well as prove them.

Evidence Safeguarding

When investigating websites or any other digital content, regularly archive web pages and digital artefacts. Tools like the Wayback Machine, Archive.is, Perma.cc, combined with screen-capture software can help preserve evidence that might later be deleted or altered. In addition, good evidence preservation practices can protect you from possible attempts to discredit your work and can support your claims if you are challenged in court by those whose malicious actions you seek to uncover.

You can read more about methods and tools to archive and safeguard digital evidence in this Exposing the Invisible Kit guide: “Retrieving and Archiving Information from Websites”.

Transparency

Tracking online disinformation networks is rarely a straightforward process, you will need to navigate various paths, combine tools and pieces of evidence, and identify connections in a very volatile digital space. Be clear about your methods and information sources when presenting your findings. Trace your own investigative process thoroughly and demonstrate your research steps in a way in which others can follow and replicate them. This transparency builds your public’s trust and ensures credibility of your work.

Wellbeing

Investigating disinformation networks and their influence can be mentally charging. Be aware that you might experience some level of emotional impact in time, and try to take preventive steps to maintain your mental health and wellbeing. This can include regular breaks and relaxing activities that help you disconnect, open discussions with peers, or professional support if necessary. You owe this care to yourself. Do not wait until the stress or tiredness you experience lead to burnout. The world needs healthy and sane investigators, now more than ever.

Most importantly, remember that in the field of investigating disinformation, not everything is “black and white”. It’s about navigating those grey areas with a commitment to uncovering the evidence behind and ultimately revealing the truth, while maintaining the ethical and methodological standards that define responsible research and fact-finding in the public interest.

Published in February 2024

Resources

Research Templates

Research Template (Excel sheet) based on this guide, and which allows you to keep track of the information you collect and to analyse the clusters of disinformation that emerge.
Sample, pre-filled template (Excel sheet) with the 4-step methodology from the guide, based on an example taken from an investigation conducted in Italy.

Articles and Guides

“Retrieving and Archiving Information from Websites”, guide from Exposing the Invisible: The Kit, Tactical Tech
“What the rollout of Google Analytics 4 means for website investigations”, by Craig Silverman.
“Disinformation as a context-bound phenomenon: toward a conceptual clarification integrating actors, intentions and techniques of creation and dissemination”, by Michael Hameleers, Communication Theory, Volume 33, Issue 1, February 2023, Pages 1–10.
“How to See What’s Behind a Website”, guide from Exposing the Invisible: The Kit, Tactical Tech
“Media Manipulation Casebook”, project of the Technology and Social Change project (TaSC), Shronstein Center on Media, Politics and Public Policy at Harvard Kennedy School.
“You Are Here. A Field Guide for Navigating Polarized Speech, Conspiracy Theories, and Our Polluted Media Landscape”, by Whitney Phillips and Ryan M. Milner, March 2, 2021, The MIT Press.
“Using the Wayback Machine and Google Analytics to Uncover Disinformation Networks”, by Justin Clark for Bellingcat, January 9, 2024.

Glossary

term-datavoids

Data voids – These occur when searches for specific, often obscure or emerging, topics result in limited, misleading, or low-quality information. Exploited by malicious actors, data voids can be filled with disinformation or conspiracy theories.

term-fimi

FIMI (Foreign Information Manipulation and Interference) – A term used to describe strategic and coordinated efforts by one country to interfere in the domestic affairs of another through the manipulation of information. This includes spreading disinformation to influence public opinion or policy decisions.

term-sourceode

Source code – The underlying code, written by computer programmers, that allows software or websites to be created. The source code for a given tool or website will reveal how it works and whether it may be insecure or malicious.

term-tor

Tor Browser – a browser that keeps your online activities private. It disguises your identity and protects your web traffic from many forms of internet surveillance. It can also be used to bypass internet filters.

term-trollfactory

Troll factories – Organizations or groups that use multiple online identities and accounts to spread disinformation, manipulate public opinion, or harass individuals. Often state-sponsored or politically motivated, these entities play a significant role in information warfare and disinformation campaigns.

term-vpn

Virtual Private Network (VPN) – software that creates an encrypted “tunnel” from your device to a server run by your VPN service provider. Websites and other online services will receive your requests from - and return their responses to - the IP address of that server rather than your actual IP address.

term-webpage

Webpage – a document that is accessible via the internet, displayed in a web browser.

term-webdomain

Web domain – a name commonly used to access a website which translates into an IP address.

term-webtracker

Web tracker – tool or software used by websites in order to trace their visitors and how they interact with the site.

term-whoisprot

Whois protocol – A protocol used to query databases that store the registered users or assignees of internet resources, like domain names. It’s instrumental in identifying the entities behind websites, although privacy laws can sometimes limit the available information.