- Investigating Political Advertisements on Facebook

By Manuel Beltrán and Nayantara Ranganathan

IN SHORT: Two investigators trace their process of creating the project, an open-research-style investigation into political advertising on Facebook. This case study traces the project’s evolution - with its revelations, roadblocks, techniques and tools - from a limited endeavour to a growing resource that facilitates investigations on political ads around the world.

This is about how we came to publish, a project uncovering information about the political advertisements (ads) published on Facebook and Instagram.

Much of the advertising on the internet is personalised, a method commonly referred to as targeted advertising. Targeted ads use psychological, behavioral and other kinds of information about people to present each of us with a different message. The difference may be in the content, design, delivery or other characteristics. The tailoring of messages is complemented with information about its effectiveness, allowing for the testing of assumptions created during the psychological or behavioural profiling.

Because politicians today use online platforms to advertise their campaigns and promote their messages, and these platforms deliver the ads on the basis of personal information, the phrase “the personal is political” has gained a whole new dimension. Ads are personalised according to our personal data and behavioural patterns, and we receive these targeted ads on our personal screens. However, understanding these mechanisms is not straightforward. The advertising delivery infrastructure operates invisibly, with platforms going to great lengths to obfuscate their mechanisms and establish many barriers to understanding how they function. Often, even advertisers don’t get the full picture of who their ads reach, or why.

We are Nayantara and Manuel, and we created to understand the new channels, over which propaganda is created and circulates online.

Data on political ads on social media platforms compiles political ads posted on Facebook and Instagram from over 300 political actors in 39 countries. It includes visualised data and stories that allow you to browse through these ads and analyse them individually or as datasets per country, political party and other categories. The project emphasises that in order to understand how political propaganda functions online, we cannot simply study how it looks or what it says. Crucially, we must also pay attention to how it circulates. That means looking at information about how ads cater to different categorisations of people – according to where they live, their age, their perceived gender and so on.

At the time of writing this methodology case in the end of 2019, it has been more than five months since we released the project. The responses to it have been overwhelmingly positive, and it has been used by journalists covering elections and social media to investigate issues in a number of countries.

“I’ve been an investigative reporter for a long time, and data like this are essential for reporting on campaign activity in the digital age. Only looking at contributions and expenditures in disclosure reports isn’t enough anymore,” posted a user who shared the link to on HackerNews.

The project has been featured in news outlets and tech reporting sites like Vice, Huffington Post and The Hindu. Television media like SkyNews in the UK and ORF in Austria used the data to uncover stories about UK politics and the Austrian elections. It has also fed into ongoing conversations on political ads and platform policies and has been presented in art exhibitions in The Netherlands, Argentina, Norway, the United States and India.

But we did not get to this point easily. The questions we asked ourselves changed over time, and our expertise and skills didn’t always match the tasks we wanted to complete. Trying to understand the operations of such an obscure infrastructure of an exceptionally powerful company was a mammoth task.

Since we did not have a specific goal at the outset, we followed threads that were interesting for us and tried getting around obstacles. We therefore narrate the story of the investigative journey of through different practical, ethical, technical and political considerations.

Making sense of the Indian elections

We were in India in April 2019 when the Indian general elections began. It was supposedly the largest exercise of democracy in history. Because of the size of the country, with its population of over 1.3 billion people and the large number of Parliamentary seats (545), elections were to happen in seven phases over a one-month period from 11 April to 19 May 2019. Different regions had elections on different days.

As we were both curious about technology and politics, we began to observe the role of the internet, data and social media companies in the elections. We became particularly interested in understanding how forms of new media were functioning during the electoral process. The role of TV, newspapers and other traditional media for political campaigning is fairly well understood – it has regulations that apply to it, and has been the subject of popular culture for a while. But there was little public awareness and insight into the relationship between social media and targeted political advertisements and how that would be approached by regulators, by the public and by curious researchers like us.

Social media platforms have grown quickly and in some cases dominate the media ecosystem during elections. Internet phenomena like “trending topics” (topics that are popular at particular times) also influence the direction of reporting and debate on traditional media. Yet these phenomena – and how they are generated and how they spread – are rather opaque. This makes it difficult for lawmakers or the general public to have appropriate responses. This state of affairs was a motivation for us to focus on election propaganda on social media.

Because the Indian electoral law was put in place long before the use of personalised digital ads, we had to wonder how these laws would be enforced on social media platforms, particularly with the staggered nature of elections. One particular detail we were curious about was whether the ‘silence period’ would be observed on social media and how the Election Commission of India would monitor this.


The silence period is a time just before the election days when political parties are not permitted to campaign, organise rallies or publish ads. This is meant to give voters a period where they can reflect and make up their mind without additional influence from campaigning that might affect their voting decision.

Propaganda circulates in India over a vast range of platforms. The usual suspects like WhatsApp, Facebook, Instagram, Google and Twitter were being used in interesting ways, but others like ShareChat and TikTok were also home to vast amounts of organised and unorganised propaganda. We started keeping an eye out for different election-related measures taken by these platforms, as well as by food delivery and fintech (finance technology services) apps like Swiggy, PayTM and others.

We spent some time researching how internet platforms were dealing with the use of their platforms for political messaging and advertising. We found different approaches: by mid-2019 TikTok introduced a feature by which search results and popular political hashtags would have a public service announcement asking people to guard against ‘fake news’; others, like Google, required certification from local authorities before an ad ran on Google’s advertising platforms. After several scandals about their role in the manipulation of elections, Facebook was under a lot of pressure to take preemptive action. Screenshot: Some hashtags like on Tiktok like #Ambedkar appeared with a “Public Service Announcement” calling for responsible behaviour, while others like #chowkidar that aggregated incendiary content directly relevant to the elections did not have the same PSA. Date: 3 May 2019. Source: Nayantara Ranganathan.

There was no single, clear guideline or approach that all the platforms were sharing. Instead, each company was testing the waters to see how it would deal with the Indian elections and the flood of political content.

In order to explore this further, we had to decide which digital platforms to monitor. As we didn’t have the time or resources to investigate them all, we decided to focus the preliminary part of the investigation on Facebook, because we knew that it was one of the most widely used online advertising platforms in India.


With more than 241 million active users every month (as of early 2020), Facebook has more users in India than in any other country. Facebook happens to be one of the most widely used social media platforms in India after WhatsAapp, which is also owned by Facebook (at the time of writing from late 2019-early 2020).

Starting this investigation wasn’t easy. Facebook had recently rolled out tools that aimed to make political ads more transparent in India and elsewhere. Screenshot: Facebook’s announcement about introducing “Ads disclaimer,” “Ad Library” and enforcement mechanisms. Date: 10 June 2019. Source:, archived link here.

Facebook’s Ad Library provided a way for people to browse political ads through keyword searches and page name searches. But as we started exploring it, we found problems with it: the search queries did not return comprehensive results – just a selection of ads whose logic was unclear to us. Furthermore, there was no way to find the exact copy of the ad you might have noticed in the tool earlier, as the ads were not tagged with any unique identifiers to trace them.

Our intention was to retrieve the data to monitor the silence period across the seven-phase elections, which meant that we needed to filter ads both by region and by the period for which the ads were running. Such an investigation was beyond the scope of what the Ad Library was able to offer. (Months later, the Ad Library tagged every ad campaign with unique numeric codes or ‘identifiers’ for traceability.)

This pushed us to look for alternative ways to monitor the ads for the purposes of our inquiry. We came to the conclusion that we needed to build our own database of ads.

After digging further we found an unknown and barely documented tool called Facebook Ad Library API (Application Programming Interface), which had just been released at the time that we looked into it in 2019. All the places where one would look for information about the tool back then – Github, Reddit and such fora – were silent about the use of the API. Facebook made little effort to document and communicate about it. We hoped that finding our way around the API could potentially allow us to build our own database in a programmatic way (although neither of us had advanced programming skills) and ask more meaningful questions of the data.

Gaining access to the API involved several demanding steps:

  • we had to have a Facebook account

  • this account had to have the exact same name as our official government-issued documents

  • We had to add a phone number to our account and enable 2 factor authentication

  • we had to submit a copy of our proof of identity from a selected list of documents that Facebook accepts, such as a driver’s license, etc.

  • and finally, depending on our nationality, we had to follow additional steps

This list differs from country to country.


In the course of this process, we faced a few difficulties while getting verified, including a house visit, but we will come to that later.

As challenging as it was to get verified, it was equally challenging to learn how to use the platform once we gained access to it.

Getting familiar with the API

We finally made it to Facebook’s API, but at this stage, we didn’t know much about how to use it. As you will see, in the course of learning to use it, we encountered questions about how to create queries but also how to re-purpose the results into a form that would be useful. Screenshot: The Facebook Graph API. Date: 10 June 2019. Source: Nayantara Ranganathan.


I spent many days understanding how to create queries and what it was possible to query in the first place. In the absence of proper documentation of the API, and without much discussion about it on other fora (perhaps because it was newly released), this was all new and uncharted terrain. In order to use the platform, I had to devise a query (a list of coded questions) that asked Facebook for the data I needed.

In the beginning I started querying ads by the &search_terms parameter, which returns all the ads containing the term in question. For example, by querying &search_terms='modi' I received all the ads containing the text ‘modi’ (i.e. Narendra Damodardas Modi – candidate in 2019 Indian elections, and Prime Minister of India since 2014 and as of 2019 when writing this piece) and shown in the countries specified in the mandatory &ad_reached_countries=['IN'] parameter. The countries are specified through a two-letter ISO country code, IN for India, in this case.

By querying the API through the search terms we were able to obtain all the different ads about a particular term. However, this wasn’t suited to our task of comprehensively understanding all the political ads purchased by a particular party in India, as the most extensive keyword search would always leave some ads out of the scope. Besides, sometimes the ads were in languages other than English, or contained unicode characters like emojis that we could not anticipate.

Instead of searching through the keyword method, we tried using the parameter &search_page_ids=, but we soon learned that searching for pages and searching with keywords were not compatible and could not be used simultaneously. We decided to drop the keyword method and went with using Page IDs as the entry points. What this allowed us to do was collect all the ads on a Facebook page within a given country. We discovered that up to 10 different Page IDs could be added in one query. This led us to consider the possibility of solving the problem of collecting all the ads by political actors by manually collecting all their Page IDs and constructing one query for each 10. The first step was to identify the political actors whose ads we needed to collect. In the case of India, this was done by Nayantara, since she was familiar with the context. We then verified it with other people in our networks. Screenshot: Page of YSR Congress Party on Facebook Date: 29 November 2019. Source: Nayantara Ranganathan.

Once we had the full list of main political candidates and political parties we were interested in, we needed to search online for their Facebook pages to obtain their Page IDs – a number that uniquely identifies each FB page. This involved a dedicated process for each page.

First, we needed to find the political actors’ pages on Facebook. In some cases, the pages were easy to find; other times we had to discern if they were fan pages or official pages. Some political actors had multiple pages; some pages were constantly changing names because of coalitions or due to rebranding; and sometimes there were no pages for some parties and candidates. This meant we had to do a lot of work to understand whether we had arrived at the right page. Posts of these pages were sometimes in regional languages, and in India alone there are 22 official languages and nine different scripts. And naturally, this challenge grew increasingly complex when the project grew, as we will see later.

Once we collected the Facebook pages, we followed a simple but repetitive task:

  • Right-click on the profile picture and open it in a new tab.

  • Within this URL, find the Page ID number, select and save it. In this way, we mapped, collected and verified the Page IDs.

Once we had this list of Page IDs, we composed our query, which looked like this:


As you can see above, we also included other parameters, which were mentioned in the otherwise sparse Facebook documentation of the API:

  • ad_active_status=ALL - which means we were looking for all the ads, active and inactive.

  • ad_reached_countries=['IN'] - which means that the country where the ads have reached must be India (IN).

  • limit=99 means that we only want 99 results. This parameter is extremely important as we will understand later.

  • and the fields we wanted to get responses to are what we have after &fields= directive:


Facebook later incorporated the &platform_publisher parameter, which is meant to tell us if the ad appears on Facebook, Instagram, Messenger or Audience Network.


Audience Network allows third party apps to use Facebook’s targeting infrastructure to deliver ads on platforms outside of Facebook and Instagram.

For each ad, the responses we sought were the following:

  • ad_snapshot_url - the ad snapshot URL (the unique identifier of each ad)

  • ad_creation_time - the time at which the ad was created

  • ad_creative_body - the creative body of the ad (content usually in the form of text and sometimes emojis)

  • ad_creative_link_title - links that might be added in the content (often these lead to surveys where more data is asked for)

  • ad_creative_link_caption – captions that can accompany the links (like ‘Learn More’ or ‘Click Here’)

  • ad_delivery_start_time - the time at which the ad was scheduled to be delivered

  • ad_delivery_stop_time - the time at which the ad was scheduled to stop being delivered

  • currency - the currency in which it was paid for

  • demographic_distribution - the ages and genders of the people to whom the ad was delivered

  • funding_entity - the entity that paid for the ad

  • impressions - the number of times the ad might have been viewed

  • page_id - the page identifier (ID),

  • page_name - the name of the page

  • region_distribution- the regions in which the ad was delivered

  • spend - a range of money that was spent on the particular ad

In the query shown above, it meant that we were looking for ads, both active and inactive, that reached India, of ten political actors, up to a limit of 99 ads. Screenshot: Inputting the query above on returns data in this form. Date: 30 November 2019. Source: Nayantara Ranganathan.

This query returned some data, and it was our first step towards gaining an understanding of what the data looked like.

Even by this stage, we had only managed to collect data on the 99 most recent ads. Facebook allowed for collection of up to 5,000 ads, but the likelihood of the query crashing increased dramatically as we raised the number of ads requested. It would happen so often that the browser tab from where we visited the Ad Library API would freeze when we made a large query. So we stuck to queries that returned between 99 and 200 results. We needed to find a way to get beyond the 99 results that were practical to collect, but also beyond the 5,000 that was theoretically offered in order to collect all the ads.


The API queries would unexpectedly crash for unknown reasons, sometimes working, sometimes not, and sometimes delivering different errors for the exact same query. By this time, it became obvious to us that Facebook’s lack of interest in documenting the API was paired with possible deterrents for people to collect data systematically. The scale of the barriers made us wonder if the platform was intentionally blocking us from accessing the data, and if the release of the Ad Library and API was a way to whitewash transparency concerns with a purposely unsuited tool. It also occurred during a period in which Facebook was cracking down on a lot of software companies for collecting data from their regular API. These bans can be seen as a strategic move in the aftermath of the Cambridge Analytica scandal, where Facebook used the scandal to limit transparency under the banner of protecting user’s privacy. Besides, the effectiveness of user targeting and the competitive edge of the Facebook advertising platform logically depends on minimal disclosure about its internal workings.

At the bottom of the page of results for the query, we found a URL with a ‘pagination’ number. This link took us to a different URL where the next page of (99) results appeared. And so on. The key part of the pagination was in the returned parameter &after=. A unique sequence of numbers and characters were given that pointed to the next page of results for the same query. We found that if we included this pagination in our next query, it would lead us to page 2, then 3, etc. It looked like this:


There was no way of knowing how many pages of results there were going to be until we went through all of them. But in this way we could at least manually go through each page of results and copy the data from the browser to a text file. The data was given in JSON format, and every page contained the syntax of the beginning and the end of the file. This meant that if we stored the data as it came, we would have had to create one file for each page of results. We quickly realised that saving the results of every 99 ads in a separate file would be an extremely arduous and inefficient task. So we looked for a way to copy the result of the JSON query without the header and footer.


The header means the portion of the response that indicates that the data begins at that point and is contained below. This is how it looks for the JSON responses by the Ad Library API: { "data": [

And the footer, which that indicates that the data ends there, looks like this: ] }

From the first page of a given query, we stored the header and the content, but not the footer. From all the following ones, we saved just the content without the header and footer, and when we reached the last page of the results we only collected the content and the footer. We could identify the last page of the results because on the final page, there would be no more pagination links leading to the next page of results. This way we managed to have data about the 10 political parties and political actors in just one JSON file.

For this system to succeed, we needed to add a comma (,) and a line break in place of the footer that we removed. In this way we use the JSON syntax to indicate that instead of the end of the file, more ads would follow.

The pagination was useful in getting around the regular crashes of the API, as we used it to restart the data collection process from the page where it crashed. We spent time trying to figure out why these crashes were happening but we could not find clear reasons. Sometimes we’d retry with the same query and it wouldn’t crash. Overall it helped to use a low amount of results from the parameter &limit=, and to not be too fast in doing the queries.

We constantly had to walk the line between not getting blocked from ‘overusing’ the API while also avoiding the unpredictable crashes. The trial and error approach also created other problems for us, such as getting blocked by the rate limits of the API. Facebook had not specified the limits of use, but some undefined ones were clearly in place. Our first efforts to automate the collection of data with a Python script were deterred and it seemed as if Facebook was deliberately making it hard to automate the collection, and would give more chances of success when queries were made from smaller, slower, manual interactions.

The challenge at this point therefore became how to automate the collection of ads while still making it seem like a manual, human effort.

We developed a very mechanical and repetitive pattern to collect the data manually. The work flow looked like this:

  • load a query

  • copy the result except the footer

  • click the next page

  • while the next page loads, paste the data in a text file

  • add the comma and jump of line to keep the continuity of data in the syntax

  • switch tabs

  • copy the data without header and footer

  • paste it in the document

  • add the comma and the jump of line…


Shortcuts like ctrl+c, ctrl+v, and alt+tab helped make this mechanical process more efficient. To make the job of the API and my own actions efficient and synchronised, I tweaked the &limit= parameter to return data that would take the same amount of time to load in the browser as the time needed to complete my part of the workflow of copy-pasting data into the text file. The randomness of human behaviour circumvented the FB blocks and crashes most of the time. Nevertheless it was an extremely slow, repetitive and meticulous task. It became an endurance routine of spending hours every day just performing this manual collection of data, building megabytes and then gigabytes of JSON data through this method.

I became very efficient at it, to the point where I could do it without thinking, embodying the movements required, like riding a bicycle. The challenge was to keep myself sane while continuing the task. Through this system we eventually collected all the ads’ data on the political actors we were researching in India.

Violations of the silence period

The Indian electoral law imposes a silence period 48 hours prior to the day of elections – which means that all campaign activities, including online and offline advertisements, are banned for the two days prior to election time. If these rules were to be followed on the internet, our query to the Facebook Ad Library API should not have returned advertisements running from the 10th of April 2019 until the end of the day on the 11th of April 2019 (the day of the first phase of elections).

To see whether this was the case, we needed to find a way to make sense of the data we had collected. It was in JSON format, so we needed software that could handle and help analyse JSON data.


While looking up web forums, I learned about Tableau, a data visualisation and analysis software. I downloaded the trial version of Tableau Public, a software available for computers running Windows OS. Working with Tableau Public was not straightforward from the start. I needed some tips from various fora on the internet and explanations on YouTube to understand what one does after getting hold of data in JSON format.

But Tableau is not available for Linux, which is the Operating System I normally run on my computer. So to be able to use Tableau, I downloaded a virtualisation software called VirtualBox. Virtualisation software allows you to run a different OS than the one installed on your computer without needing to add another OS on your device. Screenshot: VirtualBox, an application that allows you to use your computer to virtually access another machine. Date: 20 July 2019. Source: Manuel Beltrán.

Once set up and installed, we used Tableau to make sense of the data. We had to play around with Tableau to understand how to make the software read the data we had, how to make it read multiple files of data, what the structure of such combined data would have to be, and generally what features Tableau offered.

We were able to ask questions like:

  • What ads were shown in a certain window of time?

  • Which political parties were showing ads in which regions?

  • What ads were old men in south India seeing?

One parameter that we thought would be helpful in understanding which ads were violating the silence period was "ad_creation_time," which records a date in this format: "2019-04-02T17:22:45+0000."

But after further study we realised that the date and time at which an ad was created did not necessarily coincide with the time when it was delivered to people. To understand when ads ran, the more useful parameters were "ad_delivery_start_time" and "ad_delivery_stop_time." This gave us the range of time within which an ad made its way onto people’s screens. Because of the multi-phase nature of the Indian elections (i.e. voting taking place at different times in different regions) we also had to match the silence period times with the corresponding regions that were going to the polls and therefore subject to these silence periods. To understand that, we used the parameter "region_distribution."

We filtered the data to show ads that were active during the silence period of a specific region and analysed them further. We started to see that the violations were not accidental or exceptional, but rather routine. Using the parameters of "ad_start_date" and "ad_stop_date," we identified ads that were running during the silence period. On top of this, we used the "region_distribution" parameter to isolate ads that were running in the silence period for particular states. Screenshot: A step in the process of understanding the number of ads in violation of the silence period using Tableau visualisation software. Date: 20 July 2019. Source: Manuel Beltrán.

In this way, we started to systematically collect all the ads that violated the silence period. At this moment we were already about to enter the fifth phase of the elections so we collected violations of the first four phases. We learnt that one of the most important metrics – the amount of money spent on ads – was provided only as a range. This complicated our efforts to determine the expenditure of political actors. But since this was such a crucial piece of information, we decided to arrive at a compromise and calculated the average of the range for every ad.

As expected, this information led to several interesting discoveries. Some political actors who had created thousands of advertisements had spent less money than political actors who had created fewer ads but with a larger spend. We understood that the number of ads is not always directly proportional to the amount of money spent. For our investigation on the silence period, it meant that even if one party had many more advertisements in violation, it was possible that another party had spent much more money on timed ads breaking the silence period. These were important observations as we deconstructed the meaning of these violations and their scope.

In total, we collected 2,235 advertisements in violation of the silence period in India during the first four phases.

Once we realised that we had concrete evidence of violations, we faced the question that many researchers and investigators struggle with: “What can we do about it?”

At this point, the country was still in the middle of the multi-phase election, with a couple of weeks to go before it concluded. Our options were either to notify Facebook of the violations or to notify the Indian Election Commission (EC). There was of course the third option: to continue documenting it and writing about it. However, without a media affiliation or a publishing platform, we were not sure that this would be the most impactful route.


Even though we did not really expect the Election Commission (EC) to be equipped or inclined to tackle this, we were more hesitant to notify Facebook about the evidence. Reporting these violations to Facebook would have meant that we placed the responsibility of ensuring the integrity of democratic processes on a foreign private company with a terrible record of accountability, instead of on a democratic institution. It would have meant that we would be actively entrusting Facebook with oversight functions that have a significant impact on our electoral processes. Although we imagined that reporting to Facebook would be a quicker way to get the ads taken down, as a matter of principle we decided to report it first to the EC and publish our findings to the public.

We filed a report with evidence of the data on the 2,235 advertisements in violation, categorised by the four phases of the elections along with a short explanation.

As expected, we received no response.

Along with reporting to the EC, we also decided to write about it. HuffingtonPost India agreed to run our story with the findings about the violations of the silence period by political ads on Facebook. Illustrative image accompanying the reporting. Credits: Manuel Beltrán and Nayantara Ranganathan.

The violations continued to take place in the remaining phases of the election and, to our knowledge, no action has been taken so far.


In the course of finding answers to our specific questions about the silence period, we found other interesting ways of reading the data through Tableau. One of the main challenges here was understanding how to structure the data. As the data we were collecting was in multiple files, we had to make sure that files were “stacking up” the right way.

We designed the data collection in such a way that there were cut-off limits for files on the basis of their size. This meant that if a political actor’s ad data was massive (like Donald Trump’s has been), then the file would contain fewer than the standard number of ten actors’ ad data, and instead just contain one politician’s data. We were able to get a grip on structuring the data through much trial and error, as well as with help from the Tableau community forums and YouTube videos about the tool. Since it is such a popular data analysis and visualisation tool, finding answers to such details was a matter of exploring already available fora.

Apart from the breaches of the silence period, we found other interesting details: the architecture of the information on political ads, the categories of information available as well as the data that was missing, were all revealing. For example, we learned that Facebook profiled people into gender binaries of male and female, despite allowing users to identify amongst a broad range of options on the user side. This was an interesting peek into how differently things work at the user-experience level versus how they work at the back-end infrastructure designed for marketing. Screenshot: Dropdown menu options that appear on Facebook account settings after you choose to set your gender as “Custom”. Date taken 29 November 2019. Source: Nayantara Ranganathan. Screenshot: Segmentation of Facebook users into the gender binary, visible in the process of creating ads or boosting posts. Date taken 29 November 2019. Source: Nayantara Ranganathan.

Once we managed to uncover the silence period breaches, the Indian elections were well underway, and the 2019 European Parliamentary elections were approaching. On this occasion, we also started digging into data on Spanish parties for the European Parliament elections and the Spanish municipal and national elections of 2019. Slowly, we began broadening the scope of our data collection and research.

Although we were now collecting and exploring Facebook ad data from various contexts, we were uncomfortable writing stories about countries that we had no context or knowledge of. Beyond that, we felt it was somehow incorrect to do so even if we’d had some understanding of the places or found collaborators from different countries.

But while staying in our lane, we also wanted to make this data in other countries accessible. We decided that we would collect the data available in all the countries, but leave the analysis and investigations to whomever might be interested in taking them up.

We connected with colleagues and friends in other parts of the world where elections were going on to see if the data we were collecting might be useful for them directly or if they were interested in advocating for releasing similar data in those regions, too. For example in the Philippines, Facebook is a crucial channel for the Duterte government’s disinformation campaigns. The Philippines also happens to be one of the labour markets for exploitative and traumatic contract jobs for content moderation, including for Facebook. We spoke to a friend and colleague in the Philippines to share what we had worked on, and heard about the developments and preparations from there.

Once we decided to start looking at political ads globally, we had to seriously consider what approach to take, which groups or persons were already working on projects related to political advertising and what gaps we might be able to fill. We knew that FBTrex for example, is a tool that helped users collect metadata of posts including political ads that show up in their newsfeeds by installing a browser extension.


Facebook Tracking Exposed (FBTrex) is a project whose vision is that individuals should be in control of their algorithms. Among other things, the project offers a browser plugin that collects the metadata of all the public posts on your timeline and allows you to either contribute this data to a public dataset, or keep it private for your own use. FBTrex also allows researchers (and users) to use/reuse a portion of the data through their API.

ProPublica has created a tool called Political Ad Collector, which allows you to install a browser plugin that collects all the political ads you see while browsing and sends them to a database that enables ProPublica to better analyse the nature of political ad targeting.

Who Targets Me is a project that also uses a browser plugin you can install to collect all sponsored posts on your timeline. The plugin sends this data to a crowdsourced global database of political adverts, it matches the sponsored posts against categorised lists of political advertisers, and draws conclusions about who is targeting you online.

We had become familiar with fields of information that the Facebook Ad Library API was providing, but we did not know the fields of information that were available for advertisers to target users. That meant that even if we knew that a certain ad was reaching between 1,000 and 10,000 people in Delhi, we did not know whether those were the parameters that the advertiser had selected for targeting the ads. We were interested in knowing the options available to administer the targeting, so to speak.

So we created pages of fictitious political parties and tested buying and targeting ads. We created one page each, and “boosted” posts. The level of detail available from this side of the window was, unsurprisingly, far greater than the information that was being revealed in the name of transparency. We could create “custom audiences” by specifying what kinds of “Demographics, Interests and Behaviours” we wanted to target. We could also input data of people who might have indicated interest in our business or campaign and Facebook would offer to deliver ads to a “lookalike audience.” These are standard advertising methods, but we were about to discover that the fields available for targeting seemed quite problematic and could lend themselves to all kinds of discrimination. For example, we found a category called “Friends of people celebrating Ramadan.” Screenshot: Targeting suggestions when creating an ad. Date: 29 November 2019. Source: Nayantara Ranganathan.

We were aware of developments in the US where practices of discrimination had been recorded and Facebook held to account. We were also aware that Facebook had committed to not allow targeting based on race and gender when it comes to ads on employment, housing and credit. These categories confirmed that such problems were unaddressed in other parts of the world. For example, the above option of “friends of people celebrating Ramadan” is a category that can easily be used to exclude Muslims at a time when anti-Muslim sentiment in India is both emerging from the State as well as the society at large. Screenshot: Suggestions for excluding people from the targeting of an ad. Date: 29 November 2019. Source: Nayantara Ranganathan.


We decided to collect the data of political actors across the world.

As mentioned before, we had been using Tableau for internal experiments and to understand the data. At this point, we discovered that Tableau allows online publishing, and in fact, provided relatively dynamic and sophisticated levels of visualisation.

While we jumped into using Tableau software to help us make sense of the data, once the question of publishing arose, we had to think about whether we wanted to make the entire project dependent on a tool that was proprietary and mostly used by marketing departments to conduct data analysis. We also had to consider whether we were making our tool vulnerable by relying on proprietary software that could decide to revoke access at any time.

We explored other much-loved open-source alternatives like Rawgraphs and Datawrapper. However, given the size and particularities of the data, these were ruled out. Some of these alternatives could not analyse the content of the ads because the ads sometimes contained characters from non-Latin scripts. So we decided to go ahead and use Tableau Public.


It was puzzling to understand why many of these highly acclaimed open source and free visualisation software were unable to open even small extracts of the JSON files.

We used a website called JSONLint that helped “validate” the JSON data – that is, helping us spot whether there were formatting errors in the files. Doing this helped us figure out that the syntax errors identified were because the field with the text often contained scripts other than the Latin script, and also emojis and characters that were included in recent versions of Unicode, the standard for encoding scripts into machine-readable characters.

So we started designing our own visual interfaces with Tableau to browse the databases. One of the problems we encountered while using Tableau was the amount of computing resources it requires to handle databases of this size, which were surpassing five gigabytes. Our computers running through the virtual machine were very slow or sometimes simply unable to load the data. With the limited financial resources we had, buying a more powerful computer was not an option.


The university where I teach provides teachers access to a remote desktop with an Intel Xeon Processor with 20GB of RAM. This turned out to be a great solution. In the absence of this, we would have had to rent a Virtual Private Server for the same purpose. A remote desktop can be accessed through Remote Desktop Protocol (RDP). This is a protocol that allows you to access and control a system that is not in the same physical location as you.

Using RDP from my laptop allowed us to start loading the databases on Tableau and have the resources on my terminal available for other tasks. It even made it possible to close my laptop while the RDP continued to load.


Tableau allows you to import JSON files of up to 128MB only, so we had to split the files below that limit. We used a simple Python script called json-splitter for that task.

It was a challenge to find the right balance of what we wanted to achieve with the project.

On the one hand, we were sitting on a vast amount of data relevant to the politics of many countries, and we had already found critical issues in the limited amount of countries that we analysed ourselves. There was a certain vertigo about the depth of what else might be there, encouraging our need to get it out.

As a project that exposes the mechanisms of not widely understood technology it was important to present it in a way in which anyone can learn and understand how political ads function. It needed to be powerful enough to allow journalists and researchers to conduct their investigations. And it also needed to problematise how the infrastructure of Facebook ads functions. Merging all these elements took some time and we explored different iterations.

Between making it mobile-friendly or desktop-based, we opted for the desktop as it allows for a more in-depth experience.

While nearing the completion of the interfaces that were about to give light to, the question of how to update the database became more imminent.

If we were to directly dump the data to whoever needed it, perhaps the updates were not so relevant, but as a live resource we felt it was more important to be up-to-date at all times. Some weeks before the release of the project online we participated in a workshop of the Exposing the Invisible project of Tactical Tech, (to work amongst other things on the very inception of this text). There we got help from Wael Eskandar to develop the initial core of a Python script that would automate the ad collection from the API. Some of the challenges in developing it included how to make Python incorporate the modifications that we were previously introducing manually.

The next challenge was making the script understand when the ads from a particular query were finished so the script could jump to the next query. Or how to make the files stay below Tableau’s limit of 128 MB each. As previously noted, Facebook placed limits on automated queries, so we started mimicking human behaviour by adding randomised time delays between the queries. From its initial version, the script became something of a precious system of data collection, which we were improving day by day. It also became a source of experimentation through which we understood better how the Facebook API works, and to experiment with the collection of data other than that of the political parties. At a later point we created a system of text files with Page IDs that the script would go through, and a separate file with the two-letter ISO country codes, so it could collect several countries in one go. This new system also enabled us to make updates in the list of political parties more easily. Screenshot: Homepage of website. Date: 29 November 2019. Source: Manuel Beltrán.

The first release of on July 26, 2019, was done with the manual collection of data, but by then we had also perfected the script to a level that allowed us to regularly update the data, which proved extremely valuable for the effectiveness of the project. We went live with the website and posted on our personal social media networks to spread the word. We received a lot of positive reactions, including from journalists who were using the data to understand Facebook’s much publicised ad transparency efforts, noting the trends in various countries’ elections.

Forks in the road

Verification by Facebook

Back when we were first trying to access the data and needed to go through the identity verification process, we realised that the two of us had to face quite different processes of verification to gain access to Facebook’s ad platform.


As someone whose “Primary country location” was India, I had to undergo an additional step for address verification. This involved choosing between receiving a code delivered to my house via post, or a visit from “someone.” The time difference between the two options was significant: the postal mail would take three weeks, and the visit to my house would happen within a week. Because we wanted to get on with the process as soon as possible and were also struck by the house-visit process, we decided to go with that option. Screenshot: Three-step verification process for accessing the Ad Library API. Date: 20 April 2019. Source: Manuel Beltrán.

What we came to learn in the course of this identification process blew our minds.

A day after making the request for verification, I (Nayantara) received a phone call from a man who was going to conduct the process. He said he was calling about Facebook verification and to ask whether the next day was a good time to visit. Screenshot: Identity information of OnGrid representative who was tasked with the verification. Date: 23 April 2019. Source: Nayantara Ranganathan

This entire episode was interesting for many reasons, including showing how the rise of identification technology and businesses in India inter-played with big tech companies like Facebook looking for eyes and ears in India, a perfect example of identity-verification-as-a-service.

On that occasion, I asked the caller who he was and which company he was from. He hesitated a bit and seemed like he had not expected the question. Once we established that we could both speak Kannada (a language spoken predominantly by people of Karnataka in southwestern India), we established a more familiar and trusting tone of conversation. He said he worked for a company called OnGrid, and that his name was Umesh (name changed). As he was speaking to a woman on the line, I could sense an awkward tone as he asked me about my address and directions to my house. He also became more forthcoming with revealing details about himself.

Umesh assured me that I did not need to be at home when he visited, and as long as there was someone there who could confirm that I live there, it was sufficient. Two days later, Umesh arrived at my place to conduct the verification. He took pictures of my house from the outside, a landmark near it, and took the signature of the person who lives at home with me (since I was not at home during the visit). Screenshot: WhatsApp conversation between Nayantara and OnGrid representative. Date: 23 April 2019. Source: Nayantara Ranganathan.

Naturally, we were curious about the company that was Facebook’s verification service provider. OnGrid is an Indian company that claims to be enabling people in India to “establish trust instantly,” or in other words, engaging in a trade of identification information. They offer everything from “education verification” to a check of court records. Two years ago, they were in the news for a creepy image that sent a collective shiver down people’s spine: an advertisement for non-consensual image recognition using the national biometric identification architecture, Aadhaar. As a separate entity, OnGrid’s terms of service and data policy are different from Facebook’s policies about retention of information submitted for identification purposes. Image: Now-deleted image posted on Twitter by OnGrid. Date: 29 November 2019. Source: Archived page here.*

Facebook’s policies did not mention anything about outsourcing some elements of the verification process or how the data policies might change in that instance. This possibly meant that the commitments made by Facebook towards the deletion of identification data shared with Facebook was not something that OnGrid – while servicing the need for “last-mile verification” – was obliged to respect. Indeed, Facebook claims to permanently delete the data that it collects from users who undergo the verification process. OnGrid, the company that is hired by Facebook in India, on the other hand, retains this data for providing “identification services” to other entities. OnGrid uses people’s data to create their database in order to reuse and offer it as a service to other entities.

After this process, I finally had my address verification process completed, and had access to the Ad Library API.


As a Spanish citizen living in the Netherlands, my identity verification process was much simpler and involved fewer steps. However, once submitted, the final acceptance took about two days to come through, as compared to the instantaneous response Nayantara received in that step, indicating that mine might have involved a human in the process.

These episodes were helpful in understanding the different data handling processes in different countries, the vulnerabilities created by involving third-party verification actors, and the outrageous fact that any of this was even necessary for obtaining public-interest information that was quite crucial to understanding how social media could be interfering with democratic processes.

To include images and video or not?

We had a plan for visualising data about the ads but the ad “visuals” themselves could not be shown. That is, the images or video content, which is the element that Facebook and Instagram users are meant to see, could not be visualised because of the peculiarities of the data structuring and accessibility.

The data was not provided in a downloadable manner independent of reliance on the API, and therefore it was a challenge to visualise it. But even if we were to find a way to embed the link and present the visuals, there was another problem: the URL with the visuals includes an active “access token.” This is a unique, time-bound token that prevents the link from being useful after an hour. The links would not be of much use without a viewer’s own access token.


This is what an ad snapshot URL looks like:>gyy5YXX8fmZC0hUGcpQMfZCp3uWaSWeX4urEcNPwB8SM01clzJSqRXPjjh8ZBguzXZC9sc9whaz0hE9MGEj889ztZBW2XNxVfitweUSkVrcKGiwePQQZB7uGBOa

An access token is what comes after access_token=, and is the long string of letters and numbers:


We considered scraping images of the Ad Library using one of the Add-Ons for the browser like ‘Download All Images’. There were also many scripts in Python to help one do so. However, Facebook prevents all these scraping techniques from collecting the visuals. Besides, Facebook also forbids users from doing such scraping in general. Screenshot: Facebook’s Terms of Service. Date: 8 December 2019. Source: Manuel Beltrán.

On the one hand, it was important to include the visuals as they have a recall value for people browsing the interfaces. On the other hand, we wondered if it might actually be a good thing to not distract people with visuals, as the most crucial aspect to convey was that the visuals themselves might not be as special as the metadata about them, like the targeting.

Nevertheless we also tried to use the “wget” command to do this, but it did not work.

We eventually found a workaround using features available in Tableau to allow for users of to be able to input their own access tokens and view the contents of the ads.

However, after the release, we found something called an ‘Access Token Debugger’, where it was possible to extend the lifetime or validity of each access token. We then started to use extended access tokens in our system of collecting data. Screenshot: Access Token Debugger. Date: 30 November 2019. Source: Manuel Beltrán.

Working with journalists

While working together with journalists to investigate stories emerging from the data, many wanted to see the raw data. However, sharing the data came with one problem. The data contained the parameter of "ad_snapshot_url", which contained our personal access tokens, which we described earlier.

This access token is unique to different developer accounts and gets recorded in the data returned by queries. Since we were being careful not to do anything that might get our access to the API revoked, we had to remove it before sharing the data. But removing the access token from each ad’s data had to be automated as we had millions of ads. The solution was rather easy to find searching in StackOverflow, and we were able to use a tool called “SED” (Stream Editor) that is already included in the terminal of Linux.

sed -i -e 's/EAAjPOWfPqZCgBAAJ0csteVNkFJcyxbQZA7m1xbJ8w3fzFRlm6apQ5cAnzsj
rcKGiwePQQZB7uGBOa//g' US_20_1.json

Using these parameters, SED searches in a text file for the content of the access token and deletes it. This tool enabled us to easily clean the raw files of data in an automated and efficient way.

As the data we gathered was about the ads themselves and did not involve the personal news feeds of users, we didn’t have to deal with the issue of safeguarding personal data.

After publishing

Maintaining, updating and adapting the databases

Once the project was made public, it also became imperative that the data be updated regularly and the database be maintained. This involved questions about the frequency of updates, whether to collect data from the beginning each time, and if not, how to collect data exactly from where we had left off.

We added more countries as their data was added in the Facebook API. Argentina and Sri Lanka were added very shortly before the elections there, and Singapore data was also released. There were no verification procedures for Norway and Switzerland but we were able to collect the data about these countries already.


As the time passed after the release, I also worked to tweak the scripts we used for collecting the data, so that the collection was more efficient, with minimal human intervention for certain tasks.

Facebook’s response:

Shortly after the launch, Facebook’s (ex) VP of ads had some words of encouragement for the project. This was awkward, as the project wasn’t intended to be a “visualisation tool” but challenged the lack of systematic access to data by Facebook.

The project is an action to highlight the fundamental conflict between Facebook’s advertising infrastructure and the conditions needed for meaningful democratic participation on social media. With the increased individuation enabled by targeted advertising on Facebook, and their failure to open up this data in all countries they operate in meant that these conditions were being defeated time and again.

For this reason, we did not want Facebook to co-opt our project and make it part of their transparency success story.

Twitter censorship

We started to notice something peculiar two days after releasing the website. Some tweets made by others about the project were no longer visible to us. It was unlikely that multiple people had tweeted about the project and then deleted their tweets.

Soon we realised that our own tweets about the project were also missing. We didn’t initially notice this since we could both see our own tweets. However, we eventually realised that we could no longer see each other’s tweets.

Understood as “shadowbanning,” this was a phenomenon that we had heard of before.

This is made even trickier to spot as the tweets continue to be visible to the author of the tweets, raising no suspicions. This phenomenon was a regular occurrence that people have observed when it came to tweets and user accounts related to Kashmir, for example.

We decided to take up the issue immediately via a report with Twitter’s support team. A day later we had still not heard back from them. That was when we decided it might be worth soliciting support and help to understand what was going on. We posted the following call for support. Screenshot: Tweet calling for support about the censorship. Date: 2 August 2019. Source: Nayantara Ranganathan.

By now we had discovered that not only had tweets disappeared, it was also impossible to post a tweet with the project’s URL. We found that others were also experiencing this issue. Screenshots: People on Twitter alerting us to the censorship. Date: 4 August 2019. Source: Nayantara Ranganathan.

It was rather strange that the URL of the project was also blocked in Twitter’s Direct Messages. Screenshot: Censorship within Twitter’s Direct Message window. Date: 3 August 2019. Source: Nayantara Ranganathan.

We received support from many friends, colleagues and strangers in the form of documentation of error messages, connecting us with and suggesting people working at Twitter to contact.

Meanwhile, we discovered that someone who had tried posting a link to the project on LinkedIn had also received a warning message. We guessed that the URL might have been flagged on some kind of centralised blacklist beyond the platforms. This was indeed the case, as we found that one of the security firms that rate URLs and creates blacklists - - indeed listed as spam on the basis of someone’s report. We filed an appeal with them, and received a response saying they had re-classified it (funnily enough) as advertising. Screenshot: Results from a lookup that had classified as spam. Date: 5 August 2019. Source: Nayantara Ranganathan.

Eventually Twitter removed the shadowbanning but didn’t provide us with any explanation. Our inquiry continued; someone put us in touch with a person working at Twitter who seemed eager to help, but this went nowhere. We also explored the possibility of using different channels to try to get an explanation about the reason for blocking, including through the use of the Right to Explanation in the European GDPR (General Data Protection Regulation). The Right to Explanation theoretically can force Twitter to report on the reason why the censorship took place, if it was algorithmic.

Key takeaways

From the technical glitches in the tools to the earnest promises in the policies, we learnt an immense amount about the underworld of information economy, that is the sale of personal data for marketing, and its outsized impact on the socio-political realities of people. Beyond Facebook, the project acquainted us with the ethos and modus operandi of companies that are engaged in the extraction and monetisation of data about people’s lives.

This project was about believing that a small team of two could somehow challenge an entity like Facebook, with its scale of resources and power of narrative. This meant that we were constantly under the risk that our API access might be withdrawn, or our methods classified as against Facebook’s terms of use, or even considering that Facebook had access to so much sensitive information about us that could be used in damaging ways. These drastic things did not come to be, but at the time of working on the project before releasing it, we carried with us all of these possibilities.

Overall, many factors came together for the making of the project:

  • understanding the gaps that existed in this milieu

  • trusting that there was a way to hack around limitations when things were not straightforward

  • reaching out and being beneficiaries of generous advice from friends

  • taking the communication of the data seriously

These were some aspects that helped, in hindsight.

As of January 2020, the project is spinning off in new directions: we are working on making information about political advertising on social media more easily digestible for different groups of people in different parts of the world. We are also circling back to our initial goal of devising a method to make the data downloadable for anyone who wishes to go through it themselves.

The maintenance and updates of the databases also keep us busy. As political landscapes globally are changing and new parties and coalitions emerge, we receive contributions from visitors who email us with new pages to be added to

We continue to document and update with newer iterations of the project. Visit us at, or write to us at

*All access tokens in the article have been modified.

Published April 2020


Articles and Guides

Tools and Databases

  • Bash. A language to communicate with the operating systems of Linux distributions.

  • Facebook Ad Library API. Facebook’s API that allows certain users to search ads tagged as related to politics and issues of political importance.

  • Facebook Tracking Exposed. A project that offers a browser plugin, which collects the metadata of all the public posts on your timeline and allows you to either contribute this data to a public dataset, or keep it private.

  • Json splitter. A command line tool for splitting large JSON files into smaller files.

  • Json lint. A tool that helps validate (check) json files and helps reformat them.

  • Political Ad Collector from ProPublica. A tool, which allows you to install a browser plugin that collects all the political ads you see while browsing and sends them to a database that enables ProPublica to better analyse the nature of political ad targeting.

  • Remmina. Application that helps connect to remote computers using protocols like Remote Desktop Protocol.

  • Sed. An editor that allows you to perform basic text transformations within the terminal.

  • Stackoverflow. A web forum where developers ask and answer questions.

  • Tableau. A suite of software applications for data visualisation and analysis.

  • VirtualBox. Application that allows you to run an operating system different from the one installed on your computer.

  • Wget. Software to retrieve content from webservers.

  • Who Targets Me. A project that uses a browser plugin browser extension to crowdsource political ads, you can install it to collect all sponsored posts on your timeline and see who’s targeting your vote.



API (Application Programming Interface) - a software tool that facilitates communication between a user and a dataset, amongst other things. Facebook’s Ad Library API allows users to query ad data using a particular set of commands developed by Facebook.


Browser extension – also called add-ons, they are small pieces of software used to extend the functionalities of a web browser. These can be anything from extensions that allow you to take screenshots of webpages you visit to the ones checking and correcting your spelling or blocking unwanted ads from websites.


Browser plugin - a piece of software that can be added to browsers to enhance specific functionalities.


Facebook Page ID - a number that uniquely identifies each Facebook page.


Interface - (in this case) a set of visualisations that are interactive and allow users to have a graphical representation of the data.


ISO country code - the short alphabetic codes created and maintained for each country by the International Organization for Standardization:


JSON - JavaScript Object Notation is a popular format in which data is recorded. According to the JSON website, it is easy for humans to read and write and easy for machines to parse and generate.


Query - a list of coded questions a user can input in programs and apps to obtain data addressing a question or subject of interest.


Python - a programming language that allows developers to write various programs such as web applications, websites, data analysis tools:


Python editor - application to navigate, debug, compile and run scripts in the Python language.


Remote Desktop Protocol (RDP) - a protocol that allows you to access and control a system that is not in the same physical location as you.


Shadowbanning - when some tweets (or accounts) are deprioritised or made to disappear from timelines and from accounts without an explicit censorship notice.


Targeted ads (advertising) - advertising whose content or conditions of delivery are tailored to specific persons or groups based on the data available about them.


Terminal - also called command-line interface (CLI), is a means of interacting with a conputer by using a text-based interface where one can enter commands to a computer program.


Trending topics - topics that are popular at particular times.


Unicode - the standard for encoding scripts into machine-readable characters.


Unique identifiers - (in this case) unique codes that are created and tagged with every ad, so that they can be retraced if needed.


Virtual Private Server (VPS) - a virtual machine, rented out as a service, by an Internet hosting company.


Virtualisation Software - software that allows you to run a different Operating System than the one installed on your computer without needing to add another Operating System on your device.


Wget - a tool by the GNU project to get files from webservers. The tool allows for the transport of files through more than one protocol.