ad.watch - Investigating Political Advertisements on Facebook
By Manuel Beltrán and Nayantara Ranganathan
IN SHORT: Two investigators trace their process of creating the project ad.watch, an open-research-style investigation into political advertising on Facebook. This case study traces the project’s evolution - with its revelations, roadblocks, techniques and tools - from a limited endeavour to a growing resource that facilitates investigations on political ads around the world.
This is about how we came to publish ad.watch, a project uncovering information about the political advertisements (ads) published on Facebook and Instagram.
Much of the advertising on the internet is personalised, a method commonly referred to as targeted advertising. Targeted ads use psychological, behavioral and other kinds of information about people to present each of us with a different message. The difference may be in the content, design, delivery or other characteristics. The tailoring of messages is complemented with information about its effectiveness, allowing for the testing of assumptions created during the psychological or behavioural profiling.
Because politicians today use online platforms to advertise their campaigns and promote their messages, and these platforms deliver the ads on the basis of personal information, the phrase “the personal is political” has gained a whole new dimension. Ads are personalised according to our personal data and behavioural patterns, and we receive these targeted ads on our personal screens. However, understanding these mechanisms is not straightforward. The advertising delivery infrastructure operates invisibly, with platforms going to great lengths to obfuscate their mechanisms and establish many barriers to understanding how they function. Often, even advertisers don’t get the full picture of who their ads reach, or why.
We are Nayantara and Manuel, and we created ad.watch to understand the new channels, over which propaganda is created and circulates online.
Towards ad.watch
In the course of finding answers to our specific questions about the silence period, we found other interesting ways of reading the data through Tableau. One of the main challenges here was understanding how to structure the data. As the data we were collecting was in multiple files, we had to make sure that files were “stacking up” the right way.
We designed the data collection in such a way that there were cut-off limits for files on the basis of their size. This meant that if a political actor’s ad data was massive (like Donald Trump’s has been), then the file would contain fewer than the standard number of ten actors’ ad data, and instead just contain one politician’s data. We were able to get a grip on structuring the data through much trial and error, as well as with help from the Tableau community forums and YouTube videos about the tool. Since it is such a popular data analysis and visualisation tool, finding answers to such details was a matter of exploring already available fora.
Apart from the breaches of the silence period, we found other interesting details: the architecture of the information on political ads, the categories of information available as well as the data that was missing, were all revealing. For example, we learned that Facebook profiled people into gender binaries of male and female, despite allowing users to identify amongst a broad range of options on the user side. This was an interesting peek into how differently things work at the user-experience level versus how they work at the back-end infrastructure designed for marketing.
Screenshot: Dropdown menu options that appear on Facebook account settings after you choose to set your gender as “Custom”. Date taken 29 November 2019. Source: Nayantara Ranganathan.
Screenshot: Segmentation of Facebook users into the gender binary, visible in the process of creating ads or boosting posts. Date taken 29 November 2019. Source: Nayantara Ranganathan.
Once we managed to uncover the silence period breaches, the Indian elections were well underway, and the 2019 European Parliamentary elections were approaching. On this occasion, we also started digging into data on Spanish parties for the European Parliament elections and the Spanish municipal and national elections of 2019. Slowly, we began broadening the scope of our data collection and research.
Although we were now collecting and exploring Facebook ad data from various contexts, we were uncomfortable writing stories about countries that we had no context or knowledge of. Beyond that, we felt it was somehow incorrect to do so even if we’d had some understanding of the places or found collaborators from different countries.
But while staying in our lane, we also wanted to make this data in other countries accessible. We decided that we would collect the data available in all the countries, but leave the analysis and investigations to whomever might be interested in taking them up.
We connected with colleagues and friends in other parts of the world where elections were going on to see if the data we were collecting might be useful for them directly or if they were interested in advocating for releasing similar data in those regions, too. For example in the Philippines, Facebook is a crucial channel for the Duterte government’s disinformation campaigns. The Philippines also happens to be one of the labour markets for exploitative and traumatic contract jobs for content moderation, including for Facebook. We spoke to a friend and colleague in the Philippines to share what we had worked on, and heard about the developments and preparations from there.
Once we decided to start looking at political ads globally, we had to seriously consider what approach to take, which groups or persons were already working on projects related to political advertising and what gaps we might be able to fill. We knew that FBTrex for example, is a tool that helped users collect metadata of posts including political ads that show up in their newsfeeds by installing a browser extension.
Tips:
Facebook Tracking Exposed (FBTrex) is a project whose vision is that individuals should be in control of their algorithms. Among other things, the project offers a browser plugin that collects the metadata of all the public posts on your timeline and allows you to either contribute this data to a public dataset, or keep it private for your own use. FBTrex also allows researchers (and users) to use/reuse a portion of the data through their API.
ProPublica has created a tool called Political Ad Collector, which allows you to install a browser plugin that collects all the political ads you see while browsing and sends them to a database that enables ProPublica to better analyse the nature of political ad targeting.
Who Targets Me is a project that also uses a browser plugin you can install to collect all sponsored posts on your timeline. The plugin sends this data to a crowdsourced global database of political adverts, it matches the sponsored posts against categorised lists of political advertisers, and draws conclusions about who is targeting you online.
We had become familiar with fields of information that the Facebook Ad Library API was providing, but we did not know the fields of information that were available for advertisers to target users. That meant that even if we knew that a certain ad was reaching between 1,000 and 10,000 people in Delhi, we did not know whether those were the parameters that the advertiser had selected for targeting the ads. We were interested in knowing the options available to administer the targeting, so to speak.
So we created pages of fictitious political parties and tested buying and targeting ads. We created one page each, and “boosted” posts. The level of detail available from this side of the window was, unsurprisingly, far greater than the information that was being revealed in the name of transparency. We could create “custom audiences” by specifying what kinds of “Demographics, Interests and Behaviours” we wanted to target. We could also input data of people who might have indicated interest in our business or campaign and Facebook would offer to deliver ads to a “lookalike audience.” These are standard advertising methods, but we were about to discover that the fields available for targeting seemed quite problematic and could lend themselves to all kinds of discrimination. For example, we found a category called “Friends of people celebrating Ramadan.”
Screenshot: Targeting suggestions when creating an ad. Date: 29 November 2019. Source: Nayantara Ranganathan.
We were aware of developments in the US where practices of discrimination had been recorded and Facebook held to account. We were also aware that Facebook had committed to not allow targeting based on race and gender when it comes to ads on employment, housing and credit. These categories confirmed that such problems were unaddressed in other parts of the world. For example, the above option of “friends of people celebrating Ramadan” is a category that can easily be used to exclude Muslims at a time when anti-Muslim sentiment in India is both emerging from the State as well as the society at large.
Screenshot: Suggestions for excluding people from the targeting of an ad. Date: 29 November 2019. Source: Nayantara Ranganathan.
Publishing ad.watch
We decided to collect the data of political actors across the world.
As mentioned before, we had been using Tableau for internal experiments and to understand the data. At this point, we discovered that Tableau allows online publishing, and in fact, provided relatively dynamic and sophisticated levels of visualisation.
While we jumped into using Tableau software to help us make sense of the data, once the question of publishing arose, we had to think about whether we wanted to make the entire project dependent on a tool that was proprietary and mostly used by marketing departments to conduct data analysis. We also had to consider whether we were making our tool vulnerable by relying on proprietary software that could decide to revoke access at any time.
We explored other much-loved open-source alternatives like Rawgraphs and Datawrapper. However, given the size and particularities of the data, these were ruled out. Some of these alternatives could not analyse the content of the ads because the ads sometimes contained characters from non-Latin scripts. So we decided to go ahead and use Tableau Public.
Note:
It was puzzling to understand why many of these highly acclaimed open source and free visualisation software were unable to open even small extracts of the JSON files.
We used a website called JSONLint that helped “validate” the JSON data – that is, helping us spot whether there were formatting errors in the files. Doing this helped us figure out that the syntax errors identified were because the field with the text often contained scripts other than the Latin script, and also emojis and characters that were included in recent versions of Unicode, the standard for encoding scripts into machine-readable characters.
So we started designing our own visual interfaces with Tableau to browse the databases. One of the problems we encountered while using Tableau was the amount of computing resources it requires to handle databases of this size, which were surpassing five gigabytes. Our computers running through the virtual machine were very slow or sometimes simply unable to load the data. With the limited financial resources we had, buying a more powerful computer was not an option.
Manuel
The university where I teach provides teachers access to a remote desktop with an Intel Xeon Processor with 20GB of RAM. This turned out to be a great solution. In the absence of this, we would have had to rent a Virtual Private Server for the same purpose. A remote desktop can be accessed through Remote Desktop Protocol (RDP). This is a protocol that allows you to access and control a system that is not in the same physical location as you.
Using RDP from my laptop allowed us to start loading the databases on Tableau and have the resources on my terminal available for other tasks. It even made it possible to close my laptop while the RDP continued to load.
Note:
Tableau allows you to import JSON files of up to 128MB only, so we had to split the files below that limit. We used a simple Python script called json-splitter for that task.
It was a challenge to find the right balance of what we wanted to achieve with the project.
On the one hand, we were sitting on a vast amount of data relevant to the politics of many countries, and we had already found critical issues in the limited amount of countries that we analysed ourselves. There was a certain vertigo about the depth of what else might be there, encouraging our need to get it out.
As a project that exposes the mechanisms of not widely understood technology it was important to present it in a way in which anyone can learn and understand how political ads function. It needed to be powerful enough to allow journalists and researchers to conduct their investigations. And it also needed to problematise how the infrastructure of Facebook ads functions. Merging all these elements took some time and we explored different iterations.
Between making it mobile-friendly or desktop-based, we opted for the desktop as it allows for a more in-depth experience.
While nearing the completion of the interfaces that were about to give light to ad.watch, the question of how to update the database became more imminent.
If we were to directly dump the data to whoever needed it, perhaps the updates were not so relevant, but as a live resource we felt it was more important to be up-to-date at all times. Some weeks before the release of the project online we participated in a workshop of the Exposing the Invisible project of Tactical Tech, (to work amongst other things on the very inception of this text). There we got help from Wael Eskandar to develop the initial core of a Python script that would automate the ad collection from the API. Some of the challenges in developing it included how to make Python incorporate the modifications that we were previously introducing manually.
The next challenge was making the script understand when the ads from a particular query were finished so the script could jump to the next query. Or how to make the files stay below Tableau’s limit of 128 MB each. As previously noted, Facebook placed limits on automated queries, so we started mimicking human behaviour by adding randomised time delays between the queries. From its initial version, the script became something of a precious system of data collection, which we were improving day by day. It also became a source of experimentation through which we understood better how the Facebook API works, and to experiment with the collection of data other than that of the political parties. At a later point we created a system of text files with Page IDs that the script would go through, and a separate file with the two-letter ISO country codes, so it could collect several countries in one go. This new system also enabled us to make updates in the list of political parties more easily.
Screenshot: Homepage of ad.watch website. Date: 29 November 2019. Source: Manuel Beltrán.
The first release of ad.watch on July 26, 2019, was done with the manual collection of data, but by then we had also perfected the script to a level that allowed us to regularly update the data, which proved extremely valuable for the effectiveness of the project. We went live with the website and posted on our personal social media networks to spread the word. We received a lot of positive reactions, including from journalists who were using the data to understand Facebook’s much publicised ad transparency efforts, noting the trends in various countries’ elections.
Forks in the road
Verification by Facebook
Back when we were first trying to access the data and needed to go through the identity verification process, we realised that the two of us had to face quite different processes of verification to gain access to Facebook’s ad platform.
Nayantara
As someone whose “Primary country location” was India, I had to undergo an additional step for address verification. This involved choosing between receiving a code delivered to my house via post, or a visit from “someone.” The time difference between the two options was significant: the postal mail would take three weeks, and the visit to my house would happen within a week. Because we wanted to get on with the process as soon as possible and were also struck by the house-visit process, we decided to go with that option.
Screenshot: Three-step verification process for accessing the Ad Library API. Date: 20 April 2019. Source: Manuel Beltrán.
What we came to learn in the course of this identification process blew our minds.
A day after making the request for verification, I (Nayantara) received a phone call from a man who was going to conduct the process. He said he was calling about Facebook verification and to ask whether the next day was a good time to visit.
Screenshot: Identity information of OnGrid representative who was tasked with the verification. Date: 23 April 2019. Source: Nayantara Ranganathan
This entire episode was interesting for many reasons, including showing how the rise of identification technology and businesses in India inter-played with big tech companies like Facebook looking for eyes and ears in India, a perfect example of identity-verification-as-a-service.
On that occasion, I asked the caller who he was and which company he was from. He hesitated a bit and seemed like he had not expected the question. Once we established that we could both speak Kannada (a language spoken predominantly by people of Karnataka in southwestern India), we established a more familiar and trusting tone of conversation. He said he worked for a company called OnGrid, and that his name was Umesh (name changed). As he was speaking to a woman on the line, I could sense an awkward tone as he asked me about my address and directions to my house. He also became more forthcoming with revealing details about himself.
Umesh assured me that I did not need to be at home when he visited, and as long as there was someone there who could confirm that I live there, it was sufficient. Two days later, Umesh arrived at my place to conduct the verification. He took pictures of my house from the outside, a landmark near it, and took the signature of the person who lives at home with me (since I was not at home during the visit).
Screenshot: WhatsApp conversation between Nayantara and OnGrid representative. Date: 23 April 2019. Source: Nayantara Ranganathan.
Naturally, we were curious about the company that was Facebook’s verification service provider. OnGrid is an Indian company that claims to be enabling people in India to “establish trust instantly,” or in other words, engaging in a trade of identification information. They offer everything from “education verification” to a check of court records. Two years ago, they were in the news for a creepy image that sent a collective shiver down people’s spine: an advertisement for non-consensual image recognition using the national biometric identification architecture, Aadhaar. As a separate entity, OnGrid’s terms of service and data policy are different from Facebook’s policies about retention of information submitted for identification purposes.
Image: Now-deleted image posted on Twitter by OnGrid. Date: 29 November 2019. Source: https://mashable.com/2017/02/14/india-aadhaar-uidai-privacy-security-debate/. Archived page here.*
Facebook’s policies did not mention anything about outsourcing some elements of the verification process or how the data policies might change in that instance. This possibly meant that the commitments made by Facebook towards the deletion of identification data shared with Facebook was not something that OnGrid – while servicing the need for “last-mile verification” – was obliged to respect. Indeed, Facebook claims to permanently delete the data that it collects from users who undergo the verification process. OnGrid, the company that is hired by Facebook in India, on the other hand, retains this data for providing “identification services” to other entities. OnGrid uses people’s data to create their database in order to reuse and offer it as a service to other entities.
After this process, I finally had my address verification process completed, and had access to the Ad Library API.
Manuel
As a Spanish citizen living in the Netherlands, my identity verification process was much simpler and involved fewer steps. However, once submitted, the final acceptance took about two days to come through, as compared to the instantaneous response Nayantara received in that step, indicating that mine might have involved a human in the process.
These episodes were helpful in understanding the different data handling processes in different countries, the vulnerabilities created by involving third-party verification actors, and the outrageous fact that any of this was even necessary for obtaining public-interest information that was quite crucial to understanding how social media could be interfering with democratic processes.
To include images and video or not?
We had a plan for visualising data about the ads but the ad “visuals” themselves could not be shown. That is, the images or video content, which is the element that Facebook and Instagram users are meant to see, could not be visualised because of the peculiarities of the data structuring and accessibility.
The data was not provided in a downloadable manner independent of reliance on the API, and therefore it was a challenge to visualise it. But even if we were to find a way to embed the link and present the visuals, there was another problem: the URL with the visuals includes an active “access token.” This is a unique, time-bound token that prevents the link from being useful after an hour. The links would not be of much use without a viewer’s own access token.
Note:
This is what an ad snapshot URL looks like:
https://www.facebook.com/ads/archive/render_ad/?id=251563729117836&access_token=EAAjPOWfPqZCgBAAJ0csteVNkFJcyxbQZA7m1xbJ8w3fzFRlm6apQ5cAnzsjBNOOJt4zSEE8IxB4k9HcKydhbcd7P4SnNTBn82G7s>gyy5YXX8fmZC0hUGcpQMfZCp3uWaSWeX4urEcNPwB8SM01clzJSqRXPjjh8ZBguzXZC9sc9whaz0hE9MGEj889ztZBW2XNxVfitweUSkVrcKGiwePQQZB7uGBOa
An access token is what comes after
access_token=
, and is the long string of letters and numbers:EAAjPOWfPqZCgBAAJ0csteVNkFJcyxbQZA7m1xbJ8w3fzFRlm6apQ5cAnzsjBNOOJt4zSEE8IxB4k9HcKydhbcd7P4SnNTBn82G7sgyy5YXX8fmZC0hUGcpQMfZCp3uWaSWeX4urEcNPwB8S>M01clzJSqRXPjjh8ZBguzXZC9sc9whaz0hE9MGEj889ztZBW2XNxVfitweUSkVrcKGiwePQQZB7uGBOa
We considered scraping images of the Ad Library using one of the Add-Ons for the browser like ‘Download All Images’. There were also many scripts in Python to help one do so. However, Facebook prevents all these scraping techniques from collecting the visuals. Besides, Facebook also forbids users from doing such scraping in general.
Screenshot: Facebook’s Terms of Service. Date: 8 December 2019. Source: Manuel Beltrán.
On the one hand, it was important to include the visuals as they have a recall value for people browsing the interfaces. On the other hand, we wondered if it might actually be a good thing to not distract people with visuals, as the most crucial aspect to convey was that the visuals themselves might not be as special as the metadata about them, like the targeting.
Nevertheless we also tried to use the “wget” command to do this, but it did not work.
We eventually found a workaround using features available in Tableau to allow for users of ad.watch to be able to input their own access tokens and view the contents of the ads.
However, after the release, we found something called an ‘Access Token Debugger’, where it was possible to extend the lifetime or validity of each access token. We then started to use extended access tokens in our system of collecting data.
Screenshot: Access Token Debugger. Date: 30 November 2019. Source: Manuel Beltrán.
Working with journalists
While working together with journalists to investigate stories emerging
from the data, many wanted to see the raw data. However, sharing the
data came with one problem. The data contained the parameter of
"ad_snapshot_url"
, which contained our personal access tokens, which
we described earlier.
This access token is unique to different developer accounts and gets recorded in the data returned by queries. Since we were being careful not to do anything that might get our access to the API revoked, we had to remove it before sharing the data. But removing the access token from each ad’s data had to be automated as we had millions of ads. The solution was rather easy to find searching in StackOverflow, and we were able to use a tool called “SED” (Stream Editor) that is already included in the terminal of Linux.
sed -i -e 's/EAAjPOWfPqZCgBAAJ0csteVNkFJcyxbQZA7m1xbJ8w3fzFRlm6apQ5cAnzsj
BNOOJt4zSEE8IxB4k9HcKydhbcd7P4SnNTBn82G7sgyy5YXX8fmZC0hUGcpQMfZCp3uWaSWeX
4urEcNPwB8SM01clzJSqRXPjjh8ZBguzXZC9sc9whaz0hE9MGEj889ztZBW2XNxVfitweUSkV
rcKGiwePQQZB7uGBOa//g' US_20_1.json
Using these parameters, SED searches in a text file for the content of the access token and deletes it. This tool enabled us to easily clean the raw files of data in an automated and efficient way.
As the data we gathered was about the ads themselves and did not involve the personal news feeds of users, we didn’t have to deal with the issue of safeguarding personal data.
After publishing ad.watch
Maintaining, updating and adapting the databases
Once the project was made public, it also became imperative that the data be updated regularly and the database be maintained. This involved questions about the frequency of updates, whether to collect data from the beginning each time, and if not, how to collect data exactly from where we had left off.
We added more countries as their data was added in the Facebook API. Argentina and Sri Lanka were added very shortly before the elections there, and Singapore data was also released. There were no verification procedures for Norway and Switzerland but we were able to collect the data about these countries already.
Manuel
As the time passed after the release, I also worked to tweak the scripts we used for collecting the data, so that the collection was more efficient, with minimal human intervention for certain tasks.
Facebook’s response:
Shortly after the launch, Facebook’s (ex) VP of ads had some words of encouragement for the project. This was awkward, as the project wasn’t intended to be a “visualisation tool” but challenged the lack of systematic access to data by Facebook.
The project is an action to highlight the fundamental conflict between Facebook’s advertising infrastructure and the conditions needed for meaningful democratic participation on social media. With the increased individuation enabled by targeted advertising on Facebook, and their failure to open up this data in all countries they operate in meant that these conditions were being defeated time and again.
For this reason, we did not want Facebook to co-opt our project and make it part of their transparency success story.
Twitter censorship
We started to notice something peculiar two days after releasing the website. Some tweets made by others about the project were no longer visible to us. It was unlikely that multiple people had tweeted about the project and then deleted their tweets.
Soon we realised that our own tweets about the project were also missing. We didn’t initially notice this since we could both see our own tweets. However, we eventually realised that we could no longer see each other’s tweets.
Understood as “shadowbanning,” this was a phenomenon that we had heard of before.
This is made even trickier to spot as the tweets continue to be visible to the author of the tweets, raising no suspicions. This phenomenon was a regular occurrence that people have observed when it came to tweets and user accounts related to Kashmir, for example.
We decided to take up the issue immediately via a report with Twitter’s support team. A day later we had still not heard back from them. That was when we decided it might be worth soliciting support and help to understand what was going on. We posted the following call for support.
Screenshot: Tweet calling for support about the censorship. Date: 2 August 2019. Source: Nayantara Ranganathan.
By now we had discovered that not only had tweets disappeared, it was also impossible to post a tweet with the project’s URL. We found that others were also experiencing this issue.
Screenshots: People on Twitter alerting us to the censorship. Date: 4 August 2019. Source: Nayantara Ranganathan.
It was rather strange that the URL of the project was also blocked in Twitter’s Direct Messages.
Screenshot: Censorship within Twitter’s Direct Message window. Date: 3 August 2019. Source: Nayantara Ranganathan.
We received support from many friends, colleagues and strangers in the form of documentation of error messages, connecting us with and suggesting people working at Twitter to contact.
Meanwhile, we discovered that someone who had tried posting a link to the project on LinkedIn had also received a warning message. We guessed that the URL might have been flagged on some kind of centralised blacklist beyond the platforms. This was indeed the case, as we found that one of the security firms that rate URLs and creates blacklists - https://fortiguard.com/webfilter - indeed listed ad.watch as spam on the basis of someone’s report. We filed an appeal with them, and received a response saying they had re-classified it (funnily enough) as advertising.
Screenshot: Results from a lookup that had classified ad.watch as spam. Date: 5 August 2019. Source: Nayantara Ranganathan.
Eventually Twitter removed the shadowbanning but didn’t provide us with any explanation. Our inquiry continued; someone put us in touch with a person working at Twitter who seemed eager to help, but this went nowhere. We also explored the possibility of using different channels to try to get an explanation about the reason for blocking, including through the use of the Right to Explanation in the European GDPR (General Data Protection Regulation). The Right to Explanation theoretically can force Twitter to report on the reason why the censorship took place, if it was algorithmic.
Key takeaways
From the technical glitches in the tools to the earnest promises in the policies, we learnt an immense amount about the underworld of information economy, that is the sale of personal data for marketing, and its outsized impact on the socio-political realities of people. Beyond Facebook, the project acquainted us with the ethos and modus operandi of companies that are engaged in the extraction and monetisation of data about people’s lives.
This project was about believing that a small team of two could somehow challenge an entity like Facebook, with its scale of resources and power of narrative. This meant that we were constantly under the risk that our API access might be withdrawn, or our methods classified as against Facebook’s terms of use, or even considering that Facebook had access to so much sensitive information about us that could be used in damaging ways. These drastic things did not come to be, but at the time of working on the project before releasing it, we carried with us all of these possibilities.
Overall, many factors came together for the making of the project:
understanding the gaps that existed in this milieu
trusting that there was a way to hack around limitations when things were not straightforward
reaching out and being beneficiaries of generous advice from friends
taking the communication of the data seriously
These were some aspects that helped, in hindsight.
As of January 2020, the project is spinning off in new directions: we are working on making information about political advertising on social media more easily digestible for different groups of people in different parts of the world. We are also circling back to our initial goal of devising a method to make the data downloadable for anyone who wishes to go through it themselves.
The maintenance and updates of the databases also keep us busy. As political landscapes globally are changing and new parties and coalitions emerge, we receive contributions from visitors who email us with new pages to be added to ad.watch.
We continue to document and update ad.watch with newer iterations of the project. Visit us at ad.watch, or write to us at info@ad.watch.
*All access tokens in the article have been modified.
Published April 2020
Resources
Articles and Guides
Ad Tool Facebook Built to Fight Disinformation Doesn’t Work as Advertised from The New York Times (archived copy from Wayback Machine available here). An article about Facebook’s ad library.
Cambridge Analytica Files, from The Guardian (archived copy from Wayback Machine available here). An investigation series on the controversial data mining and elections marketing practices of political consulting firm Cambridge Analytica.
New Media, New Violations: Election Campaigning on Facebook Violates Code Of Conduct, article from HuffingtonPost India (archived copy from Wayback Machine available here).
This Tool Lets You See Facebook’s Targeted Political Ads All Over the World, article from Vice (archived copy from Wayback Machine available here).
Teens exposed to highly charged political ads on Facebook and Instagram, article from Sky News (archived copy from Wayback Machine available here).
The Influence Industry, from Tactical Tech (archived copy from Wayback Machine available here). A research project looking at the practices of collecting, processing and using voters personal data in elections around the world.
Tools and Databases
Bash. A language to communicate with the operating systems of Linux distributions.
Facebook Ad Library API. Facebook’s API that allows certain users to search ads tagged as related to politics and issues of political importance.
Facebook Tracking Exposed. A project that offers a browser plugin, which collects the metadata of all the public posts on your timeline and allows you to either contribute this data to a public dataset, or keep it private.
Json splitter. A command line tool for splitting large JSON files into smaller files.
Json lint. A tool that helps validate (check) json files and helps reformat them.
Political Ad Collector from ProPublica. A tool, which allows you to install a browser plugin that collects all the political ads you see while browsing and sends them to a database that enables ProPublica to better analyse the nature of political ad targeting.
Remmina. Application that helps connect to remote computers using protocols like Remote Desktop Protocol.
Sed. An editor that allows you to perform basic text transformations within the terminal.
Stackoverflow. A web forum where developers ask and answer questions.
Tableau. A suite of software applications for data visualisation and analysis.
VirtualBox. Application that allows you to run an operating system different from the one installed on your computer.
Wget. Software to retrieve content from webservers.
Who Targets Me. A project that uses a browser plugin browser extension to crowdsource political ads, you can install it to collect all sponsored posts on your timeline and see who’s targeting your vote.
Glossary
term-api
API (Application Programming Interface) - a software tool that facilitates communication between a user and a dataset, amongst other things. Facebook’s Ad Library API allows users to query ad data using a particular set of commands developed by Facebook.
term-extension
Browser extension – also called add-ons, they are small pieces of software used to extend the functionalities of a web browser. These can be anything from extensions that allow you to take screenshots of webpages you visit to the ones checking and correcting your spelling or blocking unwanted ads from websites.
term-plugin
Browser plugin - a piece of software that can be added to browsers to enhance specific functionalities.
term-fbpageid
Facebook Page ID - a number that uniquely identifies each Facebook page.
term-interface
Interface - (in this case) a set of visualisations that are interactive and allow users to have a graphical representation of the data.
term-iso
ISO country code - the short alphabetic codes created and maintained for each country by the International Organization for Standardization: https://www.iso.org/obp/ui/#search.
term-json
JSON - JavaScript Object Notation is a popular format in which data is recorded. According to the JSON website, it is easy for humans to read and write and easy for machines to parse and generate.
term-query
Query - a list of coded questions a user can input in programs and apps to obtain data addressing a question or subject of interest.
term-python
Python - a programming language that allows developers to write various programs such as web applications, websites, data analysis tools: https://www.python.org/.
term-pythoneditor
Python editor - application to navigate, debug, compile and run scripts in the Python language.
term-rdp
Remote Desktop Protocol (RDP) - a protocol that allows you to access and control a system that is not in the same physical location as you.
term-shadowbanning
Shadowbanning - when some tweets (or accounts) are deprioritised or made to disappear from timelines and from accounts without an explicit censorship notice.
term-targetads
Targeted ads (advertising) - advertising whose content or conditions of delivery are tailored to specific persons or groups based on the data available about them.
term-terminal
Terminal - also called command-line interface (CLI), is a means of interacting with a conputer by using a text-based interface where one can enter commands to a computer program.
term-trendtopics
Trending topics - topics that are popular at particular times.
term-unicode
Unicode - the standard for encoding scripts into machine-readable characters.
term-uniqueid
Unique identifiers - (in this case) unique codes that are created and tagged with every ad, so that they can be retraced if needed.
term-vps
Virtual Private Server (VPS) - a virtual machine, rented out as a service, by an Internet hosting company.
term-virtualsoft
Virtualisation Software - software that allows you to run a different Operating System than the one installed on your computer without needing to add another Operating System on your device.
term-wget
Wget - a tool by the GNU project to get files from webservers. The tool allows for the transport of files through more than one protocol.