Extracting Information From Social Apps: A case of exposed financial data

By Hang Do Thi Duc


In Short: A real-world example of an investigation into the data collected by an app. Presented in the researcher’s own words, the case combines data collection, design, storytelling and advocacy. It shows how you can research important issues related to social apps, what you can discover using open-source tools and how you can make the findings public. It’s also a great illustration of the investigative mindset involved in such projects.

What’s going on behind the screen?

“If you’re not paying for it, then you are the product.” This phrase is used quite often, and you might have heard it before. It refers to the fact that instead of paying for online services with money, we give up our personal data, which can help feed the profits of the company behind an app or platform. I think it’s fair to say that we are rarely conscious of the kind of data we share with the apps we use, and how much it is actually worth.

Often, online platforms that we use on a daily basis make it more difficult than it should be to protect our most personal information. Many of these platforms share data (publicly) by default.

As an investigator or just a curious user of online platforms, you can access many tools to help you find out what is going on behind the screen of the apps people use everyday. What can be inferred from the information we leave behind when using apps and social networks, and what can be done with it?

Some platforms give you a glimpse into the personal data that they have collected and stored.


Facebook for instance, lets you, as a user, download an archive that contains the information you have shared both with and on the platform. You can find such details by signing into your Facebook account, going to your “Account Settings”, selecting “Your Facebook Information” and choosing one of the following options: “Access Your Information” / “Download Your Information” / “Activity Log”. Mind that Facebook tends to change its user interface every now and then, so the locations of these functions could change, as could the titles that describe them. If you “Access” or “Download” your Facebook information, you will find a section called “Security and Login Information” that includes details about the devices you used to access Facebook, as well as their IP addresses – the numbers that identify those devices on the Internet.

An IP address provides valuable information because it can be used to approximate your location. In the field of targeted advertising, for example, a company can pay Facebook to serve ads to residents of a region in which it operates or to which it is expanding.

Investigating social platforms – which includes developing an understanding of users’ rights, their options and the platforms’ “terms of use” – and sharing your findings can help you and the people around you become more conscious about the apps and services you use and how much information the companies that run them are able to collect and share about you. While the data traces that users leave behind can be beneficial to investigators looking to find out more about the subject of their research, platforms should be responsible for making sure that users give informed consent on how their data is being made available and to whom.

Public by Default

In May 2015, I registered on Venmo, a mobile payments application owned by PayPal and operating in the United States. About seven million active monthly users send payments through this platform. What makes the app “social” is that other users can react, comment and have long conversations about their financial transactions. Everyone in my network of New York City students was using it.

The platform made it incredibly easy to share payments for dinner and lunches, pay your rent, lend money, or borrow from friends. A couple of months later, however, I realised that all of my transactions had been displayed in a public feed, where anyone in the world could see to whom I sent money and why. Thankfully, the dollar amount was not shown. I immediately made all my transactions private by default.

But that initial “public by default” setting remained stuck in the back of my mind.

Video recording of Venmo’s public feed. Capture by Hang Do Thi Duc

A few years later, I started to examine the platform in more depth. In 2018, many people were still sharing their Venmo transactions publicly. Their real names, profile links (providing access to their past transactions), possibly their Facebook IDs (if they connected their Venmo account with Facebook, as many users do when signing up) and conversations could easily create a map of their real-world social network and habits.

I thought to myself, “This is wrong. Either I need to convince people to change their behaviour or I need to change the system itself.”

My project PublicByDefault.fyi aims to show how much personal data Venmo users – from drug dealers to feuding couples to food cart sellers – share with the world, and to highlight the consequences of not knowing or caring what happens to your personal data and who can observe your online behaviour.

https://cdn.ttc.io/i/fit/800/0/sm/0/plain/kit.exposingtheinvisible.org/Venmo_publicbydefault.jpeg PublicByDefault.fy webpage

Surprisingly, even shockingly, Venmo provides an easy way for anyone to access its users’ data. Venmo’s public application programming interface (API), the connection through which you retrieve data, is essentially just a public URL, or web address. All you need in order to access it is a browser; no registration is necessary and no usage limits are applied. One could technically access and download as much user information as they wanted.

I could change the URL to let me see public transactions all the way from to 2016 to the present day. So, I used the public API to download all public transactions from 2017, saving a total of 207,984,218 transactions. Just by looking through users and their exchanges, I learned an alarming amount about them.

What seems private might be public

As with any investigation, it all starts by checking previous work and information related to the matter in question. What has been discovered to date? What data is already out there? And, most importantly, what is still left unanswered?

In Venmo’s case, as the app’s feed has long been public by default, I found a few creative projects that had previously brought attention to the issue.

A blog post by Dan Gorelick from October 2016 provided a useful starting point for my investigation. By looking at the “Network” tab of the browser developer console, Dan figured out that venmo.com loads the data for its public feed from one simple URL: venmo.com/api/v5/public. This is Venmo’s public API.


An API, which stands for ‘application programming interface’, is a mechanism by which an online platform makes its data accessible to external developers. APIs let you access data in a number of different formats; two common examples are XML and JSON. Facebook, Google and Twitter, among many other platforms, have their own APIs. Even the US National Aeronautics and Space Administration (NASA) has an API that gives developers access to its data, including satellite imagery.

To prevent abuse of its API, a platform will often have an authentication process that requires its users to grant permission before a third party app or developer can access their information. On Facebook, for instance, a user would have to give the third-party app/developer permission to access her/his information first, then the app/developer receives an access token/key with which they can access the API.

For NASA’s API, since the data it makes available is not personal, apps only need one API key, which they can obtain through registration. Still, even a simple authentication process like this allows NASA to limit the number of data requests per key, which is best practice for platforms that need to keep their web traffic under control.

In Venmo’s case, having a public API meant it required no authentication at all for third-party apps and developers: no permission from its users, no API key and no registration. That means Venmo allowed anybody to access, view and save the public data of its users. It is worth noting that even some public APIs limit the amount of data that a third party can access within a given time frame – to keep web traffic under control, as mentioned above. This is called a rate limit, and was apparently the only type of restriction that Venmo placed on its API.

How did I use the public API?

My strategy to draw attention to the flaws and potential dangers of Venmo’s system was to exploit, and even abuse, that system.

When I started my data collection at the end of 2017, I was able to access Venmo’s public transactions by visiting this URL: venmo.com/api/v5/public. Every time I loaded that URL, I effectively sent a data request to Venmo’s public API. (Note that even at the time of our publication in early 2019, the public API still shows transaction data and messages of real users, as they happen).

I learned from Dan’s blog post how you can add parameters to the URL to request more than 20 data records (20 was a default one could get per request, if no “limit” parameter was specified, as shown below), and even specify the timeframe for the returned transactions.

I aimed to collect all public transactions from 2017. A whole year seemed like it would give me enough data to assemble the intimate user profiles and stories I would need to demonstrate the seriousness of the system’s design and privacy flaws.

My first request was to receive one minute’s worth of transactions, and the request looked like this:


This command requested up to 2,000 public transactions between 1 January 2017 at 00h:00m:00s and 1 January 2017 at 00h:01m:00s – totaling a minute.

In translation, to formulate the above command in a way that would generate the data I needed, I converted the human-readable times (e.g. 1 January 2017 at 00h:00m:00s / midnight) to something called an ‘epoch timestamp’. This is also known as ‘UNIX epoch time’ and is a system for describing a point in time, in seconds elapsed since 00:00:00 Thursday, 1 January 1970 (a random start time that was chosen to provide a starting point for all UNIX epoch time conversions).

I did this conversion from human-readable time to epoch timestamp by using a tool called epoch converter. For added safety and anonymity while researching, this tool can be used via the Tor Browser and it requires no CAPTCHA.

With this conversion, I made one request for each minute of the year 2017. That is, 365 days x 24 hours x 60 minutes, which equals 525,600 minutes, which means 525,600 API requests.

I had no reference for this API other than that blog post from 2016; nothing was documented on Venmo’s website. To be sure that I really scooped up all public transactions for each minute of the year, I implemented a one-second time overlap for each request, using Greenwich Mean Time (GMT) timestamps such as:

00h:00m:00s >>> 00h:01m:00s,

followed by

00h:00m:59s >>> 00h:02m:00s.

Initially, I wrote a script using Node.js (a program to make API requests and send resulted data to a database) to save the data locally on my computer.

Soon, however, I realised that I needed a stable Internet connection so I could make continuous requests. I had 525,600 API requests to make after all.

As I was travelling, my laptop was often offline, so I purchased a Virtual Private Server (VPS) for 26 Euro per month. I treated it as an always-on remote computer, but after a few days the server’s requests were denied, probably due to the aforementioned rate limit – the restriction on the amount of data you can access from a URL in a given timeframe.

The solution: get another server to share the request load. I ended up with three more servers – all smaller, less powerful and cheaper than the first. After about two weeks, I had collected all the public transactions from 2017 – over 200 million exchanges in all.

I did encounter some roadblocks; after all, I am neither a professional developer nor a data scientist.

The fact that I was able to get so much data in the first place made me wonder who else must have tapped into Venmo’s API over the years. Just think how valuable this information could be for data brokers, health insurance companies or financial entities.

When creating my virtual servers, I chose Ubuntu (a free and open source Linux distribution) as their operating system because I had worked with it before. In addition, Ubuntu has a big online community so I was confident that, if I ran into problems, I would be able to find solutions on stackoverflow, a website where developers ask and answer questions. Stackoverflow also works on the Tor Browser and has no CAPTCHAS.

Then, I installed MongoDB on each server. MongoDB is a database service that can store data using arbitrary structures rather than traditional rows and columns.

Each piece of data that I saved was a set of linked items, where each item contained a ‘key’ (a unique identifier for a piece of data) and a ‘value’ (the actual data associated with that key). These are called ‘key-value pairs’. One example might be: ‘Name: John, Age: 15’. See below for more examples.

This was perfect since the Venmo API returns data in the JSON (JavaScript Object Notation) format, which also consists of key-value pairs. By the time I had finished, I had stored all public transactions from 2017 in my database.

If you look at the data returned by the API, you will see something like this:

“data”: [{object}, {object}, {object}…]

In this example, “data” is a key, and the corresponding value after the colon (“:”) is a list (which is enclosed in [ ]) of objects (which are enclosed in { }). Each object represents a public transaction using a set of key-value pairs, as shown below.

https://cdn.ttc.io/i/fit/800/0/sm/0/plain/kit.exposingtheinvisible.org/Venmo_apiresults.pngWhat is included in a transaction http://01fa.de/temp/publicapidata. Screenshot by Hang Do Thi Duc

So, each public transaction contains the following information:

  • payment ID

  • permalink (permanent link to each transaction, viewable on the Venmo website)

  • username, first name, last name, link to a profile photo and date of account creation for both the ‘actor’ of a payment and the ‘target’ / receiver of a payment

  • date and time at which the transaction was created or updated

  • type of transaction (‘payment’ or ‘charge’, for example)

  • transaction message (a caption chosen by the ‘actor’ of the transaction)

  • likes and comments

These requests yielded 200 million public transactions initially scattered across different servers, which meant I couldn’t analyse them all at once. So I merged the databases onto the first and most powerful server and cancelled the others. The whole process involved a lot of learning by doing. I used Google and DuckDuckGo (a more privacy conscious search engine) for research and asked friends for advice.

Uncovering drama in the data


A list of two hundred million transactions may not seem that large, compared to public feeds like those on Twitter or Facebook, but Venmo’s data contains sensitive financial information that can be used to compromise individuals. Even if I had collected fewer transactions, I’m sure I would have been able to find plenty of stories in the dataset.

I could clearly see that numerous Venmo users were sharing the most intimate details about their finances, drug use and romantic lives, among other disclosures, with the world – either because they didn’t know or because they didn’t care.

I was certain that if anything about the system was going to change, I would need to engage as many people as possible. The only way I could imagine doing so, with my skills, was by turning this data into digital media and sharing it far and wide. For the biggest impact, I needed to find “dirt” in this dataset.

On a big-picture level, the project was about finding stories in the data and presenting it so other people could understand what was going on. In doing so, the main challenge was to show the platform’s flaws relating to personal data, without exposing any individual users. I wanted to clearly communicate that the fault lies not with the user but with the platform itself.

So I set out to ask the data some questions:

  • What are the most commonly used words or word sequences, including emojis, in the transaction messages?

  • Who had the most transactions in 2017? What did they spend their money on?

  • Are there people who made multiple transactions of the same nature?

  • Which transactions had the most likes and comments?

  • Are there pairs of people who only sent each other money?

Specifically, I wanted to draw attention to transaction messages that refer to things most people would agree should not be public about an individual: for example, if they spend money on drugs or repay a loan, or always go to the same restaurant on a certain day of the week. I wanted to show the real value of this kind of data, which is connected to the ongoing aggregation of people’s actions. So, I was also looking for patterns of everyday life, and lifestyle, in terms of consumption and location.

Before I could start looking for any human stories in the data, I had to clean the dataset. I removed all duplicates that likely existed because of the one-second overlap I implemented when I made the API requests. Then I had to add a MongoDB index to the ‘message’ key (also referred to as “field”). This helped to speed up further processes involving transaction ‘messages’.

Finally I was ready to start asking questions to the data. To do that, I had to write and send MongoDB queries, which are basically requests for information from the database storing the information.


A query looks like this:

db.collection.find( { message: “🏠💸””} )

This query returns a list of items in the data collection that contain the specified emoji: ‘🏠💸’. When you start typing ‘rent’ in Venmo, it suggests an auto-completion to these emojis, so this query essentially means ‘find me messages that refer to rent payment’.

A number of these queries resulted in an array of users or transactions, which called for some good old-fashioned journalistic work: going through records one-by-one until you find something interesting.

There were, for instance, about 350 users who had 1,000 or more transactions. Most of them were small businesses that accept payments through Venmo; some were just really active individuals. Sometimes it was boring going through all their spending and social interactions, but I found some potentially scandalous stories and a lot of personal drama.

For example, there was a cannabis retailer whose profile and transactions (and those of his customers) helped disclose the area in which he operates. There was a tormented couple whose transactions illustrated the entire saga of their relationship from love to hatred. There was a woman with very unhealthy eating habits (based on the fast food and fizzy drinks she bought on a daily basis), whose profile would signal a red flag to a health insurance company. (That might sound extreme, but insurance companies are known to scan social media to help create accurate profiles of current and potential clients, as well as to make solid risk assessments. Just check out a few articles like these from Huffington Post or NextAdvisor.

The 2017 data exposed plenty of stories like these. More are depicted (anonymously) on the PublicByDefault website: publicbydefault.fyi.

Among the software I used to make these data queries were Robomongo and Studio 3T, which had good online documentation, tutorials and examples. These tools also allowed me to export the data I eventually needed for my stories website. The data formats I encountered were JSON and CSV (comma separated values). To create the website, I used HTML, CSS and JavaScript, including libraries like jQuery, PixieJS, GSAP, D3.js, Lodash and Moment.js.

Making an impact

I published PublicByDefault.fyi in July 2018. The project received attention from various press outlets such as The Guardian ArsTechnica and others, which you can see on PublicByDefault.

Around the same time, someone created a bot on Twitter to make public transactions even more public by tweeting them. A couple of days later, Venmo released a series of pop-up screens reminding users that their transactions are public by default.



https://cdn.ttc.io/i/fit/800/0/sm/0/plain/kit.exposingtheinvisible.org/Venmo_privacynote2.png Examples of Venmo privacy pop-up message.

The platform also significantly (but quietly) reduced the rate at which data could be retrieved from its public API, so bulk access – getting hundreds of public transactions every two seconds, like I did – is no longer possible. They also disabled the date parameters (“since” and “until”), so you can no longer request past transactions. This makes it impossible to create a database with the same amount of data that I collected.

In my opinion, this is only a small step toward better security and respect for users. As of early 2019, Venmo’s default setting is still public, and the public API still exists. To me, these changes do not demonstrate the company’s commitment to fully protecting the data and privacy of its users. It still remains possible to create a database of public transactions, and the rate limit can be circumvented by deploying a series of servers with different IP addresses.

On 23 August 2018, a Bloomberg article reported: “In recent weeks, executives at PayPal Holdings Inc., the parent company of Venmo, were weighing whether to remove the option to post and view public transactions, said a person familiar with the deliberations. It’s unclear if those discussions are still ongoing.”

The following month, in September 2018, Mozilla delivered a petition with over 25,000 signatures to push Venmo to change its privacy default settings.

In early 2019, however, a computer science student was able to replicate the process of downloading Venmo’s public transactions. As detailed in this Techcrunch article, Dan Salmon scraped seven million transactions across six month to show that users’ public activity can still be easily obtained from the platform. Although Venmo has made mass downloading of the data more difficult lately, the “public by default” feature is still available and there is little effort to increase users’ privacy (or at least their privacy awareness).

I should note that Venmo is not an exception. There are many other platforms where users’ data is made public or shared with third parties by default. In this particular case, you might assume that, when it comes to a service dealing with private payments, privacy by design would be given a high priority. Unfortunately, it is not the case.

The target of my investigation was the company that was making private data available. But the availability of such data, which is essentially a breach of users’ privacy, also represents an opportunity for researchers to extract information for other types of investigations. For example, if you are investigating bribes, corruption or money laundering, going through this data may uncover otherwise hidden links between individuals and/or companies trying to move money for illicit purposes.

The ethical dilemmas of exposing privacy failures

As a design technologist, I’ve always been interested in data and privacy – specifically, how social media exposes so much of our personal life to so many people, often without us fully understanding how. In 2016 I built Data Selfie https://dataselfie.it, a browser extension that showed what Facebook might know about you through various data profiling algorithms. The extension, which I no longer maintain, allowed any user to investigate their own data, history, communication and relationships stored in Facebook’s archive.

When dealing with a sensitive topic like people’s privacy, there is a huge ethical dilemma for an investigator looking to use data and evidence to prove a point. You do need to see and process real data after all, otherwise there is no way to build a strong narrative and gain others’ trust.

Moreover, with this project, my goal was to urge people to reconsider their perspective, and to encourage them to make conscious decisions by asking themselves: ‘Am I okay with a platform knowing all that about me? Am I comfortable with everyone knowing that I like to spend money on fast food, that I pay my rent late sometimes or that I can be found at the same place and time every Wednesday? Are other people fine with sharing that information about themselves? Do they even know what is going on?’

Throughout this whole process I was aware of a contradiction: in trying to raise privacy awareness by bringing more attention to transactions that shouldn’t be public, I was making them even more public. With that in mind, my top priority for the final output was always to withhold real names, usernames and profile pictures. This approach also helped keep readers focused on the story and not on the exposed individuals.

From the very beginning of this project, I struggled with the question of what to do with the data after finishing my analysis and publication. It’s something I urge every investigator to think carefully about. For now, the dataset is stored locally and I have not shared it with anyone.

Digital security when working with other people’s data

When you are holding on to sensitive information that, in some sense, belongs to other people, it is your responsibility to protect it. Keeping it on a local device (computer, external hard-drive, USB etc.) – rather than on a remote server or in ‘cloud’ storage – is one important step toward fulfilling this responsibility. However, it is also important to consider the possibility that your computer, phone or storage device might be lost, stolen or confiscated. Data encryption is one way to avoid these risks.

To protect “your” data, you should ask yourself the following three questions:

  1. What tool will you use to encrypt data on your computer or phone?

  2. What tool will you use to encrypt data on an external storage device?

  3. What password will you use to prevent others from decrypting the data?

Below is some advice regarding all three.

1. Encrypting data on your device

Each of the major operating systems provides a way to activate full disk encryption which will protect all of the data on your device when it is turned off.

  • On a Mac, you can enable FileVault, which is probably the simplest of the three full disk encryption mechanisms mentioned here. Pretty much everyone with a Mac should be using FileVault.

  • On Windows, you can turn on BitLocker, though it is only available on the more expensive versions of the operating system. For this reason alone, you should consider opting for Windows ‘Pro’ the next time you buy a PC.

  • On Linux, you will hopefully have configured LUKS (Linux Unified Key Setup) when you installed the operating system. It is possible to do so after the fact (here is an example for Ubuntu Linux) but it is probably easier just to back up your data and reinstall Linux.

  • On iPhones, encryption is called ‘data protection’ and is turned on by default.

  • On Android phones, it is called ‘Encrypt device’, and you might need to activate it yourself, so it is worth taking a look at your settings.

Finally, VeraCrypt is a free and open-source encryption tool that works on Macs, Windows and Linux computers. VeraCrypt is typically used to encrypt a specific folder of data, rather than everything on your computer, but it might be your only option if you work across multiple operating systems. Have a look at Security-in-a-Box to learn more about how to use VeraCrypt on a Mac, Windows or Linux computer.

2. Encrypting data on an external storage device

Each operating system provides a way to reformat a USB stick or external hard drive so that it will encrypt all of the data subsequently stored on it. Unfortunately, devices that you encrypt this way are only accessible from computers that run the same operating system. So a USB you format on a MAC will not work on a Linux computer (yet another reason to become familiar with VeraCrypt!).

  • On a Mac, you can do this formatting with the Disk Utility program.

  • On compatible versions of Windows, you can do so using BitLocker.

  • On Linux, you can use the ‘Disks’ tool.

3. Creating and maintaining strong passwords

Your encryption is only as strong as the password you choose when you set it up. A strong password should have several characteristics:

  • It should be long enough that it cannot be ‘brute forced’ by several fast computers working in tandem. If your password is completely random, and contains symbols, lower-case letters, capital letters and numbers, then 16 characters should be plenty. If it includes easy to guess elements, like words or personal information, then it needs to be far longer. (See below regarding ‘diceware’ passwords.)

  • It should not be in any dictionary. Unfortunately, this also refers to specialised ‘password cracking dictionaries’ that contain famous quotes, song lyrics, alt3rn@t1ve sp3ll!ngs of w0rds and, in some cases, personalised phrases that relate to the target, such as birth dates, pet names, particular languages and company mottos.

  • You should never reuse the same password for multiple devices or accounts. Seriously, though. To see why, take a look at the website HaveIBeenPwned.

One popular technique, which satisfies all of these requirements while still producing relatively memorable passwords, is called “diceware.” To create a diceware password, just select seven words at random and string them together. Depending on the requirements of the software or service for which you are creating the password, examples might include:

  • cake brute tragedy outmost frostlike playroom toaster,

  • QuaintlyFreshResilientSnowstormReworkAbnormalBuilding, or

  • Myst!fyFr0stlikeDisorderChessReversePortalGab.

Finally, get yourself an encrypted password manager like KeePassXC. Password managers allow you to memorise far fewer passwords without reusing them for multiple services. You’ll still want to memorise your full disk encryption password, of course, so you can access your computer to launch KeePassXC.

Published April 2019


Articles and Guides

  • PublicByDefault.fyi. The website created by Hang Do Thi Duc with stories from her Venmo data exposure research.

  • Scraping Venmo, by developer Dan Gorelick. An article detailing how Gorelick figured out the process of extracting Venmo users’ data due to the vulnerabilities of the app.

  • How to install MongoDB on Ubuntu 16.04, from HowtoForge.com. A tutorial.

Tools and Databases

  • D3.js. A JavaScript library for producing dynamic, interactive data visualizations in web browsers.

  • Data Selfie. An application and experiment by Hang Do Thi Duc that aims to provide a personal perspective on data mining, predictive analytics and people’s online data identity.

  • Epoch Converter. A tool to convert time and date from human readable format into an epoch timestamp and back.

  • GSAP. A JavaScript animation library, good for complex, highly customisable animations with high performance.

  • jQuery. A JavaScript library for HTML document manipulation, animation and event handling.

  • Lodash. A JavaScript utility library, transforming and filtering data.

  • Moment.js. A tool used to parse, validate, manipulate and display dates and times in JavaScript.

  • MongoDB. A free and open source document database program.

  • Node.js. A tool allowing you to use JavaScript outside of the browser.

  • PixieJS. A JavaScript library for simulations or animating a lot of elements quickly.

  • Stackoverflow. A website where developers ask and answer questions.

  • Studio 3T. A platform providing tools for those working with MongoDB.



API – stands for application programming interface, by which a platform can make its data accessible to external developers for free or under some conditions or fees.


Bot – also called web robot or internet bot, is a software application that runs automated tasks over the internet. For example, a Twitter bot that posts automated messages and news feeds.


Browser extension – also called add-ons, they are small pieces of software used to extend the functionalities of a web browser. These can be anything from extensions that allow you to take screenshots of webpages you visit to the ones checking and correcting your spelling or blocking unwanted ads from websites.


Brute force – a password-cracking technique that involves trying every possible combination.


CAPTCHA – an automated test used by websites and online services to determine whether a user is human or robot. For example, a test asking users to identify all traffic lights in a series of nine pictures.


Cloud storage – a data-storage model whereby information is kept on remote servers that users can access via the internet


Database – a system used to store and organise collections of data with a particular focus or purpose. For example, a database of land and property ownership in country Z.


Data broker – a company or person collecting and processing people’s data to be used as an asset for commercial or political purposes. Data is collected from collating database records, polling and social networks, among others.


Dataset – a collection of data that share some common attributes and which is usually organised in rows and columns (tables) for easier processing. For example, a dataset of the foreign owners of land and properties in country Z.


Developer console – a space in an app or platform where developers can gain access to its API, tools and data to test for bugs, help develop the existing code or use the data to create new applications of that code.


Full-disk encryption (FDE) – is encryption that happens at a device or hardware level. For example, encrypting an entire computer’s disk would also automatically encrypt all the data saved on it.


Encryption - A way of using clever mathematics to encode a message or information so that it can only be decoded and read by someone who has a particular password or an encryption key.


Internet Protocol (IP) address – a set of numbers used to identify a computer or data location you are connecting to. Example:


JSON – stands for JavaScript Object Notation, a data-interchange format.


Node.js – allows you to use JavaScript syntax outside of your browser, for instance to make API requests and interact with a database.


Public API (open API) – an API that is published on the internet and is freely accessible for developers to work with.


Public (web) feed – an online data providing service that gives updated information on a regular basis to its users or the general public. It can be set up via subscription to the feed of a website/media or it can be publicly available to everyone.


Rate limit – is used to limits incoming and outgoing traffic from networks and websites.


Script – a list of commands that are executed by a certain program to automate processes, e.g. visit a URL every two seconds and save the data that is returned.


Social network – refers to websites that focus on user-generated content and interactions such as Facebook, Twitter, Instagram, etc.


Targeted advertising – a form of advertising that aims to reach only certain selected groups or individuals with particular characteristics or from specific geographic areas.


Third party – a person or entity that is not directly part of a contract or relationship but may have a function related to it nevertheless.


Tor Browser – a browser that keeps your online activities private. It disguises your identity and protects your web traffic from many forms of internet surveillance. It can also be used to bypass internet filters.


Universal Resource Locator (URL) – a web address used to retrieve a page or data on a network or internet.


Userbase – a list of users associated with a particular platform or system.


Virtual private server (VPS) ­ - a virtual machine, rented out as a service, by an Internet hosting company.


Web domain – a name commonly used to access a website which translates into an IP address.


Web interface – a graphical user interface in the form of a web page that is accessed through the internet browser.


Webpage – a document that is accessible via the internet, displayed in a web browser.


Web server – also knows as internet server, is a system that hosts websites and delivers their content and services to end users over the internet. It includes hardware (physical server machines that store the information) and software that facilitates users’ access to the content.