How to See What’s Behind a Website
By Brad Murray, Wael Eskandar
In Short: A practical overview of tools and techniques to investigate the ownership of websites and uncover hidden information online, as well as essential tips on how to do it securely.
On the surface, websites look like they’re designed to make information available to the public. However, there is plenty of valuable information hiding behind what you are able to see in your web browser.
Sometimes it is important to research hidden data: to identify the individuals or companies that own a domain name or maintain a website, to determine where that site was registered or to dig up content that it once contained but that has since been removed.
Doing so is not always straightforward. For example, people who do not want to be associated with a website’s content, or with the affiliated business, sometimes try to hide their connection to the site by using intermediaries when they register its domain name.
A website investigator is sometimes like a mechanic. Just as a mechanic might need to poke around inside a car’s engine to diagnose a problem, an investigator might need to look into the inner workings of a website to find out who and what is behind it.
Finding hidden content and connections is not an exact science, but a combination of acquired skills, a set of tools and a dose of perseverance. We’ll explore some useful tools and methods, which can help a determined investigator to unearth clues buried within a website – from registration details and metadata to source code and server configurations.
A website and its elements
To investigate a website effectively, you will need to know what goes into one. This includes elements that are immediately apparent to visitors and others that lurk beneath the surface.
Website and webpage
A website is made up of webpages that display information. That information might include the profile of a company, a list of social media posts, a description of a product, a collection of photographs, a database of legal information or just about anything else.
These webpages can typically be viewed by anyone with internet access and a web browser. Considered from another perspective, however, a webpage is really just a digital file that is stored on a disk that is attached to a computer that is plugged into power and connected to a network cable somewhere in the physical world. It is sometimes helpful to keep this in mind when investigating a website.
To visit a website, your device needs to know the Internet Protocol address, or IP address, of the computer that hosts it. Hosting a website means making it available to the world; the computers responsible for doing so are often called servers.
An IP address is typically written as a series of four numbers, separated by periods, each of which ranges from 0 to 255.
For example: 184.108.40.206 is the IP address of one of the servers that hosts the “google.com” website, at which visitors can access Google’s search engine.
At any given time, each device that is directly connected to the internet - be it a webserver, an email service or a home WiFi router - is identified by a particular IP address. This allows other devices to find it, to request access to whatever it is hosting and, in some cases, to send it content like search terms, passwords or email messages.
Many devices, including most mobile phones, laptops and desktop computers, connect to the internet indirectly. They can reach out to websites and other services - and they can receive replies - but most other devices cannot reach out to them. In a sense, they are not listening for connections. Many of these devices have what are called “internal IP addresses.” This means that devices on the same local network can connect to them directly, but others cannot. If you lookup the IP address of your phone or laptop, you will likely find an internal IP address, but you will rarely find one associated with a website.
Like most long numbers, IP addresses are difficult to remember, so we tend to use domain names instead. Each domain name points to one or more IP addresses. In the example above, the domain name “google.com” points to 220.127.116.11 and is far easier for most people to remember.
Domain registrar, domain registrants & domain registration
Domain names are unique. There can only be one “google.com,” for example. The process of purchasing a domain name is called domain registration.
This process ensures that domain names remain unique and makes it more difficult for someone to impersonate a website they do not control. When someone registers a domain name, a record is created to keep track of that domain’s official owner and administrator (or their representatives).
A person who registers a domain is called a domain registrant. That registrant - or someone to whom they give access - can then point their domain to a particular IP address. If a webserver is listening at that IP address, a website is born.
The companies that handle the registration process are called domain registrars, and they almost always charge a fee for their services. Example registrars include GoDaddy.com, Domain.com and Bluehost.com, among many others. These companies are required to keep track of certain information about each of their registrants.
A non-profit organisation called the Internet Corporation for Assigned Names and Numbers (ICANN) governs the domain registration process for every website in the world.
We know that a website has a domain name and that a domain name is translated into an IP address. We also know that every website is actually stored on a computer somewhere in the physical world. The computer that hosts the website is called a web host.
There is an entire industry of companies that store and serve websites. They are called web hosting companies. They have buildings filled with computers that store websites, and they can be located anywhere in the world. While it is most common for websites to be hosted in “data centres” like these, they can actually be hosted from almost any device with an internet connection.
There are many ways to describe using and researching on the internet. Many of these descriptions involve “traveling” somewhere, for example “surfing” the internet or “going to” a website.
The fact is, a better description would be opening a door or dialing a phone number. When you dial a phone number, the person on the other end can see your phone number. When you visit a website’s IP address, the website can see your IP address. When you open a door to look out, someone on the other side can look in. It is important to understand that when you visit a website you are sending hidden information about yourself to that website.
That information includes what kind of device or computer you have (iPhone 6, Samsung Galaxy, MacBook etc.), which operating system you are running (Windows, MacOS, Linux), and even what fonts you have installed.
All of this information can be used to figure out who you are, where you are, and even what other websites you have been on.
There are tools you can use to see some of the data you are sharing with the websites you visit. Using your current web browser, visit the online tools below to see what information you might be leaking to the websites you visit and the companies that own them.
Browser Leaks – displays a list of web-browser security-testing tools that tell you what personal data you may be leaking to others, without your knowledge or permission, when you surf the Internet. This site also works on Tor Browser.
Be sure to check for leaks related to the Web Real-Time Communication (WebRTC) protocol – a technology that supports video and audio chat – and for DNS leaks – which allow third parties like your internet service provider (ISP) to see what websites you visit and what apps you use. The sites above also indicate whether or not your real IP address is visible to the websites you visit.
Having seen some of your weaknesses and formulated some concerns about how your online research might expose your information or threaten your safety, you can now take the next step. In the final section - How to stay safe when investigating websites - we go through a few tools and techniques you can use to protect yourself and your data when investigating online.
Basic WHOIS Look-up
When researching a website, one of the most useful sources of data can be found in its domain registration details.
Over the course of your investigation, it might be relevant to know who – whether it is an organisation or an individual – owns a particular domain, when it was registered and by which registrar, as well as other details. In many cases, this information can be accessed through third-party services that are detailed below.
Yet, as mentioned earlier, sometimes the owner of a domain would not want to appear as linked to the site. Whatever the reason - be it not wanting to be associated with the site’s content or just wishing to maintain a degree of privacy - it’s worth noting that domains can be registered through proxy or intermediary organisations that conceal the full details of the registration.
The information collected from domain registrants is called WHOIS data, and it includes contact details for the technical staff assigned to manage the site, as well as contact details of the actual site owner or their proxy.
This data has long been publicly available on sites like ICANN’s WHOIS Lookup. However, there are currently other free or partially-free services (some have fees for advanced searches and extended results) that also aggregate WHOIS information and which often provide more details and more accurate and up-to-date information than ICANN.
Note that if you are making many requests for information in a short period of time, on most of these sites you may receive an error and need to wait or switch to a different service to continue your searches. Similarly, many of these sites require you to complete CAPTCHAs (selecting various items from images) to make sure you are not a robot.
These are some of the sites providing useful WHOIS data for free:
https://iana.org/whois – works via Tor Browser and doesn’t have CAPTCHA
https://who.is – works via Tor Browser and doesn’t have CAPTCHA
https://www.whois.com/whois/ – works via Tor Browser and has CAPTCHA
https://godaddy.com/whois – works via Tor Browser and has CAPTCHA
https://whois.domaintools.com (limited free search) – works via Tor Browser and doesn’t have CAPTCHA
As mentioned above, many registrars offer the ability to act as proxy contacts on the domain registration forms, a service known as “WHOIS privacy”. In such cases, domains registered with WHOIS privacy will not list the actual names, phone numbers, postal and email addresses of the true registrant and owner of the site, but rather the details of the proxy service. While this can frustrate some WHOIS queries, the lookup tool is nonetheless a powerful resource for investigating a domain.
As different search engines return different results for the same query depending on their indexes and algorithms, it may be that searching with different WHOIS query services returns varying amounts of detail about your domain of interest. Checking with multiple sources whenever possible is therefore a good way to make sure you collect as much information as possible, as is standard in any part of an investigation.
To illustrate this, let’s look at what a search for “usps.com” (the website of the United States Postal Service) on several WHOIS services leads to.
A query for WHOIS data for “usps.com” using the ICANN WHOIS Lookup returns:
ICANN WHOIS data for “usps.com” on 19 February 2019
The information we get about the registrant is limited – we can only see the domain’s creation and expiry dates – and the registrar’s details appear in place of those of the registrant.
To show how the information returned from these services may differ, a search for “usps.com” on https://who.is/ returns more information about the Postal Service, including an address, email contact, and phone number.
Who.is WHOIS data for “usps.com” on 19 February 2019
In addition to the WHOIS search tools above, IntelTechniques – the website of Michael Bazzel, an open source intelligence consultant – provides an aggregated list of domain search tools that allow you to compare search results from several sources of WHOIS data. Just check the Domain Name search menu on the left-hand side. Also note that IntelTechniques has a rich offering of other tools you can use in your investigations, such as image metadata search and social media search tools.
The European Union’s (EU) General Data Protection Regulation (GDPR) has led to a lot of uncertainty for the status of public WHOIS registries in the EU because in theory, WHOIS data of owners and administrators of EU-registered domains should not be collected and published by registrars. Under the GDPR, it is considered to be private information.
However, ICANN has sued several European registrars for deviating from its interpretation of the GDPR, which has a more relaxed approach to the regulation and permits limited access to WHOIS data. Even after GDPR’s implementation, ICANN continued to demand EU registrars to at least collect data about site owners and administrators, if not to make it publicly available. ICANN’s interpretation has been repeatedly rejected by the courts, but their insistence that their policy for EU registrants is GDPR compliant leaves a lot of questions unanswered. Most likely, collection of and access to WHOIS data for EU-based registrants will be restricted.
Even in these conditions, some researchers are finding ways to work around the restrictions that make some registrants’ data inaccessible at times. This post by GigaLaw - a US law firm specialised in domain name disputes - provides some tips and techniques that can prove successful at times.
Historic data can be a useful tool when investigating websites, because it can track the transfer of a domain’s ownership. It can also help identify owners of websites who have not consistently chosen to obscure their registration data using a WHOIS privacy service.
One example where this historic data proved useful was the investigation of a cybercrime gang known as Carbanak, who were believed to have stolen over a billion dollars from banks. Using the historical data provided by DomainTools, a researcher was able to link multiple sites together by going through their historical records and finding hundreds of domains that were initially registered with the same phone number and Yahoo email address. These contact details were later used to establish a link between Carbanak and a Russian security company.
For your own investigations, several companies offer access to historic WHOIS records, though these records may often be restricted to non-EU countries due to the GDPR, as mentioned above.
It is perhaps the best-known of these companies that offer historic hosting and WHOIS data. Unfortunately, this data is not free and DomainTools requires you to register for a membership in order to access it.
An alternative to Domain Tools that also provides historical WHOIS data. It requires you to create an account for both basic free, as well as advanced fee-based services. There is a limit to the number of free basic searches per day and this option only provides you with the latest historical data archive of a website (not full history). The full historical archives require payment and there are several annual fee rates depending on the number of searches and other features the service provides. Whoisology doesn’t work via the Tor Browser, and it may also use CAPTCHAs to verify that you are a real person searching for information.
If you decide to set up an account with these services, it may be a good idea to create a new email address that you can use for this purpose only. This way you avoid sharing your regular contact data and other personal details.
Reverse WHOIS Look-up
Reverse phone directories, which allowed you to look up a phone number to determine who it belonged to, used to be a staple of investigative work for years. These directories contained the same information as a phone book, but they organised it differently: entries were sorted by phone numbers rather than by names. This allowed investigators to cross-reference phone numbers back to the names of the people to whom those numbers belonged. While printed reverse directories have long since been replaced by online databases (such as White Pages Reverse Phone), the need to cross-reference information has expanded into many other applications.
Investigators often need to look up residents by home address, to get names from email addresses or find businesses by officer or incorporation agent (a person or business that carries out company formation services on behalf of real owners). Reverse directories should be part of any investigator’s toolkit. The notion of tracing little pieces of information back to their sources is central to the investigative mindset.
When you look up the domain names registered to a certain email address, phone number or name, it is called a “reverse WHOIS lookup”. Several sites offer these kinds of searches.
To identify the owner of a domain – especially when that owner has taken some steps to obscure their identity – you will need to locate all the information about the website that can be reverse searched. The tools available to cross-reference information from a website will change, and the information available will vary for each site, but the general principle is consistent. When trying to locate the owner of a domain name, focus on locating information that can help you “reverse” back to an ultimate owner.
Here are some tools you can use for reverse searches:
It is free and allows searches by email or phone number. ViewDNSinfo also provides other useful options such as searching by an individual or company, historical IP address search (historical list of IP addresses a given domain name has been hosted on as well as where that IP address is geographically located) etc. Note that IP address owners are sometimes marked as ‘unknown’ so it helps to use several websites for your searches and combine the results for a fuller picture. It works via Tor Browser and doesn’t have CAPTCHA.
You can register on Domain Eye to get 10 free searches per day. It works via Tor Browser and doesn’t have CAPTCHA.
A paid service with no free demos available for reverse WHOIS at the moment. It works via Tor Browser and doesn’t have CAPTCHA.
ViewDNSinfo example of reverse WHOIS search based on email address firstname.lastname@example.org (used by the Internet Archive), date searched 11 January 2019
Discovering useful information in a webpage’s source code
A webpage that you see in your browser is a graphical translation of code.
Together, these are referred to as a website’s source code, which includes both content and a set of instructions, written by programmers, that makes sure the content is displayed as intended.
Your browser processes these instructions behind the scenes and produces the combination of text and images you see when accessing a website. With a simple extra step, your browser will let you view the source code of any page you visit.
Give it a try. Open up your browser and take a look at the source code of a website that interests you. You can usually right click on the page and select “View page source”. On most Windows and Linux browsers, you can also press CTRL+U. For Mac instructions and additional tips, check out this guide on how to read source code (also accessible via Tor Browser)
Part of the source code for the White House website https://www.whitehouse.gov, which you can reveal by right-clicking your cursor and selecting “View source code”, looks like this: Example of source code
If you’ve never looked at a site’s source code before, you might be struck by how much of the information that is transmitted to your computer does not appear when you view the page in your browser.
For instance, there may be comments left by whoever wrote the source
code. These comments are only visible when you view the source – they
are never displayed in the rendered page (that is, the page that has
been translated into graphics and text). A comments begin with
<!--, which indicates that what comes next is a comment and should
not be displayed on the page. They end with
-->, which signals the end
of the comment.
Comments are often written in plain language and sometimes provide hints about who maintains a website. They may also include personal notes or reveal information such as a street address or copyright designation.
Finding connections with reverse Google Analytics ID
There are numerous things you can uncover from a page’s source coude, but one good example is code that helps website owners and administrators monitor the traffic that a website is receiving. One of the most popular such services is Google Analytics - https://analytics.google.com.
Sites that are related often share a Google Analytics ID. Because Google Analytics allows multiple websites to be managed by one traffic-monitoring account, you can use their ID numbers to identify domains that may be connected by a shared ownership or administrator.
Sites that use Google Analytics embed an ID number into their source code. All Google Analytics IDs begin with “UA-”, and are followed by an account number. They look a bit like this: “UA-12345678-2”.
To follow on the White House example above, the Google Analytics ID for www.whitehouse.gov is “UA-12099831-10”. You can find this out yourself by following these steps while on the website:
go to the website’s source code by right-clicking and selecting “View source code”, as indicated above,
open a search box with “Ctrl-F” or “Command-F” while you are on the page’s source code,
search for “UA-” by typing it in the search box; you will find the site’s Google Analytics code “UA-12099831-10”.
Whitehouse Analytics code
The number after the first dash (-12099831) is the White House’s Google Analytics account number. The number at the end (10, in this case) indicates how many different websites rely on that same account to track visitors.
Because multiple websites can be managed on one Google Analytics account, you can use Google Analytics ID numbers to identify domains that may be connected by a shared ownership or administrator.
There are several reverse search tools that allow you to locate sites that share a given analytics IDs. Examples includes:
DNSLytics – searchable by domain name, IP address, or Analytics ID. It also works via the Tor Browser.
DomainIQ - searchable by domain name or analytics ID. It doesn’t works via the Tor Browser.
Moonsearch – searchable by Analytics ID, IP address, etc. It doesn’t works via the Tor Browser.
As usual, it’s advisable to search the same Google Analytics ID on several of these websites, as their results tend to vary.
Sometimes one website may copy the source code of another even if they are not actually related. This will lead to misleading results when looking up the Google Analytics ID. Reverse lookups of Google Analytics ID must always be treated as a possible lead and not as hard evidence. This technique can be useful but makes it worth repeating the importance of checking multiple sources before drawing conclusions.
For instance, in the case above, searching for the White House ID (UA-12099831-10) with any of these services will return a list of sites sharing the same Google Analytics ID with the White House website. (Also note that results tend to differ from service to service; some will return more sites others less, so search on more to compile a thorough list of findings.) If you do this exercise, you will notice that several websites that are most likely unrelated to the official White House site also appear on the list. Some are parody sites, others are gaming sites, and so on. Although this looks bizarre at first, the explanation is rather simple – the White House source code has been copied and replicated without deleting the Google Analytics ID. Therefore, not all the listed sites are related in this case. Also worth noting that the unrelated websites are not actually using the Google Analytics ID of the White House and its genuinely related sites, they are merely displaying it.
How can these searches help an investigation?
If a website owner or administrator is obscuring their identity on one site, they may not have taken similar measures on every site they manage or own. Enumerating these sites by reverse searching the Google Analytics IDs can help you locate related websites that may be easier to identify.
In a 2011 article, Wired columnist Andy Baio revealed that out of a sample of 50 anonymous or pseudonymous blogs he researched, 15 percent were sharing their Google Analytics ID with another website. This finding proved fruitful for unmasking anonymous sites. Out of the sample of 50, Baio claimed to have identified seven of the bloggers in 30 minutes of searching. The full story is available here.
Let’s try an exercise and see if the website Our Revolution uses Google Analytics to monitor traffic.
Screenshot of “Ourrevolution.com”
To determine whether “Our Revolution” has a Google Analytics ID we have to view the source code as described above.
Source code of “ourrevolution.com”
We can then use one of the reverse search tools mentioned above to see if other sites are using that same Google Analytics ID. On DNSlytics, for instance, choose Reverse Analytics from the Reverse Tools top navigation menu.
Searching by Google Analytics ID on DNSlytics
In addition to the “Our Revolution” domain where we found the Analytics ID, the search returns another domain name: “Summer for Progress” - https://summerforprogress.com/.
Results of Google Analytics ID search on dnslytics.com
When someone creates a file (such as a document, PDF or spreadsheet) on their computer, the programs they use automatically embed information in that file.
We can consider “data” to be the contents you see in a file: the words in a document, the charts in a PDF, the numbers in a spreadsheet or the elements of a photograph.
On the other hand, the automatically embedded information is called “metadata”.
Examples of metadata might include the size of the file, the date when the file was created, or the date when it was last changed or accessed. Metadata might also include the name of the file’s author or the name of the person who owns the computer used to create it.
There are many types of metadata. Here, we look at how to find and make sense of several examples that are useful for investigations.
With documents, even if metadata doesn’t always identify the author or creator of a file (if they take steps to keep this identity hidden, for example, by deleting metadata such as name or dates), it often still provides clues to their identity or other significant facts about them or the devices and software they used to work on those files.
A similar situation happens when we take photos: the image files our cameras produce often contain a type of metadata called EXIF (Exchangeable image file format). EXIF metadata can reveal information related to when and where the photo was taken: time, date, GPS (Global Position Satellite) location, etc.
Users can manually remove this potentially identifying information, and many apps and websites clear metadata from uploaded files in order to protect their users. In some cases, however, EXIF metadata that remains in the final version of a photograph may end up revealing clues about the identity of the photographer, locations, dates and other information that can help you connect the missing links in your investigation.
For example, American serial killer Dennis Rader was arrested after mailing a disk containing documents from his church to a news organisation. The documents contained metadata that identified their author. Here is an article in The Atlantic showing how it happened.
With this in mind, if you can’t find the owner of a domain name through the means and tools presented above, it can be useful to download all text documents, spreadsheets, PDFs and other files hosted by the site. From there, you can analyse the documents’ metadata and look for an author name or other identifying details. You can do this by checking the properties of the documents after you download them. Keep in mind, however, that documents like these sometimes contain malware that can put you and those with whom you work at risk. To avoid thid, you should not open them with a device that you use for any other purposes (work or personal) or that is connected to the internet.
Safety First! - Opening downloaded files from unknown sources
Some investigators maintain a separate laptop that they use only to open untrusted files. These devices are often called ‘air gapped’ computers because, once they are set up, they are never connected to the internet.
As an alternative, you can restart your computer from a USB stick that contains the Tails operating system when you need to analyse suspicious documents. Even if a document contains malware that affects Tails, any damage it might do will become irrelevant once you reboot back into your normal operating system. And the next time you restart into Tails, you will have a clean system once again. Tails is based on the GNU/Linux operating system, however, so it comes with a bit of a learning curve.
To use either of these techniques, you will need a USB stick or an external hard drive so you can transfer the files in question.
Finally, if you are not worried about associating yourself with the documents or about exposing their contents to Google (or to anyone with the authority to access other people’s Google accounts), you can upload them to Google Drive, and search for metadata using Google Docs. Don’t worry, Google is pretty good at protecting their servers from malware!
Not all documents will contain metadata. It’s not always embedded in the first place, and the creator can easily delete or modify it, as can anyone else with the ability to edit the document. Moreover, not all metadata relates to the original author. Documents change hands and are sometimes created on devices that belong to people other than the author.
Again, any piece of information you find needs to be verified and corroborated from multiple sources. Despite that, metadata could provide you with additional leads or help to confirm other evidence you have already found.
In addition to helping you identify the true owner of a document or website, metadata can also provide clues about employment contracts and other affiliations and connections. For example, a Slate writer analysed the PDFs found on a conservative policy website run by former American media personality Campbell Brown and discovered that all of them were written by staff working for a separate right-leaning policy group. The link between these two groups was not known until the metadata analysis was conducted. The full story is available here.
Let’s look at how this finding can be replicated.
The PDF described in this article was originally found at the following web address on the commonsensecontract.com website: http://commonsensecontract.com/assets/downloads/Rewards_for_Great_Teachers.pdf.
It has since been taken down and, indeed, that domain name now points to a completely different website: http://commonsensecontract.com. You can still find the original one archived on the Internet Archive’s Wayback Machine.
To learn more about the Wayback Machine, see our resource on “Retrieving and Archiving Information From Websites”)
Archived webpage from “commonsensecontract.com”
You can follow the steps below to examine the metadata in question. But first:
We recommend using an online document viewer to avoid exposing yourself to any malware that might be lurking within the online documents you download. (We did not find any malware in this document, nor is it particularly sensitive, but it’s best to plan for the worst.)
If you are using an online document viewer that requires you to sign in, such as Google Docs, we recommend creating a separate account on that service. This will help you avoid associating your investigative activities with your personal online profile. In the example below, we will use a simple online service that does not require an account.
Keep in mind that you are showing this document, and its metadata, to whoever runs the service you use. They, in turn, could share or publish it. If that is not acceptable, you might have to use one of the other techniques mentioned in the “Safety First” sections of this Kit.
To view the metadata in this PDF:
Browse to the Wayback Machine - https://archive.org/web/
Search for the original web address: http://commonsensecontract.com/assets/downloads/Rewards_for_Great_Teachers.pdf
Click on the year 2014
Click on one of the blue dots in the calendar (the one in May or one of the two in September)
Click the download link toward the upper, right-hand corner of the screen
Save the PDF somewhere on your device, but do not open it yet
Browse to the Online PDF Reader (it also works on Tor Browser and does not have CAPTCHA)
Click the “Start Online PDF Read” button
Upload the Rewards_for_Great_Teachers.pdf file
Click the “Rewards_for_Great_Teachers.pdf” Document >Properties link toward the upper, left-hand corner of the screen
Note that the author is listed as Elizabeth Vidyarthi.
Articles and Guides
How to Read Your Website Source Code and Why It’s Important by Neil Patel. A guide with useful tips, techniques and tools for checking out websites’ source code and understanding the information they provide.
Tools and Databases
IntelTechniques by Michael Bazzell. An open source intelligence and digital forensics resource with tools, guides and tips useful for investigating websites and people online.
ICANN Whois, from the Internet Corporation for Assigned Names and Numbers. The official ICANN Whois search tool for websites registered around the world.
Panopticlick, from the Electronic Frontier Foundation. An online tool that analyses how well your browser and add-ons protect you against online tracking techniques.
Algorithm – an established sequence of steps to solve a particular problem.
API – stands for application programming interface, by which a platform can make its data accessible to external developers for free or under some conditions or fees. (not used)
Bandwidth – in computing, the maximum rate of information transfer per unit of time, across a given path.
Bot – also called web robot or internet bot, is a software application that runs automated tasks over the internet. For example, a Twitter bot that posts automated messages and news feeds.
Browser extension – also called add-ons, they are small pieces of software used to extend the functionalities of a web browser. These can be anything from extensions that allow you to take screenshots of webpages you visit to the ones checking and correcting your spelling or blocking unwanted adds from websites.
Brute force - a password cracking technique that involves trying every possible combination.
CAPTCHA – an automated test used by websites and online services to determine whether a user is human or robot. For example, a test asking users to identify all traffic lights in a series of nine pictures.
Cloud storage – a data storage model whereby information is kept on remote servers that users can access via the internet
Content Management System (CMS) - software used to manage content that is later rendered into pages on the internet.
Crawler – also called a spider, is an internet robot that systematically browses the internet, typically for the purpose of Web indexing (Wikipedia)
Database – a system used to store and organize collections of data with a particular focus or purpose. For example, a database of land ownership in country Z.
Dataset – a collection of data sharing some common attributes and that is usually organized in rows and columns for easier processing. For example, a dataset of the foreign owners of land and properties in country Z.
Directory – a container used to categorise files or other containers of files and data.
Domain name - a name that is commonly used to access a website (e.g. tacticaltech.org). Domain names are translated into IP addresses.
Domain Name Service (DNS)- the distributed service that converts domain names into IP addresses like 18.104.22.168
Domain Name System (DNS) – a naming system used by computers to turn domain names into IP addresses in order to connect to websites.
DNS leak – when requests to visit a certain site or domain are exposed to an internet providere despite efforts to conceal them using VPN.
DNS query – the process of asking to translate a domain name into an IP address.
Full-disk encryption(FDE) – encryption that happens at a device or hardware level. For example, encrypting and entire computer’s disk would also automatically encrypt all the data saved on it.
Encryption- A way of using clever mathematics to encode a message or information so that it can only be decoded and read by someone who has a particular password or an encryption key.
Internet Protocol (IP) address – a set of numbers used to identify a computer or data location you are connecting to. Example: 22.214.171.124
Metadata – information about information. E.g.: the content of a sound file is the recording, but the duration of the recording is a property of the file that can be described as metadata.
Public (web) feed – an online data providing service that gives updated information on a regular basis to its users or the general public. It can be set up via subscription to the feed of a website/media or it can be publicly available to everyone.
Registrar - a company that provides domain registration services.
Registrant - a person who registers a domain.
Robots.txt – a file on a website that instructs automated programs (bots/robots/crawlers) on how to behave with data on the website.
Root Directory – the topmost level folder or directory, which may or may not contain other subdirectories.
Script – a list of commands that are executed by a certain program to automate processes, e.g. visit a URL every two seconds and save the data that is returned.
Server - a computer that remains on and connected to the Internet in order to provide some service, such as hosting a webpage or sending and receiving email to/from other computers
Server configuration – a combination of settings that determine the behavior of the server.
Sitemap protocol - a set of guidelines that enables site administrators to inform search engines about pages on their site that are available for crawling.
Subdomain – an extra identifier, typically added before a domain name, that represents a subcategory of content (e.g. google.com is a domain name whereas translate.google.com is a subdomain).
Source code - The underlying code, written by computer programmers, that allows software or websites to be created. The source code for a given tool or website will reveal how it works and whether it may be insecure or malicious.
Targeted advertising – a form of advertising that aims to reach only certain selected groups or individuals with particular characteristics or from specific geographic areas. For e.g. placing bicycle sale ads on Facebook accounts of young people in Amsterdam.
Subdirectory – a directory within a directory.
Tor Browser – a browser that keeps your online activities private. It disguises your identity and protects your web traffic from many forms of internet surveillance. It can also be used to bypass internet filters.
Web tracker – tool or software used by websites in order to trace their visitors and how they interact with the site.
Universal Resource Locator (URL) – a web address used to retrieve a page or data on a network or internet.
Virtual Private Network (VPN) - software that creates an encrypted “tunnel” from your device to a server run by your VPN service provider. Websites and other online services will receive your requests from - and return their responses to - the IP address of that server rather than your actual IP address.
Virtual private server (VPS) - a virtual machine, rented out as a service, by an Internet hosting company.
Web domain – a name commonly used to access a website which translates into an IP address.
Web interface – a graphical user interface in the form of a web page that is accessed through the internet browser.
Website log – a file that records every view of a website and of the documents, images and other digital objects on that website.
Webpage – a document that is accessible via the internet, displayed in a web browser.
Web server – also known as internet server, is a system that hosts websites and delivers their content and services to end users over the internet. It includes hardware (physical server machines that store the information) and software that facilitates users’ access to the content.
Website – a set of pages or data that is available remotely, typically to visitors with internet or network access.