Search Smarter by Dorking

by Gabi Sobliye

../_images/Google-dorking-cik-illustration.png


In Short: A look at advanced internet searches with “Google dorking,” how they work across different search engines, and how you can use the technique in your investigations. Mind the tips on protecting your privacy while searching, and safeguarding your personal information from those who might use this technique for malicious purposes.

When investigating, you often need to gather as much information as possible about a topic. Advanced search techniques can help to uncover files or leads that are relevant to the questions you are trying to answer. For example you may be able to find a company’s tax returns or a local government’s expenditure reports, information that may not appear on their websites or show up when you do a regular web search.

Google dorking (also known as Google hacking) is a technique used by newsrooms, investigative reporting organisations, security auditors and tech savvy criminals to query search engines in order to find hidden information that might be available on public websites or to identify evidence of digital security vulnerabilities. This technique can be used on most search engines, not just Google’s, so we typically refer to it simply as “dorking.”

Dorking involves using search engines to their full potential to unearth results that are not visible with a regular search. It allows you to refine your searches and dive deeper, and with greater precision, into webpages and documents that are available online. Uncovering hidden files and security flaws by dorking does not require a great deal of technical knowledge. It really boils down to learning just a few search techniques and using them across a number of search engines.

All you need to carry out a Google dork is a computer, an internet connection and a basic understanding of the appropriate search syntax: keywords and symbols (sometimes called “operators” or “filters”) that you can use to refine your search results. To do so effectively, however, you may also need persistence, creativity, patience and luck.

A brief history of dorking

Google dorking has been documented since the early 2000s. Like many other hacks, dorking is not technically sophisticated. It simply requires a small amount of obscure knowledge and some creativity.

Johnny Long, aka j0hnnyhax, was a pioneer of dorking. He first posted his definition of the newly coined term, googleDork, in 2002. Since then, its meaning has evolved to include other usages.

../_images/google-dorking-googledork.jpgJohnny Long's 2002 definition of a googleDork


An ordinary search query relies on a semantic way of asking for information – either by typing an entire question (“What is Google dorking?”) or by selecting important keywords (“Google dorking meaning”).

A dork refines that query, by combining technical and semantic elements, in order to take full advantage of the fact that web content is being constantly scanned and indexed by machines.

In a 2011 interview, Johnny Long said:

“In the years I’ve spent as a professional hacker, I’ve learned that the simplest approach is usually the best. As hackers, we tend to get down into the weeds, focusing on technology, not realising there may be non-technical methods at our disposal that work as well or better than their high-tech counterparts. I always kept an eye out for the simplest solution to advanced challenges.”

To dork or not to dork

By unleashing the full power of search engines, dorking can expose information on websites as well as vulnerabilities within them. This might include information that was supposed to stay in a password-protected folder but that ended up somewhere else. Or, it might include a setup script for a content management system (CMS) that still has the ability to perform administrative functions like adding users and changing passwords.

Dorking can strengthen your investigations by expanding your access to information that is of public interest but that is not, whether by design or by accident, readily available through search engines. It can also help you find digital security flaws in your own online services and publication platforms.


Note:

With great information access comes great ethical responsibility. While you can use these techniques, in a responsible manner, to extend your investigations, others can use them to obtain personal data or exploit vulnerabilities. As is often the case, intentions matter.


Safety First!

If you are thinking about using Google dorking as an investigative technique, there are several precautions to take before you start.

Google dorking demands an awareness of the legal issues involving accessing pages and files, even if they are on a public server. Although you are in most cases free to search at will on search engines, accessing certain webpages or downloading files from them can in some circumstances be a prosecutable offense, especially in the United States, in accordance with the vague and overreaching Computer Fraud and Abuse Act (CFAA). Moreover, since search queries are monitored and stored indefinitely by search providers and even governments, it’s possible that your searches could be recorded, identified as yours, and even used against you in the future.

To protect you in your research, we recommend using the Tor Browser or Tails (an operating system that routes all internet traffic through the Tor anonymity network) when Google dorking on any search engine. Tor masks your internet traffic, separating your computer’s identifying information from the webpages that you are accessing.

Tactical Tech’s Security-in-a-Box website includes detailed guides on how to use the Tor Browser on Linux, Mac and Windows, among others.

While the Tor Browser has become easier to use over the years, it can sometimes make your searches more difficult. Google and other search engines might ask you to solve CAPTCHAs to prove you’re human. Tor connects to the site you want to reach through a series of servers communicating on your behalf and forming what is called a Tor circuit. The last server to process your request and connect to the site you want is called a Tor exit node. If many automated programs (or bots) are using this same exit node, it might make search engines suspicious of your activities even when you’re not the one running the automated program. In such cases, search engines might block your searches entirely. In this case, you can try changing your Tor circuit until you connect to an exit node that’s not blacklisted. To do so, click the site information icon (“ⓘ”) on the left side of the address bar and select “New Circuit for this Site,” as shown below.

../_images/google-dorking-New-Circuit.pngScreenshot of how to request a new circuit in Tor web browser

Note that, depending on what country you are in, using Tor might flag your online activity as suspicious. Unless your are specifically targeted by an advanced attack, the Tor Browser is rather effective at preventing the association of your online identity with the websites you visit or the search terms you enter, but Tor does not hide the fact that you are using Tor.

This is a risk you must be willing to take when using Tor, though you can mitigate that risk to some extent by configuring the Tor Browser to use a Bridge with the “obfs4” pluggable transport. Using a Bridge tries to hide the fact that you are connecting to a Tor server, and using “obfs4” tries to make that connection look like something other than Tor traffic.

If you cannot use Tor, another option, though less effective in preserving your anonymity, would be using a VPN (Virtual Private Network).

VPNs work by disguising your IP address, which can be used by websites you visit to map where you are coming from. When using a VPN, rather than seeing your real IP address, sites you visit will see the IP of the VPN provider.

There are many VPN options and it can be confusing when deciding which one to pick. To add to the confusion, most VPN reviews and listings are not independent, some are really biased. ThatOnePrivacySite is a VPN review site we can endorse. It is recommended you choose a VPN company that claims that they do not record logs of your traffic. While most free VPNs should be avoided because they are often funding their operation by selling their log data (records of what sites users visit via the VPN), there are some reputable ones we can endorse, such as:

You can also use a privacy-aware search engine, such as DuckDuckGo that supports some of the advanced search techniques covered below.

If you decide to proceed with an investigation that involves Google dorking, the following methods will help you get started and provide a comparison of supported dorks, as of March 2019, across various search engines.


How dorking works

../_images/Google-dorking_Breakdown_01-cik-illustration.png

In everyday use, search engines like Google, Bing, DuckDuckGo and Yahoo accept a search term (a word), or a string of search terms, and return matching results. But most search engines are programmed to accept more advanced “filters” or “prefix operators” as well. A filter is a keyword or phrase that has particular meaning for the search engine. This includes terms like:

  • inurl:
  • intext:
  • site:
  • feed:
  • language:

Note:

Each filter keyword ends with a colon (:) and is followed by the relevant search term or terms - with no space before or after the colon! We’ll show a few examples below.

At the end of the day, whether you call it something pretentious (like “advanced search engine query syntax”) or something silly, a dork is just a search that relies on these and other special keywords to obtain more significant results. Those results might include specific strings of text from the body of a website, for example, or files hosted at a specific web address.

Not all “advanced” search techniques rely on prefix filters like those shown above. Adding quotation marks (“all night pharmacies in Budapest”, for example) tells most search engines to match an exact phrase. Placing an all-caps OR between search terms (like pharmacies OR drugstores in Budapest) tells the search engine to return results with either term.

The following is a simple example of a dork that does rely on a prefix operator. It will search https://tacticaltech.org for all indexed PDF files hosted on that domain.

site:tacticaltech.org filetype:pdf

Another example, which returns all websites under the tacticaltech.org domain that have the word “invisible” in their titles, might look like this:

site:tacticaltech.org intitle:invisible

If you need to use a search term that contains multiple words, you can surround them with quotation marks:

site:tacticaltech.org intext:exposing intitle:“the invisible”

Dorks can also be paired with a general search term. For example:

exposing site:tacticaltech.org, or
exposing site:tacticaltech.org filetype:pdf

Here, ‘exposing’ is the general search term, and the filters site: and filetype: narrow down the results.

Example search results are shown below.

../_images/google-dorking-tactecpdf.pngExample of searching for pdf in TacticalTech.org in Google

A similar search on a different domain exposingtheinvisible.org turns up no documents, showing us that there are no public PDFs hosted on that website.

../_images/google-dorking-etipdf.pngExample of searching for pdf in exposingtheinvisible.org in Google


Tip:

The order of the terms you enter, including filters, does matter on some search engines, so it may be worth your while to try different combinations for more accurate or relevant results.

Dorking for Dummies

There are many different dorking operators, and they vary across search engines. To give you a general idea of what can be found, we have included four examples of dorks below. Even if two search engines support the same operators, they often return different results.

Repeating these searches across various search engines is a good way to get a sense of those differences. For a quick comparative reference, see the dorking operators used by Google, DuckDuckGo, Yahoo and Bing in the table below.

Example 1: Finding budgets on the US Homeland Security website

This dork will bring you all public, indexed Excel spreadsheets that contain the word ‘budget’:

budget filetype:xls

The ‘filetype:’ operator does not automatically recognise different versions of similar file formats (i.e. doc vs. odt or xlsx vs. csv), so each of these formats must be dorked separately:

budget filetype:xlsx OR filetype:csv

This dork will return PDF files on the NASA website:

site:nasa.gov filetype:pdf

And this dork will return .xlsx spreadsheets containing the word ‘budget’ on the United States Department of Homeland Security website:

budget site:dhs.gov filetype:xls

That final query, performed across various search engines, will return different results, as illustrated below.

Google

On Google, we had to solve a CAPTCHA.

../_images/google-dorking-captcha.pngGoogle example 1: Captcha

../_images/google-dorking-example1-google.pngGoogle example 1: Finding budgets on the US Homeland Security website search results

Bing

../_images/google-dorking-example1-bing.pngBing example 1: Finding budgets on the US Homeland Security website search results

Yahoo

../_images/google-dorking-example1-yahoo.pngYahoo example 1: Finding budgets on the US Homeland Security website search results

DuckDuckGo

../_images/google-dorking-example1-duck.pngDuckDuckGo example 1: Finding budgets on the US Homeland Security website search results

As you can see, results vary between search engines.

Example 2: London house prices

Another interesting example looks at housing prices in London, Below are the results from the following query, which we submitted to four different search engines:

filetype:xls “house prices” AND “London”

../_images/google-dorking-example2-google.png Google example 2: London house prices search results

../_images/google-dorking-example2-bing.pngBing example 2: London house prices search results

../_images/google-dorking-example2-yahoo.pngYahoo example 2: London house prices search results

../_images/google-dorking-example2-duck.pngDuckDuckGo example 2: London house prices search results

Example 3: Looking for the Indian government’s security plans

For our final example we will locate documents containing the words ‘security plan’ on Indian government websites. Below are the results from the following query:

filetype:doc “security plan” site:gov.in

../_images/google-dorking-example3-google.pngGoogle example 3: Looking for the Indian government’s security plans search results

../_images/google-dorking-example3-bing.pngBing example 3: Looking for the Indian government’s security plans search results

../_images/google-dorking-example3-yahoo.pngYahoo example 3: Looking for the Indian government’s security plans search results

../_images/google-dorking-example3-duck.pngDuckDuckGo example 3: Looking for the Indian government’s security plans search results

Hopefully, after seeing the examples above, you can think of a few websites you’d like to search using similar techniques.

In the following section, we will share a few of the dorks that we have found particularly useful and discuss how they work with different search engines.

Dork It Yourself

Below is a list of the relevant dorks we identified and updated as of March 2019. This list might not be exhaustive, but the operators should help you get started.

We collected and tested these dorks across search engines with the help of the following resources:

[Table: Dorking operators for Google, DuckDuckGo, Yahoo and Bing]

Dork Description Google DuckDuckGo Yahoo Bing
cache:[url] Shows the version of the web page from the search engine’s cache.
related:[url] Finds web pages that are similar to the specified web page.
info:[url] Presents some information that Google has about a web page, including similar pages, the cached version of the page, and sites linking to the page.
site:[url] Finds pages only within a particular domain and all its subdomains.
intitle:[text] or allintitle:[text] Finds pages that include a specific keyword as part of the indexed title tag. You must include a space between the colon and the query for the operator to work in Bing.
allinurl:[text] Finds pages that include a specific keyword as part of their indexed URLs.
meta:[text] Finds pages that contain the specific keyword in the meta tags.
filetype:[file extension] Searches for specific file types.
intext:[text], allintext:[text], inbody:[text] Searches text of page. For Bing and Yahoo the query is inbody:[text]. For DuckDuckGo the query is intext:[text]. For Google either intext:[text] or allintext:[text] can be used.
inanchor:[text] Search link anchor text
location:[iso code] or loc:[iso code], region:[region code] Search for specific region. For Bing use location:[iso code] or loc:[iso code] and for DuckDuckGo use region:[iso code].An iso location code is a short code for a country for example, Egypt is eg and USA is us. https://en.wikipedia.org/wiki/ISO_3166-1
contains:[text] Identifies sites that contain links to filetypes specified (i.e. contains:pdf)
altloc:[iso code] Searches for location in addition to one specified by language of site (i.e. pt-us or en-us)
feed:[feed type, i.e. rss] Find RSS feed related to search term
hasfeed:[url] Finds webpages that contain both the term or terms for which you are querying and one or more RSS or Atom feeds.
ip:[ip address] Find sites hosted by a specific ip address
language:[language code] Returns websites that match the search term in a specified language
book:[title] Searches for book titles related to keywords
maps:[location] Searches for maps related to keywords
linkfromdomain:[url] Shows websites whose links are mentioned in the specified url (with errors)

DorkDorkGo

We included the most widely used search engines in the analysis above, but our preferred service is DuckDuckGo, which is a privacy-focused search engine that claims not to collect personal information about its users and that saves search queries in such a way that they cannot be attributed to specific users.

That said, if you are doing sensitive research, it still makes sense to use the Tor Browser, in combination DuckDuckGo, to further protect your privacy. And fortunately, DuckDuckGo is much less likely than Google to block Tor users or make them solve CAPTCHAs.

DuckDuckGo also has a useful feature called “bang,” which allows you to query other search engines without leaving the DuckDuckGo website. To do so, you start your search with an exclamation mark followed by a qualifier, which is normally an abbreviation for a specific search provider. Note that if DuckDuckGo is your browser’s default search engine, you can use bangs in your address bar as well.

../_images/google-dorking-bangs.pngDuckDuckGo Bangs

For example, starting your search with the !w bang allows you to search Wikipedia directly, while !twitter, followed by your search terms, will return relevant twitter posts. You can find thousands of bang shortcuts here: https://duckduckgo.com/bang.

Suppose you wanted to lookup the Wikipedia entry for ‘dorking’. The following query will take you to Wikipedia’s search engine.

!w dorking


Safety First!

Note that using bangs will not protect your searches in accordance with DuckDuckGo’s privacy policy, as the searches themselves are carried out by other services, in our example now, Wikipedia.

And, because it’s an exact match, you will end up on the ‘dorking’ Wikipedia entry itself, but, with a different meaning than ours.

../_images/google-dorking-dorking.pngDuckDuckGo Bangs search result for !w dorking

Other privacy-aware search engines

For general searching, we also recommend StartPage which is a search engine that returns Google results using a privacy filter that reduces the amount of personal information that Google can collect about your searches.

As important as it is to use privacy-aware search engines in your day-to-day browsing, the Tor Browser should offer enough protection to let you dork across other search engines when necessary.

Defensive dorking

../_images/Google-dorking-breakdown_02_cik_smaller.png

You can use dorking to protect your own data and to defend websites for which you are responsible. We call this “defensive dorking,” and it typically takes one of two forms:

  • Checking for security vulnerabilities in an online service, such as a website or an FTP server, that you administer; or
  • Looking for sensitive information about yourself - or about someone else, with their permission - that might be exposed unintentionally on a website, regardless of whether or not you administer that website.

This advice is primarily concerned with the latter type of dorking but we will first introduce a database that might help you or your service administrators with the former.

Checking for security vulnerabilities

The Google Hacking Database (GHDB) suggests various keywords and other terms that you can use - along with the site:yoursite.org filter in order to identify certain vulnerabilities.

While these searches may help attackers locate vulnerable services, they also help administrators protect their own. We recommend that you coordinate with the technical administrator of the service you want to test (unless of course that’s you) before trying them out.

Looking for sensitive information

To look for sensitive information, we recommend starting with the following simple commands, along with the site:yoursite.org filter. You can then remove the site: filter to discover which other websites might be exposing information about you or your organisation. Below are a few examples.

You can search for your name in PDF documents with:

<your name> filetype:pdf

You can repeat this search with other potentially relevant filetypes, such as xls, xlsx, doc, docx, ods or odt. You can even look for several different file types in one search:

<your name> filetype:pdf OR filetype:xlsx OR filetype:docx

Or you can search for your name in regular website content with something like the following. (See the table above for information about whether your search engine of choice uses intext: or inbody: as the text-searching filter.)

<your name> intext:”<personal information like a phone number or address>”

Safety First!

Be careful, though. If you search for your name or address and then, say, your social security number, you are essentially giving that information to whoever runs the search engine. Even the Tor Browser cannot protect you from that sort of privacy leak.

You can also search for information associated with the IP address of your servers:

ip:[your server’s IP address] filetype:pdf

For more examples, have a look at Exploit Database’s list of Files Containing Juicy Info.


Example: Finding passwords

Searching for login and password information can be useful as a defensive dork. Passwords are sometimes stored in publicly accessible documents on webservers. Dorking is one way to identify security vulnerabilities like this.

The easiest way to try this out, while leaving your ethics intact, is to restrict your searches to a website that you manage or to one that is managed by someone from whom you can seek permission. Test the following dorks in different search engines:

password filetype:doc site:yoursite.org password filetype:docx site:yoursite.org password filetype:pdf site:yoursite.org password filetype:xls site:yoursite.org

In order to avoid calling out any particular company or organisation, we tried this search without the ‘site:’ filter. Doing so put certain responsibilities on us:

  • Not to share any passwords we might view or download,
  • To encrypt any files we might download,
  • Not to test or use any passwords we might learn, and
  • To notify the administrator of any website on which we might find an exposed password list.

Google’s results linked to files that contained actual usernames and passwords for two institutions, including a North American high school. We obscured these results, in the screenshot below, and notified the school that their data was vulnerable. The list of passwords has since been removed.

../_images/google-dorking-pass-googleo.pngGoogle dorking passwords search results

../_images/google-dorking-pass-bing.pngBing dorking passwords search results

../_images/google-dorking-pass-yahoo.pngYahoo dorking passwords search results

../_images/google-dorking-pass-duck.pngDuckDuckGo dorking passwords search results

As you can see, the various search engines once again produced different results. Some of them did not include the documents mentioned above in their first few pages of results. Also, both Yahoo and DuckDuckGo returned a few non-document results. Including, for whatever reason, a collection of Cajun recipes.

This sort of varying results are to be expected when dorking; some queries work better than others, and results differ among search engines.


Published April 2019

Resources and tools

RESOURCES

Articles and Guides

Tools and Databases

Glossary

term-bang

Bang - is a nerdy nickname for the exclamation point (“!”).

term-blacklist

Blacklist - a list of blocked websites and other Internet services that can not be accessed due to a restrictive filtering policy.

term-bot

Bot – also called web robot or internet bot, is a software application that runs automated tasks over the internet. For example, a Twitter bot that posts automated messages and news feeds.

term-captcha

CAPTCHA – an automated test used by websites and online services to determine whether a user is human or robot. For example, a test asking users to identify all traffic lights in a series of nine pictures.

term-cms

Content Management System (CMS) - software used to manage content that is later rendered into pages on the internet.

term-directory

Directory – a container used to categorise files or other containers of files and data.

term-domain

Domain name - a name that is commonly used to access a website (e.g. tacticaltech.org). Domain names are translated into IP addresses.

term-defensedork

Defensive dork– means dorking to identify vulnerabilities that might affect your own data or the websites for which you are responsible.

term-dorking

Dorking - a technique of using search engines to their full by employing refined searches and prefix operators.

term-dork

Dork – as in Google dork, the person using the dorking technique

term-filter

Filter – in web search context, it is a keyword or phrase that has particular meaning for the search engine.

term-ftpserv

FTP server - a software application that runs the File Transfer Protocol (FTP), which is used to transfer files between computers over the internet.

term-hack

Hack – the practice of interacting with technology in unexpected ways in order to learn more about it. (It has also gained malicious uses and connotations.)

term-hacker

Hacker- traditionally, anyone who interacts with technology in unexpected ways in order to learn more about it. In negative context, a malicious computer criminal who may be trying to access sensitive information or take control of someone’s computer.

term-ip

Internet Protocol (IP) address – a set of numbers used to identify a computer or data location you are connecting to. Example: 213.108.108.217

term-prefix

Prefix operator - special text that is added before the searched text in a search bar. For example, “site:https://www.worldbank.org filetype:pdf” will look for all the .pdf files on the World Bank site.

term-script

Script – a list of commands executed by a program to automate processes, e.g. visit a URL every two seconds and save the data that is returned.

term-seo

Search Engine Optimisation (SEO) – a method of influencing the organic (not paid) visibility of a website or a webpage in search engines. For example, by using certain ways of constructing titles and content or linking to/from multiple sources.

term-searchsyntax

Search syntax - keywords and symbols, sometimes called “operators” or “filters,” that you can use to refine your internet search results.

term-string

Search string – the combination of words, numbers and other characters we use when searching for information in search engines.

term-server

Server - a computer that remains on and connected to the Internet in order to provide some service, such as hosting a webpage or sending and receiving email, to other computers.

term-tor

Tor Browser – a browser that keeps your online activities private. It disguises your identity and protects your web traffic from many forms of internet surveillance. It can also be used to bypass internet filters.

term-url

Universal Resource Locator (URL) – a web address used to retrieve a page or data on a network or internet.

term-vpn

Virtual Private Network (VPN) - software that creates an encrypted “tunnel” from your device to a server run by your VPN service provider. Websites and other online services will receive your requests from - and return their responses to - the IP address of that server rather than your actual IP address.

term-webdomain

Web domain – a name commonly used to access a website which translates into an IP address.

term-webinterf

Web interface – a graphical user interface in the form of a web page that is accessed through the internet browser.

term-webpage

Webpage – a document that is accessible via the internet, displayed in a web browser.

term-webserver

Web server – also knows as internet server, is a system that hosts websites and delivers their content and services to end users over the internet. It includes hardware (physical server machines that store the information) and software that facilitates users’ access to the content.

term-webadmin

Website administrator – the person responsible for managing the systems behind a website. Also called a webmaster.