Search Smarter by Dorking¶
by Gabi Sobliye
In Short: A look at advanced internet searches with “Google dorking,” how they work across different search engines, and how you can use the technique in your investigations. Mind the tips on protecting your privacy while searching, and safeguarding your personal information from those who might use this technique for malicious purposes.
When investigating, you often need to gather as much information as possible about a topic. Advanced search techniques can help to uncover files or leads that are relevant to the questions you are trying to answer. For example you may be able to find a company’s tax returns or a local government’s expenditure reports, information that may not appear on their websites or show up when you do a regular web search.
Google dorking (also known as Google hacking) is a technique used by newsrooms, investigative reporting organisations, security auditors and tech savvy criminals to query search engines in order to find hidden information that might be available on public websites or to identify evidence of digital security vulnerabilities. This technique can be used on most search engines, not just Google’s, so we typically refer to it simply as “dorking.”
Dorking involves using search engines to their full potential to unearth results that are not visible with a regular search. It allows you to refine your searches and dive deeper, and with greater precision, into webpages and documents that are available online. Uncovering hidden files and security flaws by dorking does not require a great deal of technical knowledge. It really boils down to learning just a few search techniques and using them across a number of search engines.
All you need to carry out a Google dork is a computer, an internet connection and a basic understanding of the appropriate search syntax: keywords and symbols (sometimes called “operators” or “filters”) that you can use to refine your search results. To do so effectively, however, you may also need persistence, creativity, patience and luck.
A brief history of dorking¶
Google dorking has been documented since the early 2000s. Like many other hacks, dorking is not technically sophisticated. It simply requires a small amount of obscure knowledge and some creativity.
Johnny Long, aka j0hnnyhax, was a pioneer of dorking. He first posted his definition of the newly coined term, googleDork, in 2002. Since then, its meaning has evolved to include other usages.
Johnny Long's 2002 definition of a googleDork
An ordinary search query relies on a semantic way of asking for information – either by typing an entire question (“What is Google dorking?”) or by selecting important keywords (“Google dorking meaning”).
A dork refines that query, by combining technical and semantic elements, in order to take full advantage of the fact that web content is being constantly scanned and indexed by machines.
In a 2011 interview, Johnny Long said:
“In the years I’ve spent as a professional hacker, I’ve learned that the simplest approach is usually the best. As hackers, we tend to get down into the weeds, focusing on technology, not realising there may be non-technical methods at our disposal that work as well or better than their high-tech counterparts. I always kept an eye out for the simplest solution to advanced challenges.”
To dork or not to dork¶
By unleashing the full power of search engines, dorking can expose information on websites as well as vulnerabilities within them. This might include information that was supposed to stay in a password-protected folder but that ended up somewhere else. Or, it might include a setup script for a content management system (CMS) that still has the ability to perform administrative functions like adding users and changing passwords.
Dorking can strengthen your investigations by expanding your access to information that is of public interest but that is not, whether by design or by accident, readily available through search engines. It can also help you find digital security flaws in your own online services and publication platforms.
With great information access comes great ethical responsibility. While you can use these techniques, in a responsible manner, to extend your investigations, others can use them to obtain personal data or exploit vulnerabilities. As is often the case, intentions matter.
If you are thinking about using Google dorking as an investigative technique, there are several precautions to take before you start.
Google dorking demands an awareness of the legal issues involving accessing pages and files, even if they are on a public server. Although you are in most cases free to search at will on search engines, accessing certain webpages or downloading files from them can in some circumstances be a prosecutable offense, especially in the United States, in accordance with the vague and overreaching Computer Fraud and Abuse Act (CFAA). Moreover, since search queries are monitored and stored indefinitely by search providers and even governments, it’s possible that your searches could be recorded, identified as yours, and even used against you in the future.
To protect you in your research, we recommend using the Tor Browser or Tails (an operating system that routes all internet traffic through the Tor anonymity network) when Google dorking on any search engine. Tor masks your internet traffic, separating your computer’s identifying information from the webpages that you are accessing.
While the Tor Browser has become easier to use over the years, it can sometimes make your searches more difficult. Google and other search engines might ask you to solve CAPTCHAs to prove you’re human. Tor connects to the site you want to reach through a series of servers communicating on your behalf and forming what is called a Tor circuit. The last server to process your request and connect to the site you want is called a Tor exit node. If many automated programs (or bots) are using this same exit node, it might make search engines suspicious of your activities even when you’re not the one running the automated program. In such cases, search engines might block your searches entirely. In this case, you can try changing your Tor circuit until you connect to an exit node that’s not blacklisted. To do so, click the site information icon (“ⓘ”) on the left side of the address bar and select “New Circuit for this Site,” as shown below.
Screenshot of how to request a new circuit in Tor web browser
Note that, depending on what country you are in, using Tor might flag your online activity as suspicious. Unless your are specifically targeted by an advanced attack, the Tor Browser is rather effective at preventing the association of your online identity with the websites you visit or the search terms you enter, but Tor does not hide the fact that you are using Tor.
This is a risk you must be willing to take when using Tor, though you can mitigate that risk to some extent by configuring the Tor Browser to use a Bridge with the “obfs4” pluggable transport. Using a Bridge tries to hide the fact that you are connecting to a Tor server, and using “obfs4” tries to make that connection look like something other than Tor traffic.
If you cannot use Tor, another option, though less effective in preserving your anonymity, would be using a VPN (Virtual Private Network).
VPNs work by disguising your IP address, which can be used by websites you visit to map where you are coming from. When using a VPN, rather than seeing your real IP address, sites you visit will see the IP of the VPN provider.
There are many VPN options and it can be confusing when deciding which one to pick. To add to the confusion, most VPN reviews and listings are not independent, some are really biased. ThatOnePrivacySite is a VPN review site we can endorse. It is recommended you choose a VPN company that claims that they do not record logs of your traffic. While most free VPNs should be avoided because they are often funding their operation by selling their log data (records of what sites users visit via the VPN), there are some reputable ones we can endorse, such as:
You can also use a privacy-aware search engine, such as DuckDuckGo that supports some of the advanced search techniques covered below.
If you decide to proceed with an investigation that involves Google dorking, the following methods will help you get started and provide a comparison of supported dorks, as of March 2019, across various search engines.
How dorking works¶
In everyday use, search engines like Google, Bing, DuckDuckGo and Yahoo accept a search term (a word), or a string of search terms, and return matching results. But most search engines are programmed to accept more advanced “filters” or “prefix operators” as well. A filter is a keyword or phrase that has particular meaning for the search engine. This includes terms like:
Each filter keyword ends with a colon (:) and is followed by the relevant search term or terms - with no space before or after the colon! We’ll show a few examples below.
At the end of the day, whether you call it something pretentious (like “advanced search engine query syntax”) or something silly, a dork is just a search that relies on these and other special keywords to obtain more significant results. Those results might include specific strings of text from the body of a website, for example, or files hosted at a specific web address.
Not all “advanced” search techniques rely on prefix filters like those
shown above. Adding quotation marks (“all night pharmacies in
Budapest”, for example) tells most search engines to match an exact
phrase. Placing an all-caps
OR between search terms (like
pharmacies OR drugstores in Budapest) tells the search engine to
return results with either term.
Another example, which returns all websites under the tacticaltech.org domain that have the word “invisible” in their titles, might look like this:
If you need to use a search term that contains multiple words, you can surround them with quotation marks:
site:tacticaltech.org intext:exposing intitle:“the invisible”
Dorks can also be paired with a general search term. For example:
exposing site:tacticaltech.org, or
exposing site:tacticaltech.org filetype:pdf
Here, ‘exposing’ is the general search term, and the filters
filetype: narrow down the results.
Example search results are shown below.
Example of searching for pdf in TacticalTech.org in Google
A similar search on a different domain exposingtheinvisible.org turns up no documents, showing us that there are no public PDFs hosted on that website.
Example of searching for pdf in exposingtheinvisible.org in Google
The order of the terms you enter, including filters, does matter on some search engines, so it may be worth your while to try different combinations for more accurate or relevant results.
Dorking for Dummies¶
There are many different dorking operators, and they vary across search engines. To give you a general idea of what can be found, we have included four examples of dorks below. Even if two search engines support the same operators, they often return different results.
Repeating these searches across various search engines is a good way to get a sense of those differences. For a quick comparative reference, see the dorking operators used by Google, DuckDuckGo, Yahoo and Bing in the table below.
Example 1: Finding budgets on the US Homeland Security website¶
This dork will bring you all public, indexed Excel spreadsheets that contain the word ‘budget’:
The ‘filetype:’ operator does not automatically recognise different versions of similar file formats (i.e. doc vs. odt or xlsx vs. csv), so each of these formats must be dorked separately:
budget filetype:xlsx OR filetype:csv
This dork will return PDF files on the NASA website:
And this dork will return .xlsx spreadsheets containing the word ‘budget’ on the United States Department of Homeland Security website:
budget site:dhs.gov filetype:xls
That final query, performed across various search engines, will return different results, as illustrated below.
On Google, we had to solve a CAPTCHA.
Google example 1: Captcha
Google example 1: Finding budgets on the US Homeland Security website search results
Bing example 1: Finding budgets on the US Homeland Security website search results
Yahoo example 1: Finding budgets on the US Homeland Security website search results
DuckDuckGo example 1: Finding budgets on the US Homeland Security website search results
As you can see, results vary between search engines.
Example 2: London house prices¶
Another interesting example looks at housing prices in London, Below are the results from the following query, which we submitted to four different search engines:
filetype:xls “house prices” AND “London”
Google example 2: London house prices search results
Bing example 2: London house prices search results
Yahoo example 2: London house prices search results
DuckDuckGo example 2: London house prices search results
Example 3: Looking for the Indian government’s security plans¶
For our final example we will locate documents containing the words ‘security plan’ on Indian government websites. Below are the results from the following query:
filetype:doc “security plan” site:gov.in
Google example 3: Looking for the Indian government’s security plans search results
Bing example 3: Looking for the Indian government’s security plans search results
Yahoo example 3: Looking for the Indian government’s security plans search results
DuckDuckGo example 3: Looking for the Indian government’s security plans search results
Hopefully, after seeing the examples above, you can think of a few websites you’d like to search using similar techniques.
In the following section, we will share a few of the dorks that we have found particularly useful and discuss how they work with different search engines.
Dork It Yourself¶
Below is a list of the relevant dorks we identified and updated as of March 2019. This list might not be exhaustive, but the operators should help you get started.
We collected and tested these dorks across search engines with the help of the following resources:
- Advanced Search Operators for Yahoo, Bing and Google, from Bruce Clay inc.
- Google hacking entry from Wikipedia
- DuckDuckGo official search syntax
- Bing Advanced Search Tricks from Microsoft
- Google Search Help, about refining web searches
- Google, Yahoo and Live Search operators
[Table: Dorking operators for Google, DuckDuckGo, Yahoo and Bing]¶
|cache:[url]||Shows the version of the web page from the search engine’s cache.||✓|
|related:[url]||Finds web pages that are similar to the specified web page.||✓|
|info:[url]||Presents some information that Google has about a web page, including similar pages, the cached version of the page, and sites linking to the page.||✓|
|site:[url]||Finds pages only within a particular domain and all its subdomains.||✓||✓||✓||✓|
|intitle:[text] or allintitle:[text]||Finds pages that include a specific keyword as part of the indexed title tag. You must include a space between the colon and the query for the operator to work in Bing.||✓||✓||✓||✓|
|allinurl:[text]||Finds pages that include a specific keyword as part of their indexed URLs.||✓|
|meta:[text]||Finds pages that contain the specific keyword in the meta tags.||✓|
|filetype:[file extension]||Searches for specific file types.||✓||✓||✓||✓|
|intext:[text], allintext:[text], inbody:[text]||Searches text of page. For Bing and Yahoo the query is inbody:[text]. For DuckDuckGo the query is intext:[text]. For Google either intext:[text] or allintext:[text] can be used.||✓||✓||✓||✓|
|inanchor:[text]||Search link anchor text||✓|
|location:[iso code] or loc:[iso code], region:[region code]||Search for specific region. For Bing use location:[iso code] or loc:[iso code] and for DuckDuckGo use region:[iso code].An iso location code is a short code for a country for example, Egypt is eg and USA is us. https://en.wikipedia.org/wiki/ISO_3166-1||✓||✓|
|contains:[text]||Identifies sites that contain links to filetypes specified (i.e. contains:pdf)||✓|
|altloc:[iso code]||Searches for location in addition to one specified by language of site (i.e. pt-us or en-us)||✓|
|feed:[feed type, i.e. rss]||Find RSS feed related to search term||✓||✓||✓|
|hasfeed:[url]||Finds webpages that contain both the term or terms for which you are querying and one or more RSS or Atom feeds.||✓||✓|
|ip:[ip address]||Find sites hosted by a specific ip address||✓||✓|
|language:[language code]||Returns websites that match the search term in a specified language||✓||✓|
|book:[title]||Searches for book titles related to keywords||✓|
|maps:[location]||Searches for maps related to keywords||✓|
|linkfromdomain:[url]||Shows websites whose links are mentioned in the specified url (with errors)||✓|
We included the most widely used search engines in the analysis above, but our preferred service is DuckDuckGo, which is a privacy-focused search engine that claims not to collect personal information about its users and that saves search queries in such a way that they cannot be attributed to specific users.
That said, if you are doing sensitive research, it still makes sense to use the Tor Browser, in combination DuckDuckGo, to further protect your privacy. And fortunately, DuckDuckGo is much less likely than Google to block Tor users or make them solve CAPTCHAs.
DuckDuckGo also has a useful feature called “bang,” which allows you to query other search engines without leaving the DuckDuckGo website. To do so, you start your search with an exclamation mark followed by a qualifier, which is normally an abbreviation for a specific search provider. Note that if DuckDuckGo is your browser’s default search engine, you can use bangs in your address bar as well.
For example, starting your search with the
!w bang allows you to search Wikipedia directly, while
Suppose you wanted to lookup the Wikipedia entry for ‘dorking’. The following query will take you to Wikipedia’s search engine.
And, because it’s an exact match, you will end up on the ‘dorking’ Wikipedia entry itself, but, with a different meaning than ours.
DuckDuckGo Bangs search result for !w dorking
Other privacy-aware search engines¶
For general searching, we also recommend StartPage which is a search engine that returns Google results using a privacy filter that reduces the amount of personal information that Google can collect about your searches.
As important as it is to use privacy-aware search engines in your day-to-day browsing, the Tor Browser should offer enough protection to let you dork across other search engines when necessary.
You can use dorking to protect your own data and to defend websites for which you are responsible. We call this “defensive dorking,” and it typically takes one of two forms:
- Checking for security vulnerabilities in an online service, such as a website or an FTP server, that you administer; or
- Looking for sensitive information about yourself - or about someone else, with their permission - that might be exposed unintentionally on a website, regardless of whether or not you administer that website.
This advice is primarily concerned with the latter type of dorking but we will first introduce a database that might help you or your service administrators with the former.
Checking for security vulnerabilities¶
The Google Hacking Database (GHDB)
suggests various keywords and other terms that you can use - along
site:yoursite.org filter in order to identify certain
While these searches may help attackers locate vulnerable services, they also help administrators protect their own. We recommend that you coordinate with the technical administrator of the service you want to test (unless of course that’s you) before trying them out.
Looking for sensitive information¶
To look for sensitive information, we recommend starting with the
following simple commands, along with the
You can then remove the
site: filter to discover which other
websites might be exposing information about you or your organisation.
Below are a few examples.
You can search for your name in PDF documents with:
<your name> filetype:pdf
You can repeat this search with other potentially relevant filetypes, such as xls, xlsx, doc, docx, ods or odt. You can even look for several different file types in one search:
<your name> filetype:pdf OR filetype:xlsx OR filetype:docx
Or you can search for your name in regular website content with
something like the following. (See the table above for information about
whether your search engine of choice uses
inbody: as the
<your name> intext:”<personal information like a phone number or address>”
Be careful, though. If you search for your name or address and then, say, your social security number, you are essentially giving that information to whoever runs the search engine. Even the Tor Browser cannot protect you from that sort of privacy leak.
You can also search for information associated with the IP address of your servers:
ip:[your server’s IP address] filetype:pdf
For more examples, have a look at Exploit Database’s list of Files Containing Juicy Info.
Example: Finding passwords
Searching for login and password information can be useful as a defensive dork. Passwords are sometimes stored in publicly accessible documents on webservers. Dorking is one way to identify security vulnerabilities like this.
The easiest way to try this out, while leaving your ethics intact, is to restrict your searches to a website that you manage or to one that is managed by someone from whom you can seek permission. Test the following dorks in different search engines:
password filetype:doc site:yoursite.org
password filetype:docx site:yoursite.org
password filetype:pdf site:yoursite.org
password filetype:xls site:yoursite.org
In order to avoid calling out any particular company or organisation, we tried this search without the ‘site:’ filter. Doing so put certain responsibilities on us:
- Not to share any passwords we might view or download,
- To encrypt any files we might download,
- Not to test or use any passwords we might learn, and
- To notify the administrator of any website on which we might find an exposed password list.
Google’s results linked to files that contained actual usernames and passwords for two institutions, including a North American high school. We obscured these results, in the screenshot below, and notified the school that their data was vulnerable. The list of passwords has since been removed.
Google dorking passwords search results
Bing dorking passwords search results
Yahoo dorking passwords search results
DuckDuckGo dorking passwords search results
As you can see, the various search engines once again produced different results. Some of them did not include the documents mentioned above in their first few pages of results. Also, both Yahoo and DuckDuckGo returned a few non-document results. Including, for whatever reason, a collection of Cajun recipes.
This sort of varying results are to be expected when dorking; some queries work better than others, and results differ among search engines.
Published April 2019
Articles and Guides¶
- Bing Query Language Guide (to download from archived page stored by the Internet Archive’s Wayback Machine) and Operators explained, from Microsoft Bing.
- DuckDuckGo Search guide. A set of tips and guidelines on how to conduct advanced searches with the DuckDuckGo search engine.
- Google Searches. A set of tips and guidelines on how to conduct advanced searches with the Google browser.
- Google hacking, from Wikipedia.
- Investigative Online Search, from The Center for Investigative Journalism. A guide on basic and advanced internet research.
- Search Commands for Google, Yahoo and “Live Search”, from searchcommands.com.
Tools and Databases¶
- Advanced Search Operators for Yahoo, Bing and Google, guide and cheatsheet from Bruce Clay inc.
- Google Advanced Search Operators. The Complete List, by Joshua Hardwick, Ahref.com.
Bang - is a nerdy nickname for the exclamation point (“!”).
Blacklist - a list of blocked websites and other Internet services that can not be accessed due to a restrictive filtering policy.
Bot – also called web robot or internet bot, is a software application that runs automated tasks over the internet. For example, a Twitter bot that posts automated messages and news feeds.
CAPTCHA – an automated test used by websites and online services to determine whether a user is human or robot. For example, a test asking users to identify all traffic lights in a series of nine pictures.
Content Management System (CMS) - software used to manage content that is later rendered into pages on the internet.
Directory – a container used to categorise files or other containers of files and data.
Domain name - a name that is commonly used to access a website (e.g. tacticaltech.org). Domain names are translated into IP addresses.
Defensive dork– means dorking to identify vulnerabilities that might affect your own data or the websites for which you are responsible.
Dorking - a technique of using search engines to their full by employing refined searches and prefix operators.
Dork – as in Google dork, the person using the dorking technique
Filter – in web search context, it is a keyword or phrase that has particular meaning for the search engine.
FTP server - a software application that runs the File Transfer Protocol (FTP), which is used to transfer files between computers over the internet.
Hack – the practice of interacting with technology in unexpected ways in order to learn more about it. (It has also gained malicious uses and connotations.)
Hacker- traditionally, anyone who interacts with technology in unexpected ways in order to learn more about it. In negative context, a malicious computer criminal who may be trying to access sensitive information or take control of someone’s computer.
Internet Protocol (IP) address – a set of numbers used to identify a computer or data location you are connecting to. Example: 188.8.131.52
Prefix operator - special text that is added before the searched text in a search bar. For example, “site:https://www.worldbank.org filetype:pdf” will look for all the .pdf files on the World Bank site.
Script – a list of commands executed by a program to automate processes, e.g. visit a URL every two seconds and save the data that is returned.
Search Engine Optimisation (SEO) – a method of influencing the organic (not paid) visibility of a website or a webpage in search engines. For example, by using certain ways of constructing titles and content or linking to/from multiple sources.
Search syntax - keywords and symbols, sometimes called “operators” or “filters,” that you can use to refine your internet search results.
Search string – the combination of words, numbers and other characters we use when searching for information in search engines.
Server - a computer that remains on and connected to the Internet in order to provide some service, such as hosting a webpage or sending and receiving email, to other computers.
Tor Browser – a browser that keeps your online activities private. It disguises your identity and protects your web traffic from many forms of internet surveillance. It can also be used to bypass internet filters.
Universal Resource Locator (URL) – a web address used to retrieve a page or data on a network or internet.
Virtual Private Network (VPN) - software that creates an encrypted “tunnel” from your device to a server run by your VPN service provider. Websites and other online services will receive your requests from - and return their responses to - the IP address of that server rather than your actual IP address.
Web domain – a name commonly used to access a website which translates into an IP address.
Web interface – a graphical user interface in the form of a web page that is accessed through the internet browser.
Webpage – a document that is accessible via the internet, displayed in a web browser.
Web server – also knows as internet server, is a system that hosts websites and delivers their content and services to end users over the internet. It includes hardware (physical server machines that store the information) and software that facilitates users’ access to the content.
Website administrator – the person responsible for managing the systems behind a website. Also called a webmaster.