Crowdsourcing Evidence for Investigations
by Tetyana Bohdanova
In Short: This guide provides you with essential techniques, tools, examples and considerations to be aware of when planning and managing a crowdsourcing effort. It also demonstrates the role of crowdsourcing as a method increasingly used by journalists, activists and researchers who engage with communities to collect information, verify it and build evidence that can help expose issues affecting societies everywhere.
Around the world, increasing access to technology and social media makes gathering information directly from those affected by a social issue, a natural disaster, or a conflict easier than ever before. Evidence gathering, verification and investigation are no longer exclusively undertaken by journalists and the media, but also by people and communities determined to expose problems and events that affect their lives, to advocate for change, and to seek justice.
Crowdsourcing is an increasingly valuable method of gathering and corroborating information, as well as of reaching out to sources and to wider communities. This is especially the case when approaching local issues, widespread injustice, or unfolding events where it is difficult to keep track of constant changes without involving those immediately affected. We also approach crowdsourcing as an opportunity for the media and other civil society members to collaborate with communities in a fair, ethical, and safe manner.
What is crowdsourcing?
The term crowdsourcing was first coined by Jeff Howe in a 2006 Wired Magazine article where he defined it as a new way of sourcing labor enabled by the internet. Different types of commercial and non-commercial crowdsourcing emerged since – for example, Wikipedia and Kickstarter are crowdsourcing projects, utilizing collective knowledge and crowdfunding respectively.
Crowdsourcing is often used by journalists. The Tow Center for Digital Journalism of Columbia University defines Journalism crowdsourcing as “the act of specifically inviting a group of people to participate in a reporting task – such as news gathering, data collection, or analysis – through a targeted, open call for input; personal experiences; documents; or other contributions.”
Examples of crowdsourcing initiatives used by journalists:
In 2022, the Bureau of Investigative Journalism and OjoPúblico revealed the extent to which school children in Lima, Peru are exposed to an abundance of adverts and displays for cigarettes placed on sweets and food stands in shops near their schools. The journalists used crowdsourced images shared by people from across Lima’s Metropolitan Area and mapped schools according to the intensity of this exposure. The investigation went further to show how Big Tobacco companies were driving this approach in Peru and other regions, while playing a different game in stronger regulated areas like Europe.
Since 2022, the Ukrainian government has set up an official platform - WarCrimes - coordinated by the Office of the Prosecutor General of Ukraine, where any user may personally submit evidence of crimes committed by the occupying Russian military.
German media outlet CORRECTIV set up its own CrowdNewsroom that runs collaborative investigation and reporting projects based on suggestions, grievances and data gathered from concerned communities. People can contribute ideas and data in response to calls and surveys, and even take part in collective investigations. In CORRECTIV’s own words “those affected - e.g. tenants, soccer players, cyclists and many more - share knowledge, important data and insights and thus create the basis for a journalistic analysis” (source: https://crowdnewsroom.org/en/). For example, in 2019, CrowdNewsroom investigated who really owns and controls rents in Hamburg’s non-transparent real-estate market by using data about rental contracts collected from the city’s inhabitants.
In 2018, news outlet ABC Australia conducted the country’s biggest crowdsourced investigation into aged care by gathering citizens’ personal experiences with the system.
In 2014, Netherlands-based outlet Bellingcat heavily relied on crowdsourcing multimedia materials and verification efforts during its flagship investigation into the downing of the Malaysia Airlines 17 (MH17) passenger airplane over Ukraine.
For over 15 years, crowdsourcing has been used for mapping crisis information, enabling so-called “activist mapping” or a type of activism, which combines crowdsourcing, citizen journalism, and geospatial information for social change or public accountability. Frequently, crowdmapping is used during disaster relief efforts, including in combination with other technologies, such as the use of drones or satellite imagery.
Examples of activist mapping initiatives using crowdsourcing:
In 2021-2023 an academic and humanitarian research project ran in collaboration with the Humanitarian OpenStreetMap (HOT) and OpenStreetMap Ethiopia (OSME) crowdmapped food security in Ethiopia.
In 2020, at the height of the COVID-19 pandemic, Harry Machmud from Humanitarian Open Street Maps used the Ushahidi platform for Mapping Handwashing Stations in Indonesia to prevent the spread of COVID.
Since 2014, collaborative project Missing Maps uses OpenStreetMap to map disaster affected areas that are “literally ‘missing’ from open and accessible maps” so that first responders and other humanitarian organizations can better prioritise their relief efforts.
In addition, various initiatives employ crowdsourcing for better governance, accountability, and defending human rights.
Some initiatives using crowdsourcing for governance, accountability, and human rights purposes:
Since 2016, Amnesty International crowdsourced the labor of at least 50,000 digital volunteers to help investigate human rights abuses via its Decoders Initiative (archived here).
Since 2008, “I Paid a Bribe” participatory platform crowdmaps anonymous reports of corruption across India. The initiative has already expanded to five continents.
Launched in 2007, an open source FixMyStreet platform by mySociety solicits citizen reports about street problems, maps them, and reports to the councils responsible for fixing them across the UK.
In 2020, women-activists started adding services often overlooked by men to a popular crowdsourced map of Mexico city.
Crowdsourcing is increasingly used in investigations in combination with other methods, such as Open Source Intelligence (OSINT). For example, Ukrainian activists combine these two frameworks for documenting the ongoing Russian military invasion.
Is crowdsourcing right for you?
Despite such prolific use, there are notable risks and disadvantages to crowdsourcing, which should be weighed carefully in each case.
Pros of crowdsourcing | Cons of crowdsourcing |
---|---|
Allows tapping into a vast pool of data otherwise inaccessible to organizers | Carries the danger of data manipulation |
Allows engaging diverse contributors | May require a lot of know-how and resources |
Can help save time and costs | Carries the danger of coming up empty-handed |
Opens new avenues for collaboration with contributors and/or others working in the same space | May carry potential risks to organizers and contributors |
While there are ways of mitigating risks such as manipulation of data (see the section on verification), when the cons outweigh the pros, alternatives to crowdsourcing may be considered.
Steps of crowdsourcing evidence
1. Define purpose
Why you want to crowdsource is the key question to answer before setting up a crowdsourcing effort.
Whether it is to tell a complete story of an event like ProPublica’s Electionland project investigating problems during the 2022 federal elections; engage citizen around an important issue, such as monitoring the air quality around the world; or bring perpetrators to justice, as done by the Ukrainian government with collecting digital evidence of Russian war crimes, setting the goal of a crowdsourcing effort is likely to influence the type of data you collect, its format, the extent to which you will attempt to verify it, and how you are going to present your findings.
The clearer you are when setting your goal(s), the easier it will be to deal with other important aspects of crowdsourcing outlined below.
2. Consider ethics and safety
Before engaging in crowdsourcing it is important to consider a range of issues connected with ethics and safety. This includes the accuracy of the information, the privacy and security of contributors and your team, the ownership/property of the data collected, and the accessibility of your effort. Some of these aspects may also have legal implications, so it is wise to seek legal counsel before launching a crowdsourcing project if you feel you are navigating uncertain conditions.
Any risks and security considerations that may affect your contributors and team need to be carefully assessed before you engage in crowdsourcing. In general, it is highly recommended to conduct a risk-based assessment and to have a risk mitigation plan in place. For additional safety tips when conducting evidence collection and collaborative projects, read this Safety First guide.
While you may have strategies and tools at your disposal to protect your team, it is important to be clear about any risks to your potential contributors. Your role here is to ensure that your target audience(s) is aware of the risks and is able to make an informed decision as to whether to engage in your crowdsourcing effort. In turn, you should take every measure to protect contributors’ privacy and anonymity where needed, by using data submission platforms and tools where people’s identity and communications / information sharing can be protected. At times this may mean taking extra steps to anonymize the data before further processing.
Also note that sometimes your contributors may not come from the subset of the population most affected by the issue(s) you are gathering information about, such as in this crowdsourced investigation about homelessness in the UK by the London-based Bureau of Investigative Journalism.
3. Define audience, scope, and duration
Clearly defining your target audience(s) is another key to the success of a crowdsourcing effort.
Think about who your main potential contributors are, whether you are able to identify and reach them (note that they may be different from the end beneficiaries of your crowdsourcing effort):
Consider such demographic characteristics as age, gender, or geographic location of your contributors.
If you do not want your data to be coming from just one subset of the population, ask yourself whether the information about your efforts can reach marginalized groups or whether the tools you are using are equally accessible to everyone or are likely to widen existing digital gaps.
Note that crowdsourced data usually cannot be representative in a sociological sense. However, you may still want it to come from a variety of locations and different groups, to present a more comprehensive picture with your findings.
The goal(s) of your crowdsourcing effort will have direct implications on its scope and duration. For instance, the duration of crowdsourcing itself would depend on whether you want to organize the effort around a particular event or phenomena. Another consideration here includes the (sometimes limited) resources available to you.
In short, you must make the best with what you have, but make sure you are still able to collect sufficient data to work with. What that “sufficient” volume of data looks like is another aspect for you and your team to define.
4. Identify the best method
Choosing the right approach really depends on your goals. Sometimes, journalists would set up a secure channel for citizens to anonymously send in tips, rights defenders may encourage victims to submit evidence of abuse in whatever format they may have it, while election observers may want voters to attempt to categorize the type of irregularity they witness according to some pre-set criteria.
The extent to which you need the crowdsourced data to fit a strict format needed for analysis will define whether you should engage in what the Crowdsourcing guide by the Tow Center for Digital Journalism of Columbia University refers to as “structured” and “unstructured” call-out for submissions:
In the “structured” type, journalists target certain groups with a specific request for information. The information is usually provided in a predefined format and captured in a searchable database.
In the “unstructured” type, the public is invited to contact journalists with information via an open call using various channels (email, telephone, SMS, online polling software, etc.) to contribute votes, calls, tips, or any other material they wish to submit to a news organization/journalist. This format usually follows the open data collection format.
The benefits of crowdsourcing specific data in a structured way include aggregating specific types of evidence in a unified format, which allows easily analysing data. However, a stricter format may limit the ability of your target audiences to contribute data.
In turn, unstructured callouts allow crowdsourcing a greater variety of data from potentially larger numbers of contributors, without limiting yourself and your audiences by the types of reports you think you may receive. At the same time, verifying and analyzing data that came through a variety of channels in multiple formats may be a lot more labor and time demanding.
Examples:
Structured crowdsourcing: When investigating the alarmingly high number of maternal deaths in the US, in 2017, NPR and ProPublica correspondents published a questionnaire, reaching out to women who had experienced life-threatening complications in childbirth.
Unstructured crowdsourcing: In 2016, The Correspondent reporter published an open appeal to Shell employees to send him information about whether the company knew they directly contributed to climate change or otherwise help his investigation.
Sometimes, a mix of approaches may be used, especially in large collaborative projects or where evidence needs to be cross-referenced across multiple streams of data.
5. Identify the right tools
It is easy to feel overwhelmed or get overly excited about using a particular technical tool for crowdsourcing. However, it is important to choose the right tool that would align with the goals and needs of your crowdsourcing effort, rather than trying to design the crowdsourcing operation around it. When making a choice, consider:
Technical environment. – Do most members of your target audience have access to internet? What kind of connection do they have? Do they have access to mobile devices or smartphones? If so, what types / models of phones are people most likely to use? Also consider the level of their computer and digital skills overall.
Privacy and security. – It is important to consider whether participating in your crowdsourcing effort potentially carries any risks for your target audience. If so, it is critical to do everything in your power to offer a secure channel of communication and to protect the identity of your contributors. Although people may be more lenient about security than you would expect, you should strive to make sure they are aware of the risks and the extent to which you are able to mitigate them.
Using existing tools vs building / introducing something new. – Users are especially reluctant to change their habits regarding technology. Investigate which tech tools your target audience is already using (i.e. social networks, instant messengers, etc.) and consider integrating those tools into your crowdsourcing effort. If you choose to develop and introduce a special tool for crowdsourcing data in this particular environment, consider that despite your best efforts getting people to use it may take a while or you may not be successful at all.
Some popular communication tools and their pros and cons
There are multiple secure communication tools that are used by journalists and activists. While no technology is 100% secure, there are instruments that attempt to create a safer environment than normal communication channels (such as telephone, social media, email) provide. No one tool is best for everyone, so it is important to carefully consider individual circumstances of your prospective contributor(s).
Examples of secure communication channels that can be used in crowdsourcing:
Tool | Characteristics | Trade-offs |
---|---|---|
Signal: https://signal.org/ | Signal is a free and open source secure messaging app for iOS and Android, developed by Open Whisper Systems. It encrypts all communication from end to end, making all data accessible only to the sender and recipient. | Signal is not nearly as popular as WhatsApp or other end-to-end encrypted messages. Users must register using their phone numbers, but now they have the option of not sharing that number with contacts (using an alias instead). A major plus is that Signal records virtually no metadata about your contacts or messages, so as to make it impossible to infer details about your communication based on your use of the app. |
Pretty Good Privacy (PGP) email encryption | PGP is an encryption standard that is popular among journalists for securing email. It uses public key cryptography, meaning that each user has a "public key" used to encrypt messages to other users. The public key can be shared with anyone. Each user also has a corresponding "private key" that is used to unscramble messages, and should never be shared. Examples of email encryption software include GPG Suite for Mac, GPG4win for Windows and Linux, Thunderbird with the Enigmail extension, and Mailvelope. | PGP requires a certain level of technical knowledge and training before a regular computer or smartphone owner could use it. Other secure communication channels may offer a comparable level of protection while being more user friendly. |
Protonmail: https://protonmail.com/ | ProtonMail is a free PGP fully integrated free email service. This means that with ProtonMail, anybody can use PGP, regardless of their technical knowledge. It also prevents anyone, including ProtonMail itself, from reading or sharing your emails while at rest, a concept known as zero-access encryption. | While free and easy to use, by default, ProtonMail communicates with external email accounts without end-to-end encryption. This means that the external email provider on the other side might have access to the emails sent from ProtonMail, so it is a good idea to communicate sensitive information only inside ProtonMail service. |
SecureDrop: https://securedrop.org/ | SecureDrop is an open source whistleblower submission system news organizations can install to safely and anonymously receive documents and tips from sources. It is available in 20 languages and is used by many news organizations worldwide, including The New York Times, The Washington Post, ProPublica, The Globe and Mail, and The Intercept. | While SecureDrop allows any organization that installs it to completely own the servers, minimize metadata, encrypts data and imposes a series of other strong security practices, it can be costly and difficult to set it up yourself. |
Tella: https://tella-app.org/ | Tella is a free open source mobile data collection application designed for environments with limited internet connectivity and high security risks. It is currently available on Android in a number of languages. | While Tella is relatively easy to use and is customizable to the needs of an organization using it, deploying the application still requires training users and some technical skill for backend server setup. |
OnionShare: https://onionshare.org/ | In order to send files securely you can use OnionShare in combination with other communication tools in order to send files peer-to-peer. | You will need to establish a separate communication channel in order to transfer files securely to a person, and it may not be the ideal method in large crowdsourcing projects, but rather in smaller-scale initiatives, as well as when working with whistleblowers. |
Tresorit: https://send.tresorit.com/ | Tresorit is a paid cloud storage but their file sending is free and supports up to 5GB of data per link and is end-to-end encrypted. | Tresorit has proprietary code but has been audited by third parties. Intercepted links can also expose the data. The created links are not password protected by default. |
WhatsApp and Telegram use and vulnerabilities:
Many people are active on commonly used platforms such as WhatsApp and Telegram. Like Signal,
WhatsApp stores users’ phone numbers. It is owned by Meta (formerly Facebook) and shares the user’s phone number and user analytics with the social media company. Meta can also be forced to share its troves of user data in response to a court order, subpoena or law enforcement requests. Due to the large amounts of metadata stored by WhatsApp and the company’s eagerness to share data with law enforcement, you need to be extra careful when using this tool, particularly with sensitive investigations. WhatsApp may also be backing up your unencrypted messages to iCloud or Google Drive, which is a feature that can be turned off in the messenger’s security settings. For an extra layer of privacy, you can turn on the end-to-end encrypted backup.
Telegram is widely used by communities to share information, but due to the closed source nature of Telegram’s source code, security experts cannot guarantee its safety.
6. Engage the target audience(s)
Successfully engaging the members of a community that you want to contribute data is half the success of your crowdsourcing effort. In other words, you may do everything else right but if no one participates, all of your work would be in vain. Therefore, it is critical to think about community engagement in advance. Here are a few recommendations:
Be sure to consider the particular social and political conditions in which you operate. If contributing data to your effort involves a certain degree of risk, people would be reluctant to do so if they do not believe that tangible change may come out of it. Think about how you may demonstrate results and close the feedback loop with your audience (including when contributors remain anonymous).
Think about how you can get its members interested in and excited about being a part of your crowdsourcing effort. Sometimes, this may mean that crowdsourcing should be preceded by some awareness-raising and trust-building work. You may choose to run an information campaign about a particular issue, engage with opinion leaders, build trust with the most active members of the community and so on.
Examples of audience engagement:
The Global Investigative Journalism Network cites different community outreach methods including community meetings and listening events, run as a part of various audience engagement campaigns.
Internews provides a separate resource on community engagement called Listening Post Collective.
Crowdsourcing through the “snowballing effect”
Try to create a “snowballing effect” by making sure some contributions come in as soon as possible after you launch your effort. When members of the community see others actively participating they are more likely to get engaged themselves, thus every new submission makes the initiative more attractive to other potential contributors.
To achieve this, some initiatives start by publishing the data their own members uncovered alongside data (to be) submitted by their target audience. Others mix crowdsourced data with that collected internally via OSINT techniques.
However, If you do combine data collected in different ways, make sure to differentiate between such types of data during publication.
Examples of crowdsourcing with the “snowballing effect”:
“Wall Evidence” crowdsources photos of inscriptions left by Russian soldiers in Ukraine. The initiative first published those that its members personally documented in Kyiv and later Chernihiv regions, de-occupied in early spring of 2022. Following this, the team called on residents of these territories to submit photos using simple means like email and popular messengers.
Documentary filmmakers of “Anyone’s Child: Mexico” have set up a free telephone line in Mexico, so that families who lost a loved one in the drug war are able to share their stories. Anyone calling may also listen to the stories of other contributors.
7. Develop a verification protocol
Verifying crowdsourced data is extremely important. Depending on the type and format of data that you are attempting to collect, think carefully about how much verification you would want and be able to carry out. Here are a few steps to consider:
Before developing a verification process, decide how much verification you deem “enough” for the data to be publicized.
If you are unable to verify the data but want to publicize it, think about “vetting” the data first, by cross-referencing it against other known sources or factors. For example, you may want to ask questions like:
Do received contributions resemble the data we expect to receive?
Do we have any other verified information that directly or indirectly confirms this data?
Is the data coming around expected time? (e.g. if we are talking about a time-specific event such as voting, data about alleged voting rights violations cannot arrive before the polls open at a specific place)
Is the data coming from an expected location?
Provide a clear explanation of the extent to which you are able to verify the data when publicizing your findings. Clearly mark “unverified” data if choosing to publish.
Due to its nature, crowdsourcing bears an inherent risk of manipulation, particularly if you are up against malign entities that can utilize bots or organize users to intentionally corrupt your data. If you have indications that data has been purposefully corrupted, consider carefully whether the findings should be publicized at all.
Whether to mitigate the risk of crowdsourced data being corrupted by malign actors or to provide additional avenues for verification, many organizations choose to supplement crowdsourcing with other data collection methods. For instance, in disaster relief work it is not uncommon to combine data generated by social media users with drone footage or satellite imagery. In investigating rights abuses, crowdsourced reports may be further documented / verified by trained personnel on the ground.
Example of addressing verification of crowdsourced data:
Russian election violation crowdmapping platform “Karta Narusheniy” provides a disclaimer, stating that the reports of election irregularities are voluntarily contributed by users and are published without additional verification by website administrators in order to promptly bring data to the attention of election administration bodies and law enforcement. When a serious incident is uncovered, a mobile field team would attempt to promptly investigate the situation on the ground. This map of violations is a project by Russian independent election monitoring organization “Golos”, disbanded due to pressure from the Russian government and operating as an unregistered civil movement.
8. Analyse data and present collected evidence
Remember that those contributing data and the final beneficiaries of your work may be different. Try to present your findings in a format that will be accessible to the audience(s) that you want to share them with. This may affect the format in which you want to crowdsource data.
Example of analysis of crowdsourced data:
The website of “I Paid a Bribe”, a project tracking corruption in India, highlights the map of the country according to the density of reports coming from each region, publishes individual reports in real time, totals them by category, and provides an overview of resulting news publications.
It is extremely important to present crowdsourced data in an honest and truthful manner. To ensure your findings are perceived as credible:
Describe both your methods of data collection and how you arrived at your conclusions.
On your website and any other materials, clearly state whether and to what extent you have been able to verify crowdsourced data.
Findings need to be presented in an interesting and engaging manner. One way to do it is to think of a “story” you are trying to tell with your data. Take the example of this terrifying “Lost Mothers: Maternal Mortality in the US” series based on the stories crowdsourced by NPR and ProPublica.
Important considerations
Collaboration opportunities
Crowdsourcing often opens up various avenues for cooperation. Those could be other groups or separate activists potentially interested or already engaged in similar work.
Ask whether there are any groups with experience in crowdsourcing evidence or whose data you can use to cross-reference your findings. It is usually a good idea to partner with others in order to enhance your efforts or avoid duplication. Additionally, some fun mixed data collection approaches may emerge as a result of your collaboration.
Example of collaboration:
ProPublica’s Electionland project represented a collaborative journalism effort to cover voting access, cybersecurity, misinformation and election integrity in the 2020 USA elections. In order to document voting impediments in real time, the organization assembled a coalition of over 150 newsrooms around the country as well as launched a call to voters, poll workers, and election administrators to report any problems they experience or witness during the voting process via a variety of channels.
Crediting contributors
It is important to always give credit where credit is due, even in crowdsourced efforts. This may involve giving gratitude to collaborating organizations, listing the tools and software used, and even naming contributors (groups or most active individuals in especially large efforts) if they agree to it. Weigh the risks involved in naming contributors, make everyone aware of any potential risks, and make sure that people who need to remain anonymous do not get exposed.
Example of crediting contributions:
When publicizing their findings about the civilian deaths during the 2017 bombing of Raqqa, Syria, Amnesty International and Airwars listed and credited all partners, tools, and major contributors providing multiple streams of evidence used in the investigation (See https://raqqa.amnesty.org/ > “Toolkit” > “Credits”).
Crowdsourcing represents an unprecedented democratization of information gathering, which transforms investigations and evidence collection into community-driven endeavors, enabling collaboration between media, civil society, and affected groups. Combined with other methods, such as open source intelligence (OSINT), it provides a valuable way to document dynamic events, to address local concerns and to expose systemic injustice.
At the same time, it is critically important to use crowdsourcing in an ethical, fair, and responsible manner to uphold the integrity of information, protect contributors, and ensure the positive impact on communities and society.
Published in May 2024
Resources
“Guide to Crowdsourcing”, by Mimi Onuoha, Jeanne Pinder, and Jan Schaffer, November 20, 2015, Tow Center for Digital Journalism of Columbia University.
“How Three Reporting Teams Crowdsourced Groundbreaking Investigations”, by Maurice Oniang’o, November 15, 2021, the Global Investigative Journalism Network.
Workshop curriculum “Crowdsourcing Information for Investigations”, by Tetyana Bohdanova, Exposing the Invisible, Tactical Tech.