Not Sure? Ask Everyone

Crowdsourcing is becoming an increasingly common tool to solve scientific challenges both big and small. It is even being put to the test in AIDS vaccine research

By Andreas von Bubnoff

1974 Josephus. These were the two words that had to be retyped exactly as they appeared on the screen in a recent attempt to sign up for a Facebook account. These images of scanned text are called captchas and are used by many web sites to make sure that whoever signs up is human, and not a computer program written to fill in the form.

But, unbeknownst to them, internet users who retype captchas also help digitize scanned text taken from books or sources such as the New York Times, which were created prior to the computer age. Although the scanned text from these sources can often be recognized automatically by computers, human help is necessary about 30% of the time to decipher words computers can’t identify, according to Luis von Ahn, a computer scientist at Carnegie Mellon University. Von Ahn, who co-invented captchas with his PhD advisor in 2000, started a project over a year ago to use these captchas to help fill in the words from books and periodicals that computers couldn’t identify. He realized that 200 million captchas were filled out every day for internet security, amounting to about 500,000 hours of work that could be used for another purpose. The project now utilizes about 100,000 web sites with captchas, including Facebook, Twitter, and Ticketmaster, which means people help decipher about 30 million words per day, says von Ahn.

This project is just one example of what some call “crowdsourcing.” Jeff Howe, a contributing editor at Wired, dubbed the term crowdsourcing in a 2006 article for the magazine. It describes a phenomenon where an undefined, generally large group of people, or a crowd, takes on tasks once performed by a designated person, usually an employee, in response to an open call. “The labor isn’t always free, but it costs a lot less than paying traditional employees,” he wrote. “It’s not outsourcing; it’s crowdsourcing.” With captchas, millions of people are helping to digitize books and articles for free. Another example, Howe says, is Amazon Mechanical Turk (MTurk.com). With MTurk, people offer small amounts of money to the public for solving simple tasks. Howe has been using an online service, which uses MTurk, to get interviews transcribed for his book on crowdsourcing that was published last year.

In contrast to such simple tasks, crowdsourcing is also used to solve more complex scientific problems. For example, researchers recently started to use an online game to get help from the general public in solving protein structures. Also, companies such as InnoCentive or NineSigma have sprung up to issue open calls to the public to get their help in tackling scientific or engineering challenges. These companies post challenges on their web sites on behalf of clients, often companies, who then offer a reward to anyone in the general public who can come up with a solution to the problem. The solutions can come from anyone, anywhere, and they often do—the success rate for the challenges posted through these sites is surprisingly high. This approach has even been used recently to address a challenge in AIDS vaccine research. In 2008, IAVI posted a challenge on the InnoCentive website to create a stable version of the HIV Env protein.

Although using the internet to post challenges is a fairly recent phenomenon, the strategy of putting out an open call to the general public in exchange for a reward or prize has existed for a long time. In 1927, Charles Lindbergh won the Orteig prize for flying non-stop across the Atlantic Ocean. Today, some organizations still offer large amounts of money for the solution of quite ambitious goals. Google’s “Project 10¹⁰⁰” offers US$10 million to fund up to five “ideas to change the world,” and the X-Prize Foundation has announced a $10 million award to anyone who can sequence the human genome faster than ever before. The X-Prize Foundation may soon get involved in infectious disease research as well—the organization has received a grant from the Bill & Melinda Gates Foundation to explore a future prize for a better tool to diagnose tuberculosis.

Gaming for science

It doesn’t take as much knowledge to retype a captcha as it does to develop a faster way to sequence the human genome. But not all solutions to scientific problems require specialized knowledge. Last year, University of Washington researchers launched the online game Foldit, which allows players to earn points by finding the lowest possible energy structure of proteins. Players use their mouse to move around parts of proteins, which are displayed on the screen, and score points for getting the protein in a conformation closer to its lowest energy state, which usually represents its natural structure. Using a computer to find the lowest-energy state requires a significant amount of time because there are many possible low-energy structures for any protein.

Originally, David Baker, a University of Washington professor of biochemistry, and his team developed a downloadable program called rosetta@home that used the computer’s downtime to sort through protein structures. As a reward, it showed the results of the calculations as a screensaver. The online game Foldit was created because users of rosetta@home wanted to participate, not just watch, Baker says. “They thought they could do better,” he says. And it seems that they can. People see which particular options to try in a more efficient way than computers would, says Zoran Popović, a computer scientist at the University of Washington who developed Foldit with Baker and others. As of May 2009, about a year after Foldit was launched, it had over 100,000 players, Popović says, adding that people seem to be at least as competitive as massive computational efforts in finding low energy protein structures. “They can find solutions that the computers have not found,” he says. Recently, Foldit announced a new feature of the game that allows players to design the HIV Env protein to expose areas that are vulnerable to neutralizing antibodies.

Paying the crowd

Successful Foldit players are rewarded with peer recognition and being able to affect the direction of research, Popović says. “The best performing proteins will be synthesized in the lab.” But whoever solves scientific or engineering challenges posted by companies like InnoCentive or NineSigma is usually eligible for a financial reward.

At InnoCentive, a seeker looking for a solution to a problem pays a fee to InnoCentive to post a specific challenge on its website. Anyone can then submit a solution. Some challenges only require a written proposal of ideas as to how to solve a challenge, others require additional evidence that the solution actually works, such as original data from experiments or even a physical sample. The seeker then pays a cash award to the solver who provides the solution that best meets the requirements of the challenge.

Ed Melcarek, a 60-year-old Canadian engineer and scientist, says he has made over $115,000 for solving seven challenges since 2003. InnoCentive named him one of the most successful solvers of 2007. “[Seven solved challenges] is a lot given the complexity of those problems,” Howe says. Melcarek has submitted solutions to 31 additional challenges which did not get awarded, and he currently has five others pending.

As a postdoctoral student at the University of Chicago, Laurie Parker spent 30 minutes in 2006 to solve a $5,000 challenge. She found a new way to synthesize large collections of random peptides without using a traditional biological approach like polymerase chain reaction or cloning, says Parker, now an assistant professor of medicinal chemistry and molecular pharmacology at Purdue University. It didn’t take her long to solve that challenge because she was already working on a chemical reaction that also applied to this challenge, she says. In addition, the seeker only asked for ideas about how to solve the problem without necessarily proving that it worked, says Parker.

On average, it takes two weeks, or 80 hours, for solvers to come up with a solution to an InnoCentive challenge, according to a study of 166 challenges solved through the company’s website between 2001 and 2004. The study also found that the further removed the background of the solver was from the area the challenge pertained to, the more likely it was that the problem got solved, says one of the study’s authors, Karim Lakhani, an assistant professor in the technology and operations management unit at Harvard Business School. “In our analysis, the problem solvers said that the problem that they tried to create a solution to was typically outside their own field of expertise,” he says.

For example, John Davis solved a challenge to help with oil spill recovery. The challenge from the non-profit Oil Spill Recovery Institute was to find a way to liquefy the oil/water slush collected on barges from arctic waters in the case of an oil spill so that it could be pumped from the barges to larger storage tanks on land. Davis says he had once helped a friend whose family owns a small concrete business and remembered that construction workers used a vibrating device to keep the concrete from solidifying at construction sites. He thought the same approach might work on the oil/water slush. After a day of work, and a call to the company asking if they could modify the vibrating device for this purpose, he filed the solution. A few months later, he received $20,000.

InnoCentive says that about a third of its challenges get solved, but Lakhani says it’s hard to know how this compares with the in-house success rate of companies, because most don’t keep track of that or share it publicly. Still, Lakhani says that in his conversations, research and development chiefs at various organizations seem “very surprised” by the high success rate of InnoCentive, especially considering that these challenges likely get posted on the InnoCentive web site because the firms couldn’t solve them in-house.

NineSigma, another company that connects clients with potential solvers, was founded in 2000 by Mehran Mehregany, a professor of electrical engineering and computer science at Case Western Reserve University. Mehregany says he founded the company once he realized that the elaborate system the government uses to issue open calls to academic researchers wasn’t available to industry. “Industry does not have a similar systematic infrastructure to broadcast its science and technology needs to the broader science and technology community,” Mehregany says. Anyone can submit a solution for a NineSigma challenge, but for all challenges, the company also uses a proprietary system to proactively find experts that are likely to be able to solve a challenge. “If you are out there and we think you relate to our challenge, we will do our best to find you,” Mehregany says. That’s why NineSigma calls its approach “expert sourcing” instead of crowdsourcing. For some of its challenges, InnoCentive also tries to identify potential solvers outside of its network of registered solvers.

One issue for the two companies is how to handle intellectual property (IP) rights. At NineSigma, solvers negotiate their IP rights directly with the seeker. At InnoCentive, solvers sometimes transfer their IP rights when they accept the award money. Melcarek doesn’t have a problem with that. “I would much sooner have the cash in the bank than a piece of paper saying that I own property rights,” he says. “[Then] you have to find somebody that’s in that field to buy the patent from you.” But Parker says she might not want to sign away her IP rights if it kept her from patenting future work in her own field. “I would never want to sign away the rights to something that I am interested in pursuing,” she says.

From Crowdsourcng to Crowdfunding

While many organizations are using the principle of crowdsourcing to find solutions to problems, some are using the same principle to find funding. IAVI is sponsoring three projects on globalgiving.com, a web portal where people interested in making a donation can look through hundreds of causes or projects and choose which one to help fund, says GlobalGiving Program Officer Saima Zaman. One project aims to increase the number of HIV testing and counseling outreach teams around AIDS vaccine clinical research centers in Entebbe, Uganda. The goal is to raise US$27,000 for the project. So far, $2,554 has been raised from 75 donors, according to the GlobalGiving website. —AVB

Crowdsourcing for non-profits

InnoCentive typically has companies as clients, but non-profits also post challenges. Prize4Life, for example, a non-profit organization trying to accelerate the discovery of treatments and cures for Amyotrophic Lateral Sclerosis (ALS), posted a challenge to identify a biomarker to measure disease progression in ALS. “We try to make it very appealing for non-profits because we think non-profits have not had access to the same innovation channels that the commercial interests have,” says Dwayne Spradlin, president and CEO of InnoCentive. “We will typically either lower the prices or increase the amount of services we provide to non-profits.”

Also, the Rockefeller Foundation collaborated with InnoCentive from 2006 until 2008 to encourage non-profits to participate. The foundation would typically pay the fee required to post a challenge as well as half of the award money on behalf of the non-profit, according to Amanda Sevareid, a research associate at the Rockefeller Foundation. Once a problem was solved, the foundation would pay the rest of the award money if there was evidence that the solution was successfully implemented. Six non-profits took part in the program, and most of their challenges were solved. In late 2008, the TB Alliance announced two awards of $20,000 each for improving the synthesis of a tuberculosis drug candidate.

IAVI posted a $150,000 challenge late last year as part of the Rockefeller Foundation program. The challenge was to create a protein that mimics the trimeric HIV Env protein and would remain stable in laboratory testing. In its natural state, the Env trimer is unstable and breaks down easily when entering the body, according to Kalpana Gupta, director for new alliances and initiatives at IAVI, who was involved in developing the challenge. As a result, it has been difficult to trigger antibody responses against the trimer.

The solution required showing that neutralizing monoclonal antibodies to Env can bind the new protein in vitro. If the Envelope structure also turned out to be sufficiently immunogenic in animal testing, the solver would be eligible for a bonus of up to $500,000 and/or the opportunity to pursue their research further with support from IAVI, Gupta says. Because the award money was quite high in this case, Rockefeller agreed to pay just a third of the initial $150,000 of the award money. However, none of the solutions submitted by the deadline met the requirements of the challenge.

Making science more transparent

In addition to solving problems, scientists are also increasingly using the principle of crowdsourcing to share and collect information and data. In one such effort, scientists use the online encyclopedia Wikipedia to compile information about scientific topics. In late 2007, Andrew Su, group leader of bioinformatics and computational biology at the Genomics Institute of the Novartis Research Foundation in San Diego, started the Gene Wiki project, by creating thousands of gene-related entries on Wikipedia, initially using information from existing databases (1). The Gene Wiki project by now has added about 9,000 gene pages to the approximately 650 gene-related pages that existed on Wikipedia before the project started. Su says that once the entries are created, people are more likely to add information to them than when they have to create a new entry first.

In his daily work, Su also sometimes asks his fellow scientists for advice using a site called FriendFeed. That’s where he went when asked for this article about how scientists use crowdsourcing. Within a few hours, the answers started to arrive. “I like the recursive nature of crowdsourcing an answer to a question about crowdsourcing,” read one reply.

One example Su mentioned of how scientists use crowdsourcing is Jean-Claude Bradley, an associate professor of chemistry at Drexel University. Bradley coined the term “open notebook science,” which aims to make the scientific process more transparent by making a researcher’s lab notebook public, in real time. Bradley, the members of his lab, and students from around the world post the results of their measurements of the solubility of chemical compounds for anti-malaria drugs on the web. He says such transparency can save time that would otherwise be wasted repeating other people’s mistakes. “You have to see how other people are failing,” Bradley says, adding that transparency also enables collaboration with other researchers.

Still, seekers or solvers of challenges through companies like InnoCentive might need to keep the way a challenge was solved confidential to protect intellectual property rights. For academic researchers, it might be less of a concern as long as they don’t want to patent a finding. However, there is the concern that researchers from large, well-funded labs might take research ideas they find in open access and use their resources to do experiments to turn that into a grant or publication, even though the original research idea wasn’t theirs, says Parker. But Bradley says it should be easy to identify plagiarism because everything is on the web, providing a track record as to who came up with the findings when. “I think it would be really embarrassing if somebody came in and copied stuff that anybody can Google,” Bradley says. “I think we are safer because it is so public.”

1.PLoS Biol. 6, e175, 2008