• Timothy Smith

Why A Supercomputer Failed at Medicine (But Won at Jeopardy!)


Photo Source: Wikimedia Commons


Ten years ago, on February 14-15, 2011, IBM appeared poised to change the world through artificial intelligence. Fans and curious onlookers watched IBM’s supercomputer, Watson, win the question-answering gameshow Jeopardy! by soundly defeating two of the show’s greatest champions, Brad Rutter and Ken Jennings. In the defeat, Watson earned three times more money than the nearest human competitor (AP news). People watched in amazement as the computer would listen to the questions and answer them in its not-exactly-human voice. Following on the success of Watson, IBM launched Watson Health to help do to medical research what it did to Jeopardy! Researchers believed that if Watson could show such high information grabbing abilities in a game show, it could be able to make medical breakthroughs that human researchers. However, when trying to apply this assumption, things did not go as smoothly as anticipated.


Fast forward a decade to February 2021. The Wall Street Journal just reported that IBM announced that it is currently seeking a buyer for Watson. (wsj.com) In the article titled “Potential IBM Watson Health Sale Puts Focus on Data Challenges” the author, John McCormick, notes that the goal of using AI to treat disease turned out to represent a much more difficult problem to solve. The problem lies in the way Watson takes in information and how it learns from it.


IBM initially built Watson as a question-answering computer specifically to compete on Jeopardy! Behind the voice and logo visible to the public on the gameshow, IBM computer scientists had built a very complex and powerful computer. Watson represents an important step in the development of artificial intelligence because of the way that it works to answer a question.


Instead of searching for an answer in a set of known responses, Watson generates many possible answers by breaking the question down into parts and looking for connections in the millions of pages of information it has in its memory. For example, here is a clue from Jeopardy! that aired on December 18, 1989, “85% of the cars sold in the U.S. in 1968 had this type of large engine”(The answer is V-8). Watson searched its memory for top car companies in the US in 1968 such as Ford and GM. Watson then looked up the best-selling models and the types of engines the models had. With such information, Watson identified the best answers and then ranks these responses by how correct each potential answer could be. A person then verifies whether or not Watson was right. The way Watson learns these correct answers is its biggest strength and weakness. The way Watson learns is through humans telling it which answer is correct. A person needs to train Watson on what is correct and incorrect. If Watson is trained with good information, then it will provide good guesses as to what the answer would be. In other words, quality input will produce quality output; however, it can also go the other way. If Watson is not trained with good information, it can’t make very good guesses.


Because Watson needs training, not everything works as well as answering Jeopardy! questions. In a widely publicized move, the MD Anderson Cancer Center in Texas, ended their collaboration with IBM Watson (Forbes) after several years and tens of millions of dollars spent on an attempt to get Watson to help in the cure for cancer. In spite of the promise of Watson, when it came to the complex world of cancer research, Watson could not deliver. The real story centers on the medical information used to train Watson. The information varied in quality which led Watson in some cases to produce bad answers. Gary N. Smith, the author of the book The AI Delusion, noted in an interview on mindmaters.ai that Watson can consume massive amounts of medical information far faster than any human. However, Smith suggests that the fundamental problem with the medical literature that it is mixed with both good and bad research. He speculates that the pressure on scientists focus on quantity over quality in regard to publishing produces rushed conclusions that contaminate the literature with low-quality research. Watson simply cannot distinguish the good from the bad, so when it receives bad information, it gives bad answers.


A decade ago, IBM’s question-answering supercomputer Watson wowed the world when it triumphed in Jeopardy! A system of complex computer programs allows Watson to make educated guesses to questions on any topic in which it has been trained such as general trivia; however, Watson needs a lot of care and teaching from experts to reach its full potential. It is not all-knowing, and its failure over the past ten years to yield results in cancer research highlights the challenge of deploying artificial intelligence to make medical breakthroughs. In many ways, a remarkable machine such as Watson suffers from the same limitation as many other things— “garbage in, garbage out.” As we start using artificial intelligence in more applications, especially applications that deal in life or death matters, we have to start becoming more careful about the information we give to artificial intelligence, just like the way we teach children. If we take the time to rear artificial intelligence carefully, we can achieve great things, but the opposite can also be true as well.




Dr. Smith’s career in scientific and information research spans the areas of bioinformatics, artificial intelligence, toxicology, and chemistry. He has published a number of peer-reviewed scientific papers. He has worked over the past seventeen years developing advanced analytics, machine learning, and knowledge management tools to enable research and support high-level decision making. Tim completed his Ph.D. in Toxicology at Cornell University and a Bachelor of Science in chemistry from the University of Washington.

You can buy his book on Amazon in paperback and in kindle format here.