Article: Battling Useless and Harmful AI with AI
- Dr. Timothy Smith
- Sep 27, 2023
- 3 min read

Photo Source: Unsplash
Building large language models such as OpenAI’s ChatGPT and Google’s Bard requires massive amounts of text data and millions of dollars in computing power to crunch all the text of the internet, for example. However, a newly minted model has no manners and can quickly churn out offensive and even dangerous responses to questions posed by its users. For instance, fraudsters have embraced LLMs to make their scams more plausible and realistic. In some cases, LLMs help to create realistic-looking bank statements and electrical bills that help individuals apply for credit under someone else’s name. Such scams can even allow thieves to borrow against someone’s home without them knowing it. LLMs without proper training can produce toxic language and even provide instructions on how to commit crimes. For such reasons, the hard work of model building happens after constructing the raw model, and humans and machines train it to be as helpful and harmless as possible. A maximally helpful LLM would answer any question posed by a user without consideration for the harmfulness of the response. Such as providing a recipe for a poison cocktail or how to write a computer virus. To combat the spread of such information from an LLM, computer scientists, linguists, lawyers, ethicists, educators, doctors, and more work in large teams to tune the models to decline to answer harmful questions. Taken to the safety extreme, a model could be instructed not to answer any but the most benign questions, but such a model would have little utility. Computer scientists use different techniques to tune models to achieve maximal helpfulness and minimal harm. One such process involves thousands of people asking the model questions, grading the answer as harmful, and instructing the model to decline to answer the questions. Such tuning goes by the name reinforcement learning with human feedback or RLHF. RLHF costs millions of dollars and depends on a common ethical code among the thousands of people providing the feedback, not to mention the trauma that the humans who trained the models endured while categorizing the worst elements of the internet. (cmswire.com) With the size and complexity of LLMs continuing to grow, researchers continue to develop new methods for tuning models that will rely less on brute force evaluation of model responses and use another artificial intelligence governed by rules of appropriate conduct or a constitution to do the first pass of training, leaving only the most ambiguous problems for human feedback. The process generates questions and randomly pulls rules from the constitution to see if the response breaks the rule. If so, the question politely will get no answer. This approach, dubbed Constitutional AI by researchers at Anthropic, has produced measurable improvement in model helpfulness while reducing harm with significantly less human interaction with harmful elements of the internet. (arxiv.org) Large language models have taken the world by storm over the past year since the release of ChatGPT in November 2022, and these tools achieve much of their utility by consuming massive amounts of text data to train the model to give natural-sounding answers to human questions. These models do not emerge as well-behaved artificial intelligences. Instead, they must undergo massive training and tuning to not produce harmful and toxic responses. Human and machine training has helped to make the latest models helpful and much less harmful. The efforts to tune these now use AI to train AI. With AI watching AI, over time, one has to wonder who will be watching the AI looking over the AI?

Dr. Smith’s career in scientific and information research spans the areas of bioinformatics, artificial intelligence, toxicology, and chemistry. He has published a number of peer-reviewed scientific papers. He has worked over the past seventeen years developing advanced analytics, machine learning, and knowledge management tools to enable research and support high-level decision making. Tim completed his Ph.D. in Toxicology at Cornell University and a Bachelor of Science in chemistry from the University of Washington.
You can buy his book on Amazon in paperback and in kindle format here.


Comments