top of page

Article: How New Technology Is Fighting AI Plagiarism


Photo Source: Pexels


The battle against plagiarism fought by educators, editors, and publishers has raged since the creation of the written word. New artificial intelligence has evolved new weapons against claiming the work of another as your own. Now, artificial intelligence can generate mountains of new stories, poems, articles, essays, and songs at an unprecedented speed and volume with large language models (LLMs) such as OpenAI’s ChatGPT and Google’s Bard. Before the internet made oceans of written information accessible and copyable, teachers and publishers relied on skilled people to detect and verify plagiarism. With the advent of the internet, students and writers could access and easily copy someone else’s writing with a mouse click, making plagiarism more effortless than ever. Soon tools such as Grammarly emerged that used search algorithms to look across the internet and detect plagiarism. However, with the emergence of LLMs, these tools can generate novel stories and essays in seconds, opening an entirely new realm of potential and more difficult to detect plagiarism.


Enter a new set of tools designed to detect AI generated versus human-written text. These tools, such as Originality AI (originality.ai), Content at Scale AI Detector (contenatscale.ai), and GPTZero (gptzero.me), can detect AI-generated content using a similar system as what large language models use to learn. LLMs use a learning similar to a fill-in-the-blank test. When the LLM analyzes text, it reads forwards and backwards to learn which words often go together. In the process, the LLM will leave out words and guess which one should be in the blank. Every time the LLM guesses correctly, the model gets stronger. The AI detectors use a similar method but add text labeled as machine-written or human-written. Every time the AI detector gets the correct classification of human or machine text, it gets stronger. Competition among detectors for accuracy differentiates them. Several publications have compared these detector models along different lines, such as percent accuracy, cost to use the model, and distinguishing mixed human and machine text in a document.


According to several online publications, Originality AI scores the highest in comparison tests for detecting AI-generated text. eWeek released wrote that Originality AI accurately identified AI generated of chatGPT text 94% of the time and GPT4 content 99% of the time. (eweek.com) However, another testing group reported on Scribbr.com that Originality AI only scored 76% accuracy in detecting AI-generated text. (scribbr.com) Different testing protocols produce different results, but one thing remains clear, no tool is 100% accurate. Some tools, such as Sapling and CopyLeaks, offers their services for free, while others charge for it.


The struggle against plagiarism has existed since the beginning of the written word, and teachers, editors, and publishers have developed strategies to detect the wrongful appropriation of someone else’s work. Before the internet, expert editors and educators with excellent knowledge of their area of study used their memories and the library to detect plagiarism. Still, with the introduction of the internet, there has been an arms race between finding new ways to generate plagiarized text and detectors to catch people in the act. Tools such as Grammarly and Google use search algorithms to find copied text. However, the recent development of large language models such as ChatGPT has changed the playing field entirely because these models generate novel text that a student or writer could claim as their own. In response to easy access to LLMs such as chatGPT, companies claiming to detect AI-written text have emerged. Tools such as Originality AI and Sapling use AI similar to LLMs to learn the differences between AI-generated and human-written texts. Although the detectors do not register 100% accuracy, they offer teachers and editors tools to detect non-human writing. As the LLMs improve at generating human-like text, the detector’s job will become more complex.




Dr. Smith’s career in scientific and information research spans the areas of bioinformatics, artificial intelligence, toxicology, and chemistry. He has published a number of peer-reviewed scientific papers. He has worked over the past seventeen years developing advanced analytics, machine learning, and knowledge management tools to enable research and support high-level decision making. Tim completed his Ph.D. in Toxicology at Cornell University and a Bachelor of Science in chemistry from the University of Washington.


You can buy his book on Amazon in paperback and in kindle format here.





 
 
 

Comments


bottom of page