top of page

The Hidden Cost in Agentic Tokenomics by Dr. Timothy Smith

Photo Source: Unsplash


The term "tokenomics" refers to the economy of tokens or pieces of information in transactions between humans and computers, as well as between computers. Token consumption adds a hidden cost to the burgeoning use of AI across industries, from telecoms to financial institutions. When using large language models such as Claude and Gemini, people and machines must break language into small pieces of computer-readable information called tokens. Tokens represent pieces of words or punctuation that the computer model can understand. Each token for an LLM represents a chunk of information, and LLM providers use tokens as a means to charge for model usage and computation expenses. Tokenomics broadly encompasses the economics of cryptocurrencies, but for this article, we will focus on LLMs and AI agents that use tokens.

 

Every prompt to and response from an LLM gets measured and billed in tokens. The token system can cost little when a person prompts ChatGPT to answer a question. In comparison, a complex system that calls other LLMs for information in a multi-step process can quickly rack up significant token costs. For example, an AI agent designed to complete a complex task, such as designing an optimal supply chain for the materials needed by a tool factory, must interact with other agents and information sources. The connection with other agents to achieve an optimal supply chain creates a growing token balance for the model to process. In computer science, the chain of tokens given to the model goes by the name "context window." A context window (or context length) refers to the maximum amount of information or tokens that the model can process and "remember" at one time. AI Agents that write and test code, call other AI agents, and run through multiple iterations to test and correct their answers. These cycles can quickly run up bills much higher than a person with a simple prompt. Using tokens, also known as burning context, increases the cost of running AI agents with LLMs.

 

A recent report by The Stevens Institute of Technology concludes that AI-agentic systems often consume significantly more tokens than human workers on similar problems. (online.stevens.edu) The process of burning context involves accumulating tokens for each iteration, during which the agent inputs information, processes it, calls other agents, and generates output. When the output needs to be checked for correctness, another iteration must occur. The process can start in turn 1 with 50 tokens of input and 50 tokens of output for a total context of 100 tokens. The second turn takes 100 history tokens, 50 new tokens, and 50 output tokens, for a total of 200 tokens. By the third turn, the context equals 300. The rapid consumption of tokens also goes by the name "context burn." In other words, to burn through context means an AI agent repeatedly calls itself or other models in a loop, generating long chains of messages that quickly fill the model's context window. A human analogy for context burn is when someone must assemble a piece of furniture. Instead of following the instructions step by step, they read the instruction manual, complete the first step, write down the instructions, note the completion of the first step, and check the results against the manual. However, instead of just going on to the next step, the person rereads the whole manual, writes it down, then executes the second step, and so on. Performing an assembly in this way would make a simple assembly process very verbose and slow. When left uncontrolled, context burn wastes tokens, slows down processes, and often produces worse results.

 

In some cases, an agent or a group of agents may consume hundreds of thousands of dollars in tokens without producing any useful output. In a recent article by AIMagazine, Uber reported that the massive adoption of AI coding tools such as ClaudeCode and Curson by its 5,000+ coding engineers has consumed the company's entire AI budget as of April 2026. (aimagazine.com) Management encouraged the adoption of AI for coding, but the cost of context burn in AI coding pushed the R&D budget of the transportation tech company up 9% to $3.4 billion.

 

The rapid adoption of LLMs and AI agents across industry has not only increased the impact of AI on processes such as supply chain optimization. Still, it has also introduced a new challenge to corporate budgets through context burn. The next wave of developments in AI for industry will emerge through thoughtful deployment of the right level of AI proportional to the problem to be solved. A firefighter at the station who needs some groceries to prepare food for the crew must decide the best way to get to the store. This example provides a good example of proportional solutions. He could drive the fire engine to the store or walk two blocks. According to the FEMA Schedule of Equipment Rates for 2025, a Type 1 fire engine costs $162 per hour to operate, and a heavier aerial ladder engine runs $261 per hour. (fema.gov) It costs nothing but time to walk to the store, whereas driving the firetruck costs money. Many models do not need an LLM or AI agents to accomplish a task, and AI-backed work amounts to overkill. The next iterations of LLMs and AI agents will address proportional usage to save tokens and, therefore, budget.






Dr. Smith’s career in scientific and information research spans the areas of bioinformatics, artificial intelligence, toxicology, and chemistry. He has published a number of peer-reviewed scientific papers. He has worked over the past seventeen years developing advanced analytics, machine learning, and knowledge management tools to enable research and support high-level decision making. Tim completed his Ph.D. in Toxicology at Cornell University and a Bachelor of Science in chemistry from the University of Washington.


You can buy his book on Amazon in paperback and in kindle format here.








 

 


 




 

 



 
 
 

Comments


bottom of page