Why is my @agent
not using tools!
AI Agents unlock new and exciting ways to use and leverage LLMs to do things for you as opposed to just reply with text. However, these LLMs are still not fully intelligent and like other implementations of LLMs - this method is not without its “gotchas”.
Like other LLM problems, this mostly comes down to the model you are using and as always a more powerful & capable model yields better results. When using agents, we recommend the best model you can run.
caveat: There are some smaller models that are specifically trained for JSON/function calling and they can be used in lieu of just a larger model, but this has its own drawbacks when you want to then get the final response back as a normal chat. In general, you should use a general text/instruct model.
What even is an agent?
Without getting too technical there is some foundational knowledge to understand what an “AI Agent” even is. The below graphics really describe what LLMs are doing and “reasoning” about. As you can see, its no different that a specifically formatted text response!
So now that we know LLMs are basically doing an extra step in between your prompt and it’s final answer, any agent’s implementation usually goes wrong in the JSON generation part.
Okay, so now that we know how this pipeline works in order for an agent to even function works, how can we solve and debug issues?
Some LLMs are bad at generating JSON and even worse at following instructions.
️💡
Tip: Cloud based (un-quantized) models are typically dramatically better at following instructions and forming valid JSON matching the required tool-call.
You can use a cloud based model for just agent calls in AnythingLLM and use an open-source model for normal chatting.
The main issue we see with agents are people who want to use a smaller parameter model that is heavily quantized and want to get GPT-level quality tool interactions. Below are the reasons + ways to mitigate the effects of bad tool calls and their common solutions.
Model is hallucinating a tool call.
When a tool is actually called you will see what we call a “thought” output to the UI. This indicates that the tool was actually called. If the LLM responds with information and you don’t see a thought-chain, it is likely making up the output and pretended to call a tool.
Common Solutions
- Swap to a high quantization version or larger param model
/reset
chat history and re-ask the prompt
LLM says it cannot call XYZ
tool.
Some models are aligned too heavily and will refuse to use some tools because of their training. This is common for requests like website scraping.
Common Solutions
- Swap to a high quantization version, larger param model, or less restricted model
/reset
chat history and re-ask the prompt- Turn off tools you are not using to reduce prompt window size
LLM is refusing to even detect or call a tool at all.
Open-source models, with their quantization and limited context window are suspectable to just refusing to discover or correct call a tool properly.
When tools are injected into the LLMs prompt for discovery and execution they can quite often be “overloaded” with information or due to their quantization are unable to create valid JSON that exactly matches the schema required for a tool call to succeed. The LLM is simply generating JSON, something lower-param and quantized models are particularly bad at!
AnythingLLM however does make some significant corrections to have slightly invalid JSON be formatted properly so a call can succeed, but we can only do so much on this front.
Common Solutions
- Swap to a high quantization version, larger param model, or less restricted model
/reset
chat history and re-ask the prompt (chat history can sometimes impact output of JSON)- Turn off tools you are not using to reduce prompt window size and load on prompt.