Introducing Any-Agent: An abstraction layer between your code and the many agentic frameworks

Since the launch of ChatGPT in 2022, generative AI and LLMs have rapidly entered everyday life. The viral adoption of these tools was unprecedented, and in some ways contentious. In order to grant greater capabilities to LLMs, they can be integrated into a framework that’s referred to as an “Agent”.

Introducing Any-Agent: An abstraction layer between your code and the many agentic frameworks
Photo by Kevin Grieve / Unsplash

Ever since ChatGPT burst onto the scene in 2022, generative AI and Large Language Models (LLMs) have become a part of the cultural moment. The widespread and viral adoption of these tools into many parts of our lives was unprecedented, unexpected, and in some ways contentious: The risk of spreading misinformation and introducing automation bias through their use is a hotly debated topic.

Efforts to improve the reliability, explainability, and capability of LLM-powered applications have been an active area of research and have resulted in techniques like Retrieval-Augmented Generation (RAG), structured outputs, and tool usage. These extra capabilities can be used to improve the quality of LLM-powered chat applications (e.g. ChatGPT) and have been integrated into workflow automation, where a workflow is a sequence of pre-defined steps that can be executed with the help of the LLM.

In order to grant even greater capabilities to LLMs, they can be integrated into a framework that’s referred to by the community as an “Agent”. The term is quite fuzzy, however, Anthropic offers a helpful definition in their blog post that clearly differentiates between workflow and agent:

“Agent" can be defined in several ways. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:

  • Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
  • Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

Implementing an agent is no trivial task: The agent must be reliable and have a clearly understood design so that it can be properly evaluated and monitored. To accomplish this goal, there are a myriad of frameworks available to help build these systems. Some frameworks are specific to the model being used, while others are specific to the cloud provider that hosts the LLM, and yet others are agnostic to both model and provider. Although not an exhaustive list, several popular options currently exist, listed in no specific order: OpenAI Agents Python SDK, LangGraph Python SDK, AWS Bedrock Agents SDK, Smolagents Python SDK, CrewAI Python SDK , AutoGen Python SDK, and Agno Python SDK.

Does the framework matter?

With so many options for building an agent, choosing a framework may seem like an arbitrary decision: Would it be sufficient to read the documentation and pick the one that looks like it will be the easiest API? It’s not clear that a framework would have a big impact on the performance of the agent. Although this is a reasonable assumption, it turns out to be a bit more complicated. An agent framework is rather opinionated in the way it implements agentic logic and routing, even for a defined processing framework like ReAct. For example, the Smolagents library defaults to using a “CodeAgent”, which executes all of the tasks by generating python code and has specific LLM prompting text that is hardcoded. Similarly, Llama index has hardcoded text for error handling. Many of the libraries may also have made opinionated decisions about the default system_prompt used. Even if configurable parameters like temperature are set to 0, the difference in the prompt text that is provided to the LLM will result in different behavior.

How should we evaluate which framework works best for our use?

The world of AI agents is just beginning, and with ever-expanding options, the question of how to choose a framework without locking into a specific API becomes important. Although the semantics and underlying code of each framework is different, many of them are intending to accomplish similar things at the high level, and we have found that it may be useful to provide a common language with which to build an agent, regardless of which framework you choose to build with.

To this end, today we’re sharing a library we’ve started working on called Any-Agent! By using Any-Agent, you can build your agent a single time, and when you would like to experiment with the latest and greatest framework, switching to the new architecture can be as simple as changing the “AgentFramework” configuration parameter. In addition, Any-Agent handles normalization of logging (powered by open-inference) so that you can see consistent outputs regardless of which framework you’ve selected.

We also have provided a way to evaluate these Agents using a “trace-first” approach, which uses LLM-as-a-judge to help give you confidence that the agent performed the steps you expected!

Example

Let's see a simple use case: Creating an agent in LangChain vs an agent in smolagents (note that most real agentic use cases will be far more complex than this, involving the use of many tools and maybe Model Context Protocol (MCP) servers as well).

Here’s the code to load an agent with LangChain:

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
model = ChatOpenAI(model="gpt-4o")

graph = create_react_agent(model)
def print_stream(stream):
	for s in stream:
    	message = s["messages"][-1]
    	if isinstance(message, tuple):
        	print(message)
    	else:
        	message.pretty_print()
inputs = {"messages": [("user", "How many seconds would it take for a leopard at full speed to run through Pont des Arts?")]}
print_stream(graph.stream(inputs, stream_mode="values"))

Here’s the code to load an agent with smolagents:

from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel

model = HfApiModel()
agent = CodeAgent(model=model)

agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")

Now, here’s how you could run either of these using Any-Agent by changing only a single variable.

from any_agent import AgentConfig, AgentFramework, AnyAgent

agent = AnyAgent.create(
    AgentFramework("langchain"),  # Set to "langchain" or "smolagents"
    AgentConfig(model_id="gpt-4o-mini")
)

agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")

The Any-Agent library is also designed to load and configure MCP Servers for all agent frameworks, avoiding the need for a developer to understand the semantics of loading an MCP server for the agent framework that was selected.

Conclusion

Building and evaluating with agent frameworks can be complex, but the Any-Agent library has made it much easier for our team to compare and test different agentic frameworks. AI Agents have incredible potential to supercharge productivity, and we’ve found Any-Agent to be a valuable tool in exploring that potential.

We hope that you’ll find this library useful in your experiments and projects, and we look forward to hearing your thoughts (https://github.com/mozilla-ai/any-agent).