Introducing any-guardrail: A common interface to test AI safety models

Building AI agents is hard, not just due to LLMs, but also because of tool selection, orchestration frameworks, evaluation, safety, etc. At Mozilla.ai, we’re building tools to facilitate agent development, and we noticed that guardrails for filtering unsafe outputs also need a unified interface.

Daniel Nissani

Sep 9, 2025 — 3 min read

Unai no tomo: Catalogues of Japanese Toys / “Child-raising horse”, a charm for protecting children

Building AI agents is hard, not just because of the LLMs, but because of everything around them: tool selection, orchestration frameworks, evaluation, safety, and more. At Mozilla.ai, we’ve been building tools like any-llm, any-agent, and mcpd to make agent development more composable and flexible. We have noticed that guardrails, tools to filter, flag, or reject unsafe outputs, could also benefit from a unified interface.

Guardrails come in many different forms. Some are based on encoder-only transformer architectures fine-tuned to classify unsafe inputs or outputs from LLMs or agentic systems. Others take the form of an LLM-as-a-Judge, where sophisticated prompting, sometimes combined with fine-tuning, allows generative models to guardrail against unsafe content. Due to the variety of sociotechnical issues, unintended downstream harms that users can face from interacting with technologies, a single guardrail model usually tries to defend against a subset of these harms. Thus, what we have seen is that most guardrail models conform to their own pre-processing, inference, and post-processing logic, making it hard for researchers and practitioners to test which guardrails are best for their use case.

What is any-guardrail

Meet the newest tool in our any-suite: a unified interface for all your AI safety models.

any-guardrail provides a common interface for all guardrail models, implementing the necessary pre-processing, inference, and post-processing logic for each guardrail so you don’t have to. This allows researchers and practitioners to try out guardrail models seamlessly without having to concern themselves with the implementation details. It’s as easy as:

from any_guardrail import AnyGuardrail, GuardrailName, GuardrailOutput

guardrail = AnyGuardrail.create(GuardrailName.DEEPSET)

result: GuardrailOutput = guardrail.validate("All smiles from me!")

print(result.valid)

Just change the guardrail name and provide any other required parameters (available in our documentation), and you’ve got a new guardrail to test out!

Looking Ahead

Right now, any-guardrail has focused on making experimentation of open-source guardrails seamless. In the future, we hope to expand the capabilities of any-guardrail to cover broader issues in the space of guardrail models.

Internal Guardrails for Agents

AI Agents are not completely blackbox objects for companies. The LLMs may be blackboxed, but several other components, like the vector database and functions to allow for agentic interactions, are usually controlled by engineers within a company. Because of this, we are exploring whether current open-source guardrails can guardrail internal agentic communication.

Continuing to Add Guardrails

One of our next steps is to start integrating closed-source providers into any-guardrail. This keeps our ethos of rapid experimentation, while giving practitioners an avenue to production, if they already have access to a closed-source guardrail. Along with that, we recognize that continuing to add guardrails will be an evergreen problem. We expect to continue adding more guardrails to our any-guardrail, so that users have as much choice as possible.

Optimized Inference

Another worry about guardrails is the overhead that they might introduce in production. Most existing open-source guardrails come from a research setting where efficient inference was not the main focus. We are exploring ways to introduce optimizations that can be applied across all the supported guardrails without requiring users to care about API changes.

Dealing with the Complexity of Context

One of the biggest complaints about guardrail models is that they are not effective in a company’s specific deployment context. This can be for a variety of reasons: the taxonomy that the guardrail ascribes to doesn’t match the deployment context, or the text produced in production is out of distribution, thus bypassing the guardrail. Switching between guardrails can be helpful to find the best functioning guardrail for your use case, but eventually, it would be great to provide an easy pathway to fine-tune guardrails further for downstream applications.