Mozilla.ai Blog

Build Your Own Timeline Algorithm: A Blueprint

Timeline algorithms should be useful for people, not for companies. Their quality should not be evaluated in terms of how much time people spend on a platform, but rather in terms of how well they serve their users’ purposes.

Benchmarking DeepSeek R1 Models using Lumigator: A Practical Evaluation for Zero-Shot Clinical Summarization

New state-of-the-art models emerge every few weeks, making it hard to keep up, especially when testing and integrating them. In reality, many available models may already meet our needs. The key question isn’t “Which model is the best?” but rather, “What’s the smallest model that gets the job done?”

Evaluating DeepSeek V3 with Lumigator

A typical user may be building a summarization application for their domain and wondering: “Do I need to go for a model as big as DeepSeek, or can I get away with a smaller model?”. This takes us to the key elements: Metrics, Models, and Datasets.

Map Features in OpenStreetMap with Computer Vision

Mozilla.ai developed and released the OpenStreetMap AI Helper Blueprint. If you love maps and are interested in training your own computer vision model, you’ll enjoy diving into this Blueprint.

Deploying DeepSeek V3 on Kubernetes

Previously, we explored how LLMs like Meta’s Llama reshaped AI, offering transparency and control. We discussed open-weight models like DeepSeek and deployment options. Now, we show how to deploy DeepSeek V3, a powerful open-weight model, on a Kubernetes cluster using vLLM.

Running an open-source LLM in 2025

The landscape of LLMs has evolved dramatically since ChatGPT burst onto the scene in late 2022. At Mozilla.ai, we’re focused on improving trust in open-source AI by supporting their use in appropriate situations and their proper evaluation.

Structured Question Answering

When deciding on a new Blueprint, we focus on selecting an end application that is both practical and impactful, along with the best techniques to implement it. With endless possible applications of LMs today, selecting one that is actually useful can be challenging.

Lumigator is here!

Lumigator is a developer-first tool designed and built by the community to help engineers evaluate and compare AI models with ease. Lumigator empowers developers to make data-driven choices when integrating AI models into their applications.

Blueprint Deep Dive: Turn Documents into Podcasts Locally with Open-Source AI

Blueprints are customizable workflows that help developers build AI applications using open-source tools and models. In this blog, we’ll dive into our first Blueprint: document-to-podcast. We’ll explain how it works, our technical decisions, and how you can use and customize it yourself.

Introducing Blueprints: Customizable AI workflows for Developers

Developers today face many challenges when trying to integrate AI into their apps or building an “AI solution” from scratch. At Mozilla.ai, we’re committed to breaking down these barriers with Blueprints – our initiative to help developers adopt open-source AI tools and models with confidence.

Image of two hands stacking small pebbles on a group of rocks

Let’s build an app for evaluating LLMs

Lumigator 🐊 is a self-hosted, open-source Python application for evaluating large language models using offline metrics. It targets common machine learning use-cases, starting with summarization, and is extensible at the task and job level.

Taming randomness in ML models with hypothesis testing and marimo

The behavior of ML models is often affected by randomness at different levels, from the initialization of model parameters to the dataset split into training and evaluation. Thus, predictions made by a model (including the answers an LLM gives to your questions) are potentially different every time you run it.

Introducing any-llm: A unified API to access any LLM provider

Wasm-agents: AI agents running in your browser

Smarter Prompts for Better Responses: Exploring Prompt Optimization and Interpretability for LLMs

The Challenge of Choosing the Right LLM

What do you mean by AI testing...?

AIssert: Testing LLM Integrations

Introducing Any-Agent: An abstraction layer between your code and the many agentic frameworks

Evaluating Local LLMs on Translation Use Case with Lumigator

📘+🐊 Blueprint Releases & Lumigator's Latest Features

Open-source AI is hard. Blueprints can help!

Mozilla.ai

Latest