Document version

How Muzzle Works v1

This is the stored snapshot for the approved document version. The diff below shows what changed from the previous version.

Preview

Source path

muzzle/HOWITWORKS.md

Source commit

No commit recorded

Created at

Jun 29, 2026, 6:22 AM UTC

Source digest

5050098f45beb6e21ecdee324a972010d2c50ed4a6e0b1d6e5dc3b39f5d931d6

Document snapshot

How Muzzle Works

Muzzle is a transparent inspecting proxy. It takes over the host and port the client already uses, so you configure it once and then use the model provider normally — every request and response flows through Muzzle without any client changes. v1 covers Ollama, OpenAI, and Anthropic.

Request lifecycle

A client sends a request to a Muzzle proxy listener (the port it used to send to the provider directly).
The provider adapter parses the provider-native request into Muzzle's canonical model.
The policy engine inspects the canonical request (the input phase).
If allowed, Muzzle forwards to the configured upstream and reads the response (buffering streamed responses in full first).
The adapter parses the response into the canonical model and the engine inspects it (the output phase), including any tool calls.
Muzzle renders the (possibly redacted or transformed) result back into the provider-native shape and returns it — re-emitting streamed responses chunk by chunk.

Endpoints that are not part of the inspected proxy surface pass through uninspected, so unrelated provider calls keep working.

Inspection

The engine is rules-first and fast. Detectors run before any optional model call:

Prompt injection / jailbreak — detects attempts to override instructions.
Secrets — API keys, tokens, private keys, and similar.
PII — emails, card numbers, and other personal identifiers (linear scanning to avoid catastrophic backtracking on long input).
Content policy — a configurable denylist of terms.

Each detector maps to a category, and each category has a configurable action per direction (input and output):

Action	Effect
`allow`	pass through untouched
`log`	record the match, take no other action
`redact`	mask the matching spans in place
`transform`	rewrite the content
`block`	reject the request with a provider-style error

Tool calls are blocked by default.

Muzzle fails closed by default: if inspection cannot complete, the request is blocked rather than passed through. Set fail_mode: open to invert that.

The LLM judge

An optional LLM judge adds a model-based opinion on top of the rules. It is off by default and configured under llm_judge (enabled, model, base_url).

When it runs. Only after the hard rule gates (content-policy, secrets, plus the other rule detectors) approve a request. If the rules already block, the judge never runs.
What it checks. Both input and output, for prompt injection, PII, and banned subjects (see below).
Direct, unfiltered call. The judge classifies by calling its own model endpoint (llm_judge.base_url) directly, which bypasses Muzzle's inspection. So the text being judged — and the judge's own reply — are never re-filtered by the input/output policies, and there is no recursion. Point base_url at a real model server, never at Muzzle's own listener.
Actions. A judge finding maps to the configured action: prompt-injection and PII use the per-direction policies actions; each subject uses its own action. block rejects the request; log/allow let it through (judge findings are classifications, not spans, so redact/transform are recorded but not applied).
Failure. If the judge model is unreachable, Muzzle honors fail_mode (closed → block, open → allow) and logs the outcome.

Subjects (topic policy)

content_rules.subjects is a list of banned topics, each with its own action, e.g. { name: "weapons", action: block }. When the judge is enabled it determines whether the input or output is about any configured subject and applies that subject's action.

A blocked request returns a provider-style error, e.g. for Ollama: {"error": "muzzle rejection: <reason> (log#<ref>)"}. The log#<ref> ties the rejection to a line in the decision log.

Architecture

Canonical model + provider adapters. Each adapter (Ollama, OpenAI, Anthropic) parses provider-native requests/responses into one canonical request/response/ event/tool-call model and renders canonical results back out. The policy engine runs once against the canonical form, so all providers are inspected identically and new providers are additive work.
Per-listener routing. Muzzle is configured via a YAML file. Each listener binds a host:port and has a kind: a proxy listener routes to one configured upstream; an admin listener serves the local admin UI. Multiple upstreams are supported by running multiple proxy listeners.
Streaming. Streamed responses are buffered, inspected as a whole, then re-emitted preserving chunk ordering, so policy applies to the complete response while the client still sees a stream.
Decision logging. Every decision is written as a JSON line to the configured logging.decisions sink (a file path or stdout), with the upstream, stage, action, categories, redaction count, reason, and a short ref.

Configuration

The YAML config has these sections:

listeners — list of { bind, kind, upstream }. Proxy listeners require an upstream; admin listeners ignore it.
upstreams — map of name to { provider, base_url }.
policies.default — input and output maps of category → action.
policies.overrides — per-upstream policy sets that replace the default for that upstream.
fail_mode — closed (default) or open.
llm_judge — { enabled, model, base_url }.
secrets — { mode, file }. mode is rules (built-in regex, default), file (exact values from the encrypted file), or both.
content_rules.denylist_terms / subjects — inline lists (apply to both directions), plus file references below.
logging — { level, decisions }.

List & secrets files

To keep the YAML small, the bulk lists live in files (one entry per line), with a separate file per direction. The config holds the paths; the entries live in the files.

secrets.file — encrypted file of literal secret values, one per line. It is encrypted with a generated Fernet key at <file>.key (chmod 600, owned by the service user). Never hand-edit the ciphertext — use the CLI or admin portal, which decrypt in memory and re-encrypt on save.

The installer pre-creates all of these under /etc/muzzle/ (empty list files, the secrets key, and an empty encrypted secrets file), wires their paths into the config, and chowns them to the service user — so they exist and are editable from the admin Files tab right after install. The engine also tolerates missing files, treating them as empty.

content_rules.content_policy_files.{input,output} — plain term files for the content-policy detector, per direction.
content_rules.subject_files.{input,output} — plain subject files, per direction; each line is name: action (action defaults to block).

Edit them with muzzle secrets|terms|subjects … or in the admin portal. Files are loaded when the config reloads, so the CLI restarts the service after an edit.

Operating a v1 install

The installed build runs as a systemd service and exposes an admin listener in the same process, so operators can manage it without leaving the VM.

Admin UI (on the admin listener) — a tabbed page: a Configuration form (General, Upstreams, Listeners, Policies with default + per-upstream overrides, Content rules) where rows can be added and removed; an Advanced YAML tab for raw editing; a Logs tab that live-tails the decision log with filters; and a Simulation tab. Saving validates the config and reloads the running service.
muzzle CLI (on PATH) — validate, edit, status, restart, logs (with --follow, --tail, and filters), add (denylist terms), input/output (set default policy actions), upstream list|remove|add (plus a muzzle HOST:PORT NAME shorthand to add one), and simulate. With no subcommand, muzzle runs the proxy. The config path comes from --config/-c or MUZZLE_CONFIG.
Install/uninstall — install.sh writes /opt/muzzle/v1, /etc/muzzle/ muzzle.yaml, the systemd unit, /var/log/muzzle/decisions.jsonl, the /usr/local/bin/muzzle wrapper, and a virtualenv with all dependencies. uninstall.sh removes that full footprint.

Simulation

The CLI (muzzle simulate) and the admin UI both dry-run policy decisions against the exact same engine the proxy uses in production.
Simulation accepts provider-native payloads and reports the resulting decision, redactions, and tool-call handling without forwarding anything upstream — so a preview matches what live traffic would do.

Docs workflow

The living files in products/muzzle remain the source of truth.
An approved export step writes a fresh immutable MongoDB snapshot for each document, stamped with time, source path, commit, and content digest.
The API serves the latest approved snapshot for each doc.
The website can browse document history and diffs for each document version.

Getting started and v1 scope: products/muzzle/v1/README.md
Full design: docs/plans/2026-06-23-muzzle-v1-design.md
Enterprise proxy core design: docs/plans/2026-06-24-muzzle-enterprise-proxy-core.md

Diff from previous

This is the first approved version, so there is no previous diff.