Technical

What is Inference?

The process of running a trained AI model to generate output from a given input.

Definition

Inference is what happens when a trained AI model processes input to produce output — a prediction, a generated response, a classification. It's distinct from training (adjusting model weights) and fine-tuning (further training). When you send a prompt to Claude or GPT-4 and get a response, that's inference. Inference cost and latency are key operational concerns for AI agent systems that make many model calls.

Example

Every time an AI agent calls an LLM to reason about a tool result — 'given this GitHub PR list, which ones need review?' — that's one inference call. A complex agent might make 5–20 inference calls to complete a single task.

Inference vs training: What's the difference?

Inference

The process of running a trained AI model to generate output from a given input.

training

Training adjusts the model's weights over a large dataset — it's slow and expensive. Inference runs the trained model to generate output — it's fast and done on-demand.

Related terms

Back to glossary