What is Inference?
The process of running a trained AI model to generate output from a given input.
Definition
Inference is what happens when a trained AI model processes input to produce output — a prediction, a generated response, a classification. It's distinct from training (adjusting model weights) and fine-tuning (further training). When you send a prompt to Claude or GPT-4 and get a response, that's inference. Inference cost and latency are key operational concerns for AI agent systems that make many model calls.
Example
Every time an AI agent calls an LLM to reason about a tool result — 'given this GitHub PR list, which ones need review?' — that's one inference call. A complex agent might make 5–20 inference calls to complete a single task.
Inference vs training: What's the difference?
The process of running a trained AI model to generate output from a given input.
Training adjusts the model's weights over a large dataset — it's slow and expensive. Inference runs the trained model to generate output — it's fast and done on-demand.