Defining Stopping Criteria in Large Language Models: A Practical Guide
- Metric Coders
- Mar 29
- 3 min read
Large Language Models (LLMs) like GPT-4, Claude, and PaLM have transformed how we generate text, code, and even multi-turn conversations. But there’s one underrated piece of the generation puzzle that deeply affects how well these models perform: stopping criteria.
Whether you're building a chatbot, a code assistant, or a text generation pipeline, you eventually need the model to stop generating tokens — at the right time. If it stops too early, your output might be incomplete. Too late, and you get rambling, repetition, or hallucination.

In this post, we’ll explore the different ways you can define stopping criteria for LLMs, with real-world examples and trade-offs.
🚦 1. Token Count / Max Tokens
🔍 What is it?
Limit the number of tokens the model can generate.
✅ Pros
Simple and effective.
Prevents infinite or excessively long outputs.
⚠️ Cons
Might cut off meaningful output mid-sentence.
Doesn’t adapt based on context or content.
💡 Use case
openai.ChatCompletion.create(
model="gpt-4",
messages=chat_history,
max_tokens=512
)
✋ 2. Special Stop Sequences
🔍 What is it?
Define one or more string sequences that, when generated, immediately stop further output.
✅ Pros
Great for structured outputs like JSON, code blocks, or prompts with delimiters.
Works well with tools/functions integration.
⚠️ Cons
Must know or enforce these sequences in the prompt.
Fragile if the model never emits the exact string.
💡 Use case
openai.ChatCompletion.create(
model="gpt-4",
messages=chat_history,
stop=["\nHuman:", "<END>"]
)
🧠 3. Semantic Stopping
🔍 What is it?
Stop when the generated content semantically completes the task (e.g., completes a paragraph, finishes a function, etc.).
✅ Pros
Feels natural and human-like.
More flexible for open-ended generation.
⚠️ Cons
Harder to implement — requires post-processing or heuristics.
May need a second model or logic to evaluate “completion.”
💡 Use case
In multi-turn chat, detect if the assistant finishes with a question or closing sentence:
if response.endswith((".", "!", "?")) and not continuation_needed(response):
stop_generation = True
🧰 4. Function Calling & Tool Use (Structured Interface)
🔍 What is it?
LLMs like GPT-4 and Claude can be guided with a schema — they stop generating when the function/tool-call is formed correctly.
✅ Pros
Great for building agentic systems.
Guarantees structured outputs.
⚠️ Cons
Overhead in defining function schemas.
Not for freeform text generation.
💡 Use case
{
"functions": [
{
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
}
}
}
]
}
🌀 5. Sampling-based Criteria (e.g., Repetition Penalties, Top-p, Temperature)
🔍 What is it?
Use token sampling strategies to bias the model toward stopping earlier or reducing rambling.
✅ Pros
More natural termination in creative writing or poetry.
Encourages diverse but coherent outputs.
⚠️ Cons
Indirect — you’re guiding rather than explicitly stopping.
Needs fine-tuning and experimentation.
💡 Use case
openai.ChatCompletion.create(
model="gpt-4",
messages=chat_history,
temperature=0.7,
top_p=0.9,
presence_penalty=0.6
)
🧩 6. Custom Callbacks or Streaming Interruption
🔍 What is it?
If using token streaming, you can define your own logic to interrupt generation mid-way.
✅ Pros
Extremely flexible.
Real-time control (great for agents, UI, etc.).
⚠️ Cons
Requires more infrastructure.
Must handle partial outputs gracefully.
💡 Use case
def stop_streaming_callback(token):
if token == "<END>" or too_many_tokens_seen():
raise StopGeneration()
# Used in streaming pipelines or async frameworks.
🧠 Bonus: Multi-Criteria Hybrid Approaches
In production, you often combine several of the above:
Set a max_tokens limit.
Define a stop sequence.
Monitor output for semantic completion.
Abort via a custom callback if needed.
This gives you both safety and flexibility.
🧭 Wrapping Up
Stopping criteria isn’t just a technical detail — it shapes the user experience, performance, and reliability of your LLM-powered product.
Whether you’re auto-generating code, writing product descriptions, or managing multi-turn dialogue, pick the criteria (or combo) that fits your use case. A thoughtful stopping strategy can elevate the quality and control of your AI systems by a mile.