Self-Reflection Pattern¶
An agent generates output, critiques its own work, and refines iteratively until approved or max rounds reached.
Best for: Code generation, essay writing, structured output quality improvement.
LLM calls: 2–2R (generate + critique per round). Stops early on APPROVED.
Sequence Diagram¶
sequenceDiagram
participant U as User
participant A as Agent
U->>A: "Write a sorting function"
A->>A: Generate: def sort(arr): return sorted(arr)
A->>A: Critique: missing null check, empty array case
A->>A: Refine: added guards
A->>A: Critique: APPROVED
A-->>U: Final refined function
Use Case 1 — Code Generation with Self-Review¶
import asyncio
from pyagent_patterns.base import Agent
from pyagent_patterns.resolution import SelfReflection
from pyagent_providers import OpenAILLM
pattern = SelfReflection(
agent=Agent(
"coder",
OpenAILLM("gpt-4o-mini"),
system_prompt="Write clean, idiomatic Python with type hints and docstrings.",
),
critic_prompt="Review the code you just wrote. Check for: "
"1) Missing edge cases (null, empty, negative inputs), "
"2) Missing error handling, "
"3) Type hint completeness, "
"4) Potential off-by-one errors. "
"If all issues are resolved, respond exactly with APPROVED. "
"Otherwise list specific problems to fix.",
max_rounds=3,
)
result = asyncio.run(pattern.run(
"Write a robust function that merges two sorted lists into a single sorted list. "
"Handle all edge cases."
))
print(result.output)
print(f"Rounds: {result.metadata['rounds']}, Early stop: {result.metadata['early_stop']}")
print(f"Cost: ${result.cost_estimate:.4f}")
Use Case 2 — Writing Quality Loop (Anthropic)¶
Use Sonnet to draft and self-critique against journalism standards.
from pyagent_providers import AnthropicLLM
writer = SelfReflection(
agent=Agent(
"journalist",
AnthropicLLM("claude-sonnet-4-20250514"),
system_prompt="Write clear, engaging technical journalism for a developer audience. "
"Use the inverted pyramid — most important facts first. "
"Avoid jargon. Every paragraph should add new information.",
),
critic_prompt="Critique the article you just wrote: "
"1) Does the lead bury the lede? "
"2) Are there unsupported claims that need evidence? "
"3) Is any paragraph redundant? "
"4) Would a non-expert understand this? "
"If all four criteria are satisfied, respond with APPROVED. "
"Otherwise list specific revisions.",
max_rounds=4,
)
result = asyncio.run(writer.run(
"Write a 400-word article explaining why LLM agents fail in production "
"and what engineering teams should do differently."
))
print(result.output)
print(f"Revision rounds: {result.metadata['rounds']}")
Use Case 3 — Structured Output Validation (LiteLLM)¶
from pyagent_providers import LiteLLM
schema_validator = SelfReflection(
agent=Agent(
"schema_builder",
LiteLLM("gpt-4o"),
system_prompt="Generate Pydantic v2 data models from natural language descriptions. "
"Include field validators, proper types, and docstrings.",
),
critic_prompt="Review the Pydantic schema: "
"1) Are all required fields present? "
"2) Are validators correct (e.g., regex patterns valid)? "
"3) Are field types the most specific possible? "
"4) Does the model correctly reflect the described domain? "
"Respond APPROVED if all pass, or list specific fixes needed.",
max_rounds=3,
)
result = asyncio.run(schema_validator.run(
"Create a Pydantic model for a financial transaction: "
"amount in USD (positive), currency code (ISO 4217), timestamp, "
"merchant name, and optional category from: food, transport, utilities, other."
))
print(result.output)
OTel Trace Output¶
Trace: pyagent.pattern.self_reflection (4.2s, $0.009)
├── Round 1
│ ├── pyagent.agent.coder — generate (1.4s, gpt-4o-mini)
│ └── pyagent.agent.coder — critique (1.1s, gpt-4o-mini) → REVISE
├── Round 2
│ ├── pyagent.agent.coder — generate (1.3s, gpt-4o-mini)
│ └── pyagent.agent.coder — critique (0.4s, gpt-4o-mini) → APPROVED
└── early_stop: true (round 2 of 3)
When to Use¶
| Condition | Recommendation |
|---|---|
| A single agent can meaningfully self-critique | ✅ Use Self-Reflection |
| Task has clear quality criteria | ✅ Use Self-Reflection |
| External perspective adds more value than self-review | ❌ Use Cross-Reflection |
| You need a scored quality threshold | ❌ Use Evaluator-Optimizer |
| Speed is more important than quality | ❌ Single-shot call |
See Also¶
- Cross-Reflection — a separate reviewer model critiques the output
- Evaluator-Optimizer — scored quality gate, optimise until threshold
- Debate — adversarial refinement with a judge