How to Build a Multi-Agent Essay Grading System in Python¶
A single LLM grader is noisy and easy to bias. This recipe uses the Voting pattern: several grader agents score the same essay independently against the same rubric, and a majority vote produces a defensible final grade — exactly how a panel of human markers reduces individual bias.
Patterns used: Voting
Architecture¶
flowchart TD
E[Student Essay + Rubric] --> V1[Grader A]
E --> V2[Grader B]
E --> V3[Grader C]
V1 --> T[Majority Vote]
V2 --> T
V3 --> T
T --> G[Final Grade + Rationale]
Implementation¶
import asyncio
from pyagent_patterns.base import Agent
from pyagent_patterns.resolution import Voting
from pyagent_providers import AnthropicLLM, GeminiLLM, OpenAILLM
RUBRIC = (
"Grade the essay A, B, C, D, or F using this rubric: thesis clarity, evidence, "
"structure, and grammar. Reply with the single letter grade on line 1, then a "
"one-sentence justification on line 2."
)
grader = Voting(
voters=[
Agent("grader_openai", OpenAILLM("gpt-4o"), system_prompt=RUBRIC),
Agent("grader_anthropic", AnthropicLLM("claude-sonnet-4-20250514"), system_prompt=RUBRIC),
Agent("grader_gemini", GeminiLLM("gemini-2.5-pro"), system_prompt=RUBRIC),
],
strategy="majority",
)
essay = (
"Title: Why Cities Should Invest in Public Transit\n\n"
"Public transit reduces traffic, cuts emissions, and connects communities. "
"When cities fund buses and trains, fewer cars crowd the roads ... "
"(800-word student submission)"
)
result = asyncio.run(grader.run(f"{essay}"))
print(result.output)
print(f"Tally: {result.metadata['tally']}, winner: {result.metadata['winner']}")
Expected output¶
Grade: B
Justification: A clear thesis and good structure, but evidence is thin and a few
grammar slips weaken otherwise solid reasoning.
Tally: {'B': 2, 'A': 1}, winner: B
Because three independent graders must converge, a single over-generous or harsh model can't swing the grade on its own — and the tally is an audit trail you can show a student.
Customization¶
Weighted graders¶
grader = Voting(
voters=[Agent("grader_openai", OpenAILLM("gpt-4o"), system_prompt=RUBRIC), ...],
strategy="weighted",
weights=[2.0, 1.0, 1.0], # trust the strongest grader more
)
Add a rubric-specific grader¶
grader.voters.append(
Agent("grammar_grader", OpenAILLM("gpt-4o-mini"),
system_prompt="Grade ONLY grammar and mechanics A-F. Letter on line 1, reason on line 2."),
)
Return per-grader transparency¶
The result.metadata['tally'] and ['votes'] give you each grader's score — surface them to students
as an appeal trail.
When to Use¶
| Situation | Use Voting? |
|---|---|
| You need a robust answer that resists single-model bias | ✅ Yes |
| The decision is a discrete choice (grade, label, yes/no) | ✅ Yes — majority/consensus fits |
| You want graders to debate and persuade each other | ❌ Use Debate |
| One agent should critique and improve a draft | ❌ Use Self-Reflection |
Cost Profile¶
| Query type | Typical model | Avg cost | Volume (1k essays/day) |
|---|---|---|---|
| 3 independent graders | gpt-4o / sonnet / gemini-pro | $0.012 | $360/mo |
| Per essay | mix | ~$0.012 | ~$360/mo |
Use cheaper models (e.g. three *-mini voters) for formative feedback; reserve the premium
panel for high-stakes summative grading.
See Also¶
- Voting pattern
- Loan Underwriting Committee — a debating panel for credit decisions
- Browse all recipes