Agent Harness

The agent harness is the core execution engine that processes user messages, manages tool calls, and maintains conversation context.

Overview

The harness is designed for extended, multi-turn research workflows:

Long-running sessions — Up to 300 tool calls per user message
Mode enforcement — Restricts tools based on Plan/Execute mode
Context management — Automatic compaction when approaching model limits
Doom loop detection — Breaks out of repetitive tool call patterns
DB-persisted writing — Paper drafts survive across workers and restarts
Redis interrupt relay — Actually kills running tasks, not just a flag check
Sub-agent streaming — Research tool spawns nested agents with visible tool calls

Agent Loop

┌─────────────────────────────────────────────────────────┐
│                    Agent Loop                             │
│                                                          │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐             │
│  │ Context  │──▶│   LLM    │──▶│  Parse   │             │
│  │ Manager  │   │  Stream  │   │ Response │             │
│  └──────────┘   └──────────┘   └────┬─────┘             │
│       ▲                             │                    │
│       │                    ┌────────▼────────┐           │
│       │                    │   Tool Router   │           │
│       │                    │ (mode filtering)│           │
│       │                    └────────┬────────┘           │
│       │                             │                    │
│       │                    ┌────────▼────────┐           │
│       │                    │  Execute Tools  │           │
│       │                    └────────┬────────┘           │
│       │                             │                    │
│  ┌────┴────────────┐      ┌────────▼────────┐           │
│  │ Doom Detection  │◀─────│  Add Results    │           │
│  │ (break loops)   │      │  to Context     │           │
│  └─────────────────┘      └─────────────────┘           │
└─────────────────────────────────────────────────────────┘

The loop runs for each user message:

Check if context needs compaction
Call LLM with streaming (system prompt + history + tools)
Parse response for tool calls
Filter tools through mode restrictions
Execute allowed tools, return errors for blocked ones
Add results to context
Check for doom loops
Repeat until LLM produces no tool calls or max iterations reached

Mode Enforcement

Tools are restricted based on the current mode at three layers:

System prompt — Instructs the agent about mode constraints
Tool filtering — Only mode-allowed tools are sent to the LLM
Runtime blocking — Blocked calls return an error instead of executing

See Modes for the full breakdown.

Context Management

Token tracking uses a character-based estimate (~4 chars per token).

Compaction triggers at 90% of the model's context window:

Summarizes old messages while preserving recent ones
Keeps the last N messages untouched (default: 5)
Preserves completion reports, key decisions, and PLAN.md
Broadcasts context_usage events for the UI gauge

Doom Loop Detection

Detects when the agent gets stuck in repetitive patterns:

Identical consecutive calls — Same tool + same arguments 3+ times:

bash(ls) → bash(ls) → bash(ls)  → DETECTED

Repeating sequences — A-B-A-B patterns:

read(a) → edit(a) → read(a) → edit(a)  → DETECTED

When detected, a correction prompt is injected telling the agent to try a different approach.

DB-Persisted Writing Projects

Paper writing uses the writing_projects table:

Outline, sections, and bibliography are stored as structured data
Every write/update auto-saves to the database immediately
Writing state survives Celery worker restarts, server redeployments, and browser refreshes
The Paper tab in the UI reads directly from the database
Client-side export to Markdown or LaTeX

Redis Interrupt Relay

When a user clicks Stop:

Frontend sends POST /api/interrupt
Web process publishes interrupt signal to Redis channel
Celery worker receives the signal
Worker kills the running agent task immediately
interrupted event is broadcast via SSE

This is a real kill, not a cooperative flag check. The agent stops within seconds regardless of what tool is executing.

Sub-Agent Streaming

The research tool spawns an independent sub-agent:

Sub-agent has its own context window and tool set
Parent agent sees nested tool calls streamed in real-time
Frontend displays nested tool calls inline within the research tool output
Useful for deep dives that would consume too much of the main context

Per-Conversation Processing

Each conversation gets isolated state:

Own agent session, tool router, and sandbox manager
Processing state tracked independently (idle / processing / interrupted)
Multiple conversations can process in parallel
Interrupting one does not affect others

Configuration

Key settings in AgentConfig:

python

@dataclass
class AgentConfig:
    model_name: str = ""                    # LLM to use (empty = auto-detect)
    max_iterations: int = 300               # Tool calls per turn
    stream: bool = True                     # Stream responses
    compact_threshold_ratio: float = 0.90   # Compact at 90%
    untouched_messages: int = 5             # Keep last N during compaction
    default_max_tokens: int = 200000        # Fallback context size
    yolo_mode: bool = False                 # Skip confirmations

Agent Harness ​

Overview ​

Agent Loop ​

Mode Enforcement ​

Context Management ​

Doom Loop Detection ​

DB-Persisted Writing Projects ​

Redis Interrupt Relay ​

Sub-Agent Streaming ​

Per-Conversation Processing ​

Configuration ​