Context Compaction: Managing Conversation History¶

Overview¶

Context Compaction is the automatic compression of conversation history to prevent exceeding token budget limits. It allows unlimited conversation length by intelligently summarizing completed work while preserving recent context and task state.

Problem: Token Budget Limits¶

The agent loop operates within a token budget (typically 100k-200k tokens):

Budget: 100,000 tokens
├─ System prompt: ~2,000 tokens
├─ Current iteration API response: ~5,000 tokens
├─ Message history: grows with every iteration
└─ Reserve for next iteration: ~4,000 tokens

After many iterations:
  Iteration 50: Message history is 60,000 tokens
  Iteration 51: Only 20,000 remaining for next iteration
  → Cannot ask Claude substantive questions
  → Loop must terminate soon

Solution: Compress old messages before budget is exhausted.

Token Budget States¶

State Indicator    Token Usage    Behavior
─────────────────────────────────────────────
NORMAL             < 80%          Continue normally
WARNING            80-95%         Auto-compact if enabled
CRITICAL           > 95%          May force truncation
EXHAUSTED          >= 100%        Loop terminates

The system tracks tokens in real-time and adjusts behavior accordingly.

Automatic Compaction¶

Trigger Conditions¶

Automatic compaction occurs when:

Token usage enters WARNING state (80%+ of budget) AND
Auto-compact is enabled (default: yes)

Example:

Budget: 100,000 tokens
Current usage: 84,000 tokens (84%)
→ WARNING state → autoCompact() triggered
→ Compress old messages
→ Reclaim 40,000 tokens
→ Continue with 44,000 tokens available

Compaction Algorithm¶

The compact() function applies the following strategy:

function compact(messages: Message[], config: CompactConfig) {
  // Step 1: Find compaction boundary
  // Skip recent messages, start from older ones
  const boundaryIndex = findCompactionBoundary(
    messages,
    keepRecentMessages: 5,  // Always keep 5 recent
    minCompactSize: 10000   // Need at least 10k tokens to compact
  )

  // Step 2: Build pre-compact messages (unchanged)
  const preCompactMessages = messages.slice(0, boundaryIndex)

  // Step 3: Generate summary
  const summary = await generateCompactSummary(
    preCompactMessages,
    {
      includeToolUse: true,
      includeReasons: true,
      preserveState: ["tasks", "permissions"]
    }
  )

  // Step 4: Create boundary message
  const boundaryMessage = {
    type: "SystemCompactBoundaryMessage",
    content: summary,
    timestamp: Date.now(),
    tokensReclaimed: preCompactTokens - summaryTokens
  }

  // Step 5: Build post-compact messages
  const postCompactMessages = [
    boundaryMessage,  // Synthetic message marking compaction point
    ...messages.slice(boundaryIndex)  // Recent messages unchanged
  ]

  // Step 6: Return compressed messages
  return postCompactMessages
}

Compaction Example¶

Before Compaction¶

Message 1:  User: "Analyze this bug"
            [50 tokens]
Message 2:  Assistant: "I'll investigate..."
            [100 tokens]
Message 3:  ToolUse: Read src/main.ts
            [50 tokens]
Message 4:  ToolResult: [file contents - 500 tokens]
            [500 tokens]
Message 5:  Assistant: "I found the bug..."
            [200 tokens]
Message 6:  ToolUse: Edit src/main.ts (fix)
            [100 tokens]
Message 7:  ToolResult: Success [50 tokens]
            [50 tokens]
... 50+ more messages ...
Message 60: [3000 tokens]
Message 61: [Current user query - 200 tokens]

Total: 87,000 tokens (87% of budget) → WARNING

Compaction Strategy¶

Keep messages 58-61 (recent), summarize 1-57 (old):

Summary:
"The user asked to analyze a bug in src/main.ts.
After investigation, found memory leak in loop variable.
Fixed with variable initialization on line 42.
User then asked for testing. Created 5 new unit tests
covering edge cases. All tests passing."

[300 tokens]

After Compaction¶

SystemCompactBoundaryMessage (generated):
  "Previous conversation: Analyzed bug in src/main.ts,
   found memory leak, applied fix (line 42), wrote tests."
  [300 tokens]

Message 58: User: "..."
            [1000 tokens]
Message 59: Assistant: "..."
            [1200 tokens]
Message 60: ToolUse: "..."
            [100 tokens]
Message 61: User: "..."
            [200 tokens]

Total: 2,800 tokens (2.8% of budget) → NORMAL
Tokens reclaimed: ~84,200 tokens

Preserved Information¶

Compaction preserves critical state:

1. Task State¶

All task information preserved: - Current tasks and status - Dependencies (blockedBy, blocks) - Assigned agents - Task metadata

// Tasks extracted and preserved in summary
"Active tasks:
  - Task 1 (in_progress): Fix API endpoint
  - Task 2 (pending): Write tests (blocked by Task 1)
  - Task 3 (completed): Database migration"

2. Permission Rules¶

All permission context preserved: - Allowed/denied paths - Tool access levels - Permission mode

"Permissions:
  - Can edit: src/**, tests/**
  - Can execute: npm test, npm run build
  - Cannot execute: rm, destructive commands"

3. Key Decisions¶

Important decisions and findings: - Why certain approaches were chosen - Known issues or constraints - API keys or configuration (if appropriate)

"Decided: Use React for UI (not Vue) due to existing
codebase. Constraint: Cannot modify database schema
until migration complete."

4. Recent Context¶

Always kept verbatim: - Last N messages (user-configurable, default: 5) - Current state - Active operations

Manual Compaction¶

Users can manually trigger compaction:

# Interactive command
> /compact

# Initiates:
// 1. Display token usage
// 2. Show what will be summarized
// 3. Execute compaction
// 4. Display tokens reclaimed

Context Collapse (Experimental)¶

Advanced feature that aggressively collapses context:

When enabled: Further compaction by: - Combining redundant messages - Summarizing tool chains - Removing intermediate steps

Example:

Before collapse:
  Tool: Read file A
  Result: contents
  Tool: Read file B
  Result: contents
  Tool: Grep pattern
  Result: matches

After collapse:
  Summary: Searched pattern across files A and B,
           found N matches in file A, 0 in file B

Reactive Compaction (Experimental)¶

Proactive compaction before WARNING state:

Normal operation: < 80% used
→ Reactive compaction triggers at 70%
→ Preemptively compresses
→ Maintains larger buffer for next iteration
→ Smoother operation, fewer stalls

Compaction Events & Hooks¶

System emits events during compaction:

// Hook: pre_compact
// Called before compaction starts
{
  type: "hooks_start",
  hookType: "pre_compact",
  tokensUsed: 84000,
  tokenBudget: 100000
}

// Compaction executes...

// Hook: post_compact
// Called after compaction completes
{
  type: "hooks_end",
  hookType: "post_compact",
  tokensBefore: 84000,
  tokensAfter: 2800,
  tokensReclaimed: 81200
}

Configuration¶

Auto-Compact Settings¶

Control compaction behavior:

// In config
const compactConfig = {
  enabled: true,                      // Enable auto-compact
  triggerThreshold: 0.80,             // 80% budget = trigger
  keepRecentMessages: 5,              // Always preserve 5 recent
  minCompactSize: 10000,              // Need 10k+ to compact
  maxCompactSize: 50000,              // Compact max 50k per round
  enableReactiveCompact: false,       // (experimental)
  enableContextCollapse: false        // (experimental)
}

Disable Auto-Compact¶

For development or specific workflows:

export CLAUDE_AUTO_COMPACT=false

# Now manual /compact only

Performance Characteristics¶

Compaction Time¶

Typical compaction takes 1-5 seconds:

Time Breakdown:
  Parse old messages:        0.5s
  Generate summary (API):    2-3s
  Build new message list:    0.5s
  Write to disk:             0.1s
  ─────────────────────────────
  Total:                     3-4s

Token Reclamation¶

Typical reclamation: 80-90% of compacted messages

Messages to compact: 50,000 tokens
Summary generated: 300-500 tokens
Net reclamation: ~49,500 tokens (99% savings!)

Impact on Loop¶

Compaction pauses the agent loop:

Loop iteration N:
  API call, tool results, etc.
  Check tokens: 85% used → WARNING

→ Trigger compaction (user notified)
→ Pause for 3-4 seconds
→ Reclaim 50,000 tokens

Loop iteration N+1:
  Continue with 32% usage (refreshed!)

Edge Cases¶

Small Conversations¶

Compaction is skipped if too little to compact:

Messages: 5, total tokens: 3,000
→ All under minCompactSize threshold
→ Compaction skipped
→ No benefit from compacting so little

Recent Message-Heavy Conversations¶

If many recent messages can't be compacted:

Recent 10 messages: 70,000 tokens (protected)
Older messages: 18,000 tokens (compactible)

→ Compaction only targets 18,000
→ Maybe not enough to drop below WARNING
→ Might need to terminate loop

Circular Tool Calls¶

Tool calls that repeat (loop):

Iteration 1: Call Grep
Iteration 2: Call Grep (similar query)
Iteration 3: Call Grep (similar query)

Summary: "Executed similar grep queries N times
          with results: ..."

Best Practices¶

1. Clear Task Descriptions¶

Help compaction preserve important context:

✅ Good:

Task: "Fix critical security vulnerability in JWT validation.
       Tokens currently don't verify algorithm field.
       See CVE-2023-1234 for details."

❌ Bad:

Task: "Fix security bug"

2. Checkpoint Completion¶

Mark tasks/milestones as complete:

✅ Good:

// Periodically:
TaskUpdate(task, status="completed")  // Clear checkpoint

// New task:
TaskCreate(task)  // Start fresh

❌ Bad:

// One massive conversation:
User: "Do everything..."
// 200+ messages accumulate
// Hard to compact meaningfully

3. Use Subagents for Long Tasks¶

Delegate to subagents to keep parent loop clean:

✅ Good:

Parent spawns:
  Subagent-1: "Analyze API"
  Subagent-2: "Analyze Database"

Parent compacts between spawns
Each subagent has fresh budget

❌ Bad:

Single agent does everything
Message history grows unbounded
Compaction becomes very lossy

Key Files¶

File	Purpose
`src/services/compact/autoCompact.ts`	Compaction trigger logic
`src/services/compact/compact.ts`	Core compaction algorithm
`src/services/compact/reactiveCompact.ts`	(experimental) proactive compaction
`src/services/contextCollapse/`	(experimental) aggressive collapse
`src/query/tokenBudget.ts`	Token tracking and states
`src/query/transitions.ts`	State transition logic

Context Compaction: Managing Conversation History¶

Overview¶

Problem: Token Budget Limits¶

Token Budget States¶

Automatic Compaction¶

Trigger Conditions¶

Compaction Algorithm¶

Compaction Example¶

Before Compaction¶

Compaction Strategy¶

After Compaction¶

Preserved Information¶

1. Task State¶

2. Permission Rules¶

3. Key Decisions¶

4. Recent Context¶

Manual Compaction¶

Context Collapse (Experimental)¶

Reactive Compaction (Experimental)¶

Compaction Events & Hooks¶

Configuration¶

Auto-Compact Settings¶

Disable Auto-Compact¶

Performance Characteristics¶

Compaction Time¶

Token Reclamation¶

Impact on Loop¶

Edge Cases¶

Small Conversations¶

Recent Message-Heavy Conversations¶

Circular Tool Calls¶

Best Practices¶

1. Clear Task Descriptions¶

2. Checkpoint Completion¶

3. Use Subagents for Long Tasks¶

Key Files¶

See Also¶