Context Compaction: 管理对话历史¶

概述¶

Context Compaction 是对话历史的自动压缩，防止超过 token 预算限制。它允许通过智能总结已完成的工作，同时保留最近的上下文和任务状态来实现无限的对话长度。

问题: Token 预算限制¶

Agent 循环在 token 预算内运行（通常 100k-200k tokens）:

Budget: 100,000 tokens
├─ System prompt: ~2,000 tokens
├─ Current iteration API response: ~5,000 tokens
├─ Message history: grows with every iteration
└─ Reserve for next iteration: ~4,000 tokens

After many iterations:
  Iteration 50: Message history is 60,000 tokens
  Iteration 51: Only 20,000 remaining for next iteration
  → Cannot ask Claude substantive questions
  → Loop must terminate soon

解决方案: 在预算耗尽之前压缩旧消息。

Token 预算状态¶

State Indicator    Token Usage    Behavior
─────────────────────────────────────────────
NORMAL             < 80%          Continue normally
WARNING            80-95%         Auto-compact if enabled
CRITICAL           > 95%          May force truncation
EXHAUSTED          >= 100%        Loop terminates

系统实时跟踪 tokens 并相应调整行为。

自动 Compaction¶

触发条件¶

自动 compaction 在以下情况发生：

Token 使用进入 WARNING 状态（预算的 80%+）并且
Auto-compact 已启用（默认：是）

示例：

Budget: 100,000 tokens
Current usage: 84,000 tokens (84%)
→ WARNING state → autoCompact() triggered
→ Compress old messages
→ Reclaim 40,000 tokens
→ Continue with 44,000 tokens available

Compaction 算法¶

compact() 函数应用以下策略：

function compact(messages: Message[], config: CompactConfig) {
  // Step 1: Find compaction boundary
  // Skip recent messages, start from older ones
  const boundaryIndex = findCompactionBoundary(
    messages,
    keepRecentMessages: 5,  // Always keep 5 recent
    minCompactSize: 10000   // Need at least 10k tokens to compact
  )

  // Step 2: Build pre-compact messages (unchanged)
  const preCompactMessages = messages.slice(0, boundaryIndex)

  // Step 3: Generate summary
  const summary = await generateCompactSummary(
    preCompactMessages,
    {
      includeToolUse: true,
      includeReasons: true,
      preserveState: ["tasks", "permissions"]
    }
  )

  // Step 4: Create boundary message
  const boundaryMessage = {
    type: "SystemCompactBoundaryMessage",
    content: summary,
    timestamp: Date.now(),
    tokensReclaimed: preCompactTokens - summaryTokens
  }

  // Step 5: Build post-compact messages
  const postCompactMessages = [
    boundaryMessage,  // Synthetic message marking compaction point
    ...messages.slice(boundaryIndex)  // Recent messages unchanged
  ]

  // Step 6: Return compressed messages
  return postCompactMessages
}

Compaction 示例¶

Compaction 之前¶

Message 1:  User: "Analyze this bug"
            [50 tokens]
Message 2:  Assistant: "I'll investigate..."
            [100 tokens]
Message 3:  ToolUse: Read src/main.ts
            [50 tokens]
Message 4:  ToolResult: [file contents - 500 tokens]
            [500 tokens]
Message 5:  Assistant: "I found the bug..."
            [200 tokens]
Message 6:  ToolUse: Edit src/main.ts (fix)
            [100 tokens]
Message 7:  ToolResult: Success [50 tokens]
            [50 tokens]
... 50+ more messages ...
Message 60: [3000 tokens]
Message 61: [Current user query - 200 tokens]

Total: 87,000 tokens (87% of budget) → WARNING

Compaction 策略¶

保留消息 58-61（最近的），总结 1-57（旧的）：

Summary:
"The user asked to analyze a bug in src/main.ts.
After investigation, found memory leak in loop variable.
Fixed with variable initialization on line 42.
User then asked for testing. Created 5 new unit tests
covering edge cases. All tests passing."

[300 tokens]

Compaction 之后¶

SystemCompactBoundaryMessage (generated):
  "Previous conversation: Analyzed bug in src/main.ts,
   found memory leak, applied fix (line 42), wrote tests."
  [300 tokens]

Message 58: User: "..."
            [1000 tokens]
Message 59: Assistant: "..."
            [1200 tokens]
Message 60: ToolUse: "..."
            [100 tokens]
Message 61: User: "..."
            [200 tokens]

Total: 2,800 tokens (2.8% of budget) → NORMAL
Tokens reclaimed: ~84,200 tokens

保留的信息¶

Compaction 保留关键状态：

1. Task 状态¶

所有 task 信息都被保留： - 当前 tasks 和状态 - 依赖关系（blockedBy, blocks） - 分配的 agents - Task metadata

// Tasks extracted and preserved in summary
"Active tasks:
  - Task 1 (in_progress): Fix API endpoint
  - Task 2 (pending): Write tests (blocked by Task 1)
  - Task 3 (completed): Database migration"

2. Permission 规则¶

所有 permission 上下文都被保留： - 允许/拒绝的路径 - Tool 访问级别 - Permission 模式

"Permissions:
  - Can edit: src/**, tests/**
  - Can execute: npm test, npm run build
  - Cannot execute: rm, destructive commands"

3. 关键决策¶

重要的决策和发现： - 为什么选择某些方法 - 已知问题或约束 - API keys 或配置（如果适当）

"Decided: Use React for UI (not Vue) due to existing
codebase. Constraint: Cannot modify database schema
until migration complete."

4. 最近的上下文¶

始终保持原样： - 最后 N 条消息（用户可配置，默认：5） - 当前状态 - 活动操作

手动 Compaction¶

用户可以手动触发 compaction：

# Interactive command
> /compact

# Initiates:
// 1. Display token usage
// 2. Show what will be summarized
// 3. Execute compaction
// 4. Display tokens reclaimed

Context Collapse（实验性）¶

高级功能，可以积极地折叠上下文：

启用时： 通过以下方式进一步压缩： - 合并冗余消息 - 总结 tool 链 - 删除中间步骤

示例：

Before collapse:
  Tool: Read file A
  Result: contents
  Tool: Read file B
  Result: contents
  Tool: Grep pattern
  Result: matches

After collapse:
  Summary: Searched pattern across files A and B,
           found N matches in file A, 0 in file B

Reactive Compaction（实验性）¶

在 WARNING 状态之前主动 compaction：

Normal operation: < 80% used
→ Reactive compaction triggers at 70%
→ Preemptively compresses
→ Maintains larger buffer for next iteration
→ Smoother operation, fewer stalls

Compaction 事件和 Hooks¶

系统在 compaction 期间发出事件：

// Hook: pre_compact
// Called before compaction starts
{
  type: "hooks_start",
  hookType: "pre_compact",
  tokensUsed: 84000,
  tokenBudget: 100000
}

// Compaction executes...

// Hook: post_compact
// Called after compaction completes
{
  type: "hooks_end",
  hookType: "post_compact",
  tokensBefore: 84000,
  tokensAfter: 2800,
  tokensReclaimed: 81200
}

配置¶

Auto-Compact 设置¶

控制 compaction 行为：

// In config
const compactConfig = {
  enabled: true,                      // Enable auto-compact
  triggerThreshold: 0.80,             // 80% budget = trigger
  keepRecentMessages: 5,              // Always preserve 5 recent
  minCompactSize: 10000,              // Need 10k+ to compact
  maxCompactSize: 50000,              // Compact max 50k per round
  enableReactiveCompact: false,       // (experimental)
  enableContextCollapse: false        // (experimental)
}

禁用 Auto-Compact¶

用于开发或特定工作流：

export CLAUDE_AUTO_COMPACT=false

# Now manual /compact only

性能特征¶

Compaction 时间¶

典型的 compaction 需要 1-5 秒：

Time Breakdown:
  Parse old messages:        0.5s
  Generate summary (API):    2-3s
  Build new message list:    0.5s
  Write to disk:             0.1s
  ─────────────────────────────
  Total:                     3-4s

Token 回收¶

典型回收：压缩消息的 80-90%

Messages to compact: 50,000 tokens
Summary generated: 300-500 tokens
Net reclamation: ~49,500 tokens (99% savings!)

对 Loop 的影响¶

Compaction 暂停 agent loop：

Loop iteration N:
  API call, tool results, etc.
  Check tokens: 85% used → WARNING

→ Trigger compaction (user notified)
→ Pause for 3-4 seconds
→ Reclaim 50,000 tokens

Loop iteration N+1:
  Continue with 32% usage (refreshed!)

边缘情况¶

小型对话¶

如果压缩内容太少，则跳过 compaction：

Messages: 5, total tokens: 3,000
→ All under minCompactSize threshold
→ Compaction skipped
→ No benefit from compacting so little

最近消息较多的对话¶

如果许多最近的消息无法压缩：

Recent 10 messages: 70,000 tokens (protected)
Older messages: 18,000 tokens (compactible)

→ Compaction only targets 18,000
→ Maybe not enough to drop below WARNING
→ Might need to terminate loop

循环 Tool 调用¶

重复的 tool 调用（循环）：

Iteration 1: Call Grep
Iteration 2: Call Grep (similar query)
Iteration 3: Call Grep (similar query)

Summary: "Executed similar grep queries N times
          with results: ..."

最佳实践¶

1. 清晰的 Task 描述¶

帮助 compaction 保留重要上下文：

✅ 好：

Task: "Fix critical security vulnerability in JWT validation.
       Tokens currently don't verify algorithm field.
       See CVE-2023-1234 for details."

❌ 差：

Task: "Fix security bug"

2. Checkpoint 完成¶

将 tasks/milestones 标记为完成：

✅ 好：

// Periodically:
TaskUpdate(task, status="completed")  // Clear checkpoint

// New task:
TaskCreate(task)  // Start fresh

❌ 差：

// One massive conversation:
User: "Do everything..."
// 200+ messages accumulate
// Hard to compact meaningfully

3. 为长任务使用 Subagents¶

委托给 subagents 以保持父 loop 清洁：

✅ 好：

Parent spawns:
  Subagent-1: "Analyze API"
  Subagent-2: "Analyze Database"

Parent compacts between spawns
Each subagent has fresh budget

❌ 差：

Single agent does everything
Message history grows unbounded
Compaction becomes very lossy

关键文件¶

File	目的
`src/services/compact/autoCompact.ts`	Compaction 触发逻辑
`src/services/compact/compact.ts`	核心 compaction 算法
`src/services/compact/reactiveCompact.ts`	(experimental) 主动 compaction
`src/services/contextCollapse/`	(experimental) 积极 collapse
`src/query/tokenBudget.ts`	Token 跟踪和状态
`src/query/transitions.ts`	状态转换逻辑

另请参阅¶

Agent Loop - 触发 compaction 的地方
Tasks - Compaction 期间的任务保存
Permissions - 权限保存
Hooks System - Compaction 事件