Context Compaction

Research on how other coding assistants implement context compaction to manage long conversations.

Overview

Context compaction (also called "handoff" or "summarization") is a technique to manage the context window in long coding sessions. When conversations grow too long, performance degrades and costs increase. Compaction summarizes the conversation history into a condensed form, allowing work to continue without hitting context limits.

Claude Code

Manual: /compact command Auto: Triggers at ~95% context capacity (source)

How it works

Takes entire conversation history
Uses an LLM to generate a summary
Starts a new session with the summary as initial context
User can provide custom instructions with /compact (e.g., "summarize only the TODOs") (source)

Prompt (extracted from community)

From r/ClaudeAI:

Your task is to create a detailed summary of the conversation so far, paying close attention to the user's explicit requests and your previous actions. This summary will be used as context when continuing the conversation, so preserve critical information including:
- What was accomplished
- Current work in progress  
- Files involved
- Next steps
- Key user requests or constraints

Key observations

Auto-compact triggers at ~95% capacity but users often recommend manual compaction earlier (source)
Quality can degrade with multiple compactions (cumulative information loss) (source)
Different from /clear which wipes history completely (source)
Users report the model can "go off the rails" if auto-compact happens mid-task (source)

OpenAI Codex CLI

Source: github.com/openai/codex (codex-rs/core/src/compact.rs, codex-rs/core/templates/compact/)

Manual: /compact slash command Auto: Triggers when token usage exceeds model_auto_compact_token_limit

How it works

Uses a dedicated summarization prompt
Sends entire history with the prompt appended
Collects the summary from the model response
Builds new history: initial context + recent user messages (up to 20k tokens) + summary
Replaces session history with the compacted version

Prompt

From codex-rs/core/templates/compact/prompt.md:

You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly continue the work.

Summary prefix (prepended to summaries in new context)

From codex-rs/core/templates/compact/summary_prefix.md:

Another language model started to solve this problem and produced a summary of its thinking process. You also have access to the state of the tools that were used by that language model. Use this to build on the work that has already been done and avoid duplicating work. Here is the summary produced by the other language model, use the information in this summary to assist with your own analysis:

Key observations

Uses token-based threshold (model_auto_compact_token_limit) rather than percentage (config/mod.rs)
Default thresholds vary by model (e.g., 180k for some models, 244k for others) (config/mod.rs)
Preserves recent user messages (last ~20k tokens worth) alongside summary (compact.rs)
Warns user: "Long conversations and multiple compactions can cause the model to be less accurate" (compact.rs)
Has retry logic with exponential backoff for failed compactions (compact.rs)
Uses "effective_context_window_percent" of 95% for safety margin (model_family.rs)

OpenCode (sst/opencode)

Source: github.com/sst/opencode (packages/opencode/src/session/compaction.ts)

Manual: /compact command Auto: Triggers when isOverflow() returns true (based on token usage vs model limits)

How it works

Checks if tokens exceed (context_limit - output_limit) (compaction.ts)
Creates a new assistant message marked as "summary"
Uses a compaction system prompt
Streams the summary generation
If auto-compaction, adds a "Continue if you have next steps" message

Prompt

From packages/opencode/src/session/prompt/compaction.txt:

You are a helpful AI assistant tasked with summarizing conversations.

When asked to summarize, provide a detailed but concise summary of the conversation. 
Focus on information that would be helpful for continuing the conversation, including:
- What was done
- What is currently being worked on
- Which files are being modified
- What needs to be done next
- Key user requests, constraints, or preferences that should persist
- Important technical decisions and why they were made

Your summary should be comprehensive enough to provide context but concise enough to be quickly understood.

Final user message

From compaction.ts:

Summarize our conversation above. This summary will be the only context available when the conversation continues, so preserve critical information including: what was accomplished, current work in progress, files involved, next steps, and any key user requests or constraints. Be concise but detailed enough that work can continue seamlessly.

Key observations

Has a "prune" mechanism separate from compaction (compaction.ts):
- Scans backward through tool calls
- Protects last 40k tokens of tool output (PRUNE_PROTECT constant)
- Prunes tool outputs beyond that threshold if >20k tokens prunable (PRUNE_MINIMUM constant)
Disables auto-compaction via OPENCODE_DISABLE_AUTOCOMPACT env var (flag.ts)
Separate summarization for UI display (2 sentences max) vs. compaction (detailed) (summary.ts)

Amp (Sourcegraph)

Source: ampcode.com/guides/context-management

Manual: "Handoff" feature Auto: None (manual context management encouraged)

How it works

Amp takes a different approach, providing tools for manual context management rather than automatic compaction:

Handoff: Specify a goal for the next task, Amp analyzes the current thread and extracts relevant information into a new message for a fresh thread
Fork: Duplicate context window at a specific point
Edit/Restore: Edit or restore to previous messages
Thread References: Reference other threads to extract information on-demand

Key observations

Philosophy: "For best results, keep conversations short & focused" (source)
Emphasizes that everything in context affects output quality: "everything in the context window has an influence on the output" (source)
Uses a secondary model to extract relevant information during handoff (source)
Thread references allow selective extraction without full context inclusion (source)
No automatic compaction; relies on user discipline and tooling

Implementation Recommendations for pi-coding-agent

`/compact` Command

// User triggers: /compact [optional custom instructions]
// 1. Generate summary using current conversation
// 2. Create new session with summary as initial context
// 3. Optionally continue with queued user message

Auto-compaction

// Threshold-based (e.g., 85-90% of context limit)
// Check after each turn:
if (tokenUsage / contextLimit > 0.85) {
  await compact({ auto: true });
}

Compaction Prompt

Based on research, a good compaction prompt should include:

Create a detailed summary for continuing this coding session. Include:

1. **Completed work**: What tasks were finished
2. **Current state**: Files modified, their current status
3. **In progress**: What is being worked on now
4. **Next steps**: Clear actions to take
5. **Constraints**: User preferences, project requirements, key decisions made
6. **Critical context**: Any information essential for continuing

Be concise but preserve enough detail that work can continue seamlessly.

Key Design Decisions

Threshold: 85-90% recommended (95% is often too late, per Claude Code user feedback)
Pruning: Consider pruning old tool outputs before full compaction (OpenCode approach)
Warning: Notify users that compaction happened and quality may degrade (Codex approach)
Disable option: Allow users to disable auto-compaction via flag/env (OpenCode approach)
Custom instructions: Support /compact [instructions] for targeted summaries (Claude Code approach)
Session continuity: New session should feel seamless (summary as hidden context)

Existing Infrastructure

The coding-agent already has:

/clear command that resets the session
Session management with message history
Token counting per turn

For compaction, we need to:

Add /compact command handler (similar to /clear but with summary)
Add token threshold checking after each assistant turn
Create a summarization prompt
Wire it to create a new session with the summary

badlogic/compaction.md

Select an option

No results found

Select an option

No results found

Context Compaction

Overview

Claude Code

How it works

Prompt (extracted from community)

Key observations

OpenAI Codex CLI

How it works

Prompt

Summary prefix (prepended to summaries in new context)

Key observations

OpenCode (sst/opencode)

How it works

Prompt

Final user message

Key observations

Amp (Sourcegraph)

How it works

Key observations

Implementation Recommendations for pi-coding-agent

`/compact` Command

Auto-compaction

Compaction Prompt

Key Design Decisions

Existing Infrastructure

badlogic/compaction.md

Context Compaction

Overview

Claude Code

How it works

Prompt (extracted from community)

Key observations

OpenAI Codex CLI

How it works

Prompt

Summary prefix (prepended to summaries in new context)

Key observations

OpenCode (sst/opencode)

How it works

Prompt

Final user message

Key observations

Amp (Sourcegraph)

How it works

Key observations

Implementation Recommendations for pi-coding-agent

/compact Command

Auto-compaction

Compaction Prompt

Key Design Decisions

Existing Infrastructure

`/compact` Command