Claude Code /compact and Auto-Memory: Master Context Window Management

February 2026 • 10 min read

Every Claude Code session has a hard ceiling: 200,000 tokens of context. That sounds like a lot until you are deep in a complex refactor, reading dozens of files, running test suites, and iterating on implementations. At that point, the context window fills up fast, and when it does, the agent's quality degrades -- it forgets earlier decisions, repeats work, or contradicts itself.

Managing context is not optional. It is the difference between a productive 2-hour session and one where you restart three times because the agent lost track of what it was doing. Here is how to do it right.

Understanding the 200K Token Budget

Before optimizing, you need to understand where your tokens go. A typical Claude Code session consumes context in four categories:

                Token Consumption Breakdown
                System prompt and CLAUDE.md: 2,000-5,000 tokens (loaded once at session start)
File reads: 500-5,000 tokens per file (cumulative -- every file the agent reads stays in context)
Tool use and outputs: 200-2,000 tokens per tool call (bash commands, grep results, file edits)
Conversation history: Everything you type plus every response the agent generates

            

A session that reads 30 files, runs 20 commands, and has 15 back-and-forth exchanges can easily consume 150,000 tokens. That leaves very little room for the agent to continue working effectively.

You can check your current usage at any time by running /cost in your Claude Code session. This shows token consumption and estimated spend -- both useful signals for knowing when to intervene.

The /compact Command: What It Does and When to Use It

The /compact command is your primary tool for context management. When you run it, Claude Code summarizes the entire conversation history into a compressed representation, typically achieving a 60-70% reduction in token usage. The agent retains the key decisions, file modifications, and current state while discarding the verbose intermediate steps.

Think of it as the agent writing itself a set of notes, then starting fresh with those notes as context. The notes contain what matters -- which files were changed, what the current objective is, what patterns to follow -- without the full transcript of how it got there.

When to Compact

After completing a distinct task. You finished implementing a feature and are about to start a new one. Compact before context from the first task pollutes the second.
When you notice quality degradation. If the agent starts forgetting earlier decisions, repeating questions, or producing inconsistent code, your context window is likely saturated.
At regular intervals during long sessions. A good rule of thumb: compact every 45-60 minutes of active work, or whenever /cost shows you are above 120K tokens.
Before a complex operation. If you are about to ask the agent to refactor a large module or implement something that requires reading many files, compact first to give it maximum room.

When Not to Compact

Mid-task. If the agent is in the middle of implementing something and you compact, it may lose important intermediate state. Let the current task complete first.
When you need the agent to reference specific earlier dialogue. After compaction, the exact wording of earlier exchanges is gone. If you need the agent to recall precisely what you said three prompts ago, compact after that reference is no longer needed.

Pro tip: You can provide a custom focus to /compact by running /compact focus on the authentication refactor. This tells the summarizer to prioritize retaining context about a specific topic, which is useful when you know what the next phase of work will be.

Auto-Memory: Persistence Between Sessions

Compaction solves the within-session problem. But what happens when you close the terminal and come back tomorrow? Without explicit memory, the next session starts from zero -- no knowledge of yesterday's decisions, no awareness of which files were changed, no recollection of the architectural direction you chose.

Claude Code's auto-memory feature addresses this through the CLAUDE.md file system. Here is how it works:

Project-level memory: A CLAUDE.md file in your project root is automatically loaded at session start. It contains project conventions, architecture decisions, and current state.
User-level memory: A ~/.claude/CLAUDE.md file persists across all projects. Use this for personal preferences, common patterns, and cross-project conventions.
Session memory: During a session, the agent can update the CLAUDE.md file with new decisions and state. This is the "auto-save" mechanism -- tell the agent to "save this decision to memory" and it writes to the file.

Effective Memory File Structure

# Project: MyApp

## Architecture
- Next.js 14, App Router, TypeScript
- Prisma ORM with PostgreSQL
- Tailwind CSS for styling

## Conventions
- Named exports only
- API routes return { data, error } shape
- Tests colocated with source files

## Current State (updated 2026-02-28)
- Auth refactor: COMPLETE
- Dashboard redesign: IN PROGRESS
  - Header component done
  - Sidebar navigation done
  - Main content area: next task
- Known issues: #142 (race condition in WebSocket handler)

The "Current State" section is the most important part. Update it at the end of every session. When the next session starts, the agent immediately knows where you left off and what to work on next.

Cost Monitoring: The /cost Command

Context management is also cost management. Every token consumed costs money, and long sessions with bloated context are expensive. The /cost command gives you real-time visibility into your spending.

Develop a habit of checking /cost at regular intervals. Here is a practical monitoring schedule:

Every 30 minutes: Quick check. If you are under $2 for a 30-minute block, you are on track. If you are over $5, consider whether the agent is doing unnecessary file reads or generating overly verbose responses.
Before compaction: Check the cost to establish a baseline. After compacting, check again -- you should see lower per-prompt costs going forward.
End of session: Track daily spend to establish your personal baseline. Most developers using Claude Code actively spend $5-$15 per day. Spikes above that usually indicate context management issues.

Compact vs. Start Fresh: Decision Framework

Sometimes /compact is not enough. Sometimes you need a completely fresh session. Here is how to decide:

                Use /compact When:
                You are continuing the same project and general task area
The agent has made decisions you want it to remember
You are switching sub-tasks within the same feature
Context usage is between 60-85% of capacity

            

                Start a Fresh Session When:
                You are switching to a completely different project or feature
The agent's behavior has degraded significantly despite compaction
You want to change the fundamental approach (compacted context carries forward old assumptions)
Context has exceeded 90% and even compaction will not free enough space for the next task

            

When starting fresh, make sure your memory file is up to date first. Run /compact one final time, ask the agent to update the CLAUDE.md with current state, then close the session and open a new one. The new session loads the updated memory file and picks up seamlessly.

Shared-Context Patterns for Multi-Agent Workflows

When running multiple agents on the same project -- one for frontend, one for backend, one for tests -- context management becomes a coordination problem. Each agent has its own 200K token window, but they need to stay aware of each other's changes.

The solution is shared memory through the file system. All agents read the same CLAUDE.md file. When one agent makes an architectural decision, it writes to the memory file. The other agents pick up the change on their next file read.

Shared project CLAUDE.md: Contains architecture, conventions, and current state. All agents read this on startup.
Agent-specific sections: Add sections like "## Frontend Agent Notes" and "## Backend Agent Notes" so each agent can write without overwriting the others.
Coordination signals: Use the memory file to leave messages between agents. "Backend API for /users endpoint is ready, schema is in src/types/user.ts" tells the frontend agent exactly what it needs.

Beam's Install/Save Memory Workflow

Managing memory files manually works, but it adds friction. Every session start requires remembering to check the memory file. Every session end requires remembering to update it. Human memory is unreliable -- which is ironic when the goal is to manage AI memory.

Beam automates this with two toolbar buttons. Install Project Memory reads your memory file and installs it as the project's CLAUDE.md, ensuring every new session starts with full context. Save Project Memory captures the current session state and writes it back to the memory file.

The workflow becomes automatic:

Open Beam, click Install Memory -- session starts with full project context
Work with Claude Code for an hour or two, compacting as needed
When you finish, click Save Memory -- session state is preserved
Tomorrow, repeat from step 1 -- zero context loss between days

This eliminates the most common context management failure: forgetting to save state before closing a session. When saving is a single click rather than a manual file edit, it actually happens consistently.

Context window management is not glamorous. It is not the exciting part of agentic coding. But it is the part that determines whether your agent sessions are productive or frustrating, whether your costs are predictable or surprising, and whether your multi-day projects maintain coherence or restart from scratch every morning. Master /compact, maintain your memory files, and monitor your costs. Everything else in agentic coding gets easier when context is under control.

Never Lose Context Again

Beam's one-click Install and Save Memory workflow preserves your project context across every session. Stop restarting from scratch.

Download Beam Free

Claude Code /compact and Auto-Memory: Master Context Window Management

Understanding the 200K Token Budget

Token Consumption Breakdown

The /compact Command: What It Does and When to Use It

When to Compact

When Not to Compact

Auto-Memory: Persistence Between Sessions

Effective Memory File Structure

Cost Monitoring: The /cost Command

Compact vs. Start Fresh: Decision Framework

Use /compact When:

Start a Fresh Session When:

Shared-Context Patterns for Multi-Agent Workflows

Beam's Install/Save Memory Workflow

Never Lose Context Again

Related Articles

AI Agent Memory Management

The Persistent Memory Pattern

Managing Multiple Claude Code Sessions