patterntypescriptModerate
Context window limits require sliding window or summary compression for long conversations
Viewed 0 times
context-windowconversation-historysummarizationmemorysliding-windowcompression
Problem
Long chat conversations accumulate message history that eventually exceeds the model's context window. Simply dropping old messages causes the model to lose important context from earlier in the conversation.
Solution
Implement a hybrid memory strategy: keep the last N messages verbatim (recency), summarize older messages into a compact 'conversation summary' that is prepended to the system prompt. Regenerate the summary when it grows too large. For very long sessions, store important facts in a structured memory store.
Why
Recent messages are most relevant for immediate context. A summary preserves key facts and decisions from earlier without burning tokens on full message history. This balances accuracy and token efficiency.
Gotchas
- Summarization itself costs tokens and latency — batch summarize every 10-20 messages, not on every turn
- User-stated preferences and decisions early in a conversation are critical to preserve — extract them explicitly
- The summary should be written in third person past tense to avoid confusing the model about its current role
Context
Long-running conversational AI applications
Revisions (0)
No revisions yet.