Long-horizon LLM agents finally get a lifecycle-aware memory primitive
L

Long-horizon LLM agents finally get a lifecycle-aware memory primitive

Long-horizon LLM agents finally get a lifecycle-aware memory primitive

1,278 stars
N/A forks
N/A contributors

README

Project documentation from GitHub

Long-Horizon LLM Agents Finally Get a Lifecycle-Aware Memory Primitive

If you've ever tried building an LLM agent that needs to remember state across hours or days of interaction, you know the pain. Context windows fill up. Old memories get evicted. The agent forgets it already solved a problem, or worse, hallucinates a new one.

Most attempts at long-term memory for LLMs are either too brittle (just dump everything into a vector store) or too manual (hand-write summarization logic). The PaperGuru Benchmark project on GitHub takes a different approach: a lifecycle-aware memory primitive that treats memory as something that lives, grows, and eventually expires.

It's not a hype-driven framework. It's a practical, well-thought-out primitive for agents that need to keep state across long horizons.

What It Does

The PaperGuru Benchmark is a testing ground for evaluating how well LLM agents handle long-horizon tasks with a structured memory system. At its core, it provides a memory primitive that tracks the lifecycle of information:

  • Birth: New information arrives and gets stored with metadata (timestamp, relevance score, source)
  • Life: The memory can be read, updated, or merged with other memories as the agent learns more
  • Death: Old, irrelevant, or contradictory memories are flagged for eviction or compression

This isn't just a vector store with timestamps. The primitive understands that some memories are more important than others, and that the same piece of information can change meaning over time.

Why It's Cool

The design is refreshingly developer-friendly. Instead of forcing you into a specific agent architecture, it gives you a memory interface you can plug into any LLM workflow. A few highlights:

  • Lifecycle hooks let you define custom policies for when memories should be promoted, merged, or deleted
  • Conflict resolution is built in — if two memories contradict each other (e.g., "project deadline is Friday" vs "deadline is Monday"), the system can flag the conflict rather than silently overwriting
  • Benchmarking is first-class: the repo includes a set of long-horizon tasks specifically designed to test memory fidelity over dozens of turns

What makes this particularly clever is the decay model. Memories don't just sit forever. They lose relevance based on how often they're accessed, their age, and whether they've been superseded by newer information. This mirrors how human memory works — recent, frequently used information stays sharp; old forgotten stuff fades.

How to Try It

Getting started is s

Did you like this issue?

Join our weekly newsletter

Love discovering amazing projects?

Help us continue bringing you the best open-source discoveries every week.

Back to Projects
Last updated: Jun 7, 2026