An AI memory infrastructure that promises continuity, without magic

Published on 5/14/2026

Supermemory's GitHub repository has been active since April 2024. In October 2025, a seed round was announced that raised approximately 3 million dollars, with participation from figures like Jeff Dean (Google) and Logan Kilpatrick (DeepMind). The founder is Dhravya Shah, nineteen years old at the time of the announcement, originally from Mumbai, with a background in AI infrastructure and experience at Cloudflare. A biographical profile lists him as a former AI engineer at Mem0, although this information has not been independently confirmed by multiple sources.

The project has collected over 18,000 stars on GitHub. It's worth understanding what it does, because it tackles a concrete problem: large language models forget everything between conversations.

The problem Supermemory tries to solve

A classic LLM is stateless: every request starts from scratch. Even with large context windows, the model doesn't retain memory across different sessions. Users must repeat preferences, context, and ongoing projects.

Traditional RAG has tried to mitigate the problem by retrieving documents from a vector database. But classic RAG searches by semantic similarity, doesn't handle temporal relationships, doesn't distinguish between stable facts and obsolete information, and doesn't resolve contradictions.

Supermemory positions itself as an intermediate layer: not a vector database, not pure RAG, but a system that extracts facts from conversations, organizes them, handles updates over time, and automatically removes expired information.

How it's built, according to public documentation

Supermemory describes its architecture as a "five-layer context stack." The structure below follows this official terminology, integrated with technical details from public documentation.

The system integrates connectors for heterogeneous sources: Google Drive, Gmail, Notion, Slack, GitHub, S3. Synchronization happens via webhooks for real-time updates.

Content is processed with differentiated strategies: OCR for images, transcription for audio and video, AST-aware chunking for code via the code-chunk library to preserve the logical integrity of functions and classes, semantic cleaning for web pages.

Extracted data is transformed into vector embeddings and enriched with explicit relationships in a knowledge graph. The system distinguishes between updates that replace previous information (updates), enrichments that expand them (extends), and inferences derived from multiple patterns (derives).

Memories are indexed in both a vector database and the graph, enabling hybrid searches. According to data declared by the vendor, retrieval for user profiles takes approximately 50 milliseconds, while generic retrieval stays under 300 milliseconds. These metrics have not been independently verified.

At user query time, the engine combines vector search, BM25 keywords, and contextual reranking. Support for the Model Context Protocol allows sharing the same memory across compatible clients like Claude Desktop, Cursor, and Windsurf. The "meta-MCP" implementation generates a unique URL per user that serves as a personal endpoint, simplifying integration without repeated OAuth configurations.

Documented use cases

Scira AI. This open-source project, an alternative to Perplexity for structured searches, migrated from Mem0 to Supermemory. According to a case study published on Supermemory's blog, the transition led to reduced latency, a 32% increase in usage, and the acquisition of ten premium customers attracted by the memory feature. The source is the vendor itself; there are no independent reports with the same data.

Montra and Cluely. Both are cited as customers in media coverage of the launch. Montra is an AI video editor, Cluely a desktop assistant backed by a16z. Detailed workflows or operational metrics are not public; their presence indicates adoption, not proof of specific effectiveness.

Benchmarks: what they claim, what's missing

Supermemory claims first place on LongMemEval, LoCoMo, and ConvoMem. However, these benchmarks are self-declared and not peer-reviewed independently. Recent competitors like Hindsight and Mastra OM report higher scores on LongMemEval in single-pass retrieval configurations.

The project has also created MemoryBench, an open-source framework for evaluating conversational memory systems. It's a useful tool for the community, but results published so far come primarily from the Supermemory team itself.

Limitations worth considering

The core memory engine is proprietary and cloud-hosted. Plugins and clients are open-source, but the retrieval and graph management system is not publicly inspectable. Self-hosting requires an Enterprise agreement not available to the public.

The product is relatively recent compared to alternatives like Zep or Letta. The consumer app is in early access, with known bugs reported by users.

Benchmarks are vendor-declared. For teams with independent validation requirements or in regulated sectors, this lack of transparency can be an obstacle.

In summary

Supermemory tackles a real problem with an architecture that combines a semantic graph, hybrid retrieval, and automatic user profiles. The most solid use cases, like Scira AI, suggest it works for some scenarios. The limitations — closed-source core, self-declared benchmarks, product youth — are significant for those considering adoption in critical contexts.

It's not a magic solution. It's a framework with an interesting technical idea, built by a young team with serious backing. It's worth monitoring if context continuity is a priority for your use case.

Contact me

Do you have an idea and want to see if it could work? Want to talk about technology? Interested in organizing a talk?

Contact me