Stale AI Memory Across Sessions

The Problem

We had a chat feature that used an LLM with a conversation buffer so the model could see recent messages. Users reported that sometimes the bot seemed to "remember" things from a different conversation or from another user. In one case, a user started a new chat but the first reply from the bot referred to a topic from a previous session. That was a serious privacy and correctness bug.

Investigation

The chat API was built with a library that kept conversation history in an in-memory buffer. The buffer was keyed by something we thought was unique per conversation, but we had reused the same key across "new chat" clicks in the same browser session, or the key was not including the user or tenant ID. So when user A started a new chat, the buffer still contained messages from their previous chat, or in the worst case from another user if the key was only the session or request ID and that had been reused or mis-scoped.

I traced where the buffer was created and who passed the key. The key was generated on the client and sent with each request. For "new chat" we were not sending a new key or the backend was not creating a new buffer when it saw a new key. So the same in-memory buffer was being reused across what the user thought were separate conversations.

Root Cause

Conversation memory was stored in a shared or long-lived buffer that was not properly scoped to (user, conversation) or (tenant, user, conversation). So when a new conversation was started, the backend either kept using the same buffer or created a new one but did not clear the old one, and the wrong history was attached to the wrong conversation. The root cause was treating the buffer as a singleton or keying it by something that was not unique per conversation and per user.

The Fix

We made the memory key explicitly include tenant ID, user ID, and conversation ID. When the client started a new chat, it sent a new conversation ID; the backend created a new buffer for that key and did not reuse any previous buffer. For the "new chat" action on the frontend, we ensured we generated and sent a fresh conversation ID so the backend always started with an empty buffer.

We also added a safeguard: when the API received a message with a conversation ID it had not seen before, it initialised a new buffer for that key and did not attach any other buffer. That way even if the client sent a stale or wrong ID by mistake, we did not leak one conversation into another. We also considered TTL or eviction for old buffers so that in-memory state did not grow forever, but the immediate fix was correct scoping and creating a new buffer per new conversation.

Lessons Learned

In-memory conversation buffers must be keyed by (tenant, user, conversation) or equivalent so that different users and different chats never share the same history.
"New chat" must result in a new key on the backend; the client has to send a new conversation ID and the backend has to create a new buffer, not reuse the previous one.
When integrating a library that holds state (e.g. LangChain-style memory), always check how that state is keyed and who can share it. Defaults are often process-wide or session-wide and are unsafe for multi-tenant or multi-conversation use.