I’ve been trying to get my head around RAG systems with AI, and I wanted to build something that would help me cement my knowledge. I enjoy learning this way as it allows me to pick up a basic understanding of something by reading or watching tutorials, then figure it out as I go by building something. I have been thinking of a fun use case for AI & RAG systems so I decided on a text based AI driven RPG (Roll playing game).

The concept is fairly simple: role play D&D style game engine, but giving control of the NPCs to the AI LLM in battle scenarios and to bring non-combat NPCs to life in game.

The moving parts were easy to get working

I configured Docker with Ollama, set up a basic interface using Python and Laravel, and got it working within a few hours. Ask a question, get an answer. Perfect. But the moment I tried to have an actual conversation, I realised the default setup of an LLM is fairly basic.

What I found is that the AI couldn’t remember the previous prompt at all.

A simple out of the box AI LLM has no memory, it is only concerned about what you are asking it at that very moment.

I thought I could overcome this by recording all the conversations and provide them as context in the next request. But the smaller models (I was using Qwen2:1.5b) couldn’t filter through previous conversations properly with all the filler text around questions and answers. It wasn’t long before it started hallucinating and forgetting the original prompt (“you are an AI LLM roleplaying an NPC in a fantasy world”) and going off script, which was both hilarious and disturbing.

One or two questions worked fine, then suddenly the AI started having full two-way conversations with itself, all within a single response. It thought it needed to simulate the entire conversation rather than roleplay and wait for my response. I’d ask “What’s for sale today?” and get back something like:

Shopkeeper: "We have fine weapons and armour today, traveller."
You: "What kind of weapons?"
Shopkeeper: "Swords, axes, bows. What interests you?"
You: "I'll take a sword."
Shopkeeper: "That'll be 50 gold pieces."

All without me saying a word after my initial question.

The key facts approach (better, but not much)

I did some research and a recommended approach was to cherry pick key facts and store this to provide in the context when talking, so I evolved the system to extract key facts from conversations using the AI agent running parallel requests, then inject only relevant highlights when needed. This worked better but revealed new problems. The history kept growing, and I was pulling in irrelevant facts with the context. Important nuanced information got lost in translation.

I had a system that remembered my conversations but completely forgot the context of why I was asking certain questions.

It was surprisingly easy once it all clicked together

After some more research, I found what I needed: proper vectorisation of the extracted facts, then semantic search to filter information before constructing the context. Use the RAG system to find relevant information rather than dumping everything in.

Once I implemented vector search using Redis, the system could find conversations with similar meaning — not just matching keywords. When someone asked “How do I defeat dragons?” it would retrieve discussions about dragon weaknesses, combat strategies, and magical defences, even if those conversations used completely different words.

The breakthrough was the missing piece and the final comprehension of how the RAG system worked together with the LLM

This realisation changed my understanding

Working through this process gave me the understanding that AI only has the context window to give it all the history and facts to then ask your question. I hadn’t fully comprehended this fundamental limitation before.

AI doesn’t actually have memory. Every single AI request is completely isolated. The model can’t “look up” previous conversations, can’t “remember” user preferences, and can’t “access” established context.

It’s like hiring a consultant with complete amnesia who needs a detailed briefing before every meeting. No matter how many times you’ve met, you must start from scratch each time with their role, your company context, this customer’s history, and the current situation.

Miss any of these briefing points, and the consultant gives responses that feel disconnected or completely wrong.

This is exactly how AI works. Every request requires manually constructing and injecting complete context.

Once this clicked, everything else made sense. The AI talking to itself was trying to fill in missing context by simulating the conversation it thought should have happened. The memory problems weren’t bugs — they were the fundamental nature of how these systems work.

Why this matters beyond my RPG

People building AI features might not understand this limitation. They assume AI has some form of persistent memory or background knowledge about their specific context. This misunderstanding is why so many AI implementations feel inconsistent or break down in extended use.

Understanding context construction isn’t just technical — it’s the difference between AI that works and AI that frustrates users.

I would encourage people to try it for themselves

Building for fun means I’m not precious about architecture and design, just focused on getting something working. Most of the time it’s to learn so I can stumble through it at my own pace. This allows me to try and fail when the stakes are lower, rather than making these mistakes at work.

There’s something liberating about experimenting when nobody’s watching and nothing critical depends on it working perfectly.

You also really get to lift the lid on these systems rather than use an off the shelf solution.

I hope this helps your journey

I have a long way to go for a fully operational RPG, I hope this post helps someone else wanting to explore AI and RAG, or someone struggling to put the pieces together like I did.

The most important insight isn’t about vector databases or RAG architectures. It’s understanding that AI has no inherent memory, and everything depends on how you construct the context for each individual request.

Once you grasp that fundamental limitation, you can build AI experiences that actually work consistently. Before that realisation, you’re just fighting against the basic nature of how these systems operate.

I’ll continue to build in the open and write about my journey, stay tuned for more progress updates!

By Ben

Leave a Reply

Your email address will not be published. Required fields are marked *