The Idea & The Stack
It started as a simple idea: As an AI developer, why send a static PDF resume when I could send an interactive AI version of myself? The traditional job application process is inherently dry. You send a piece of paper, wait for a human to parse the bullet points, and hope they understand the nuance of your engineering decisions. I wanted to build DanGPT to see how far I could push a Retrieval-Augmented Generation (RAG) system with my own professional history.
The Foundation:
• FastAPI backend for the core service
• Groq inference engine running llama-3.1-8b-instant
• fastembed for blazing-fast local dense vectors
From day one, the architecture was built for speed. If a recruiter asks a question, the response needs to feel instant. I chose Groq to run Llama 3.1 because the sub-second latency delivers the wow factor, and the API costs are incredibly cheap, which is essential for a side project. I wired this up to a FastAPI backend to handle the requests.
At first, development was smooth. I asked it about my skills, my Master's degree, and my project architectures. It nailed every single question. Thinking I had built a masterpiece, hubris set in. Then I asked my friends to test it out and it didn't work as planned. I proudly handed my phone to my friends to ask it questions.
Not as smooth as I thought
I had tested the system using the exact phrasing and cadence that I use. My friends, however, didn't care about my rigid testing parameters. They asked weird, highly contextual, conversational things. They challenged the bot. They asked multi-part questions spanning different jobs.
DanGPT fell apart. It started hallucinating skills I didn't even have. It lost track of the timeline. It worked perfectly for me, but it failed the reality test. It was a stark reminder of a core engineering truth: as you scale a system up to face real users, the complexity of edge cases scales exponentially. A working script is not a robust system.
Fixing the Core: STAR Chunking & Custom BM25
I went back to the drawing board to figure out why it hallucinated. The culprit was my chunking strategy. I had been using a standard, blind sliding-window approach. This meant that halfway through explaining a complex machine learning project, the chunk would just cut off. The model was receiving the "Action" in one chunk, and the "Result" in another. No wonder it couldn't piece the story together for my friends.
Through sheer trial and error, I threw out the sliding window. I rewrote the chunking logic to split my data explicitly by logical boundaries, targeting the STAR (Situation, Task, Action, Result) format. By ensuring every chunk was a self-contained narrative, the LLM finally had the proper context.
But retrieving those chunks required a better search engine. I wanted a hybrid search (combining dense vector similarity with sparse keyword matching). The standard industry reflex is to haul in heavy infrastructure like Pinecone or Elasticsearch. I rejected that to keep the system lightweight and zero-dependency. Although to improve it, I might add Pinecone, Chroma, or pgvector.
Instead, I implemented a custom Okapi BM25 class entirely in Python. This gave me absolute, surgical control over the tokenisation logic. At this point, I was essentially vibecoding the solution, bouncing between reading research papers and deep-diving Reddit to piece the architecture together. I paired this custom BM25 scorer with the dense vectors generated locally by fastembed. The result is a hyper-efficient hybrid search that runs entirely in a single FastAPI process, with no external vector databases required.
Defending the Clone
My friends proved another point: people will absolutely try to break an AI model if you put it in front of them. Realising that random internet users could bankrupt my API token limits or trick my AI clone into generating embarrassing responses, I had to build a defence mechanism.
I wrote preemptive injection detection logic right into the FastAPI layer. Before the LLM ever sees a prompt, the system scans the input for Base64 encodings and typoglycemia (spaced-out or scrambled trigger words). If you try any funny business with DanGPT, it (hopefully) catches it and shuts down the request.
What's Next?
DanGPT is live, and it's doing what I built it to do. It's a quirky and interactive showcase of my engineering skills. But it's not feature-complete.
My next goal is to enhance it by adding robust context-aware systems. Right now, it answers questions excellently in isolation, but I want to give it deeper agentic memory so it can track the flow of a multi-turn interview conversation over time. I'm working on that architecture right now.
To avoid cold start issues over time, I'm running a cron job that keeps the instance warm at peak hours while shutting the instance down during off-peak hours.
Conclusion
Building DanGPT was an exercise in turning an abstract idea into a robust, production-ready system. This was incredibly fun, 10/10 will do it again. Reach out if you want to build something cool.