It just changed the workflow:
1️⃣ Ask AI to build everything
2️⃣ Hit usage limits
3️⃣ Wait for the reset
4️⃣ Ask AI to fix everything it built
Modern software engineering.
Like many of you, I use bike-share system pretty regularly. And, honestly, it drives me nuts: either there’s not a single bike when I need one, or the station is packed and I can’t return mine. It feels inefficient—but is it really, or is it just bad luck?
The standard RAG stack is: embed your documents, store them in a vector database, retrieve the top-k most similar chunks at query time, feed them to an LLM. This works surprisingly well for a lot of use cases.
It also fails completely for an entire class of questions that matter in specialized domains.
...Chatbots are easy to build. Conversational AI that actually works is hard.
The difference? State management. A real conversation requires remembering what was said, managing context limits, and maintaining coherence across multiple exchanges.
In this article, we'll explore the patterns that make multi-turn conversations work in production C# applications.
<...LLMs are impressive text generators, but production applications need more than prose. You need structured data—JSON objects you can deserialize, decisions you can act on, and outputs that fit your domain models.
This is where function calling and structured outputs transform LLMs from chatbots into programmable decision engines.
Every week I'd open four different dashboards: