Understanding LLM based AI Agents
What are LLM-based AI Agents?
An LLM-based AI agent is an intelligent system that uses large language models (like GPT-4) to perform tasks autonomously, including perceiving the environment, reasoning about it, making decisions, and executing actions. Unlike traditional chatbots, these agents can plan tasks, take actions using external tools, and handle multi-step problems. It aims to enable people to perform and handle professional or complex tasks in a highly automated manner using natural language, driven by LLM technology, thereby freeing up personnel energy to the greatest extent.
Core Components of an AI Agent
LLM-based agents usually rely on a modular architecture that gives the LLM the support it needs to operate autonomously. The typical agent is composed of a few key components.
graph LR U(User Request) -->|input| A[Agent Core] A -->|analyse| B[Planning] B -->|action plan| A A -->|invoke| C[Tools] A <-->|context| D[Memory] A -->|response| U
- Agent Core: At the heart of the agent is the LLM itself (for example, GPT-4, Claude, or an open-source model like LLaMA 2). This core interprets the user’s input and generates the agent’s responses. On its own, the LLM is very powerful at understanding language and reasoning, but it’s essentially reactive (it responds to prompts without taking additional actions). In an agent setup, the LLM core is augmented with surrounding logic so it can be more proactive, meaning it needs other components to interface with the outside world and handle complex tasks.
-
Tools (Action Interfaces): Tools are how an LLM agent interacts with the world beyond its internal knowledge. A tool could be any external function or API the agent can call – for example, a web search engine, a database query, a code execution sandbox, a calculator, or a source code search function. Tools give the agent the ability to take actions like looking up information, running computations, or retrieving specific data. In a software development context, useful tools might include:
- A code search tool to find snippets in a repository
- A documentation lookup API to fetch reference docs
- A ticket database query to pull related support tickets
- or a shell command executor to run test cases
-
Memory: Memory provides the agent with the ability to remember information across multiple steps or interactions. There are typically two kinds of memory in such agents.
- Short-term memory – which keeps track of the agent’s recent “thoughts” and actions during the current task (often called the context or the transcript of the reasoning so far). This is like the agent’s scratchpad for one session – it might include what the agent has already tried and what it observed.
- Long-term memory – which persists information beyond a single session, allowing the agent to recall facts or past conversations from earlier. This could be implemented via a database or vector store that the agent queries as needed.
-
Planning Module: This component is what makes an agent “agentic” – it allows the system to plan a multi-step strategy rather than just respond immediately. The planning module could be as simple as some prompt engineering that encourages the LLM to think step-by-step, or as elaborate as a separate algorithm or model that breaks tasks into sub-tasks. The idea is to enable task decomposition, decision-making, and iteration. In practical, the planning does things like:
- Analyze the user’s request and break it into smaller problems if needed.
- Decide which tool to use at each step.
- Loop through actions and observations: the agent might use a tool, see the result, and then decide the next step based on that.
These components work together in a cycle. Typically, an agent loop might look like: the LLM core (with the planning prompt) decides on an Action (calls a Tool), the tool returns some data which is stored in Memory or fed back to the LLM, and the LLM then decides the next steps or gives the final answer. This continues until the agent is confident it has solved the user’s request.