Agent primer

Published
2026-04-16
Last modified
2026-04-16

This article provides a quick primer for developers who are late to the whole AI agent thing.

Disclaimer: since this is a primer, we'll gloss over many technical details.

Large Language Models (LLMs)

For most devs it is sufficient to think of an LLM as a black box that takes some input, runs a simulation on a ridiculous number of values, and produces some output that feels like you're talking with a very smart albeit sometimes silly person.

Chat models vs agents

The basic way of interacting with models is by providing input and getting output.

Let's use coding as an example. Say you want to add a feature to a function. You would send the entire text of the function along with instructions to the model, then the model will reply with the new function that contains the feature. You would then copy the function and paste it back into your file.

This was the state of AI tooling circa 2022. There were some early efforts to streamline the copying and pasting, but the true breakthrough was agents.

What are agents? I'll go deeper into the terminology below, but basically, agents are LLMs that have access to tools to do things, like read and write files or run shell commands. It turns out that rather than hand feeding a model, letting the model do things directly makes it a lot more powerful (and dangerous!).

The LLM API

First, let's set a baseline by looking at the Gemini API (Gemini is simply used as an example).

The important bits are that you pass in some text and a collection of tools, and you get back either some text or a tool call. Tools are specified like functions, with a name, documentation string, and arguments.

This is the foundational API for interacting with "agentic" LLMs. If an LLM doesn't support tool use ("not agentic") then it won't have the tool bits of this API.

Now that we have that baseline, everything else is just "wrapper logic" that makes using the LLM easier.

The harness

Next, let's move on to the harness. The harness is sort of an abstract concept; it's the thing that manages the tools and the UI for you, so you aren't manually assembling LLM API requests, dispatching tool calls, etc.

A basic harness might look like this:

Tools, skills, MCP

To recap, tools are the basic mechanism for letting LLMs do stuff. These can either be implemented natively in the harness, or the harness could allow configuring (additional) tools.

Now let's move on to all of the other agent-related terminology. There are many IDEs, editors, LLM providers, and harnesses out there, and the plethora of standards are to reduce the effort of integrating tools into harness.

Model Context Protocol (MCP) was originally a standard for self-documenting APIs for agents to access data sources (e.g. a domain specific database). It has been pretty quickly generalized to provide arbitrary tools to harnesses.

For a basic integration, a harness can be pointed at an MCP server, and the harness can query the server for the tools it exposes, their descriptions and arguments, and then the harness can pass those tools to the LLM. Then, if the LLM requests a tool call, the harness can perform the RPC call with the MCP server.

Thus, MCP is a way to provide tools in a way that can be reused with any harness, without having to do custom integration work between each harness and each set of tools.

Context files like AGENTS.md are a standard way to provide project specific instructions. For example, you would want to add instructions about how to run the tests and build, and certain things you might want the LLM to do like always include the fact that AI is being used in commit messages.

The harness is responsible for loading the context files and feeding them into the model.

Skills are like context files, but they're more task and/or domain specific rather than project specific. Skills are standardized so they're shareable between different harnesses. They're also loaded on demand.

Agent terminology

Before I move on, let's talk about the term "agent". Above, I said that agents are LLMs that have access to tools to do things. However, in practice people use "agent" to refer to multiple different things. Commonly, "agent" refers to harnesses, and in particular harnesses provided by the incumbent AI providers, like Gemini CLI and Claude Code.

"Agent" can also refer to specific configurations within a harness. Harnesses like Gemini CLI are locked down, but there are also harnesses like goose that can be freely configured to use different models from different providers, extended with custom tools, and more. In these cases, one harness can have multiple different "agent" configurations.

ACP, A2A

The reason for that quick segue is because we now have to talk about agent APIs, starting with ACP.

Agent Client Protocol (ACP) is a protocol for talking to agents. This can be used by IDEs or other tools to integrate into an existing harness, rather than having to implement support themselves. If you are familiar with Language Server Protocol (LSP), this is kind of like that.

An IDE could implement a harness itself. However, ACP makes it possible to "embed" the harness, thus resulting in less work on the IDE side as well as making it easier to swap to different harnesses/agents. The tradeoff is that a native IDE harness can have deeper integration with the IDE, such as allowing the LLM to open files in different window splits, if you want the agent to show you something.

ACP can also be used by agents to delegate to other agents. For example, you could have Claude Code delegate a task to Gemini CLI, or Gemini CLI delegating to another instance of Gemini CLI, or one agent configuration in Goose delegating to another agent configuration in Goose.

Unfortunately, the AI ecosystem is moving fastly and furiously (as of today), and ACP is already superseded by A2A, the Agent to Agent Protocol. As the name implies, there's a stronger focus on the agent delegation use case over the embedding use case.

How to start

If you have no experience with agents, I would recommend starting with Gemini CLI because it will have the smallest startup cost for most people. You probably have a Google account, and as of now Gemini CLI has a reasonable amount of free quota.

Install it and get it set up, and try running it to do some small tasks:

These are all fairly low risk tasks, that LLMs as of now can perform fairly reliably. You want to start using an agent and getting a feel for it.

If you don't like Google or Gemini, you can also use Claude or goose or whatever. The actual agent/harness doesn't matter too much when you're starting off, and the example tasks provided should all work, as all of the "out of the box" harnesses will have some basic built-in tools (read/write file, run shell command) that are sufficient (though note that local models require a lot of RAM, will run slowly unless you have amazing compute, and will generally perform worse than the big corpo cloud models).

Once you get a feel for using agents, you can start playing with plugging in different tools and tweaking the prompts and context to do things beyond coding tasks, like summarizing your email, if that's your thing.

Disclaimer (a.k.a. OpenClaw)

It's important to emphasize that giving an LLM tools to do things also opens the door for them to do a lot of damage, if you're not careful. Most harnesses will stop and ask for user confirmation before running potentially dangerous tools, like running shell commands.

"Most harnesses" does not include OpenClaw. At the risk of oversimplifying a little bit, OpenClaw is basically an agent that provides an aggressively full amount of access to everything on your system with an aggressively zero amount of user confirmation or protection. That allows it to do great things, including great destruction. But at the end of the day, it is just an agent.