A2A for Beginners, Part 1: Understanding the Protocol

In our last post, we talked about what AI agents are and how they work. We ended with a teaser about an emerging protocol that lets agents work together. Let's pick up where we left off.

What is A2A?

A2A stands for Agent-to-Agent - an open protocol that defines how AI agents discover each other and communicate. Think of it as a shared language that lets agents built by different teams, in different programming languages, running on different platforms talk to each other without any custom integration work.

Why does this matter?

Right now, the AI agent landscape is fragmented. Every agent lives behind its own API, speaks its own dialect, and uses its own data formats. If you want Agent A to work with Agent B, you need to write glue code specific to both. Scale that to dozens or hundreds of agents and you've got a combinatorial nightmare.

A2A solves this by establishing a universal protocol. Any agent that speaks A2A can communicate with any other A2A-compliant agent - just as any web browser can talk to any web server because they both speak HTTP.

A Brief History

A2A was announced by Google at Cloud Next on April 9, 2025, backed by over 50 technology partners including Atlassian, LangChain, MongoDB, PayPal, Salesforce, SAP, and ServiceNow. The draft specification was released on GitHub under the Apache 2.0 license.

Two months later, on June 23, 2025, Google donated A2A to the Linux Foundation, establishing it as a vendor-neutral open standard. Founding members include AWS, Cisco, Google, Microsoft, Salesforce, SAP, and ServiceNow - representing a serious cross-industry commitment.

In August 2025, IBM's Agent Communication Protocol (ACP), a similar effort with overlapping goals, officially merged with A2A. Rather than fragment the ecosystem with competing standards, the two communities consolidated around a single protocol.

The protocol is still young and evolving. The latest stable release is v0.3.0, with a v1.0 release candidate in development. Things are moving fast.

A2A vs MCP - The Distinction Everyone Gets Wrong

If you've been following the AI agent space, you've probably also heard of MCP (Model Context Protocol), developed by Anthropic. A common question: aren't A2A and MCP the same thing?

They're not. They solve different problems, and they're designed to work together.

MCP is agent-to-tool. It standardizes how an agent connects to external tools and data sources - databases, file systems, APIs, code interpreters. MCP gives an agent hands.

A2A is agent-to-agent. It standardizes how agents talk to each other as peers. A2A gives agents the ability to collaborate.

MCP vs. A2A

Here's a simple analogy. Imagine a carpenter and an electrician working on a house renovation.

MCP is like the standard way each worker uses their tools. The carpenter has a hammer, saw, and tape measure. The electrician has a multimeter, wire strippers, and a drill. MCP standardizes how they find and operate their respective tools.
A2A is the conversation between the carpenter and the electrician. "I've finished framing this wall - you can run the wiring now." A2A standardizes how they coordinate with each other.

An A2A agent can use MCP tools internally. The two protocols live at different layers of the stack and complement each other naturally. Google's recommended architecture: build your agent with any framework, equip it with MCP tools, and communicate with other agents via A2A.

The Agent Card

The foundation of A2A is the agent card - a machine-readable JSON document that describes who an agent is, what it can do, and how to talk to it. If A2A gives agents a shared language, the agent card is the introduction.

Every A2A-compliant agent publishes its card at a well-known URL on the agent's base domain. The original spec placed it at /.well-known/agent.json, though v0.3 of the protocol renamed this to /.well-known/agent-card.json. You'll encounter both in the wild as the ecosystem transitions.

Here's what a simple agent card looks like:

{
  "name": "Weather Agent",
  "description": "Provides real-time weather forecasts and historical weather data for any location worldwide.",
  "url": "https://weather-agent.example.com",
  "version": "1.0.0",
  "capabilities": {
    "streaming": false,
    "pushNotifications": false
  },
  "defaultInputModes": ["text/plain"],
  "defaultOutputModes": ["text/plain"],
  "skills": [
    {
      "id": "forecast",
      "name": "Weather Forecast",
      "description": "Get a weather forecast for a specific location and time range.",
      "tags": ["weather", "forecast", "climate"],
      "examples": [
        "What's the weather in Tokyo this weekend?",
        "Will it rain in London tomorrow?"
      ]
    },
    {
      "id": "historical",
      "name": "Historical Weather",
      "description": "Look up past weather data for a given location and date.",
      "tags": ["weather", "history", "data"],
      "examples": [
        "What was the temperature in New York on July 4, 2024?"
      ]
    }
  ]
}

Let's break down the key parts:

name and description tell you - or another agent - what this agent is and what it does. These are the fields that matter most for discovery.
url is the agent's service endpoint. This is where you send requests.
capabilities declares what communication patterns the agent supports. Can it stream responses in real time? Can it send push notifications when a long-running task finishes?
defaultInputModes and defaultOutputModes define what types of content the agent can accept and produce, using MIME types. text/plain is the most common, but agents can also work with images, audio, structured data, and more.
skills are the most interesting part. Each skill is a specific capability the agent offers, with its own name, description, tags, and example prompts. This is what lets other agents - or humans - understand exactly what the agent can do and how to ask for it.

Think of the agent card as a combination of a business card and a resume. It tells the world who you are, what you're good at, and how to reach you.

The card can also include authentication requirements (what credentials you need to interact with the agent), provider information (who built it), and in newer versions of the spec, digital signatures for verifying integrity.

How A2A Communication Works

Once you've read an agent's card, you know what it can do and where to reach it. The next question: how do you actually talk to it?

A2A uses JSON-RPC 2.0 over HTTPS as its transport. If that sounds intimidating, it's simpler than it looks. JSON-RPC is just a convention for structuring request-and-response messages in JSON. You send a JSON object that says "call this method with these parameters," and you get a JSON object back with the result.

The core communication flow:

Discover - find an agent and read its card.
Send a message - create a task by sending a message to the agent's endpoint.
Agent works - the agent processes your request, potentially going through multiple internal steps.
Receive results - the agent returns artifacts (its outputs) when the task is complete.

A2A flow

The Building Blocks

A2A defines a handful of concepts that fit together:

Task - the fundamental unit of work. When you send a message to an agent, you create a task. Each task has a unique ID and progresses through a lifecycle: working → completed (or failed, canceled, rejected). There's also an input_required state for when the agent needs more information from you before it can continue.

Message - a single turn in the conversation. Each message has a role - user or agent - and contains one or more parts. You send a message to start or continue a task, and the agent sends messages back as it works.

Part - the atomic unit of content. A part can be text, a file, or structured data (JSON). Messages and artifacts are both made up of parts. This flexibility lets agents exchange anything from plain text to images, documents, or complex data structures.

Artifact - the deliverable output of a task. If you ask an agent to generate a report, the report is the artifact. Artifacts are composed of parts, just like messages.

A2A Task Structure

Here's a simplified example. Say you want to ask the Weather Agent for a forecast:

You send:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "message/send",
  "params": {
    "message": {
      "role": "user",
      "parts": [
        { "text": "What's the weather in Paris this weekend?" }
      ]
    }
  }
}

The agent responds:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "id": "task-abc-123",
    "status": { "state": "completed" },
    "artifacts": [
      {
        "parts": [
          {
            "text": "Paris this weekend: Saturday 18°C partly cloudy, Sunday 22°C sunny with light winds."
          }
        ]
      }
    ]
  }
}

That's the basic pattern. The protocol also supports streaming responses via Server-Sent Events (for real-time output as the agent works), push notifications via webhooks (for long-running tasks), and multi-turn conversations where the agent asks follow-up questions before completing the task.

Discovery - How Agents Find Each Other

Here's the hard problem that A2A doesn't fully solve yet: how do agents find each other in the first place?

The protocol defines what an agent card looks like and where it should live, but it doesn't define a centralized registry or discovery mechanism. If you want to find an agent that can translate documents, you need to somehow know where to look.

This is an intentional design choice. A2A is a communication protocol, not a directory service - just as HTTP tells browsers how to talk to servers but doesn't tell them which websites exist. That's what search engines are for.

In practice, discovery happens through a few mechanisms today:

Direct URLs - if you know an agent's URL, you can fetch its card directly. This works for agents you already know about, but doesn't help you find new ones.
Curated lists - community-maintained registries where developers list their agent endpoints.
Crawling - automated systems that scan the web for agent cards at well-known URLs, similar to how search engine crawlers discover web pages.

This is the gap that projects like Waggle are trying to fill. As the ecosystem matures, more robust discovery mechanisms will likely emerge - whether through the A2A spec itself, a complementary standard, or a combination of approaches.

The Ecosystem Today

Let's be honest about where things stand: A2A is early.

The protocol has serious institutional backing. The Linux Foundation project's founding members - AWS, Cisco, Google, Microsoft, Salesforce, SAP, ServiceNow - represent a significant cross-industry commitment. Over 150 organizations have voiced support. IBM cared enough about convergence to merge its competing protocol into A2A rather than fragment the space.

On the platform side, major cloud providers are adding native support. Google's Agent Development Kit (ADK) was built with A2A from the start. Amazon Bedrock added A2A support in late 2025, enabling agents built with different frameworks to interoperate through a common protocol. Microsoft's agent tooling supports it too.

But the ground-level reality is that most A2A agents in the wild today are experimental. The protocol is less than a year old. Developers are still figuring out best practices, the spec is still evolving toward v1.0, and the tooling is maturing in real time.

This is normal. HTTP was first specified in 1991, and it took years before the web became what we know today. A2A is laying the foundation for how agents will communicate in the future. The interesting question isn't what the ecosystem looks like now - it's what it will look like in two years when the spec stabilizes, the tooling matures, and developers have had time to build real things on top of it.

What's Next

This post covered the what and why of A2A. In Part 2, we'll get our hands dirty with the how - actual code, a working A2A agent built from scratch, and step-by-step implementation examples.

If you want to explore on your own before then, the official A2A specification is well-written and worth reading. Google also maintains tutorials and quickstarts for getting started with the protocol.