Building a ReAct agent in Go

ReAct agents are not new. Yao and friends published the paper in late 2022, and within a week of every model getting native tool use the pattern was obvious: have the model write a thought, pick a tool, see the tool’s output, then write the next thought. Repeat until it answers or you stop it.

What has changed is that the tooling around the pattern has gotten boring enough that you can actually ship one. What follows is a working Go version: the loop, the tools, the concurrent dispatch, and the event stream that lets you hook a CLI or an HTTP server to it.

If you have built one of these before, skip down to Concurrent tool execution. The earlier sections are the bits I wish someone had handed me on a single page when I started.

The pattern, in one paragraph

ReAct stands for Reasoning and Acting. The model interleaves “what should I do next” with “do it”. A turn looks like this: the model produces a thought plus zero or more tool calls; your code runs the tools; the results go back into the conversation; the model produces the next turn. The loop ends when the model answers without asking for tools, or when your step budget runs out. The reasoning side is essentially chain of thought embedded inside a loop with real-world side effects.

In sequence:

If you have ever watched a senior engineer debug something nasty, this is the same loop they run in their head: form a hypothesis, run a check, look at the result, update the hypothesis. A single step is rarely insightful on its own; the iteration is where the work happens.

Why Go

I keep coming back to Go for agent runtimes for boring reasons.

Tool calls are independent and Go runs them with sync.WaitGroup plus a goroutine per call in about ten lines. The whole agent ends up as a single static binary I can scp to any box. The standard library’s http.ResponseWriter does Server Sent Events fine once you call Flush. There is no framework.

You can do every bit of this in Python. asyncio plus httpx plus structured outputs will get you there. The difference is mostly about how much code you end up writing and how readable that code is when something blows up at 3am. For me, Go wins on that axis. Your taste may differ.

Architecture

A Consumer submits a task and reads back events. The Agent runs the loop. The LLM provider does the thinking. Tools do the work in the world. A Run Log persists everything for replay. The Consumer can be a CLI, an HTTP handler streaming SSE, a cron job, a queue worker, or all of the above. The agent does not care.

The agent loop

Here is the whole loop. The LLM wrapper is left as an exercise because every provider’s SDK is slightly different and they are all boring.

type Agent struct {
    llm   LLM
    tools map[string]Tool
    max   int          // step budget; saved me at least one $40 bill
    out   chan<- Event // events stream out to whoever is consuming
}

func (a *Agent) Run(ctx context.Context, task string) error {
    msgs := []Message{{Role: "user", Content: task}}
    schemas := toolSchemas(a.tools)

    for step := 0; step < a.max; step++ {
        resp, err := a.llm.Chat(ctx, msgs, schemas)
        if err != nil {
            return fmt.Errorf("llm: %w", err)
        }

        if resp.Thinking != "" {
            a.out <- Event{Type: "thinking", Step: step, Content: resp.Thinking}
        }

        if len(resp.ToolCalls) == 0 {
            a.out <- Event{Type: "answer", Step: step, Content: resp.Content}
            return nil
        }

        msgs = append(msgs, resp.Message)
        results := a.runTools(ctx, step, resp.ToolCalls)
        msgs = append(msgs, toolResultsMessage(results))
    }
    return errors.New("step budget exceeded")
}

A few notes on what is load-bearing here.

The step budget. Given enough rope, a model will call tools forever. Twenty steps is plenty for most real tasks and cheap enough that you can ignore the cost. If you hit the budget, log the trace and read it before raising the limit. The model is usually stuck in a loop because a tool returned something it does not know how to interpret.

The out channel. The agent has no idea who is reading the events. It writes them and moves on. This keeps the agent trivially testable without HTTP, which matters a lot once you start writing evals and want to assert on the trace.

The message order. The assistant message goes back into the conversation, followed by a single user-role message containing every tool result, keyed by call ID. Some providers want one message per result, others want a list. Read the docs once and forget about it.

Tools

Tools are the part that does the actual work in the world. A tool is anything you can describe to the model and execute when called: a web search, a SQL query, a shell command, a calendar lookup, a Stripe refund.

The interface I use is small:

type Tool interface {
    Name() string
    Description() string
    Schema() map[string]any        // JSON schema for arguments
    Execute(ctx context.Context, args json.RawMessage) (any, error)
}

A real implementation, for a web fetcher:

type WebFetch struct {
    client *http.Client
    max    int64 // max response bytes; trust nothing on the open web
}

func (w *WebFetch) Name() string { return "web_fetch" }

func (w *WebFetch) Description() string {
    return "Fetch a URL and return the response body as text. " +
        "Use for reading public web pages when you need their current content."
}

func (w *WebFetch) Schema() map[string]any {
    return map[string]any{
        "type": "object",
        "properties": map[string]any{
            "url": map[string]any{
                "type":        "string",
                "description": "Absolute URL to fetch",
            },
        },
        "required": []string{"url"},
    }
}

func (w *WebFetch) Execute(ctx context.Context, args json.RawMessage) (any, error) {
    var p struct {
        URL string `json:"url"`
    }
    if err := json.Unmarshal(args, &p); err != nil {
        return nil, fmt.Errorf("bad args: %w", err)
    }
    req, err := http.NewRequestWithContext(ctx, "GET", p.URL, nil)
    if err != nil {
        return nil, err
    }
    resp, err := w.client.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    body, err := io.ReadAll(io.LimitReader(resp.Body, w.max))
    if err != nil {
        return nil, err
    }
    return map[string]any{
        "status": resp.StatusCode,
        "body":   string(body),
    }, nil
}

Two things worth pointing out.

The description matters more than you think. The model reads it to decide whether to call the tool. “Fetch a URL” is fine. “Fetch a URL when the user asks about a specific website, returning HTML you can search” is better. Write the description the way you would explain the tool to a new hire on their first day. (Tool descriptions are prompts in disguise; the patterns from the prompt engineering post apply directly.)

Schema validation is your friend. The model will hallucinate parameter names early on, especially smaller models. Returning a clear bad args: cannot unmarshal error teaches it to fix itself on the next turn; an exception you swallow teaches it nothing.

Concurrent tool execution

This is the section where Go pays for itself. When the model returns three tool calls in one turn, all three should run at the same time. Sequential execution is correct, it is also slow, and on a paid model you are paying for the LLM to sit idle while a slow tool finishes.

func (a *Agent) runTools(ctx context.Context, step int, calls []ToolCall) []ToolResult {
    results := make([]ToolResult, len(calls))
    var wg sync.WaitGroup

    for i, call := range calls {
        wg.Add(1)
        go func(i int, call ToolCall) {
            defer wg.Done()

            tCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
            defer cancel()

            a.out <- Event{
                Type: "tool_call", Step: step,
                ID: call.ID, Name: call.Name, Args: call.Args,
            }

            out, err := a.tools[call.Name].Execute(tCtx, call.Args)

            a.out <- Event{
                Type: "tool_result", Step: step,
                ID: call.ID, Output: out, Err: errString(err),
            }

            results[i] = ToolResult{ID: call.ID, Output: out, Err: err}
        }(i, call)
    }

    wg.Wait()
    return results
}

Three additions you should make in real code: a per-tool timeout (the example does this), cancellation of siblings when the request context is cancelled (replace WaitGroup with errgroup.WithContext), and rate limiting per provider. None of these change the shape, but they decide whether your agent survives contact with a flaky upstream.

The flow of one step, in a diagram:

Streaming events out

The events channel can be drained by anything. An HTTP handler writing SSE is the common case:

func (s *Server) handleRun(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming not supported", http.StatusInternalServerError)
        return
    }

    var body struct {
        Task string `json:"task"`
    }
    if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    events := make(chan Event, 32)
    go func() {
        defer close(events)
        if err := s.newAgent(events).Run(r.Context(), body.Task); err != nil {
            events <- Event{Type: "error", Content: err.Error()}
        }
    }()

    for ev := range events {
        b, _ := json.Marshal(ev)
        fmt.Fprintf(w, "event: %s\ndata: %s\n\n", ev.Type, b)
        flusher.Flush()
    }
    fmt.Fprint(w, "event: done\ndata: {}\n\n")
    flusher.Flush()
}

A CLI consumer is even simpler. Print each event as it arrives, with a bit of color:

func runCLI(ctx context.Context, task string) error {
    events := make(chan Event, 32)
    agent := NewAgent(events)

    go func() {
        defer close(events)
        if err := agent.Run(ctx, task); err != nil {
            events <- Event{Type: "error", Content: err.Error()}
        }
    }()

    for ev := range events {
        switch ev.Type {
        case "thinking":
            fmt.Printf("\033[2m%s\033[0m\n", ev.Content)
        case "tool_call":
            fmt.Printf("\033[36m-> %s(%s)\033[0m\n", ev.Name, prettyArgs(ev.Args))
        case "tool_result":
            fmt.Printf("\033[32m<- %s\033[0m\n", oneLine(ev.Output))
        case "answer":
            fmt.Printf("\n\033[1m%s\033[0m\n", ev.Content)
        case "error":
            fmt.Printf("\033[31merror: %s\033[0m\n", ev.Content)
        }
    }
    return nil
}

The agent is identical in both consumers. Sharing the channel across requests is one of those things that works fine until two people use your app at once, so always create a fresh one per run.

A real run

Click play (or step through) to watch the loop unfold:

task What was the weather in Bangalore last week, and is that unusual for May?

An edited run of the agent. Notice the first thinking step issues two parallel tool calls; the second thinking step decides to grab one more source after seeing the data.

Notice the first reasoning step issues both data calls in one response. runTools executes them in parallel, so the wall-clock cost of the step is max(weather_history, weather_normals) instead of the sum. The second thinking step then decides to grab one more source after looking at what came back; if the agent had committed to a fixed plan up front it would have missed that the normals series was needed.

Observability

Once you have an agent in production, you will want two things you do not need on day one: a log of every run, and a way to replay one.

The log is a single table:

sql

CREATE TABLE runs (
  id          TEXT PRIMARY KEY,
  task        TEXT NOT NULL,
  status      TEXT NOT NULL,        -- 'running' | 'done' | 'error' | 'budget'
  started_at  DATETIME NOT NULL,
  ended_at    DATETIME,
  events      TEXT NOT NULL         -- JSONL of every Event
);

You write each event to a buffer as it flies past and write the buffer to the row when the run ends. Or write events as they happen if you want live tailing. Both work.

Replay is the part that surprised me. The model never produces the same trace twice. Even with temperature 0, providers do not guarantee determinism across runs, especially when tool use is involved. What you can do is rerun the same task against the same tool implementations and see if the answer’s shape is similar. That is the basis of an eval suite: ten or twenty representative tasks with expected-shape answers, run nightly, alerts when the shape drifts.

Tools you actually need

Most agents end up using a small set of tools. The set that has earned its keep across the agents I have shipped:

web_fetch: pull a URL and return the body. The single most useful tool a model can have.
web_search: call a search API like Brave, Tavily, or Exa. Cheaper and more reliable than scraping.
read_file / write_file: for anything filesystem-shaped.
sql: read-only by default, write only behind a confirmation flag. The model will try to be helpful with DROP TABLE if you let it.
shell: sandboxed exec for the brave. I run mine inside a Docker container with no network and a tmpfs working directory.
calendar / email / slack: boring CRUD tools that get more use than any of the above.

Avoid the temptation to give the model one giant do_anything tool. Smaller tools with explicit schemas give the model fewer ways to get the call wrong and give you a clearer audit trail when something does go wrong.