Context Engineering

PreviousNext

Learn how to shape context for AI agents to make them faster, cheaper, and more reliable.

Agents (but really: Context Engineering)

Agents are LLMs + tools + a loop — but what really makes them good is how you shape the context they see. That's context engineering: choosing what goes in, what stays out, and how it's arranged. Done well, it makes systems faster, cheaper, and more reliable.

Idea in one line:
How you feed the model (context) decides how it behaves (latency, cost, recovery, scale).


The Six Big Practices (from Manus, in plain English)

Below are Manus's core practices, simplified, with Vercel AI SDK snippets you can paste into a project.


1) Design for the KV-cache

What & why: Models reuse the unchanged prefix of your prompt. If the beginning changes (even a little), the cache breaks and you pay more. Keep a stable system prompt and append-only logs. Use deterministic formatting (same key order/whitespace). Reported savings can be dramatic on some models.

kv-cache-friendly.ts
import { generateText, type Message } from "ai"
 
const SYSTEM =
  "You are a reliable assistant. Use tools carefully. JSON is stable."
const transcript: Message[] = [{ role: "system", content: SYSTEM }]
 
// Deterministic JSON (sorted keys) so tiny diffs don’t break the cache.
function stableJSON(x: unknown) {
  return JSON.stringify(x, Object.keys(x as object).sort())
}
 
export async function step(userInput: string, state?: unknown) {
  transcript.push({ role: "user", content: userInput })
  if (state)
    transcript.push({ role: "user", content: `STATE:\n${stableJSON(state)}` })
 
  const res = await generateText({
    model: "openai/gpt-4o",
    messages: transcript,
  })
  transcript.push(...res.response.messages)
  return res.text
}

2) Mask, don’t remove (constrain without breaking context)

What & why: As you add tools, the model’s action space grows and choices get messy. Don’t dynamically remove tools (that changes earlier context and hurts cache hits). Keep the tool set stable, but mask which tools are usable at each step. (MarkTechPost)

mask-dont-remove.ts
import { Experimental_Agent as Agent, tool } from "ai"
import { z } from "zod"
 
const tools = {
  search: tool({
    description: "Search knowledge base",
    parameters: z.object({ q: z.string() }),
    async execute({ q }) {
      return { hits: [`Result for ${q}`] }
    },
  }),
  writeReport: tool({
    description: "Write a report",
    parameters: z.object({ title: z.string(), points: z.array(z.string()) }),
    async execute({ title, points }) {
      return { path: `/reports/${title}.md` }
    },
  }),
}
 
export const agent = new Agent({
  model: "openai/gpt-4o",
  tools,
  // Keep tools defined; switch which ones are active.
  prepareStep: async ({ stepNumber }) => {
    if (stepNumber < 2)
      return { activeTools: ["search"], toolChoice: "required" }
    return { activeTools: ["writeReport"] }
  },
})

3) Use the file system as context (infinite memory)

What & why: Don’t cram huge pages/PDFs into the prompt. Save big artifacts to files and load them when needed — external memory that cuts tokens and keeps details recoverable. Make compression reversible (keep a URL/path so you can reload the full thing). (MarkTechPost)

fs-context-tools.ts
import * as fs from "node:fs/promises"
import { Experimental_Agent as Agent, tool } from "ai"
import { z } from "zod"
 
const readFile = tool({
  description: "Read a UTF-8 file",
  parameters: z.object({ path: z.string() }),
  async execute({ path }) {
    return { content: await fs.readFile(path, "utf8") }
  },
})
 
const writeFile = tool({
  description: "Write a UTF-8 file",
  parameters: z.object({ path: z.string(), content: z.string() }),
  async execute({ path, content }) {
    await fs.writeFile(path, content, "utf8")
    return { ok: true }
  },
})
 
export const fsAgent = new Agent({
  model: "openai/gpt-4o",
  tools: { readFile, writeFile },
})

4) Steer attention with recitation (keep goals fresh)

What & why: In long runs, models forget the plan. Have the agent rewrite a todo.md with goals/progress every few steps and append it at the end of the context to bias attention toward what matters now. (MarkTechPost)

recitation-todo.ts
import * as fs from "node:fs/promises"
import { Experimental_Agent as Agent, stepCountIs, tool } from "ai"
import { z } from "zod"
 
const updateTodo = tool({
  description: "Update todo.md with goals and next steps",
  parameters: z.object({
    goals: z.array(z.string()),
    next: z.array(z.string()),
  }),
  async execute({ goals, next }) {
    const md = `# Goals\n${goals.map((g) => `- ${g}`).join("\n")}\n\n# Next\n${next.map((n) => `- ${n}`).join("\n")}\n`
    await fs.writeFile("./todo.md", md, "utf8")
    return { path: "./todo.md" }
  },
})
 
export const reciting = new Agent({
  model: "openai/gpt-4o",
  tools: { updateTodo },
  stopWhen: stepCountIs(12),
  prepareStep: async ({ stepNumber, messages }) => {
    if (stepNumber > 0 && stepNumber % 3 === 0)
      return { activeTools: ["updateTodo"] }
    try {
      const todo = await fs.readFile("./todo.md", "utf8")
      return {
        messages: [
          ...messages,
          { role: "user", content: `Current TODO:\n${todo}` },
        ],
      }
    } catch {
      return {}
    }
  },
})

5) Keep the wrong stuff in (errors help recovery)

What & why: Don’t clean away failed steps. Seeing its own mistakes helps the model recover and avoid repeats. Leave errors in the trace and ask for a short reflection before retrying. (MarkTechPost)

keep-failures-in.ts
import { generateText, type Message } from "ai"
 
const messages: Message[] = [
  {
    role: "system",
    content: "Be concise. If an error appears, reflect and suggest a fix.",
  },
]
 
export async function robustStep(task: string, doWork: () => Promise<unknown>) {
  messages.push({ role: "user", content: task })
  try {
    const ok = await doWork()
    messages.push({ role: "tool", content: `OK ${JSON.stringify(ok)}` })
  } catch (err) {
    messages.push({ role: "tool", content: `ERROR ${String(err)}` })
    const reflection = await generateText({
      model: "openai/gpt-4o",
      messages: [
        ...messages,
        {
          role: "user",
          content: "Analyze the error and propose a safer next step.",
        },
      ],
    })
    messages.push(...reflection.response.messages)
  }
}

6) Don’t few-shot yourself into a rut (add controlled variety)

What & why: If all examples look the same, the model copies blindly. Add small, structured variation to action/observation formatting so behavior stays flexible—without blowing up the cache. (MarkTechPost)

controlled-diversity.ts
const TEMPLATES = [
  (a: string, o: string) => `Action:\n${a}\nObservation:\n${o}`,
  (a: string, o: string) => `>> ACTION\n${a}\n>> OBS\n${o}`,
  (a: string, o: string) => `Do:\n${a}\nSee:\n${o}`,
]
 
// Deterministic rotation: small diversity, cache-friendly
let i = 0
export function renderActionObs(action: string, observation: string) {
  const t = TEMPLATES[i % TEMPLATES.length]
  i++
  return t(action, observation)
}

Practical Checklist

  • Stable prefix: no timestamps/randomness up top
  • Append-only logs: never rewrite the past
  • Deterministic serialization: sorted keys, fixed whitespace
  • Mask tools instead of removing them
  • External memory: use files/URLs for bulky artifacts
  • Recite goals into the tail of context (e.g., todo.md)
  • Keep failures visible and reflect briefly
  • Controlled variety in examples/templates

These choices are simple, but they add up to big wins in latency, cost, focus, and recovery. (Manus)


**Sources & further reading:** Manus, *Context Engineering for AI Agents* (core article and summaries).