
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Tue, 14 Apr 2026 21:25:00 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP]]></title>
            <link>https://blog.cloudflare.com/enterprise-mcp/</link>
            <pubDate>Tue, 14 Apr 2026 13:00:10 GMT</pubDate>
            <description><![CDATA[ We share Cloudflare's internal strategy for governing MCP using Access, AI Gateway, and MCP server portals. We also launch Code Mode to slash token costs and recommend new rules for detecting Shadow MCP in Cloudflare Gateway.
 ]]></description>
            <content:encoded><![CDATA[ <p>We at Cloudflare have aggressively adopted <a href="https://modelcontextprotocol.io/"><u>Model Context Protocol (MCP)</u></a> as a core part of our AI strategy. This shift has moved well beyond our engineering organization, with employees across product, sales, marketing, and finance teams now using agentic workflows to drive efficiency in their daily tasks. But the adoption of agentic workflow with MCP is not without its security risks. These range from authorization sprawl, <a href="https://www.cloudflare.com/learning/ai/prompt-injection/"><u>prompt injection</u></a>, and <a href="https://www.cloudflare.com/learning/security/what-is-a-supply-chain-attack/"><u>supply chain risks</u></a>. To secure this broad company-wide adoption, we have integrated a suite of security controls from both our <a href="https://www.cloudflare.com/sase/"><u>Cloudflare One (SASE) platform</u></a> and our <a href="https://workers.cloudflare.com/"><u>Cloudflare Developer platform</u></a>, allowing us to govern AI usage with MCP without slowing down our workforce. </p><p>In this blog we’ll walk through our own best practices for securing MCP workflows, by putting different parts of our platform together to create a unified security architecture for the era of autonomous AI. We’ll also share two new concepts that support enterprise MCP deployments:</p><ul><li><p>We are launching <a href="https://developers.cloudflare.com/cloudflare-one/access-controls/ai-controls/mcp-portals/#code-mode"><u>Code Mode with MCP server portals</u></a>, to drastically reduce token costs associated with MCP usage; </p></li><li><p>We describe how to use <a href="https://developers.cloudflare.com/cloudflare-wan/zero-trust/cloudflare-gateway/"><u>Cloudflare Gateway</u></a> for Shadow MCP detection, to discover use of unauthorized remote MCP servers.</p></li></ul><p>We also talk about how our organization approached deploying MCP, and how we built out our MCP security architecture using Cloudflare products including <a href="https://developers.cloudflare.com/agents/guides/remote-mcp-server/"><u>remote MCP servers</u></a>, <a href="https://www.cloudflare.com/sase/products/access/"><u>Cloudflare Access</u></a>, <a href="https://developers.cloudflare.com/cloudflare-one/access-controls/ai-controls/mcp-portals/"><u>MCP server portals</u></a> and <a href="https://www.google.com/search?q=https://www.cloudflare.com/developer-platform/ai-gateway/"><u>AI Gateway</u></a>. </p>
    <div>
      <h2>Remote MCP servers provide better visibility and control</h2>
      <a href="#remote-mcp-servers-provide-better-visibility-and-control">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/"><u>MCP</u></a> is an open standard that enables developers to build a two-way connection between AI applications and the data sources they need to access. In this architecture, the MCP client is the integration point with the <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>LLM</u></a> or other <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>AI agent</u></a>, and the MCP server sits between the <a href="https://www.cloudflare.com/learning/ai/mcp-client-and-server/"><u>MCP client</u></a> and the corporate resources.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/73kpNxOQIlM0UOnty9qGXS/53fe1b92e299b52363ac1870df52f2e9/BLOG-3252_2.png" />
          </figure><p>The separation between MCP clients and MCP servers allows agents to autonomously pursue goals and take actions while maintaining a clear boundary between the AI (integrated at the MCP client) and the credentials and APIs of the corporate resource (integrated at the MCP server). </p><p>Our workforce at Cloudflare is constantly using MCP servers to access information in various internal resources, including our project management platform, our internal wiki, documentation and code management platforms, and more. </p><p>Very early on, we realized that locally-hosted MCP servers were a security liability. Local MCP server deployments may rely on unvetted software sources and versions, which increases the risk of <a href="https://owasp.org/www-project-mcp-top-10/2025/MCP04-2025%E2%80%93Software-Supply-Chain-Attacks&amp;Dependency-Tampering"><u>supply chain attacks</u></a> or <a href="https://owasp.org/www-community/attacks/MCP_Tool_Poisoning"><u>tool injection attacks</u></a>. They prevent IT and security administrators from administrating these servers, leaving it up to individual employees and developers to choose which MCP servers they want to run and how they want to keep them up to date. This is a losing game.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4mngDeTGrsah2DiN7IAnrf/575075096e72d6b967df4327a99c35fc/BLOG-3252_3.png" />
          </figure><p>Instead, we have a centralized team at Cloudflare that manages our MCP server deployment across the enterprise. This team built a shared MCP platform inside our monorepo that provides governed infrastructure out of the box. When an employee wants to expose an internal resource via MCP, they first get approval from our AI governance team, and then they copy a template, write their tool definitions, and deploy, all the while inheriting default-deny write controls with audit logging, auto-generated <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/"><u>CI/CD pipelines</u></a>, and <a href="https://www.cloudflare.com/learning/security/glossary/secrets-management/"><u>secrets management</u></a> for free. This means standing up a new governed MCP server is minutes of scaffolding. The governance is baked into the platform itself, which is what allowed adoption to spread so quickly. </p><p>Our CI/CD pipeline deploys them as <a href="https://developers.cloudflare.com/agents/guides/remote-mcp-server/"><u>remote MCP servers</u></a> on custom domains on <a href="https://www.cloudflare.com/developer-platform/"><u>Cloudflare’s developer platform</u></a>. This gives us visibility into which MCPs servers are being used by our employees, while maintaining control over software sources. As an added bonus, every remote MCP server on the Cloudflare developer platform is automatically deployed across our global network of data centers, so MCP servers can be accessed by our employees with low latency, regardless of where they might be in the world.</p>
    <div>
      <h3>Cloudflare Access provides authentication</h3>
      <a href="#cloudflare-access-provides-authentication">
        
      </a>
    </div>
    <p>Some of our MCP servers sit in front of public resources, like our <a href="https://docs.mcp.cloudflare.com/mcp"><u>Cloudflare documentation MCP server</u></a> or <a href="https://radar.mcp.cloudflare.com/mcp"><u>Cloudflare Radar MCP server</u></a>, and thus we want them to be accessible to anyone. But many of the MCP servers used by our workforce are sitting in front of our private corporate resources. These MCP servers require user authentication to ensure that they are off limits to everyone but authorized Cloudflare employees. To achieve this, our monorepo template for MCP servers integrates <a href="https://www.cloudflare.com/sase/products/access/"><u>Cloudflare Access</u></a> as the OAuth provider. Cloudflare Access secures login flows and issues access tokens to resources, while acting as an identity aggregator that verifies end user <a href="https://www.cloudflare.com/learning/access-management/what-is-sso/"><u>single-sign on (SSO)</u></a>, <a href="https://www.cloudflare.com/learning/access-management/what-is-multi-factor-authentication/"><u>multifactor authentication (MFA)</u></a>, and a variety of contextual attributes such as IP addresses, location, or device certificates. </p>
    <div>
      <h2>MCP server portals centralize discovery and governance</h2>
      <a href="#mcp-server-portals-centralize-discovery-and-governance">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/70s5yIwdzoaYQIoRF4L0mH/018dae32529c30639a2af48ee4031bc6/BLOG-3252_4.png" />
          </figure><p><sup><i>MCP server portals unify governance and control for all AI activity.</i></sup></p><p>As the number of our remote MCP servers grew, we hit a new wall: discovery. We wanted to make it easy for every employee (especially those that are new to MCP) to find and work with all the MCP servers that are available to them. Our MCP server portals product provided a convenient solution. The employee simply connects their MCP client to the MCP server portal, and the portal immediately reveals every internal and third-party MCP servers they are authorized to use. </p><p>Beyond this, our MCP server portals provide centralized logging, consistent policy enforcement and <a href="https://www.cloudflare.com/learning/access-management/what-is-dlp/"><u>data loss prevention</u></a> (DLP guardrails). Our administrators can see who logged into what MCP portal and create DLP rules that prevent certain data, like personally identifiable data (PII), from being shared with certain MCP servers.</p><p>We can also create policies that control who has access to the portal itself, and what tools from each MCP server should be exposed. For example, we could set up one MCP server portal that is only accessible to employees that are part of our <i>finance </i>group that exposes just the read-only tools for the MCP server in front of our internal code repository. Meanwhile, a different MCP server portal, accessible only to employees on their corporate laptops that are in our <i>engineering </i>team, could expose more powerful read/write tools to our code repository MCP server.</p><p>An overview of our MCP server portal architecture is shown above. The portal supports both remote MCP servers hosted on Cloudflare, and third-party MCP servers hosted anywhere else. What makes this architecture uniquely performant is that all these security and networking components run on the same physical machine within our global network. When an employee's request moves through the MCP server portal, a Cloudflare-hosted remote MCP server, and Cloudflare Access, their traffic never needs to leave the same physical machine. </p>
    <div>
      <h2>Code Mode with MCP server portals reduces costs</h2>
      <a href="#code-mode-with-mcp-server-portals-reduces-costs">
        
      </a>
    </div>
    <p>After months of high-volume MCP deployments, we’ve paid out our fair share of tokens. We’ve also started to think most people are doing MCP wrong.</p><p>The standard approach to MCP requires defining a separate tool for every API operation that is exposed via an MCP server. But this static and exhaustive approach quickly exhausts an agent’s context window, especially for large platforms with thousands of endpoints.</p><p>We previously wrote about how we used server-side <a href="https://blog.cloudflare.com/code-mode-mcp/"><u>Code Mode to power Cloudflare’s MCP server</u></a>, allowing us to expose <a href="https://developers.cloudflare.com/api/?cf_target_id=C3927C0A6A2E9B823D2DF3F28E5F0D30"><u>the thousands of end-points in Cloudflare API</u></a> while reducing token use by 99.9%. The Cloudflare MCP server exposes just two tools: a <code>search</code> tool lets the model write JavaScript to explore what’s available, and an <code>execute</code> tool lets it write JavaScript to call the tools it finds. The model discovers what it needs on demand, rather than receiving everything upfront.</p><p>We like this pattern so much, we had to make it available for everyone. So we have now launched the ability to use the “Code Mode” pattern with <a href="https://developers.cloudflare.com/cloudflare-one/access-controls/ai-controls/mcp-portals/"><u>MCP server portals</u></a>. Now you can front all of your MCP servers with a centralized portal that performs audit controls and progressive tool disclosure, in order to reduce token costs.</p><p>Here is how it works. Instead of exposing every tool definition to a client, all of your underlying MCP servers collapse into just two MCP portal tools: <code>portal_codemode_search</code> and <code>portal_codemode_execute</code>. The <code>search</code> tool gives the model access to a <code>codemode.tools()</code> function that returns all the tool definitions from every connected upstream MCP server. The model then writes JavaScript to filter and explore these definitions, finding exactly the tools it needs without every schema being loaded into context. The <code>execute</code> tool provides a <code>codemode</code> proxy object where each upstream tool is available as a callable function. The model writes JavaScript that calls these tools directly, chaining multiple operations, filtering results, and handling errors in code. All of this runs in a sandboxed environment on the MCP server portal powered by <a href="https://developers.cloudflare.com/dynamic-workers/"><u>Dynamic Workers</u></a>. </p><p>Here is an example of an agent that needs to find a Jira ticket and update it with information from Google Drive. It first searches for the right tools:</p>
            <pre><code>// portal_codemode_search
async () =&gt; {
 const tools = await codemode.tools();
 return tools
  .filter(t =&gt; t.name.includes("jira") || t.name.includes("drive"))
  .map(t =&gt; ({ name: t.name, params: Object.keys(t.inputSchema.properties || {}) }));
}
</code></pre>
            <p> The model now knows the exact tool names and parameters it needs, without the full schemas of tools ever entering its context. It then writes a single <code>execute</code> call to chain the operations together:</p>
            <pre><code>// portal_codemode_execute
async () =&gt; {
 const tickets = await codemode.jira_search_jira_with_jql({
  jql: ‘project = BLOG AND status = “In Progress”’,
  fields: [“summary”, “description”]
 });
 const doc = await codemode.google_workspace_drive_get_content({
  fileId: “1aBcDeFgHiJk”
 });
 await codemode.jira_update_jira_ticket({
  issueKey: tickets[0].key,
  fields: { description: tickets[0].description + “\n\n” + doc.content }
 });
 return { updated: tickets[0].key };
}
</code></pre>
            <p>This is just two tool calls. The first discovers what's available, the second does the work. Without Code Mode, this same workflow would have required the model to receive the full schemas of every tool from both MCP servers upfront, and then make three separate tool invocations.</p><p>Let’s put the savings in perspective: when our internal MCP server portal is connected to just four of our internal MCP servers, it exposes 52 tools that consume approximately 9,400 tokens of context just for their definitions. With Code Mode enabled, those 52 tools collapse into 2 portal tools consuming roughly 600 tokens, a 94% reduction. And critically, this cost stays fixed. As we connect more MCP servers to the portal, the token cost of Code Mode doesn’t grow.</p><p>Code Mode can be activated on an MCP server portal by adding a query parameter to the URL. Instead of connecting to your portal over its usual URL (e.g. <code>https://myportal.example.com/mcp</code>), you attach <code>?codemode=search_and_execute</code> to the URL (e.g. <code>https://myportal.example.com/mcp?codemode=search_and_execute</code>).</p>
    <div>
      <h2>AI Gateway provides extensibility and cost controls</h2>
      <a href="#ai-gateway-provides-extensibility-and-cost-controls">
        
      </a>
    </div>
    <p>We aren’t done yet. We plug <a href="https://www.cloudflare.com/developer-platform/products/ai-gateway/"><u>AI Gateway</u></a> into our architecture by positioning it on the connection between the MCP client and the LLM. This allows us to quickly switch between various LLM providers (to prevent vendor lock-in) and to enforce cost controls (by limiting the number of tokens each employee can burn through). The full architecture is shown below.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3epLuGydMO1YxkdhkcqGPG/74c26deb712d383942e79699b4fb71da/BLOG-3252_5.png" />
          </figure>
    <div>
      <h2>Cloudflare Gateway discovers and blocks shadow MCP</h2>
      <a href="#cloudflare-gateway-discovers-and-blocks-shadow-mcp">
        
      </a>
    </div>
    <p>Now that we’ve provided governed access to authorized MCP servers, let’s look into dealing with unauthorized MCP servers. We can perform shadow MCP discovery using <a href="https://developers.cloudflare.com/cloudflare-wan/zero-trust/cloudflare-gateway/"><u>Cloudflare Gateway</u></a>. Cloudflare Gateway is our comprehensive secure web gateway that provides enterprise security teams with visibility and control over their employees’ Internet traffic.</p><p>We can use the Cloudflare Gateway API to perform a multi-layer scan to find remote MCP servers that are not being accessed via an MCP server portal. This is possible using a variety of existing Gateway and Data Loss Prevention (DLP) selectors, including:</p><ul><li><p>Using the Gateway <code>httpHost</code> selector to scan for </p><ul><li><p>known MCP server hostnames using (like <a href="http://mcp.stripe.com"><u>mcp.stripe.com</u></a>)</p></li><li><p>mcp.* subdomains using wildcard hostname patterns </p></li></ul></li><li><p>Using the Gateway <code>httpRequestURI</code> selector to scan for MCP-specific URL paths like /mcp and /mcp/sse </p></li><li><p>Using DLP-based body inspection to find MCP traffic, even if that traffic uses URI that do not contain the telltale mentions of <code>mcp</code> or <code>sse</code>. Specifically, we use the fact that MCP uses JSON-RPC over HTTP, which means every request contains a "method" field with values like "tools/call", "prompts/get", or "initialize." Here are some regex rules that can be used to detect MCP traffic in the HTTP body:</p></li></ul>
            <pre><code>const DLP_REGEX_PATTERNS = [
  {
    name: "MCP Initialize Method",
    regex: '"method"\\s{0,5}:\\s{0,5}"initialize"',
  },
  {
    name: "MCP Tools Call",
    regex: '"method"\\s{0,5}:\\s{0,5}"tools/call"',
  },
  {
    name: "MCP Tools List",
    regex: '"method"\\s{0,5}:\\s{0,5}"tools/list"',
  },
  {
    name: "MCP Resources Read",
    regex: '"method"\\s{0,5}:\\s{0,5}"resources/read"',
  },
  {
    name: "MCP Resources List",
    regex: '"method"\\s{0,5}:\\s{0,5}"resources/list"',
  },
  {
    name: "MCP Prompts List",
    regex: '"method"\\s{0,5}:\\s{0,5}"prompts/(list|get)"',
  },
  {
    name: "MCP Sampling Create Message",
    regex: '"method"\\s{0,5}:\\s{0,5}"sampling/createMessage"',
  },
  {
    name: "MCP Protocol Version",
    regex: '"protocolVersion"\\s{0,5}:\\s{0,5}"202[4-9]',
  },
  {
    name: "MCP Notifications Initialized",
    regex: '"method"\\s{0,5}:\\s{0,5}"notifications/initialized"',
  },
  {
    name: "MCP Roots List",
    regex: '"method"\\s{0,5}:\\s{0,5}"roots/list"',
  },
];
</code></pre>
            <p>The Gateway API supports additional automation. For example, one can use the custom DLP profile we defined above to block traffic, or redirect it, or just to log and inspect MCP payloads. Put this together, and Gateway can be used to provide comprehensive detection of unauthorized remote MCP servers accessed via an enterprise network. </p><p>For more information on how to build this out, see this <a href="https://developers.cloudflare.com/cloudflare-one/tutorials/detect-mcp-traffic-gateway-logs/"><u>tutorial</u></a>. </p>
    <div>
      <h2>Public-facing MCP Servers are protected with AI Security for Apps</h2>
      <a href="#public-facing-mcp-servers-are-protected-with-ai-security-for-apps">
        
      </a>
    </div>
    <p>So far, we’ve been focused on protecting our workforce’s access to our internal MCP servers. But, like many other organizations, we also have public-facing MCP servers that our customers can use to agentically administer and operate Cloudflare products. These MCP servers are hosted on Cloudflare’s developer platform. (You can find a list of individual MCPs for specific products <a href="https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/"><u>here</u></a>, or refer back to our new approach for providing more efficient access to the entire Cloudflare API using <a href="https://blog.cloudflare.com/code-mode/"><u>Code Mode</u></a>.)</p><p>We believe that every organization should publish official, first-party MCP servers for their products. The alternative is that your customers source unvetted servers from public repositories where packages may contain <a href="https://www.docker.com/blog/mcp-horror-stories-the-supply-chain-attack/"><u>dangerous trust assumptions</u></a>, undisclosed data collection, and any range of unsanctioned behaviors. By publishing your own MCP servers, you control the code, update cadence, and security posture of the tools your customers use.</p><p>Since every remote MCP server is an HTTP endpoint, we can put it behind the <a href="https://www.cloudflare.com/application-services/products/waf/"><u>Cloudflare Web Application Firewall (WAF)</u></a>. Customers can enable the <a href="https://developers.cloudflare.com/waf/detections/ai-security-for-apps/"><u>AI Security for Apps</u></a> feature within the WAF to automatically inspect inbound MCP traffic for prompt injection attempts, sensitive data leakage, and topic classification. Public facing MCPs are protected just as any other web API.  </p>
    <div>
      <h2>The future of MCP in the enterprise</h2>
      <a href="#the-future-of-mcp-in-the-enterprise">
        
      </a>
    </div>
    <p>We hope our experience, products, and reference architectures will be useful to other organizations as they continue along their own journey towards broad enterprise-wide adoption of MCP.</p><p>We’ve secured our own MCP workflows by: </p><ul><li><p>Offering our developers a templated framework for building and deploying remote MCP servers on our developer platform using Cloudflare Access for authentication</p></li><li><p>Ensuring secure, identity-based access to authorized MCP servers by connecting our entire workforce to MCP server portals</p></li><li><p>Controlling costs using AI Gateway to mediate access to the LLMs powering our workforce’s MCP clients, and using Code Mode in MCP server portals to reduce token consumption and context bloat</p></li><li><p><a href="https://developers.cloudflare.com/cloudflare-one/tutorials/detect-mcp-traffic-gateway-logs/"><u>Discovering</u></a> shadow MCP usage by Cloudflare Gateway </p></li></ul><p>For organizations advancing on their own enterprise MCP journeys, we recommend starting by putting your existing remote and third-party MCP servers behind <a href="https://developers.cloudflare.com/cloudflare-one/access-controls/ai-controls/mcp-portals/"><u> Cloudflare MCP server portals</u></a> and enabling Code Mode to start benefitting for cheaper, safer and simpler enterprise deployments of MCP.   </p><p><sub><i>Acknowledgements:  This reference architecture and blog represents this work of many people across many different roles and business units at Cloudflare. This is just a partial list of contributors: Ann Ming Samborski,  Kate Reznykova, Mike Nomitch, James Royal, Liam Reese, Yumna Moazzam, Simon Thorpe, Rian van der Merwe, Rajesh Bhatia, Ayush Thakur, Gonzalo Chavarri, Maddy Onyehara, and Haley Campbell.</i></sub></p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div>
  
</div>
<p></p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[MCP]]></category>
            <category><![CDATA[Cloudflare Access]]></category>
            <category><![CDATA[Cloudflare Gateway]]></category>
            <category><![CDATA[Agents Week]]></category>
            <guid isPermaLink="false">73AaroR7GH8sXdbfIV99Fl</guid>
            <dc:creator>Sharon Goldberg</dc:creator>
            <dc:creator>Matt Carey</dc:creator>
            <dc:creator>Ivan Anguiano</dc:creator>
        </item>
        <item>
            <title><![CDATA[Secure private networking for everyone: users, nodes, agents, Workers — introducing Cloudflare Mesh]]></title>
            <link>https://blog.cloudflare.com/mesh/</link>
            <pubDate>Tue, 14 Apr 2026 13:00:09 GMT</pubDate>
            <description><![CDATA[ Cloudflare Mesh provides secure, private network access for users, nodes, and autonomous AI agents. By integrating with Workers VPC, developers can now grant agents scoped access to private databases and APIs without manual tunnels.
 ]]></description>
            <content:encoded><![CDATA[ <p>AI agents have changed how teams think about private network access. Your coding agent needs to query a staging database. Your production agent needs to call an internal API. Your personal AI assistant needs to reach a service running on your home network. The clients are no longer just humans or services. They're <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>agents</u></a>, running autonomously, making requests you didn't explicitly approve, against infrastructure you need to keep secure.</p><p>Each of these workflows has the same underlying problem: agents need to reach private resources, but the tools for doing that were built for humans, not autonomous software. VPNs require interactive login. SSH tunnels require manual setup. Exposing services publicly is a security risk. And none of these approaches give you visibility into what the agent is actually doing once it's connected.</p><p>Today, <b>we're introducing Cloudflare Mesh</b> to connect your private networks together and provide secure access for your agents. We're also integrating Mesh with <a href="https://www.cloudflare.com/developer-platform/"><u>Cloudflare Developer Platform</u></a> so that <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Workers</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/durable-objects/"><u>Durable Objects</u></a>, and agents built with the Agents SDK can reach your private infrastructure directly.</p><p>If you’re using <a href="https://www.cloudflare.com/sase/"><u>Cloudflare One’s SASE and Zero Trust suite</u></a>, you already have access to Mesh. You don’t need a new technology paradigm to secure agentic workloads. You need a SASE that was built for the agentic era, and that’s Cloudflare One. Cloudflare Mesh is a new experience with a simpler setup that leverages the on-ramps you’re already familiar with: WARP Connector <i>(now called a Cloudflare Mesh node)</i> and WARP Client <i>(now called Cloudflare One Client)</i>. Together, these create a private network for human, developer, and agent traffic. Mesh is directly integrated into your existing Cloudflare One deployment. Your existing Gateway policies, Access rules, and device posture checks apply to Mesh traffic automatically.</p><p>If you're a developer who just wants private networking for your agents, services, and team, Mesh is where you start. Set it up in minutes, connect your networks, and secure your traffic. And because Mesh runs on the <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Cloudflare One</u></a> platform, you can grow into more advanced capabilities over time: <a href="https://www.cloudflare.com/sase/products/gateway/"><u>Gateway</u></a> network, DNS, and HTTP policies for fine-grained traffic control, <a href="https://www.cloudflare.com/sase/use-cases/infrastructure-access/"><u>Access for Infrastructure</u></a> for SSH and RDP session management, <a href="https://www.cloudflare.com/sase/products/browser-isolation/"><u>Browser Isolation</u></a> for safe web access, <a href="https://developers.cloudflare.com/cloudflare-one/data-loss-prevention/"><u>DLP</u></a> to prevent sensitive data from leaving your network, and <a href="https://www.cloudflare.com/sase/products/casb/"><u>CASB</u></a> for SaaS security. You won’t have to plan for all of this on day one. You just don't have to migrate when you need it.</p>
    <div>
      <h2>New agentic workflows</h2>
      <a href="#new-agentic-workflows">
        
      </a>
    </div>
    <p>Private networking has always been about connecting clients to resources — SSH into a server, query a database, access an internal API. What's changed is who the clients are. A year ago, the answer was your developers and your services. Today, it's increasingly your agents.</p><p>This isn't theoretical. Look at the ecosystem: the explosion of <a href="https://blog.cloudflare.com/remote-model-context-protocol-servers-mcp/"><u>MCP (Model Context Protocol) servers</u></a> providing tool access, coding agents that need to read from private repos and databases, personal assistants running on home hardware. Each of these patterns assumes the agent can reach the resources it needs. When those resources are isolated in private networks, the agent is stuck. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/12RTZcOBkkKwmaRe5VYn9k/e1b00401bee274a7643a4dfd4139e2df/BLOG-3215_2.png" />
          </figure><p>This creates three workflows that are hard to secure today:</p><ol><li><p>Accessing a personal agent from a mobile device. You're running OpenClaw on a Mac mini at home. You want to reach it from your phone, your laptop at a coffee shop, or your work machine. But exposing it to the public Internet (even behind a password) can leave some gaps exposed. Your agent has shell access, file system access, and network access to your home network. One misconfiguration and anyone can reach it.</p></li><li><p>Letting a coding agent access your staging environment. You're using Claude Code, Cursor, or Codex on your laptop. You ask it to check deployment status, query analytics from a staging database, or read from an internal object store. But those services live in a private cloud VPC, so your agent can't reach them without exposing them to the Internet or tunneling your entire laptop into the VPC.</p></li><li><p>Connecting deployed agents to private services. You're building agents into your product using the <a href="https://developers.cloudflare.com/agents/"><u>Agents SDK</u></a> on Cloudflare Workers. Those agents need to call internal APIs, query databases, and access services that aren't on the public Internet. They need private access, but with scoped permissions, audit trails, and no credential leakage.</p></li></ol>
    <div>
      <h2>Cloudflare Mesh: one private network for users, nodes, and agents</h2>
      <a href="#cloudflare-mesh-one-private-network-for-users-nodes-and-agents">
        
      </a>
    </div>
    <p>Cloudflare Mesh is developer-friendly private networking. One lightweight connector, one binary, connects everything: your personal devices, your remote servers, your user endpoints. You don't need to install separate tools for each pattern. One connector on your network, and every access pattern works.</p><p>Once connected, devices in your private network can talk to each other over private IPs, routed through Cloudflare’s global network across 330+ cities giving you better reliability and control over your network.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ENfLb0kr1Zesh7PnuCvDK/830afd93d49b658023a9ece55e0e505b/BLOG-3215_3.gif" />
          </figure><p>Now, with Mesh, a single solution can solve all of the agent scenarios we mentioned above:</p><ul><li><p>With <b>Cloudflare One Client for iOS</b> on your phone, you can securely connect your mobile devices to your local Mac mini running OpenClaw via a Mesh private network.</p></li><li><p>With <b>Cloudflare One Client for macOS</b> on your laptop, you can connect your laptop to your private network so your coding agents can reach staging databases or APIs and query them.</p></li><li><p>With <b>Mesh nodes</b> on your Linux servers, you can connect VPCs in external clouds together, letting agents access resources and MCPs in external private networks.</p></li></ul><p>Because Mesh is powered by <a href="https://developers.cloudflare.com/cloudflare-one/team-and-resources/devices/cloudflare-one-client/"><u>Cloudflare One Client</u></a>, every connection inherits the security controls of the Cloudflare One platform. Gateway policies apply to Mesh traffic. Device posture checks validate connecting devices. DNS filtering catches suspicious lookups. You get this without additional configuration: the same policies that protect your human traffic protect your agent traffic.</p>
    <div>
      <h2>Choosing between Mesh and Tunnel</h2>
      <a href="#choosing-between-mesh-and-tunnel">
        
      </a>
    </div>
    <p>With the introduction of Mesh, you might ask: when should I use Mesh instead of Tunnel? Both connect external networks privately to Cloudflare, but they serve different purposes. <a href="https://developers.cloudflare.com/cloudflare-one/networks/connectors/cloudflare-tunnel/"><u>Cloudflare Tunnel</u></a> is the ideal solution for unidirectional traffic, where Cloudflare proxies the traffic from the edge to specific private services (like a web server or a database). </p><p>Cloudflare Mesh, on the other hand, provides a full bidirectional, many-to-many network. Every device and node on your Mesh can access one another using their private IPs. An application or agent running in your network can discover and access any other resource on the Mesh without each resource needing its own Tunnel. </p>
    <div>
      <h2>Using the power of Cloudflare’s network</h2>
      <a href="#using-the-power-of-cloudflares-network">
        
      </a>
    </div>
    <p>Cloudflare Mesh gives you the benefits of a mesh network (resiliency, high scalability, low latency and high performance), but, by routing everything through Cloudflare, it resolves a key challenge of mesh networks: NAT traversal.</p><p>Most of the Internet is behind NAT (Network Address Translation). This mechanism allows an entire local network of devices to share a single public IP address by mapping traffic between public headers and private internal addresses. When two devices are behind NAT, direct connections can fail and traffic has to fall back to relay servers. If your relay infrastructure has limited points of presence, a meaningful fraction of your traffic hits those relays, adding latency and reducing reliability. And while it can be possible to self-host your own relay servers to compensate, that means taking on the burden of managing additional infrastructure just to connect your existing network.</p><p>Cloudflare Mesh takes a different approach. All Mesh traffic routes through Cloudflare's global network, the same infrastructure that serves traffic for some of the largest websites of the Internet. For cross-region or multi-cloud traffic, this consistently beats public Internet routing. There's no degraded fallback path, because the Cloudflare edge is the path.</p><p>Routing through Cloudflare also means every packet passes through Cloudflare's security stack. This is the key advantage of building Mesh on the Cloudflare One platform: security isn't a separate product you bolt on later. And by leveraging this same global backbone, we can provide these core pillars to every team from day one:</p><p><b>50 nodes and 50 users free. </b>Your whole team and your whole staging environment on one private network, included with every Cloudflare account. </p><p><b>Global edge routing.</b> 330+ cities, optimized backbone routing. No relay servers with limited points of presence. No degraded fallback paths.</p><p><b>Security controls from day one.</b> Mesh runs on Cloudflare One. Gateway policies, DNS filtering, DLP, traffic inspection, and device posture checks are all available on the same platform. Start with simple private connectivity. Turn on Gateway policies when you need traffic filtering. Enable Access for Infrastructure when you need session-level controls for SSH and RDP. Add DLP when you need to prevent sensitive data from leaving your network. Every capability is one toggle away.</p><p><b>High availability.</b> Create a Mesh node with high availability enabled and spin up multiple connectors using the same token in active-passive mode. They advertise the same IP routes, so if one goes down, traffic fails over automatically.</p>
    <div>
      <h2>Integrated with the Developer Platform with Workers VPC</h2>
      <a href="#integrated-with-the-developer-platform-with-workers-vpc">
        
      </a>
    </div>
    <p>Mesh connects your agents and resources across external clouds, but you also need to be able to connect from your agents built on Workers with Agents SDK as well. To enable this, we’ve extended <a href="https://blog.cloudflare.com/workers-vpc-open-beta/"><u>Workers VPC</u></a> to make your entire Mesh network accessible to Workers and Durable Objects.</p><p>That means that you can connect to your Cloudflare Mesh network from Workers, making the entire network accessible from a single binding’s <code>fetch()</code> call. This complements Workers VPC’s existing support for Cloudflare Tunnel, giving you more choice over how you want to secure your networks. Now, you can specify entire networks that you want to connect to in your <code>wrangler.jsonc</code> file. To bind to your Mesh network, use the <code>cf1:network</code> reserved keyword that binds to the Mesh network of your account:</p>
            <pre><code>"vpc_networks": [
  { "binding": "MESH", "network_id": "cf1:network", "remote": true },
  { "binding": "AWS_VPC", "tunnel_id": "350fd307-...", "remote": true }
]
</code></pre>
            <p>Then, you can use it within your Worker or agent code:</p>
            <pre><code>export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext) {
    // Reach any internal host on your Mesh, no pre-registration required
    const apiResponse = await env.MESH.fetch("http://10.0.1.50/api/data");

    // Internal hostname resolved via tunnel's private DNS resolver
    const dbResponse = await env.AWS_VPC.fetch("http://internal-db.corp.local:5432");

    return new Response(await apiResponse.text());
  },
};
</code></pre>
            <p>By connecting the Developer Platform to your Mesh networks, you can build Workers that have secure access to your private databases, internal APIs and MCPs, allowing you to build cross-cloud agents and MCPs that provide agentic capabilities to your app. But it also opens up a world where agents can autonomously observe your entire stack end-to-end, cross-reference logs and suggest optimizations in real-time.</p>
    <div>
      <h2>How it all fits together</h2>
      <a href="#how-it-all-fits-together">
        
      </a>
    </div>
    <p>Together, Cloudflare Mesh, Workers VPC, and the Agents SDK provide a unified private network for your agents that spans both Cloudflare and your external clouds. We’ve merged connectivity and compute so your agents can securely reach the resources they need, wherever they live, across the globe.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6tYFghWEN1H3jIwV0Kejc9/04dcaedda5692a9726a7081d19bc413c/BLOG-3215_4.png" />
          </figure><p><b>Mesh nodes</b> are your servers, VMs, and containers. They run a headless version of Cloudflare One Client and get a Mesh IP. Services talk to services over private IPs, bidirectionally, routed through Cloudflare's edge. </p><p><b>Devices</b> are your laptops and phones. They run the Cloudflare One Client and reach Mesh nodes directly: SSH, database queries, API calls, all over private IPs. Your local coding agents use this connection to access private resources. </p><p><b>Agents on Workers</b> reach private services through Workers VPC Network bindings. They get scoped access to entire networks, mediated by MCP. The network enforces what the agent can reach. The MCP server enforces what the agent can do. </p>
    <div>
      <h2>What’s next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>The current version of Mesh provides the foundation for secure, unified connectivity. But as agentic workflows become more complex, we’re focused on moving beyond simple connectivity toward a network that is more intuitive to manage and more granularly aware of who, or what, is talking to your services. Here is what we are building for the rest of the year.</p>
    <div>
      <h4>Hostname routing</h4>
      <a href="#hostname-routing">
        
      </a>
    </div>
    <p>We're extending Cloudflare Tunnel's <a href="https://blog.cloudflare.com/tunnel-hostname-routing/"><u>hostname routing</u></a> to Mesh this summer. Your Mesh nodes will be able to attract traffic for private hostnames like <code>wiki.local</code> or <code>api.staging.internal</code>, without you having to manage IP lists or worry about how those hostnames resolve on the Cloudflare edge. Route traffic to services by name, not by IP. If your infrastructure uses dynamic IPs, auto-scaling groups, or ephemeral containers, this removes an entire class of routing headaches.</p>
    <div>
      <h4>Mesh DNS</h4>
      <a href="#mesh-dns">
        
      </a>
    </div>
    <p>Today, you reach Mesh nodes by their Mesh IPs: <code>ssh 100.64.0.5</code>. That works, but it's not how you think about your infrastructure. You think in names: <code>postgres-staging</code>, <code>api-prod</code>, <code>nikitas-openclaw</code>.</p><p>Later this year we're building Mesh DNS so that every node and device that joins your Mesh automatically gets a routable internal hostname. No DNS configuration or manual records. Add a node named <code>postgres-staging</code>, and <code>postgres-staging.mesh</code> resolves to the right Mesh IP from any device on your Mesh.</p><p>Combined with hostname routing, you'll be able to <code>ssh postgres-staging.mesh</code> or <code>curl http://api-prod.mesh:3000/health</code> without ever knowing or managing an IP address.</p>
    <div>
      <h4>Identity-aware routing</h4>
      <a href="#identity-aware-routing">
        
      </a>
    </div>
    <p>Today, Mesh nodes authenticate to the Cloudflare edge, but they share an identity at the network layer. Devices authenticate with user identity via the Cloudflare One Client, but nodes don't yet carry distinct, routable identities that Gateway policies can differentiate.</p><p>We want to change that. The goal is identity-aware routing for Mesh, where each node, each device, and eventually each agent gets a distinct identity that policies can evaluate. Instead of writing rules based on IP ranges, you write rules based on who or what is connecting.</p><p>This matters most for agents. Today, when an agent running on Workers calls a tool through a VPC binding, the target service sees a Worker making a request. It doesn't know which agent is calling, who authorized it, or what scope was granted. On the Mesh side, when a local coding agent on your laptop reaches a staging service, Gateway sees your device identity but not the agent's.</p><p>We're working toward a model where agents carry their own identity through the network:</p><ul><li><p>Principal / Sponsor: The human who authorized the action (Nikita from the platform team)</p></li><li><p>Agent: The AI system performing it (the deployment assistant, session #abc123)</p></li><li><p>Scope: What the agent is allowed to do (read deployments, trigger rollbacks, nothing else)</p></li></ul><p>This would let you write policies like: reads from Nikita's agents are allowed, but writes require Nikita directly. Agent traffic can be filtered independently from human traffic. An agent's network access can be revoked without touching Nikita's.</p><p>The infrastructure for this is in place. Mesh nodes provision with per-node tokens, devices authenticate with per-user identity, and Workers VPC bindings scope per-service access. The missing piece is making these identities visible to the policy layer so Gateway can make routing and access decisions based on them. That's what we're building.</p>
    <div>
      <h4>Mesh in containers</h4>
      <a href="#mesh-in-containers">
        
      </a>
    </div>
    <p>Today, Mesh nodes run on VMs and bare-metal Linux servers. But modern infrastructure increasingly runs in containers: Kubernetes pods, Docker Compose stacks, ephemeral CI/CD runners. We're building a Mesh Docker image that lets you add a Mesh node to any containerized environment.</p><p>This means you'll be able to include a Mesh sidecar in your Docker Compose stack and give every service in that stack private network access. A microservice running in a container in your staging cluster could reach a database in your production VPC over Mesh, without either service needing a public endpoint. </p><p>It is also useful for CI/CD pipelines that can access private infrastructure during builds and tests: your GitHub Actions runner pulls the Mesh container image, joins your network, runs integration tests against your staging environment, and tears down. All without VPN credentials to manage or persistent tunnels to maintain: the node disappears when the container exits.</p><p>We expect the Mesh Docker image to be available later this year.</p>
    <div>
      <h2>Get started</h2>
      <a href="#get-started">
        
      </a>
    </div>
    <p>While we continue to evolve these identity and routing capabilities, the foundation for secure, unified networking is available today. You can start bridging your clouds and securing your agents in just a few minutes.</p><p><b>Get started Cloudflare Mesh</b>: Head to Networking &gt; Mesh in the <a href="https://dash.cloudflare.com/?to=/:account/mesh"><u>Cloudflare dashboard</u></a>. Free for up to 50 nodes and 50 users.</p><p><b>Build agents with Agents SDK and Workers VPC: </b>Install the Agents SDK (`npm i agents`), follow the <a href="https://developers.cloudflare.com/workers-vpc/get-started/"><u>Workers VPC quickstart</u></a>, and build a <a href="https://developers.cloudflare.com/agents/guides/remote-mcp-server/"><u>remote MCP server</u></a> with private backend access.</p><p><b>Already on Cloudflare One?</b> Mesh works with your existing setup. Your Gateway policies, device posture checks, and access rules apply to Mesh traffic automatically. See <a href="https://developers.cloudflare.com/cloudflare-one/networks/connectors/cloudflare-mesh/"><u>the Mesh documentation</u></a> to add your first node.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4KMBRDqq5wBjdLHyhM3LNW/98064f81453b6ff29d0bf4b86cfc251a/image1.png" />
          </figure>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div>
  
</div>

<p></p> ]]></content:encoded>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[SASE]]></category>
            <guid isPermaLink="false">iAIJH3zvDaXoiDMugrwOv</guid>
            <dc:creator>Nikita Cano</dc:creator>
            <dc:creator>Thomas Gauvin</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building a CLI for all of Cloudflare]]></title>
            <link>https://blog.cloudflare.com/cf-cli-local-explorer/</link>
            <pubDate>Mon, 13 Apr 2026 14:29:45 GMT</pubDate>
            <description><![CDATA[ We’re introducing cf, a new unified CLI designed for consistency across the Cloudflare platform, alongside Local Explorer for debugging local data. These tools simplify how developers and AI agents interact with our nearly 3,000 API operations.
 ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare has a vast API surface. We have over 100 products, and nearly 3,000 HTTP API operations.</p><p>Increasingly, agents are the primary customer of our APIs. Developers bring their coding agents to build and deploy <a href="https://workers.cloudflare.com/solutions/frontends"><u>applications</u></a>, <a href="https://workers.cloudflare.com/solutions/ai"><u>agents</u></a>, and <a href="https://workers.cloudflare.com/solutions/platforms"><u>platforms</u></a> to Cloudflare, configure their account, and query our APIs for analytics and logs.</p><p>We want to make every Cloudflare product available in all of the ways agents need. For example, we now make Cloudflare’s entire API available in a single Code Mode MCP server that uses <a href="https://blog.cloudflare.com/code-mode-mcp/"><u>less than 1,000 tokens</u></a>. There’s a lot more surface area to cover, though: <a href="https://developers.cloudflare.com/workers/wrangler/commands/"><u>CLI commands</u></a>. <a href="https://blog.cloudflare.com/workers-environment-live-object-bindings/"><u>Workers Bindings</u></a> — including APIs for local development and testing. <a href="https://developers.cloudflare.com/fundamentals/api/reference/sdks/"><u>SDKs</u></a> across multiple languages. Our <a href="https://developers.cloudflare.com/workers/wrangler/configuration/"><u>configuration file</u></a>. <a href="https://developers.cloudflare.com/terraform/"><u>Terraform</u></a>. <a href="https://developers.cloudflare.com/"><u>Developer docs</u></a>. <a href="https://developers.cloudflare.com/api/"><u>API docs</u></a> and OpenAPI schemas. <a href="https://github.com/cloudflare/skills"><u>Agent Skills</u></a>.</p><p>Today, many of our products aren’t available across every one of these interfaces. This is particularly true of our CLI — <a href="https://developers.cloudflare.com/workers/wrangler/"><u>Wrangler</u></a>. Many Cloudflare products have no CLI commands in Wrangler. And agents love CLIs.</p><p>So we’ve been rebuilding Wrangler CLI, to make it the CLI for all of Cloudflare. It provides commands for all Cloudflare products, and lets you configure them together using infrastructure-as-code.</p><p>Today we’re sharing an early version of what the next version of Wrangler will look like as a technical preview. It’s very early, but we get the best feedback when we work in public.</p><p>You can try the Technical Preview today by running <code>npx cf</code>. Or you can install it globally by running <code>npm install -g cf</code>.</p><p>Right now, cf provides commands for just a small subset of Cloudflare products. We’re already testing a version of cf that supports the entirety of the Cloudflare API surface — and we will be intentionally reviewing and tuning the commands for each product, to have output that is ergonomic for both agents and humans. To be clear, this Technical Preview is just a small piece of the future Wrangler CLI. Over the coming months we will bring this together with the parts of Wrangler you know and love.</p><p>To build this in a way that keeps in sync with the rapid pace of product development at Cloudflare, we had to create a new system that allows us to generate commands, configuration, binding APIs, and more.</p>
    <div>
      <h2>Rethinking schemas and our code generation pipeline from first principles</h2>
      <a href="#rethinking-schemas-and-our-code-generation-pipeline-from-first-principles">
        
      </a>
    </div>
    <p>We already generate the Cloudflare <a href="https://blog.cloudflare.com/lessons-from-building-an-automated-sdk-pipeline/"><u>API SDKs</u></a>, <a href="https://blog.cloudflare.com/automatically-generating-cloudflares-terraform-provider/"><u>Terraform provider</u></a>, and <a href="https://blog.cloudflare.com/code-mode-mcp/"><u>Code Mode MCP server</u></a> based on the OpenAPI schema for Cloudflare API. But updating our CLI, Workers Bindings, wrangler.jsonc configuration, Agent Skills, dashboard and docs is still a manual process. This was already error-prone, required too much back and forth, and wouldn’t scale to support the whole Cloudflare API in the next version of our CLI.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2wjmgUzBkjeI0RtyXKIMXm/f7fc6ce3b7323aacb3babdfb461f383f/BLOG-3224_2.png" />
          </figure><p>To do this, we needed more than could be expressed in an OpenAPI schema. OpenAPI schemas describe REST APIs, but we have interactive CLI commands that involve multiple actions that combine both local development and API requests, Workers bindings expressed as RPC APIs, along with Agent Skills and documentation that ties this all together.</p><p>We write a lot of TypeScript at Cloudflare. It’s the lingua franca of software engineering. And we keep finding that it just works better to express APIs in TypeScript — as we do with <a href="https://blog.cloudflare.com/capnweb-javascript-rpc-library/"><u>Cap n’ Web</u></a>, <a href="https://blog.cloudflare.com/code-mode/"><u>Code Mode</u></a>, and the <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/"><u>RPC system</u></a> built into the Workers platform.</p><p>So we introduced a new TypeScript schema that can define the full scope of APIs, CLI commands and arguments, and context needed to generate any interface. The schema format is “just” a set of TypeScript types with conventions, linting, and guardrails to ensure consistency. But because it is our own format, it can easily be adapted to support any interface we need, today or in the future, while still <i>also</i> being able to generate an OpenAPI schema:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4H0xSIPMmrixUWsFL86RUJ/998b93a465d26d856885b4d833ac19d4/BLOG-3224_3.png" />
          </figure><p>To date most of our focus has been at this layer — building the machine we needed, so that we can now start building the CLI and other interfaces we’ve wanted for years to be able to provide. This lets us start to dream bigger about what we could standardize across Cloudflare and make better for Agents — especially when it comes to context engineering our CLI.</p>
    <div>
      <h2>Agents and CLIs — consistency and context engineering</h2>
      <a href="#agents-and-clis-consistency-and-context-engineering">
        
      </a>
    </div>
    <p>Agents expect CLIs to be consistent. If one command uses <code>&lt;command&gt; info</code> as the syntax for getting information about a resource, and another uses <code>&lt;command&gt; get</code>, the agent will expect one and call a non-existent command for the other. In a large engineering org of hundreds or thousands of people, and with many products, manually enforcing consistency through reviews is Swiss cheese. And you can enforce it at the CLI layer, but then naming differs between the CLI, REST API and SDKs, making the problem arguably worse.</p><p>One of the first things we’ve done is to start creating rules and guardrails, enforced at the schema layer. It’s always <code>get</code>, never <code>info</code>. Always <code>--force</code>, never <code>--skip-confirmations</code>. Always <code>--json</code>, never <code>--format</code>, and always supported across commands. </p><p>Wrangler CLI is also fairly unique — it provides commands and configuration that can work with both simulated local resources, or remote resources, like <a href="https://developers.cloudflare.com/d1/"><u>D1 databases</u></a>, <a href="https://developers.cloudflare.com/r2"><u>R2 storage buckets</u></a>, and <a href="https://developers.cloudflare.com/kv"><u>KV namespaces</u></a>. This means consistent defaults matter even more. If an agent thinks it’s modifying a remote database, but is actually adding records to local database, and the developer is using remote bindings to develop locally against a remote database, their agent won’t understand why the newly-added records aren’t showing up when the agent makes a request to the local dev server. Consistent defaults, along with output that clearly signals whether commands are applied to remote or local resources, ensure agents have explicit guidance.</p>
    <div>
      <h2>Local Explorer — what you can do remotely, you can now do locally</h2>
      <a href="#local-explorer-what-you-can-do-remotely-you-can-now-do-locally">
        
      </a>
    </div>
    <p>Today we are also releasing Local Explorer, a new feature available in open beta in both Wrangler and the Cloudflare Vite plugin.</p><p>Local Explorer lets you introspect the simulated resources that your Worker uses when you are developing locally, including <a href="https://www.cloudflare.com/developer-platform/products/workers-kv/"><u>KV</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2</u></a>, D1, <a href="https://www.cloudflare.com/developer-platform/products/durable-objects/"><u>Durable Objects</u></a> and <a href="https://www.cloudflare.com/developer-platform/products/workflows/"><u>Workflows</u></a>. The same things you can do via the Cloudflare API and Dashboard with each of these, you can also do entirely locally, powered by the same underlying API structure.</p><p>For years we’ve <a href="https://blog.cloudflare.com/wrangler3/"><u>made a bet on fully local development</u></a> — not just for Cloudflare Workers, but for the entire platform. When you use D1, even though D1 is a hosted, serverless database product, you can run your database and communicate with it via bindings entirely locally, without any extra setup or tooling. Via <a href="https://developers.cloudflare.com/workers/testing/miniflare/"><u>Miniflare</u></a>, our local development platform emulator, the Workers runtime provides the exact same APIs in local dev as in production, and uses a local SQLite database to provide the same functionality. This makes it easy to write and run tests that run fast, without the need for network access, and work offline.</p><p>But until now, working out what data was stored locally required you to reverse engineer, introspect the contents of the <code>.wrangler/state</code> directory, or install third-party tools.</p><p>Now whenever you run an app with Wrangler CLI or the Cloudflare Vite plugin, you will be prompted to open the local explorer (keyboard shortcut <code>e</code>). This provides you with a simple, local interface to see what bindings your Worker currently has attached, and what data is stored against them.</p><div>
  
</div><p>When you build using Agents, Local Explorer is a great way to understand what the agent is doing with data, making the local development cycle much more interactive. You can turn to Local Explorer anytime you need to verify a schema, seed some test records, or just start over and <code>DROP TABLE</code>.</p><p>Our goal here is to provide a mirror of the Cloudflare API that only modifies local data, so that all of your local resources are available via the same APIs that you use remotely. And by making the API shape match across local and remote, when you run CLI commands in the upcoming version of the CLI and pass a <code>--local</code> flag, the commands just work. The only difference is that the command makes a request to this local mirror of the Cloudflare API instead.</p><p>Starting today, this API is available at <code>/cdn-cgi/explorer/api</code> on any Wrangler- or Vite Plugin- powered application. By pointing your agent at this address, it will find an OpenAPI specification to be able to manage your local resources for you, just by talking to your agent.</p>
    <div>
      <h2>Tell us your hopes and dreams for a Cloudflare-wide CLI </h2>
      <a href="#tell-us-your-hopes-and-dreams-for-a-cloudflare-wide-cli">
        
      </a>
    </div>
    <p>Now that we have built the machine, it’s time to take the best parts of Wrangler today, combine them with what’s now possible, and make Wrangler the best CLI possible for using all of Cloudflare.</p><p>You can try the technical preview today by running <code>npx cf</code>. Or you can install it globally by running <code>npm install -g cf</code>.</p><p>With this very early version, we want your feedback — not just about what the technical preview does today, but what you want from a CLI for Cloudflare’s entire platform. Tell us what you wish was an easy one-line CLI command but takes a few clicks in our dashboard today. What you wish you could configure in <code>wrangler.jsonc</code> — like DNS records or Cache Rules. And where you’ve seen your agents get stuck, and what commands you wish our CLI provided for your agent to use.</p><p>Jump into the <a href="https://discord.cloudflare.com/"><u>Cloudflare Developers Discord</u></a> and tell us what you’d like us to add first to the CLI, and stay tuned for many more updates soon.</p><p><i>Thanks to Emily Shen for her valuable contributions to kicking off the Local Explorer project.</i></p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[D1]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Agents Week]]></category>
            <guid isPermaLink="false">5r3Nx1IDtp6B6GRDHqQyWL</guid>
            <dc:creator>Matt “TK” Taylor</dc:creator>
            <dc:creator>Dimitri Mitropoulos</dc:creator>
            <dc:creator>Dan Carter</dc:creator>
        </item>
        <item>
            <title><![CDATA[Durable Objects in Dynamic Workers: Give each AI-generated app its own database]]></title>
            <link>https://blog.cloudflare.com/durable-object-facets-dynamic-workers/</link>
            <pubDate>Mon, 13 Apr 2026 13:08:35 GMT</pubDate>
            <description><![CDATA[ We’re introducing Durable Object Facets, allowing Dynamic Workers to instantiate Durable Objects with their own isolated SQLite databases. This enables developers to build platforms that run persistent, stateful code generated on-the-fly.
 ]]></description>
            <content:encoded><![CDATA[ <p>A few weeks ago, we announced <a href="https://blog.cloudflare.com/dynamic-workers/"><u>Dynamic Workers</u></a>, a new feature of the Workers platform which lets you load Worker code on-the-fly into a secure sandbox. The Dynamic Worker Loader API essentially provides direct access to the basic compute isolation primitive that Workers has been based on all along: isolates, not containers. Isolates are much lighter-weight than containers, and as such, can load 100x faster using 1/10 the memory. They are so efficient, they can be treated as "disposable": start one up to run a few lines of code, then throw it away. Like a secure version of eval(). </p><p>Dynamic Workers have many uses. In the original announcement, we focused on how to use them to run AI-agent-generated code as an alternative to tool calls. In this use case, an AI agent performs actions at the request of a user by writing a few lines of code and executing them. The code is single-use, intended to perform one task one time, and is thrown away immediately after it executes.</p><p>But what if you want an AI to generate more persistent code? What if you want your AI to build a small application with a custom UI the user can interact with? What if you want that application to have long-lived state? But of course, you still want it to run in a secure sandbox.</p><p>One way to do this would be to use Dynamic Workers, and simply provide the Worker with an <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/"><u>RPC</u></a> API that gives it access to storage. Using <a href="https://developers.cloudflare.com/dynamic-workers/usage/bindings/"><u>bindings</u></a>, you could give the Dynamic Worker an API that points back to your remote SQL database (perhaps backed by <a href="https://developers.cloudflare.com/d1/"><u>Cloudflare D1</u></a>, or a Postgres database you access through <a href="https://developers.cloudflare.com/hyperdrive/"><u>Hyperdrive</u></a> — it's up to you).</p><p>But Workers also has a unique and extremely fast type of storage that may be a perfect fit for this use case: <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a>. A Durable Object is a special kind of Worker that has a unique name, with one instance globally per name. That instance has a SQLite database attached, which lives <i>on local disk</i> on the machine where the Durable Object runs. This makes storage access ridiculously fast: there is effectively <a href="https://blog.cloudflare.com/sqlite-in-durable-objects/"><u>zero latency</u></a>.</p><p>Perhaps, then, what you really want is for your AI to write code for a Durable Object, and then you want to run that code in a Dynamic Worker.</p>
    <div>
      <h2><b>But how?</b></h2>
      <a href="#but-how">
        
      </a>
    </div>
    <p>This presents a weird problem. Normally, to use Durable Objects you have to:</p><ol><li><p>Write a class extending <code>DurableObject</code>.</p></li><li><p>Export it from your Worker's main module.</p></li><li><p><a href="https://developers.cloudflare.com/durable-objects/get-started/#5-configure-durable-object-class-with-sqlite-storage-backend"><u>Specify in your Wrangler config</u></a> that storage should be provision for this class. This creates a Durable Object namespace that points at your class for handling incoming requests.</p></li><li><p><a href="https://developers.cloudflare.com/durable-objects/get-started/#4-configure-durable-object-bindings"><u>Declare a Durable Object namespace binding</u></a> pointing at your namespace (or use <a href="https://developers.cloudflare.com/workers/runtime-apis/context/#exports"><u>ctx.exports</u></a>), and use it to make requests to your Durable Object.</p></li></ol><p>This doesn't extend naturally to Dynamic Workers. First, there is the obvious problem: The code is dynamic. You run it without invoking the Cloudflare API at all. But Durable Object storage has to be provisioned through the API, and the namespace has to point at an implementing class. It can't point at your Dynamic Worker.</p><p>But there is a deeper problem: Even if you could somehow configure a Durable Object namespace to point directly at a Dynamic Worker, would you want to? Do you want your agent (or user) to be able to create a whole namespace full of Durable Objects? To use unlimited storage spread around the world?</p><p>You probably don't. You probably want some control. You may want to limit, or at least track, how many objects they create. Maybe you want to limit them to just one object (probably good enough for vibe-coded personal apps). You may want to add logging and other observability. Metrics. Billing. Etc.</p><p>To do all this, what you really want is for requests to these Durable Objects to go to <i>your</i> code <i>first</i>, where you can then do all the "logistics", and <i>then</i> forward the request into the agent's code. You want to write a <i>supervisor</i> that runs as part of every Durable Object.</p>
    <div>
      <h2><b>Solution: Durable Object Facets</b></h2>
      <a href="#solution-durable-object-facets">
        
      </a>
    </div>
    <p>Today we are releasing, in open beta, a feature that solves this problem.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/mUUk7svflWvIp5Ff3npbG/cd2ec9a7111681657c37e3560fd9af58/BLOG-3211_2.png" />
          </figure><p><a href="https://developers.cloudflare.com/dynamic-workers/usage/durable-object-facets/"><u>Durable Object Facets</u></a> allow you to load and instantiate a Durable Object class dynamically, while providing it with a SQLite database to use for storage. With Facets:</p><ul><li><p>First you create a normal Durable Object namespace, pointing to a class <i>you</i> write.</p></li><li><p>In that class, you load the agent's code as a Dynamic Worker, and call into it.</p></li><li><p>The Dynamic Worker's code can implement a Durable Object class directly. That is, it literally exports a class declared as <code>extends DurableObject</code>.</p></li><li><p>You are instantiating that class as a "facet" of your own Durable Object.</p></li><li><p>The facet gets its own SQLite database, which it can use via the normal Durable Object storage APIs. This database is separate from the supervisor's database, but the two are stored together as part of the same overall Durable Object.</p></li></ul>
    <div>
      <h2><b>How it works</b></h2>
      <a href="#how-it-works">
        
      </a>
    </div>
    <p>Here is a simple, complete implementation of an app platform that dynamically loads and runs a Durable Object class:</p>
            <pre><code>import { DurableObject } from "cloudflare:workers";

// For the purpose of this example, we'll use this static
// application code, but in the real world this might be generated
// by AI (or even, perhaps, a human user).
const AGENT_CODE = `
  import { DurableObject } from "cloudflare:workers";

  // Simple app that remembers how many times it has been invoked
  // and returns it.
  export class App extends DurableObject {
    fetch(request) {
      // We use storage.kv here for simplicity, but storage.sql is
      // also available. Both are backed by SQLite.
      let counter = this.ctx.storage.kv.get("counter") || 0;
      ++counter;
      this.ctx.storage.kv.put("counter", counter);

      return new Response("You've made " + counter + " requests.\\n");
    }
  }
`;

// AppRunner is a Durable Object you write that is responsible for
// dynamically loading applications and delivering requests to them.
// Each instance of AppRunner contains a different app.
export class AppRunner extends DurableObject {
  async fetch(request) {
    // We've received an HTTP request, which we want to forward into
    // the app.

    // The app itself runs as a child facet named "app". One Durable
    // Object can have any number of facets (subject to storage limits)
    // with different names, but in this case we have only one. Call
    // this.ctx.facets.get() to get a stub pointing to it.
    let facet = this.ctx.facets.get("app", async () =&gt; {
      // If this callback is called, it means the facet hasn't
      // started yet (or has hibernated). In this callback, we can
      // tell the system what code we want it to load.

      // Load the Dynamic Worker.
      let worker = this.#loadDynamicWorker();

      // Get the exported class we're interested in.
      let appClass = worker.getDurableObjectClass("App");

      return { class: appClass };
    });

    // Forward request to the facet.
    // (Alternatively, you could call RPC methods here.)
    return await facet.fetch(request);
  }

  // RPC method that a client can call to set the dynamic code
  // for this app.
  setCode(code) {
    // Store the code in the AppRunner's SQLite storage.
    // Each unique code must have a unique ID to pass to the
    // Dynamic Worker Loader API, so we generate one randomly.
    this.ctx.storage.kv.put("codeId", crypto.randomUUID());
    this.ctx.storage.kv.put("code", code);
  }

  #loadDynamicWorker() {
    // Use the Dynamic Worker Loader API like normal. Use get()
    // rather than load() since we may load the same Worker many
    // times.
    let codeId = this.ctx.storage.kv.get("codeId");
    return this.env.LOADER.get(codeId, async () =&gt; {
      // This Worker hasn't been loaded yet. Load its code from
      // our own storage.
      let code = this.ctx.storage.kv.get("code");

      return {
        compatibilityDate: "2026-04-01",
        mainModule: "worker.js",
        modules: { "worker.js": code },
        globalOutbound: null,  // block network access
      }
    });
  }
}

// This is a simple Workers HTTP handler that uses AppRunner.
export default {
  async fetch(req, env, ctx) {
    // Get the instance of AppRunner named "my-app".
    // (Each name has exactly one Durable Object instance in the
    // world.)
    let obj = ctx.exports.AppRunner.getByName("my-app");

    // Initialize it with code. (In a real use case, you'd only
    // want to call this once, not on every request.)
    await obj.setCode(AGENT_CODE);

    // Forward the request to it.
    return await obj.fetch(req);
  }
}
</code></pre>
            <p>In this example:</p><ul><li><p><code>AppRunner</code> is a "normal" Durable Object written by the platform developer (you).</p></li><li><p>Each instance of <code>AppRunner</code> manages one application. It stores the app code and loads it on demand.</p></li><li><p>The application itself implements and exports a Durable Object class, which the platform expects is named <code>App</code>.</p></li><li><p><code>AppRunner</code> loads the application code using Dynamic Workers, and then executes the code as a Durable Object Facet.</p></li><li><p>Each instance of <code>AppRunner</code> is one Durable Object composed of <i>two</i> SQLite databases: one belonging to the parent (<code>AppRunner</code> itself) and one belonging to the facet (<code>App</code>). These databases are isolated: the application cannot read <code>AppRunner</code>'s database, only its own.</p></li></ul><p>To run the example, copy the code above into a file <code>worker.j</code>s, pair it with the following <code>wrangler.jsonc</code>, and run it locally with <code>npx wrangler dev</code>.</p>
            <pre><code>// wrangler.jsonc for the above sample worker.
{
  "compatibility_date": "2026-04-01",
  "main": "worker.js",
  "migrations": [
    {
      "tag": "v1",
      "new_sqlite_classes": [
        "AppRunner"
      ]
    }
  ],
  "worker_loaders": [
    {
      "binding": "LOADER",
    },
  ],
}
</code></pre>
            
    <div>
      <h2><b>Start building</b></h2>
      <a href="#start-building">
        
      </a>
    </div>
    <p>Facets are a feature of Dynamic Workers, available in beta immediately to users on the Workers Paid plan.</p><p>Check out the documentation to learn more about <a href="https://developers.cloudflare.com/dynamic-workers/"><u>Dynamic Workers</u></a> and <a href="https://developers.cloudflare.com/dynamic-workers/usage/durable-object-facets/"><u>Facets</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Durable Objects]]></category>
            <category><![CDATA[Storage]]></category>
            <guid isPermaLink="false">2OYAJUdGLODlCXKKdCZMeG</guid>
            <dc:creator>Kenton Varda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Agents have their own computers with Sandboxes GA]]></title>
            <link>https://blog.cloudflare.com/sandbox-ga/</link>
            <pubDate>Mon, 13 Apr 2026 13:08:35 GMT</pubDate>
            <description><![CDATA[ Cloudflare Sandboxes give AI agents a persistent, isolated environment: a real computer with a shell, a filesystem, and background processes that starts on demand and picks up exactly where it left off. ]]></description>
            <content:encoded><![CDATA[ <p>When we launched <a href="https://github.com/cloudflare/sandbox-sdk"><u>Cloudflare Sandboxes</u></a> last June, the premise was simple: <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>AI agents</u></a> need to develop and run code, and they need to do it somewhere safe.</p><p>If an agent is acting like a developer, this means cloning repositories, building code in many languages, running development servers, etc. To do these things effectively, they will often need a full computer (and if they don’t, they can <a href="https://blog.cloudflare.com/dynamic-workers/"><u>reach for something lightweight</u></a>!).</p><p>Many developers are stitching together solutions using VMs or existing container solutions, but there are lots of hard problems to solve:</p><ul><li><p><b>Burstiness -</b> With each session needing its own sandbox, you often need to spin up many sandboxes quickly, but you don’t want to pay for idle compute on standby.</p></li><li><p><b>Quick state restoration</b> - Each session should start quickly and re-start quickly, resuming past state.</p></li><li><p><b>Security</b> - Agents need to access services securely, but can’t be trusted with credentials.</p></li><li><p><b>Control</b> - It needs to be simple to programmatically control sandbox lifecycle, execute commands, handle files, and more.</p></li><li><p><b>Ergonomics</b> - You need to give a simple interface for both humans and agents to do common operations.</p></li></ul><p>We’ve spent time solving these issues so you don’t have to. Since our initial launch we’ve made Sandboxes an even better place to run agents at scale. We’ve worked with our initial partners such as Figma, who run agents in containers with <a href="https://www.figma.com/make/"><u>Figma Make</u></a>:</p><blockquote><p><i>“Figma Make is built to help builders and makers of all backgrounds go from idea to production, faster. To deliver on that goal, we needed an infrastructure solution that could provide reliable, highly-scalable sandboxes where we could run untrusted agent- and user-authored code. Cloudflare Containers is that solution.”</i></p><p><i>- </i><b><i>Alex Mullans</i></b><i>, AI and Developer Platforms at Figma</i></p></blockquote><p>We want to bring Sandboxes to even more great organizations, so today we are excited to announce that <b>Sandboxes and Cloudflare Containers are both generally available.</b></p><p>Let’s take a look at some of the recent changes to Sandboxes:</p><ul><li><p><b>Secure credential injection </b>lets you make authenticated calls without the agent ever having credential access  </p></li><li><p><b>PTY support</b> gives you and your agent a real terminal</p></li><li><p><b>Persistent code interpreters</b> give your agent a place to execute stateful Python, JavaScript, and TypeScript out of the box</p></li><li><p><b>Background processes and live preview URLs</b> provide a simple way to interact with development servers and verify in-flight changes</p></li><li><p><b>Filesystem watching</b> improves iteration speed as agents make changes</p></li><li><p><b>Snapshots</b> let you quickly recover an agent's coding session</p></li><li><p><b>Higher limits and Active CPU Pricing</b> let you deploy a fleet of agents at scale without paying for unused CPU cycles </p></li></ul>
    <div>
      <h2>Sandboxes 101</h2>
      <a href="#sandboxes-101">
        
      </a>
    </div>
    <p>Before getting into some of the recent changes, let’s quickly look at the basics.</p><p>A Cloudflare Sandbox is a persistent, isolated environment powered by <a href="https://blog.cloudflare.com/containers-are-available-in-public-beta-for-simple-global-and-programmable/"><u>Cloudflare Containers</u></a>. You ask for a sandbox by name. If it's running, you get it. If it's not, it starts. When it's idle, it sleeps automatically and wakes when it receives a request. It’s easy to programmatically interact with the sandbox using methods like <code>exec</code>, <code>gitClone</code>, <code>writeFile</code> and <a href="https://developers.cloudflare.com/sandbox/api/"><u>more</u></a>.</p>
            <pre><code>import { getSandbox } from "@cloudflare/sandbox";
export { Sandbox } from "@cloudflare/sandbox";

export default {
  async fetch(request: Request, env: Env) {
    // Ask for a sandbox by name. It starts on demand.
    const sandbox = getSandbox(env.Sandbox, "agent-session-47");

    // Clone a repository into it.
    await sandbox.gitCheckout("https://github.com/org/repo", {
      targetDir: "/workspace",
      depth: 1,
    });

    // Run the test suite. Stream output back in real time.
    return sandbox.exec("npm", ["test"], { stream: true });
  },
};
</code></pre>
            <p>As long as you provide the same ID, subsequent requests can get to this same sandbox from anywhere in the world.</p>
    <div>
      <h2>Secure credential injection</h2>
      <a href="#secure-credential-injection">
        
      </a>
    </div>
    <p>One of the hardest problems in agentic workloads is authentication. You often need agents to access private services, but you can't fully trust them with raw credentials. </p><p>Sandboxes solve this by injecting credentials at the network layer using a programmable egress proxy. This means that sandbox agents never have access to credentials and you can fully customize auth logic as you see fit:</p>
            <pre><code>class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "my-internal-vcs.dev": (request, env, ctx) =&gt; {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}
</code></pre>
            <p>For a deep dive into how this works — including identity-aware credential injection, dynamically modifying rules, and integrating with <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>Workers bindings</u></a> — read our recent blog post on <a href="https://blog.cloudflare.com/sandbox-auth"><u>Sandbox auth</u></a>.</p>
    <div>
      <h2>A real terminal, not a simulation</h2>
      <a href="#a-real-terminal-not-a-simulation">
        
      </a>
    </div>
    <p>Early agent systems often modeled shell access as a request-response loop: run a command, wait for output, stuff the transcript back into the prompt, repeat. It works, but it is not how developers actually use a terminal. </p><p>Humans run something, watch output stream in, interrupt it, reconnect later, and keep going. Agents benefit from that same feedback loop.</p><p>In February, we shipped PTY support. A pseudo-terminal session in a Sandbox, proxied over WebSocket, compatible with <a href="https://xtermjs.org/"><u>xterm.js</u></a>.</p><p>Just call <code>sandbox.terminal</code> to serve the backend:</p>
            <pre><code>// Worker: upgrade a WebSocket connection into a live terminal session
export default {
  async fetch(request: Request, env: Env) {
    const url = new URL(request.url);
    if (url.pathname === "/terminal") {
      const sandbox = getSandbox(env.Sandbox, "my-session");
      return sandbox.terminal(request, { cols: 80, rows: 24 });
    }
    return new Response("Not found", { status: 404 });
  },
};

</code></pre>
            <p>And use <code>xterm addon</code> to call it from the client:</p>
            <pre><code>// Browser: connect xterm.js to the sandbox shell
import { Terminal } from "xterm";
import { SandboxAddon } from "@cloudflare/sandbox/xterm";

const term = new Terminal();
const addon = new SandboxAddon({
  getWebSocketUrl: ({ origin }) =&gt; `${origin}/terminal`,
});

term.loadAddon(addon);
term.open(document.getElementById("terminal-container")!);
addon.connect({ sandboxId: "my-session" });
</code></pre>
            <p>This allows agents and developers to use a full PTY to debug those sessions live.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bgyxh8kg3MPfij2v1XXLE/9cff50318ad306b20c3346c3bd3554d9/BLOG-3264_2.gif" />
          </figure><p>Each terminal session gets its own isolated shell, its own working directory, its own environment. Open as many as you need, just like you would on your own machine. Output is buffered server-side, so reconnecting replays what you missed.</p>
    <div>
      <h2>A code interpreter that remembers</h2>
      <a href="#a-code-interpreter-that-remembers">
        
      </a>
    </div>
    <p>For data analysis, scripting, and exploratory workflows, we also ship a higher-level abstraction: a persistent code execution context.</p><p>The key word is “persistent.” Many code interpreter implementations run each snippet in isolation, so state disappears between calls. You can't set a variable in one step and read it in the next.</p><p>Sandboxes allow you to create “contexts” that persist state. Variables and imports persist across calls the same way they would in a Jupyter notebook:</p>
            <pre><code>// Create a Python context. State persists for its lifetime.
const ctx = await sandbox.createCodeContext({ language: "python" });

// First execution: load data
await sandbox.runCode(`
  import pandas as pd
  df = pd.read_csv('/workspace/sales.csv')
  df['margin'] = (df['revenue'] - df['cost']) / df['revenue']
`, { context: ctx });

// Second execution: df is still there
const result = await sandbox.runCode(`
  df.groupby('region')['margin'].mean().sort_values(ascending=False)
`, { context: ctx, onStdout: (line) =&gt; console.log(line.text) });

// result contains matplotlib charts, structured json output, and Pandas tables in HTML
</code></pre>
            
    <div>
      <h2>Start a server. Get a URL. Ship it.</h2>
      <a href="#start-a-server-get-a-url-ship-it">
        
      </a>
    </div>
    <p>Agents are more useful when they can build something and show it to the user immediately. Sandboxes support background processes, readiness checks, and <a href="https://developers.cloudflare.com/sandbox/concepts/preview-urls/"><u>preview URLs</u></a>. This lets an agent start a development server and share a live link without leaving the conversation.</p>
            <pre><code>// Start a dev server as a background process
const server = await sandbox.startProcess("npm run dev", {
  cwd: "/workspace",
});

// Wait until the server is actually ready — don't just sleep and hope
await server.waitForLog(/Local:.*localhost:(\d+)/);

// Expose the running service with a public URL
const { url } = await sandbox.exposePort(3000);

// url is a live public URL the agent can share with the user
console.log(`Preview: ${url}`);
</code></pre>
            <p>With <code><i>waitForPort()</i></code> and <code><i>waitForLog()</i></code>, agents can sequence work based on real signals from the running program instead of guesswork. This is much nicer than a common alternative, which is usually some version of <code>sleep(2000)</code> followed by hope.</p>
    <div>
      <h2>Watch the file system and react immediately</h2>
      <a href="#watch-the-file-system-and-react-immediately">
        
      </a>
    </div>
    <p>Modern development loops are event-driven. Save a file, rerun the build. Edit a config, restart the server. Change a test, rerun the suite.</p><p>We shipped <i>sandbox.watch()</i> in March. It returns an SSE stream backed by native <a href="https://man7.org/linux/man-pages/man7/inotify.7.html"><u>inotify</u></a>, the kernel mechanism Linux uses for filesystem events.</p>
            <pre><code>import { parseSSEStream, type FileWatchSSEEvent } from '@cloudflare/sandbox';

const stream = await sandbox.watch('/workspace/src', {
  recursive: true,
  include: ['*.ts', '*.tsx']
});

for await (const event of parseSSEStream&lt;FileWatchSSEEvent&gt;(stream)) {
  if (event.type === 'modify' &amp;&amp; event.path.endsWith('.ts')) {
    await sandbox.exec('npx tsc --noEmit', { cwd: '/workspace' });
  }
}
</code></pre>
            <p>This is one of those primitives that quietly changes what agents can do. An agent that can observe the filesystem in real time can participate in the same feedback loops as a human developer.</p>
    <div>
      <h2>Waking up quickly with snapshots</h2>
      <a href="#waking-up-quickly-with-snapshots">
        
      </a>
    </div>
    <p>Imagine a (human) developer working on their laptop. They <code>git clone</code> a repo, run <code>npm install</code>, write code, push a PR, then close their laptop while waiting for code review. When it’s time to resume work, they just re-open the laptop and continue where they left off.</p><p>If an agent wants to replicate this workflow on a naive container platform, you run into a snag. How do you resume where you left off quickly? You could keep a sandbox running, but then you pay for idle compute. You could start fresh from the container image, but then you have to wait for a long <code>git clone</code> and <code>npm install</code>.</p><p>Our answer is snapshots, which will be rolling out in the coming weeks.</p><p>A snapshot preserves a container's full disk state, OS config, installed dependencies, modified files, data files and more. Then it lets you quickly restore it later.</p><p>You can configure a Sandbox to automatically snapshot when it goes to sleep.</p>
            <pre><code>class AgentDevEnvironment extends Sandbox {
  sleepAfter = "5m";
  persistAcrossSessions = {type: "disk"}; // you can also specify individual directories
}
</code></pre>
            <p>You can also programmatically take a snapshot and manually restore it. This is useful for checkpointing work or forking sessions. For instance, if you wanted to run four instances of an agent in parallel, you could easily boot four sandboxes from the same state.</p>
            <pre><code>class AgentDevEnvironment extends Sandbox {}

async forkDevEnvironment(baseId, numberOfForks) {
  const baseInstance = await getSandbox(baseId);
  const snapshotId = await baseInstance.snapshot();

  const forks = Array.from({ length: numberOfForks }, async (_, i) =&gt; {
    const newInstance = await getSandbox(`${baseId}-fork-${i}`);
    return newInstance.start({ snapshot: snapshotId });
  });

  await Promise.all(forks);
}
</code></pre>
            <p>Snapshots are stored in <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a> within your account, giving you durability and location-independence. R2's <a href="https://developers.cloudflare.com/cache/how-to/tiered-cache/"><u>tiered caching</u></a> system allows for fast restores across all of Region: Earth.</p><p>In future releases, live memory state will also be captured, allowing running processes to resume exactly where they left off. A terminal and an editor will reopen in the exact state they were in when last closed.</p><p>If you are interested in restoring session state before snapshots go live, you can use the <a href="https://developers.cloudflare.com/sandbox/guides/backup-restore/"><code><u>backup and restore</u></code></a> methods today. These also persist and restore directories using R2, but are not as performant as true VM-level snapshots. Though they still can lead to considerable speed improvements over naively recreating session state.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/LzVucBiNvxOh3NFn0ukxj/3b8e6cd9a5ca241b6c6a7c8556c0a529/BLOG-3264_3.gif" />
          </figure><p><sup><i>Booting a sandbox, cloning  ‘axios’, and npm installing takes 30 seconds. Restoring from a backup takes two seconds.</i></sup></p><p>Stay tuned for the official snapshot release.</p>
    <div>
      <h2>Higher limits and Active CPU Pricing</h2>
      <a href="#higher-limits-and-active-cpu-pricing">
        
      </a>
    </div>
    <p>Since our initial launch, we’ve been steadily increasing capacity. Users on our standard pricing plan can now run 15,000 concurrent instances of the lite instance type, 6,000 instances of basic, and over 1,000 concurrent larger instances. <a href="https://forms.gle/3vvDvXPECjy6F8v56"><u>Reach out</u></a> to run even more!</p><p>We also changed our pricing model to be more cost effective running at scale. Sandboxes now <a href="https://developers.cloudflare.com/changelog/post/2025-11-21-new-cpu-pricing/"><u>only charge for actively used CPU cycles</u></a>. This means that you aren’t paying for idle CPU while your agent is waiting for an LLM to respond.</p>
    <div>
      <h2>This is what a computer looks like </h2>
      <a href="#this-is-what-a-computer-looks-like">
        
      </a>
    </div>
    <p>Nine months ago, we shipped a sandbox that could run commands and access a filesystem. That was enough to prove the concept.</p><p>What we have now is different in kind. A Sandbox today is a full development environment: a terminal you can connect a browser to, a code interpreter with persistent state, background processes with live preview URLs, a filesystem that emits change events in real time, egress proxies for secure credential injection, and a snapshot mechanism that makes warm starts nearly instant. </p><p>When you build on this, a satisfying pattern emerges: agents that do real engineering work. Clone a repo, install it, run the tests, read the failures, edit the code, run the tests again. The kind of tight feedback loop that makes a human engineer effective — now the agent gets it too.</p><p>We're at version 0.8.9 of the SDK. You can get started today:</p><p><code>npm i @cloudflare/sandbox@latest</code></p><div>
  
</div>
<p></p> ]]></content:encoded>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Containers]]></category>
            <category><![CDATA[Sandbox]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">7jXMXMjQUIpjGzJdPadO4a</guid>
            <dc:creator>Kate Reznykova</dc:creator>
            <dc:creator>Mike Nomitch</dc:creator>
            <dc:creator>Naresh Ramesh</dc:creator>
        </item>
        <item>
            <title><![CDATA[Dynamic, identity-aware, and secure Sandbox auth]]></title>
            <link>https://blog.cloudflare.com/sandbox-auth/</link>
            <pubDate>Mon, 13 Apr 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[ Outbound Workers for Sandboxes provide a programmable, zero-trust egress proxy for AI agents. This allows developers to inject credentials and enforce dynamic security policies without exposing sensitive tokens to untrusted code.
 ]]></description>
            <content:encoded><![CDATA[ <p>As <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>AI Large Language Models</u></a> and harnesses like OpenCode and Claude Code become increasingly capable, we see more users kicking off sandboxed agents in response to chat messages, Kanban updates, <a href="https://www.cloudflare.com/learning/ai/ai-vibe-coding/"><u>vibe coding</u></a> UIs, terminal sessions, GitHub comments, and more.</p><p>The sandbox is an important step beyond simple containers, because it gives you a few things:</p><ul><li><p><b>Security</b>: Any untrusted end user (or a rogue LLM) can run in the sandbox and not compromise the host machine or other sandboxes running alongside it. This is traditionally (<a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>but not always</u></a>) accomplished with a microVM.</p></li><li><p><b>Speed</b>: An end user should be able to pick up a new sandbox quickly <i>and</i> restore the state from a previously used one quickly.</p></li><li><p><b>Control</b>: The <i>trusted</i> platform needs to be able to take actions within the <i>untrusted</i> domain of the sandbox. This might mean mounting files in the sandbox, or controlling which requests access it, or executing specific commands.</p></li></ul><p>Today, we’re excited to add another key component of control to our <a href="https://developers.cloudflare.com/sandbox/"><u>Sandboxes</u></a> and all <a href="https://developers.cloudflare.com/containers/"><u>Containers</u></a>: outbound Workers. These are programmatic egress proxies that allow users running sandboxes to easily connect to different services, add <a href="https://www.cloudflare.com/learning/performance/what-is-observability/"><u>observability</u></a>, and, importantly for agents, add flexible and safe authentication.</p>
    <div>
      <h2>How it works</h2>
      <a href="#how-it-works">
        
      </a>
    </div>
    <p>Here’s a quick look at adding a secret key to a header using an outbound Worker:</p>
            <pre><code>class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "github.com": (request, env, ctx) =&gt; {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}
</code></pre>
            <p>Any time code running in the sandbox makes a request to <code>“github.com”</code>, the request is proxied through the handler. This allows you to do anything you want on each request, including logging, modifying, or cancelling it. In this case, we’re safely injecting a secret (more on this later). The proxy runs on the same machine as any sandbox, has access to distributed state, and can be easily modified with simple JavaScript.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/RSbCJMOoqz8xLWMnPyijV/520addf1676d195be6258b427881cdde/BLOG-3199_2.png" />
          </figure><p>We’re excited about all the possibilities this adds to Sandboxes, especially around authentication for agents. Before going into details, let’s back up and take a quick tour of traditional forms of auth, and why we think there’s something better.</p>
    <div>
      <h2>Common auth for agentic workloads</h2>
      <a href="#common-auth-for-agentic-workloads">
        
      </a>
    </div>
    <p>The core issue with agentic auth is that we can’t fully trust the workload. While our LLMs aren’t nefarious (at least not yet), we still need to be able to apply protections to ensure they don’t use data inappropriately or take actions they shouldn’t.</p><p>A few common methods exist to provide auth to agents, and each has downsides:</p><p><b>Standard API tokens</b> are the most basic method of authentication, typically injected into applications via environment variables or in mounted secrets files. This is the arguably simplest method, but least secure. You have to trust that the sandbox won’t somehow be compromised or accidentally exfiltrate the token while making a request. Since you can’t fully trust the agent, you’ll need to set up token expiry and rotation, which can be a hassle.</p><p><b>Workload identity tokens</b>, such as OIDC tokens, can solve some of this pain. Rather than granting the agent a token with general permissions, you can grant it a token that attests its identity. Now, rather than the agent having direct access to some service with a token, it can exchange an identity token for a very short-lived access token. The OIDC token can be invalidated after a specific agent’s workflow completes, and expiry is easier to manage. One of the biggest downsides of workload identity tokens is the potential inflexibility of integrations. Many services don’t have first-class support for OIDC, so in order to get working integrations with upstream services, platforms will need to roll their own token-exchanging services. This makes adoption difficult.</p><p><b>Custom proxies </b>provide maximum flexibility, and can be paired with workload identity tokens. If you can pass some or all of your sandbox egress through a trusted piece of code, you can insert whatever rules you need. Maybe the upstream service your agent is communicating with has a bad RBAC story, and it can’t provide granular permissions. No problem, just write the controls and permissions yourself! This is a great option for agents that you need to lock down with granular controls. However, how do you intercept all of a sandbox’s traffic? How do you set up a proxy that is dynamic and easily programmable? How do you proxy traffic efficiently? These aren’t easy problems to solve.</p><p>With those imperfect methods in mind, what does an ideal auth mechanism look like?</p><p>Ideally, it is:</p><ul><li><p><b>Zero trust.</b> No token is ever granted to an untrusted user for any amount of time.</p></li><li><p><b>Simple. </b>Easy to author. Doesn’t involve a complex system of minting, rotating, and decrypting tokens.</p></li><li><p><b>Flexible.</b> We don’t rely on the upstream system to provide the granular access we need. We can apply whatever rules we want.</p></li><li><p><b>Identity-aware.</b> We can identify the sandbox making the call and apply specific rules for it.</p></li><li><p><b>Observable.</b> We can easily gather information about what calls are being made.</p></li><li><p><b>Performant.</b> We aren’t round-tripping to a centralized or slow source of truth.</p></li><li><p><b>Transparent.</b> The sandboxed workload doesn’t have to know about it. Things just work.</p></li><li><p><b>Dynamic.</b> We can change rules on the fly.</p></li></ul><p>We believe outbound Workers for Sandboxes fit the bill on all of these. Let’s see how.</p>
    <div>
      <h2>Outbound Workers in practice</h2>
      <a href="#outbound-workers-in-practice">
        
      </a>
    </div>
    
    <div>
      <h3>Basics: restriction and observability</h3>
      <a href="#basics-restriction-and-observability">
        
      </a>
    </div>
    <p>First, we’ll look at a very basic example: logging requests and denying specific actions.</p><p>In this case, we’ll use the outbound function, which intercepts all outgoing HTTP requests from the sandbox. With a few lines of JavaScript, it’s easy to ensure only GETs are made and log then deny any disallowed methods.</p>
            <pre><code>class MySandboxedApp extends Sandbox {
  static outbound = (req, env, ctx) =&gt; {
    // Deny any non-GET action and log
    if (req.method !== 'GET') {
      console.log(`Container making ${req.method} request to: ${req.url}`);
      return new Response('Not Allowed', { status: 405, statusText: 'Method Not Allowed'});
    }

    // Proceed if it is a GET request
    return fetch(req);
  };
}
</code></pre>
            <p>This proxy runs on Workers and runs on the same machine as the sandboxed VM. Workers were built for quick response times, often sitting in front of cached CDN traffic, so additional latency is extremely minimal.</p><p>Because this is running on Workers, we get observability out of the box. You can view logs and outbound requests <a href="https://developers.cloudflare.com/workers/observability/logs/workers-logs/"><u>in the Workers dashboard</u></a> or <a href="https://developers.cloudflare.com/workers/observability/logs/logpush/"><u>export them</u></a> to your application performance monitoring tool of choice.</p>
    <div>
      <h3>Zero trust credential injection</h3>
      <a href="#zero-trust-credential-injection">
        
      </a>
    </div>
    <p>How would we use this to enforce a <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/"><u>zero trust environment</u></a> for our agent? Let’s imagine we want to make a request to a private GitHub instance, but we never want our LLM to access a private token.</p><p>We can use <code>outboundByHost</code> to define functions for specific domains or IPs. In this case, we’ll inject a protected credential if the domain is “my-internal-vcs.dev”. The sandboxed agent <i>never has access</i> to these credentials.</p>
            <pre><code>class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "my-internal-vcs.dev": (request, env, ctx) =&gt; {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}
</code></pre>
            <p>It is also easy to conditionalize the response based on the identity of the container. You don’t have to inject the same tokens for every sandbox instance.</p>
            <pre><code> static outboundByHost = {
  "my-internal-vcs.dev": (request, env, ctx) =&gt; {
    // note: KV is encrypted at rest and in transit
    const authKey = await env.KEYS.get(ctx.containerId);

    const requestWithAuth = new Request(request);
    requestWithAuth.headers.set("x-auth-token", authKey);
    return fetch(requestWithAuth);
  }
}
</code></pre>
            
    <div>
      <h3>Using the Cloudflare Developer Platform</h3>
      <a href="#using-the-cloudflare-developer-platform">
        
      </a>
    </div>
    <p>As you may have noticed in the last example, another major advantage of using outbound Workers is that it makes integration into the Workers ecosystem easier. Previously, if a user wanted to access <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2</u></a>, they would have to inject an R2 credential, then make a call from their container to the public R2 API. Same for <a href="https://developers.cloudflare.com/kv/"><u>KV</u></a>, <a href="https://developers.cloudflare.com/agents/"><u>Agents</u></a>, other <a href="https://developers.cloudflare.com/containers/"><u>Containers</u></a>, other <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/"><u>Worker services</u></a>, <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>etc</u></a>.</p><p>Now, you just call <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>any binding</u></a> from your outbound Workers.</p>
            <pre><code>class MySandboxedApp extends Sandbox {
  static outboundByHost = {
    "my.kv": async (req, env, ctx) =&gt; {
      const key = keyFromReq(req);
      const myResult = await env.KV.get(key);
      return new Response(myResult);
    },
    "objects.cf": async (req, env, ctx) =&gt; {
      const prefix = ctx.containerId
      const path = pathFromRequest(req);
      const object = await env.R2.get(`${prefix}/${path}`);
      const myResult = await env.KV.get(key);
      return new Response(myResult);
    },
  };
}
</code></pre>
            <p>Rather than parsing tokens and setting up policies, we can easily conditionalize access with code and whatever logic we want. In the R2 example, we also were able to use the sandbox’s ID to further scope access with ease.</p>
    <div>
      <h3>Making controls dynamic</h3>
      <a href="#making-controls-dynamic">
        
      </a>
    </div>
    <p>Networking control should also be dynamic. On many platforms, config for Container and VM networking is static, looking something like this:</p>
            <pre><code>{
  defaultEgress: "block",
  allowedDomains: ["github.com", "npmjs.org"]
}
</code></pre>
            <p>This is better than nothing, but we can do better. For many sandboxes, we might want to apply a policy on start, but then override it with another once specific operations have been performed.</p><p>For instance, we can boot a sandbox, grab our dependencies via NPM and Github, and then lock down egress after that. This ensures that we open up the network for as little time as possible.</p><p>To achieve this, we can use <code>outboundHandlers</code>, which allows us to define arbitrary outbound handlers that can be applied programmatically using the <code>setOutboundHandler</code> method. Each of these also takes params, allowing you to customize behavior from code. In this case, we will allow some hostnames with the custom “<code>allowHosts</code>” policy, then turn off HTTP. </p>
            <pre><code>class MySandboxedApp extends Sandbox {
  static outboundHandlers = {
    async allowHosts(req, env, { params }) {
     const url = new URL(request.url);
     const allowedHostname = params.allowedHostnames.includes(url.hostname);

      if (allowedHostname) {
        return await fetch(newRequest);
      } else {
        return new Response(null, { status: 403, statusText: "Forbidden" });
      }
    }
    
    async noHttp(req) {
      return new Response(null, { status: 403, statusText: "Forbidden" });
    }
  }
}

async setUpSandboxes(req, env) {
  const sandbox = await env.SANDBOX.getByName(userId);
  await sandbox.setOutboundHandler("allowHosts", {
    allowedHostnames: ["github.com", "npmjs.org"]
  });
  await sandbox.gitClone(userRepoURL)
  await sandbox.exec("npm install")
  await sandbox.setOutboundHandler("noHttp");
}
</code></pre>
            <p>This could be extended even further. Your agent might ask the end user a question like “Do you want to allow POST requests to <code>cloudflare.com</code>?” based on whatever tools it needs at that time. With dynamic outbound Workers, you can easily modify the sandbox rules on the fly to provide this level of control.</p>
    <div>
      <h2>TLS support with MITM Proxying</h2>
      <a href="#tls-support-with-mitm-proxying">
        
      </a>
    </div>
    <p>To do anything useful with requests beyond allowing or denying them, you need to have access to the content. This means that if you’re making HTTPS requests, they need to be decrypted by the Workers proxy.</p><p>To achieve this, a unique ephemeral certificate authority (CA) and private key are created for each Sandbox instance, and the CA is placed into the sandbox. By default, sandbox instances will trust this CA, while standard container instances can opt into trusting it, for instance by calling <code>sudo update-ca-certificates</code>.</p>
            <pre><code>export class MyContainer extends Container {
  interceptHttps = true;
}

MyContainer.outbound = (req, env, ctx) =&gt; {
  // All HTTP(S) requests will trigger this hook.
  return fetch(req);
};

</code></pre>
            <p>TLS traffic is proxied by a Cloudflare isolated network process by performing a TLS handshake. It creates a leaf CA from an ephemeral and unique private key and uses the SNI extracted in the ClientHello. It will then invoke in the same machine the  configured Worker to handle the HTTPS request.</p><p>Our ephemeral private key and CA will never leave our container runtime sidecar process, and is never shared across other container sidecar processes.</p><p>With this in place, outbound Workers act as a truly transparent proxy. The sandbox doesn't need any awareness of specific protocols or domains — all HTTP and HTTPS traffic flows through the outbound handler for filtering or modification.</p>
    <div>
      <h2>Under the hood</h2>
      <a href="#under-the-hood">
        
      </a>
    </div>
    <p>To enable the functionality shown above in both <a href="https://github.com/cloudflare/containers"><code><u>Container</u></code></a> and <a href="https://github.com/cloudflare/sandbox-sdk"><code><u>Sandbox</u></code></a>, we added new methods to the <a href="https://developers.cloudflare.com/durable-objects/api/container/"><code><u>ctx.container</u></code></a> object: <code>interceptOutboundHttp and interceptOutboundHttps</code>, which intercept outgoing requests on specific hostnames (with basic glob matching), IP ranges, and it can be used to intercept all outbound requests. These methods are called with a <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/rpc/"><u>WorkerEntrypoint</u></a>, which gets set up as the front door to the outbound Worker.</p>
            <pre><code>export class MyWorker extends WorkerEntrypoint {
 fetch() {
   return new Response(this.ctx.props.message);
 }
}

// ... inside your container DurableObject ...
this.ctx.container.start({ enableInternet: false });
const outboundWorker = this.ctx.exports.MyWorker({ props: { message: 'hello' } });
await this.ctx.container.interceptOutboundHttp('15.0.0.1:80', outboundWorker);

// From now on, all HTTP requests to 15.0.0.1:80 return "hello"
await this.waitForContainerToBeHealthy();

// You can decide to return another message now...
const secondOutboundWorker = this.ctx.exports.MyWorker({ props: { message: 'switcheroo' } });
await this.ctx.container.interceptOutboundHttp('15.0.0.1:80', secondOutboundWorker);
// all HTTP requests to 15.0.0.1 now show "switcheroo", even on connections that were
// open before this interceptOutboundHttp

// You can even set hostnames, CIDRs, for both IPv4 and IPv6
await this.ctx.container.interceptOutboundHttp('example.com', secondOutboundWorker);
await this.ctx.container.interceptOutboundHttp('*.example.com', secondOutboundWorker);
await this.ctx.container.interceptOutboundHttp('123.123.123.123/23', secoundOutboundWorker);</code></pre>
            <p>All proxying to Workers happens locally on the same machine that runs the sandbox VM. Even though communication between container and Worker is “authless”, it is secure.</p><p>These methods can be called at any time, before or after starting the container, even while connections are still open. Connections that send multiple HTTP requests will automatically pick up a new entrypoint, so updating outbound Workers will not break existing TCP connections or interrupt HTTP requests.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/21utJcR5UpS7c0YNj8Y385/4a47d83acad42c5708ca23a7b04342f3/BLOG-3199_3.png" />
          </figure><p>Local development with <a href="https://developers.cloudflare.com/workers/wrangler/commands/#dev"><u>wrangler dev</u></a> also has support for egress interception. To make it possible, we automatically spawn a sidecar process inside the local container’s network namespace. We called this sidecar component <a href="https://github.com/cloudflare/proxy-everything"><i><u>proxy-everything</u></i></a>. Once <i>proxy-everything </i>is attached, it applies the appropriate TPROXY nftable rules, routing matching traffic from the local Container to <a href="https://github.com/cloudflare/workerd"><u>workerd</u></a>, Cloudflare’s open source JavaScript runtime, which runs the outbound Worker. This allows the local development experience to mirror what happens in prod, so testing and development remain simple.</p>
    <div>
      <h2>Giving outbound Workers a try</h2>
      <a href="#giving-outbound-workers-a-try">
        
      </a>
    </div>
    <p>If you haven’t tried Cloudflare Sandboxes, check out the <a href="https://developers.cloudflare.com/sandbox/get-started/"><u>Getting Started guide</u></a>. If you are a current user of <a href="https://developers.cloudflare.com/containers/"><u>Containers</u></a> or <a href="https://developers.cloudflare.com/sandbox/"><u>Sandboxes</u></a>, start using outbound Workers now by <a href="https://developers.cloudflare.com/containers/platform-details/outbound-traffic/"><u>reading the documentation</u></a> and upgrading to <code>@cloudflare/containers@0.3.0</code> or <code>@cloudflare/sandbox@0.8.9</code>.</p> ]]></content:encoded>
            <category><![CDATA[Containers]]></category>
            <category><![CDATA[Sandbox]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <guid isPermaLink="false">3j8ROSAUomMNPrmru3f2U9</guid>
            <dc:creator>Mike Nomitch</dc:creator>
            <dc:creator>Gabi Villalonga Simón</dc:creator>
        </item>
        <item>
            <title><![CDATA[Welcome to Agents Week]]></title>
            <link>https://blog.cloudflare.com/welcome-to-agents-week/</link>
            <pubDate>Sun, 12 Apr 2026 17:01:05 GMT</pubDate>
            <description><![CDATA[ Cloudflare's mission has always been to help build a better Internet. Sometimes that means building for the Internet as it exists. Sometimes it means building for the Internet as it's about to become. 

This week, we're kicking off Agents Week, dedicated to what comes next.
 ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare's mission has always been to help build a better Internet. Sometimes that means building for the Internet as it exists. Sometimes it means building for the Internet as it's about to become. </p><p>Today, we're kicking off Agents Week, dedicated to building the Internet for what comes next.</p>
    <div>
      <h2>The Internet wasn't built for the age of AI. Neither was the cloud.</h2>
      <a href="#the-internet-wasnt-built-for-the-age-of-ai-neither-was-the-cloud">
        
      </a>
    </div>
    <p>The cloud, as we know it, was a product of the last major technological paradigm shift: smartphones.</p><p>When smartphones put the Internet in everyone's pocket, they didn't just add users — they changed the nature of what it meant to be online. Always connected, always expecting an instant response. Applications had to handle an order of magnitude more users, and the infrastructure powering them had to evolve.</p><p>The approach the industry converged on was straightforward: more users, more copies of your application. As applications grew in complexity, teams broke them into smaller pieces — microservices — so each team could control its own destiny. But the core principle stayed the same: a finite number of applications, each serving many users. Scale meant more copies.</p><p>Kubernetes and containers became the default. They made it easy to spin up instances, load balance, and tear down what you didn't need. Under this one-to-many model, a single instance could serve many users, and even as user counts grew into the billions, the number of things you had to manage stayed finite.</p><p>Agents break this.</p>
    <div>
      <h2>One user, one agent, one task</h2>
      <a href="#one-user-one-agent-one-task">
        
      </a>
    </div>
    <p>Unlike every application that came before them, agents are one-to-one. Each agent is a unique instance. Serving one user, running one task. Where a traditional application follows the same execution path regardless of who's using it, an agent requires its own execution environment: one where the LLM dictates the code path, calls tools dynamically, adjusts its approach, and persists until the task is done.</p><p>Think of it as the difference between a restaurant and a personal chef. A restaurant has a menu — a fixed set of options — and a kitchen optimized to churn them out at volume. That's most applications today. An agent is more like a personal chef who asks: what do you want to eat? They might need entirely different ingredients, utensils, or techniques each time. You can't run a personal-chef service out of the same kitchen setup you'd use for a restaurant.</p><p>Over the past year, we've seen agents take off, with coding agents leading the way — not surprisingly, since developers tend to be early adopters. The way most coding agents work today is by spinning up a container to give the LLM what it needs: a filesystem, git, bash, and the ability to run arbitrary binaries.</p><p>But coding agents are just the beginning. Tools like Claude Cowork are already making agents accessible to less technical users. Once agents move beyond developers and into the hands of everyone — administrative assistants, research analysts, customer service reps, personal planners — the scale math gets sobering fast.</p>
    <div>
      <h2>The math on scaling agents to the masses</h2>
      <a href="#the-math-on-scaling-agents-to-the-masses">
        
      </a>
    </div>
    <p>If the more than 100 million knowledge workers in the US each used an agentic assistant at ~15% concurrency, you'd need capacity for approximately 24 million simultaneous sessions. At 25–50 users per CPU, that's somewhere between 500K and 1M server CPUs — just for the US, with one agent per person.</p><p>Now picture each person running several agents in parallel. Now picture the rest of the world with more than 1 billion knowledge workers. We're not a little short on compute. We're orders of magnitude away.</p><p>So how do we close that gap?</p>
    <div>
      <h2>Infrastructure built for agents</h2>
      <a href="#infrastructure-built-for-agents">
        
      </a>
    </div>
    <p>Eight years ago, we launched <a href="https://workers.cloudflare.com/"><u>Workers</u></a> — the beginning of our developer platform, and a bet on containerless, serverless compute. The motivation at the time was practical: we needed lightweight compute without cold-starts for customers who depended on Cloudflare for speed. Built on V8 isolates rather than containers, Workers turned out to be an order of magnitude more efficient — faster to start, cheaper to run, and natively suited to the "spin up, execute, tear down" pattern.</p><p>What we didn't anticipate was how well this model would map to the age of agents.</p><p>Where containers give every agent a full commercial kitchen: bolted-down appliances, walk-in fridges, the works, whether the agent needs them or not, isolates, on the other hand, give the personal chef exactly the counter space, the burner, and the knife they need for this particular meal. Provisioned in milliseconds. Cleaned up the moment the dish is served.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3UM1NukO0Ho4lQAYk5CoU8/30e1376c4fe61e86204a0de92ae4612b/BLOG-3238_2.png" />
          </figure><p>In a world where we need to support not thousands of long-running applications, but billions of ephemeral, single-purpose execution environments — isolates are the right primitive. </p><p>Each one starts in milliseconds. Each one is securely sandboxed. And you can run orders of magnitude more of them on the same hardware compared to containers.</p><p>Just a few weeks ago, we took this further with the <a href="https://blog.cloudflare.com/dynamic-workers/"><u>Dynamic Workers open beta</u></a>: execution environments spun up at runtime, on demand. An isolate takes a few milliseconds to start and uses a few megabytes of memory. That's roughly 100x faster and up to 100x more memory-efficient than a container. </p><p>You can start a new one for every single request, run a snippet of code, and throw it away — at a scale of millions per second.</p><p>For agents to move beyond early adopters and into everyone's hands, they also have to be affordable. Running each agent in its own container is expensive enough that agentic tools today are mostly limited to coding assistants for engineers who can justify the cost. <b>Isolates, by running orders of magnitude more efficiently, are what make per-unit economics viable at the scale agents require.</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6iiD7zACxMNEDdvJWM6zbo/2261c320f3be3cd2fa6ef34593a1ee09/BLOG-3238_3.png" />
          </figure>
    <div>
      <h2>The horseless carriage phase</h2>
      <a href="#the-horseless-carriage-phase">
        
      </a>
    </div>
    <p>While it’s critical to build the right foundation for the future, we’re not there yet. And every paradigm shift has a period where we try to make the new thing work within the old model. The first cars were called "horseless carriages." The first websites were digital brochures. The first mobile apps were shrunken desktop UIs. We're in that phase now with agents.</p><p>You can see it everywhere. </p><p>We're giving agents headless browsers to navigate websites designed for human eyes, when what they need are structured protocols like MCP to discover and invoke services directly. </p><p>Many early MCP servers are thin wrappers around existing REST APIs — same CRUD operations, new protocol — when LLMs are actually far better at writing code than making sequential tool calls. </p><p>We're using CAPTCHAs and behavioral fingerprinting to verify the thing on the other end of a request, when increasingly that thing is an agent acting on someone's behalf — and the right question isn't "are you human?" but "which agent are you, who authorized you, and what are you allowed to do?"</p><p>We're spinning up full containers for agents that just need to make a few API calls and return a result.</p><p>These are just a few examples, but none of this is surprising. It's what transitions look like.</p>
    <div>
      <h2>Building for both</h2>
      <a href="#building-for-both">
        
      </a>
    </div>
    <p>The Internet is always somewhere between two eras. IPv6 is objectively better than IPv4, but dropping IPv4 support would break half the Internet. HTTP/2 and HTTP/3 coexist. TLS 1.2 still hasn't fully given way to 1.3. The better technology exists, the old technology persists, and the job of infrastructure is to bridge both.</p><p>Cloudflare has always been in the business of bridging these transitions. The shift to agents is no different.</p><p>Coding agents genuinely need containers — a filesystem, git, bash, arbitrary binary execution. That's not going away. This week, our container-based sandbox environments are going GA, because we're committed to making them the best they can be. We're going deeper on browser rendering for agents, because there will be a long tail of services that don't yet speak MCP, and agents will still need to interact with them. These aren't stopgaps — they're part of a complete platform.</p><p>But we're also building what comes next: the isolates, the protocols, and the identity models that agents actually need. Our job is to make sure you don't have to choose between what works today and what's right for tomorrow.</p>
    <div>
      <h2>Security in the model, not around it</h2>
      <a href="#security-in-the-model-not-around-it">
        
      </a>
    </div>
    <p>If agents are going to handle our professional and personal tasks — reading our email, operating on our code, interacting with our financial services — then security has to be built into the execution model, not layered on after the fact.</p><p>CISOs have been the first to confront this. The productivity gains from putting agents in everyone's hands are real, but today, most agent deployments are fraught with risk: prompt injection, data exfiltration, unauthorized API access, opaque tool usage. </p><p>A developer's vibe-coding agent needs access to repositories and deployment pipelines. An enterprise's customer service agent needs access to internal APIs and user data. In both cases, securing the environment today means stitching together credentials, network policies, and access controls that were never designed for autonomous software.</p><p>Cloudflare has been building two platforms in parallel: our developer platform, for people who build applications, and our zero trust platform, for organizations that need to secure access. For a while, these served distinct audiences. </p><p>But "how do I build this agent?" and "how do I make sure it's safe?" are increasingly the same question. We're bringing these platforms together so that all of this is native to how agents run, not a separate layer you bolt on.</p>
    <div>
      <h2>Agents that follow the rules</h2>
      <a href="#agents-that-follow-the-rules">
        
      </a>
    </div>
    <p>There's another dimension to the agent era that goes beyond compute and security: economics and governance.</p><p>When agents interact with the Internet on our behalf — reading articles, consuming APIs, accessing services — there needs to be a way for the people and organizations who create that content and run those services to set terms and get paid. Today, the web's economic model is built around human attention: ads, paywalls, subscriptions. </p><p>Agents don't have attention (well, not that <a href="https://arxiv.org/abs/1706.03762"><u>kind of attention</u></a>). They don't see ads. They don't click through cookie banners.</p><p>If we want an Internet where agents can operate freely <i>and</i> where publishers, content creators, and service providers are fairly compensated, we need new infrastructure for it. We’re building tools that make it easy for publishers and content owners to set and enforce policies for how agents interact with their content.</p><p>Building a better Internet has always meant making sure it works for everyone — not just the people building the technology, but the people whose work and creativity make the Internet worth using. That doesn't change in the age of agents. It becomes more important.</p>
    <div>
      <h2>The platform for developers and agents</h2>
      <a href="#the-platform-for-developers-and-agents">
        
      </a>
    </div>
    <p>Our vision for the developer platform has always been to provide a comprehensive platform that just works: from experiment, to MVP, to scaling to millions of users. But providing the primitives is only part of the equation. A great platform also has to think about how everything works together, and how it integrates into your development flow.</p><p>That job is evolving. It used to be purely about developer experience, making it easy for humans to build, test, and ship. Increasingly, it's also about helping agents help humans, and making the platform work not just for the people building agents, but for the agents themselves. Can an agent find the latest most up-to- date best practices? How easily can it discover and invoke the tools and CLIs it needs? How seamlessly can it move from writing code to deploying it?</p><p>This week, we're shipping improvements across both dimensions — making Cloudflare better for the humans building on it and for the agents running on it.</p>
    <div>
      <h2>Building for the future is a team sport</h2>
      <a href="#building-for-the-future-is-a-team-sport">
        
      </a>
    </div>
    <p>Building for the future is not something we can do alone. Every major Internet transition from HTTP/1.1 to HTTP/2 and HTTP/3, from TLS 1.2 to 1.3 — has required the industry to converge on shared standards. The shift to agents will be no different.</p><p>Cloudflare has a long history of contributing to and helping push forward the standards that make the Internet work. We've been <a href="https://blog.cloudflare.com/tag/ietf/"><u>deeply involved in the IETF</u></a> for over a decade, helping develop and deploy protocols like QUIC, TLS 1.3, and Encrypted Client Hello. We were a founding member of WinterTC, the ECMA technical committee for JavaScript runtime interoperability. We open-sourced the Workers runtime itself, because we believe the foundation should be open.</p><p>We're bringing the same approach to the agentic era. We're excited to be part of the Linux Foundation and AAIF, and to help support and push forward standards like MCP that will be foundational for the agentic future. Since Anthropic introduced MCP, we've worked closely with them to build the infrastructure for remote MCP servers, open-sourced our own implementations, and invested in making the protocol practical at scale. </p><p>Last year, alongside Coinbase, we <a href="https://blog.cloudflare.com/x402/"><u>co-founded the x402 Foundation</u></a>, an open, neutral standard that revives the long-dormant HTTP 402 status code to give agents a native way to pay for the services and content they consume. </p><p>Agent identity, authorization, payment, safety: these all need open standards that no single company can define alone.</p>
    <div>
      <h2>Stay tuned</h2>
      <a href="#stay-tuned">
        
      </a>
    </div>
    <p>This week, we're making announcements across every dimension of the agent stack: compute, connectivity, security, identity, economics, and developer experience.</p><p>The Internet wasn't built for AI. The cloud wasn't built for agents. But Cloudflare has always been about helping build a better Internet — and what "better" means changes with each era. This is the era of agents. This week, <a href="https://blog.cloudflare.com/"><u>follow along</u></a> and we'll show you what we're building for it.</p> ]]></content:encoded>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Serverless]]></category>
            <guid isPermaLink="false">4dZj0C0XnS9BJQnxy2QkzY</guid>
            <dc:creator>Rita Kozlov</dc:creator>
            <dc:creator>Dane Knecht</dc:creator>
        </item>
        <item>
            <title><![CDATA[500 Tbps of capacity: 16 years of scaling our global network]]></title>
            <link>https://blog.cloudflare.com/500-tbps-of-capacity/</link>
            <pubDate>Fri, 10 Apr 2026 18:00:05 GMT</pubDate>
            <description><![CDATA[ Cloudflare’s global network has officially crossed 500 Tbps of external capacity, enough to route more than 20% of the web and absorb the largest DDoS attacks ever recorded. ]]></description>
            <content:encoded><![CDATA[ <p><sup><i>Cloudflare’s global network and backbone in 2026.</i></sup> </p><p>Cloudflare's network recently passed a major milestone: we crossed 500 terabits per second (Tbps) of external capacity.</p><p>When we say 500 Tbps, we mean total provisioned external interconnection capacity: the sum of every port facing a transit provider, private peering partner, Internet exchange, or <a href="https://blog.cloudflare.com/announcing-express-cni/"><u>Cloudflare Network Interconnect</u></a> (CNI) port across all 330+ cities. This is not peak traffic. On any given day, our peak utilization is a fraction of that number. (The rest is our DDoS budget.)</p><p>It’s a long way from where we started. In 2010, we launched from a small office above a nail salon in Palo Alto, with a single transit provider and a reverse proxy you could set up by <a href="https://blog.cloudflare.com/whats-the-story-behind-the-names-of-cloudflares-name-servers/"><u>changing two nameservers</u></a>.</p>
    <div>
      <h3>The early days of transit and peering</h3>
      <a href="#the-early-days-of-transit-and-peering">
        
      </a>
    </div>
    <p>Our first transit provider was nLayer Communications, a network most people now know as GTT. nLayer gave us our first capacity and our first hands-on company experience in peering relationships and the careful balance between cost and performance.</p><p>From there, we grew <a href="https://blog.cloudflare.com/and-then-there-were-threecloudflares-new-data/"><u>city</u></a> by <a href="https://blog.cloudflare.com/luxembourg-chisinau/"><u>city</u></a>: Chicago, Ashburn, San Jose, Amsterdam, Tokyo. Each new data center meant negotiating colocation contracts, pulling fiber, racking servers, and establishing peering through <a href="https://blog.cloudflare.com/think-global-peer-local-peer-with-cloudflare-at-100-internet-exchange-points/"><u>Internet exchanges</u></a>. The Internet isn't actually a cloud, of course. It is a collection of specific rooms full of cables, and we spent years learning the nuances of every one of them.</p><p>Not every city was a straightforward deployment, having to deal with missing hardware, customs strikes, and even <a href="https://blog.cloudflare.com/stories-from-our-recent-global-data-center-upgrade/"><u>dental floss</u></a>. In a single month in 2018, we opened up in 31 cities in 24 days: from <a href="https://blog.cloudflare.com/kathmandu/"><u>Kathmandu</u></a> and <a href="https://blog.cloudflare.com/baghdad/"><u>Baghdad</u></a> to <a href="https://blog.cloudflare.com/reykjavik-cloudflares-northernmost-location/"><u>Reykjavík</u></a> and <a href="https://blog.cloudflare.com/luxembourg-chisinau/"><u>Chișinău</u></a>. When we opened our 127th data center in <a href="https://blog.cloudflare.com/macau/"><u>Macau</u></a>, we were protecting 7 million Internet properties. Today, with data centers in 330+ cities, we protect more than 20% of the web.</p>
    <div>
      <h3>When the network became the security layer </h3>
      <a href="#when-the-network-became-the-security-layer">
        
      </a>
    </div>
    <p>As our footprint grew, customers asked for more than just website caching. They needed to protect employees, replace aging Multiprotocol Label Switching (<a href="https://www.cloudflare.com/learning/network-layer/what-is-mpls/"><u>MPLS</u></a>) circuits, and <a href="https://blog.cloudflare.com/mpls-to-zerotrust/"><u>secure entire enterprise networks</u></a>. Instead of traditional appliances, <a href="https://blog.cloudflare.com/magic-transit-network-functions/"><u>we built systems</u></a> to establish secure tunnels to private subnets and advertise enterprise IP space directly from our global network via BGP.</p><p>The scale of threats grew in parallel. In 2025, we mitigated a 31.4 Tbps <a href="https://blog.cloudflare.com/ddos-threat-report-2025-q4/"><u>DDoS attack</u></a> lasting 35 seconds. The source was the Aisuru-Kimwolf botnet, including many infected Android TVs. It was one of over 5,000 attacks we blocked that day. No engineer was paged.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4TRCTdrET5JTpz0BaxSdF4/4623f0953417f0998f6810dd048773ff/BLOG-3267_2.png" />
          </figure><p>A decade ago, an attack of that magnitude would have required nation-state resources to counter. Today, our network handles it in seconds without human intervention. That is what operating at a 500 Tbps scale requires: moving the intelligence to every server in our network so the network can <a href="https://blog.cloudflare.com/deep-dive-cloudflare-autonomous-edge-ddos-protection/"><u>defend itself</u></a>.</p>
    <div>
      <h3>How our network responds to an attack</h3>
      <a href="#how-our-network-responds-to-an-attack">
        
      </a>
    </div>
    <p>Here is what actually happens when an attack hits our network. Packets arrive at the network interface card (NIC) and immediately enter an eXpress Data Path (<a href="https://en.wikipedia.org/wiki/Express_Data_Path"><u>XDP</u></a>) program chain managed by <i>xdpd</i>, running in driver mode. Among the first programs in that chain is <i>l4drop</i>, which evaluates each packet against mitigation rules in extended Berkeley Packet Filter (eBPF). Those rules are generated by <i>dosd</i>, our denial of service daemon, which runs on every server in our fleet. Each <i>dosd</i> instance samples incoming traffic, builds a table of the heaviest hitters it sees, and broadcasts that table to every other instance in the colo. The result is a shared colo-wide view of traffic, and because every server works from the same data, they reach the same mitigation decision.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6tGvgIFrQkO0BqATmbWVst/c0b8dff1379a9b1437ec784f10876a2c/BLOG-3267_3.png" />
          </figure><p>When <i>dosd</i> detects an attack pattern, the resulting rule is applied locally via <i>l4drop</i> and propagates globally via Quicksilver, our distributed key-value (KV) store, reaching every server in every data center within seconds. Only after surviving <i>l4drop</i> do packets reach Unimog, our Layer 4 (L4) load balancer, which distributes them across healthy servers in the data center. For Magic Transit customers routing enterprise network traffic through our edge, <i>flowtrackd</i> adds a further layer of stateful TCP inspection, tracking connection state and dropping packets that don't belong to legitimate flows.</p><p>The 31.4 Tbps attack we mitigated followed exactly this path. No traffic was backhauled to a centralized scrubbing center. No human intervened. Every server in the targeted data centers independently recognized the attack and began dropping malicious packets at line rate, before those packets consumed a single CPU cycle of application processing. The software is only half the story: none of it works if the ports aren't there to absorb the traffic in the first place.</p>
    <div>
      <h3>A distributed developer platform</h3>
      <a href="#a-distributed-developer-platform">
        
      </a>
    </div>
    <p>Running code on every server in our network was a natural consequence of controlling the full stack. If we already ran eBPF programs on every machine to drop attack traffic, we could run customer application code there too. That insight became <a href="https://blog.cloudflare.com/code-everywhere-cloudflare-workers/"><u>Workers</u></a>, and later <a href="https://blog.cloudflare.com/introducing-workers-kv/"><u>KV</u></a> and <a href="https://blog.cloudflare.com/introducing-workers-durable-objects/"><u>Durable Objects</u></a>.</p><p>Our developer platform runs in every city we operate in, not in a handful of cloud regions. In 2025, we added <a href="https://blog.cloudflare.com/cloudflare-containers-coming-2025/"><u>Containers</u></a> to Workers, so heavier workloads can run at the edge too. V8 isolates and custom filesystem layers minimize cold starts. Your code runs where your users are, on the same servers that drop attack traffic at line rate via l4drop. Attack traffic is dropped before it reaches the network stack. Your application never sees it.</p>
    <div>
      <h3>Forward-looking protocols: IPv6, RPKI, ASPA</h3>
      <a href="#forward-looking-protocols-ipv6-rpki-aspa">
        
      </a>
    </div>
    <p>We were early adopters of <a href="https://blog.cloudflare.com/introducing-cloudflares-automatic-ipv6-gatewa/"><u>IPv6</u></a> and Resource Public Key Infrastructure (<a href="https://blog.cloudflare.com/rpki/"><u>RPKI</u></a>). <a href="https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024/"><u>BGP hijacks</u></a> cause real outages and security breaches. RPKI allows us to drop invalid routes from peers, ensuring traffic goes where it is supposed to. We sign Route Origin Authorizations (ROAs) for our prefixes and enforce Route Origin Validation on ingress. We reject RPKI-invalid routes, even when that occasionally breaks reachability to networks with misconfigured ROAs.</p><p>Autonomous System Provider Authorization (<a href="https://blog.cloudflare.com/aspa-secure-internet/"><u>ASPA</u></a>) is next. RPKI validates who owns a prefix. ASPA validates the path it took to get here. RPKI is a passport check at the destination, confirming the right owner, while ASPA is a flight manifest check: it verifies every network the traffic passed through. A route leak is like a passenger who boarded in the wrong city; RPKI would not catch it, but ASPA will.</p><p>Current ecosystem adoption for ASPA looks like RPKI did in 2015. We were one of the first networks to deploy RPKI at scale, and today, <a href="https://radar.cloudflare.com/routing"><u>867,000 prefixes</u></a> in the global routing table have valid RPKI certificates, up from near zero a decade ago. At our scale, the protocols we choose have real consequences for the broader Internet. We push for adoption early because waiting means more hijacks and more leaks in the meantime.</p>
    <div>
      <h3>AI agents and the evolving Internet</h3>
      <a href="#ai-agents-and-the-evolving-internet">
        
      </a>
    </div>
    <p>AI has changed what it means to have a presence on the web. For most of the Internet’s history, traffic was human-generated, by people clicking links in browsers. Today, AI crawlers, model training pipelines, and autonomous agents <a href="https://blog.cloudflare.com/radar-2025-year-in-review/"><u>now account</u></a> for more than 4% of all HTML requests across our network, comparable to Googlebot itself. "User action" crawling, where an AI visits a page because a human asked it a question, grew over 15x in 2025 alone.</p><p>AI crawlers behave differently than browsers at the infrastructure level. Browsers load a page and stop. Crawlers instead fetch every linked resource at maximum throughput with no pause between requests. At our scale, distinguishing legitimate AI crawling from actual attacks is a real engineering problem. Our <a href="https://blog.cloudflare.com/introducing-ai-crawl-control/"><u>detection systems</u></a> use a combination of verified bot IP ranges, TLS fingerprinting, behavioral analysis, and robots.txt compliance signals to make that distinction, and to give site owners the data they need to <a href="https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/"><u>decide which crawlers to allow</u></a>.</p><p>At the TLS layer, for example, a legitimate browser presents a ClientHello with a predictable set of cipher suites, extensions, and ordering that matches its declared User-Agent. A crawler spoofing that User-Agent but using a stripped-down TLS library will present a different fingerprint, and that mismatch is one of the signals our systems use to classify the request before it reaches the origin.</p>
    <div>
      <h3>Help us build the next 500 Tbps</h3>
      <a href="#help-us-build-the-next-500-tbps">
        
      </a>
    </div>
    <p>What started above a nail salon in Palo Alto is now a 500 Tbps network in 330+ cities across 125+ countries, where every server runs our developer platform and security services, not just cache. That is sixteen years of architectural decisions compounding, and we owe it to the 13,000+ networks and partners who peer with us. We are not done.</p><p>If you are a network operator, peer with us. Our peering policy and interconnection details are on <a href="https://peeringdb.com/asn/13335"><u>PeeringDB</u></a>. If you are interested in embedding Cloudflare infrastructure directly within your network, reach out to our team at <a href="#"><u>epp@cloudflare.com</u></a>, to join the Edge Partner Program.</p> ]]></content:encoded>
            <category><![CDATA[Network Services]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Peering]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">14NmmkLAEg9hFIbB9bosqh</guid>
            <dc:creator>Tanner Ryan</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing EmDash — the spiritual successor to WordPress that solves plugin security]]></title>
            <link>https://blog.cloudflare.com/emdash-wordpress/</link>
            <pubDate>Wed, 01 Apr 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[ Today we are launching the beta of EmDash, a full-stack serverless JavaScript CMS built on Astro 6.0. It combines the features of a traditional CMS with modern security, running plugins in sandboxed Worker isolates. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The cost of building software has drastically decreased. We recently <a href="https://blog.cloudflare.com/vinext/"><u>rebuilt Next.js in one week</u></a> using AI coding agents. But for the past two months our agents have been working on an even more ambitious project: rebuilding the WordPress open source project from the ground up.</p><p>WordPress powers <a href="https://w3techs.com/technologies/details/cm-wordpress"><u>over 40% of the Internet</u></a>. It is a massive success that has enabled anyone to be a publisher, and created a global community of WordPress developers. But the WordPress open source project will be 24 years old this year. Hosting a website has changed dramatically during that time. When WordPress was born, AWS EC2 didn’t exist. In the intervening years, that task has gone from renting virtual private servers, to uploading a JavaScript bundle to a globally distributed network at virtually no cost. It’s time to upgrade the most popular CMS on the Internet to take advantage of this change.</p><p>Our name for this new CMS is EmDash. We think of it as the spiritual successor to WordPress. It’s written entirely in TypeScript. It is serverless, but you can run it on your own hardware or any platform you choose. Plugins are securely sandboxed and can run in their own <a href="https://developers.cloudflare.com/workers/reference/how-workers-works/"><u>isolate</u></a>, via <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Workers</u></a>, solving the fundamental security problem with the WordPress plugin architecture. And under the hood, EmDash is powered by <a href="https://astro.build/"><u>Astro</u></a>, the fastest web framework for content-driven websites.</p><p>EmDash is fully open source, MIT licensed, and <a href="https://github.com/emdash-cms/emdash"><u>available on GitHub</u></a>. While EmDash aims to be compatible with WordPress functionality, no WordPress code was used to create EmDash. That allows us to license the open source project under the more permissive MIT license. We hope that allows more developers to adapt, extend, and participate in EmDash’s development.</p><p>You can deploy the EmDash v0.1.0 preview to your own Cloudflare account, or to any Node.js server today as part of our early developer beta:</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/emdash-cms/templates/tree/main/blog-cloudflare"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Or you can try out the admin interface here in the <a href="https://emdashcms.com/"><u>EmDash Playground</u></a>:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/50n8mewREzoxOFq2jDzpT9/6a38dbfbaeec2d21040137e574a935ad/CleanShot_2026-04-01_at_07.45.29_2x.png" />
          </figure>
    <div>
      <h3>What WordPress has accomplished</h3>
      <a href="#what-wordpress-has-accomplished">
        
      </a>
    </div>
    <p>The story of WordPress is a triumph of open source that enabled publishing at a scale never before seen. Few projects have had the same recognisable impact on the generation raised on the Internet. The contributors to WordPress’s core, and its many thousands of plugin and theme developers have built a platform that democratised publishing for millions; many lives and livelihoods being transformed by this ubiquitous software.</p><p>There will always be a place for WordPress, but there is also a lot more space for the world of content publishing to grow. A decade ago, people picking up a keyboard universally learned to publish their blogs with WordPress. Today it’s just as likely that person picks up Astro, or another TypeScript framework to learn and build with. The ecosystem needs an option that empowers a wide audience, in the same way it needed WordPress 23 years ago. </p><p>EmDash is committed to building on what WordPress created: an open source publishing stack that anyone can install and use at little cost, while fixing the core problems that WordPress cannot solve. </p>
    <div>
      <h3>Solving the WordPress plugin security crisis</h3>
      <a href="#solving-the-wordpress-plugin-security-crisis">
        
      </a>
    </div>
    <p>WordPress’ plugin architecture is fundamentally insecure. <a href="https://patchstack.com/whitepaper/state-of-wordpress-security-in-2025/"><u>96% of security issues</u></a> for WordPress sites originate in plugins. In 2025, more high severity vulnerabilities <a href="https://patchstack.com/whitepaper/state-of-wordpress-security-in-2026/"><u>were found in the WordPress ecosystem</u></a> than the previous two years combined.</p><p>Why, after over two decades, is WordPress plugin security so problematic?</p><p>A WordPress plugin is a PHP script that hooks directly into WordPress to add or modify functionality. There is no isolation: a WordPress plugin has direct access to the WordPress site’s database and filesystem. When you install a WordPress plugin, you are trusting it with access to nearly everything, and trusting it to handle every malicious input or edge case perfectly.</p><p>EmDash solves this. In EmDash, each plugin runs in its own isolated sandbox: a <a href="https://developers.cloudflare.com/dynamic-workers/"><u>Dynamic Worker</u></a>. Rather than giving direct access to underlying data, EmDash provides the plugin with <a href="https://blog.cloudflare.com/workers-environment-live-object-bindings/"><u>capabilities via bindings</u></a>, based on what the plugin explicitly declares that it needs in its manifest. This security model has a strict guarantee: an EmDash plugin can only perform the actions explicitly declared in its manifest. You can know and trust upfront, before installing a plugin, exactly what you are granting it permission to do, similar to going through an OAuth flow and granting a 3rd party app a specific set of scoped permissions.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4JDq2oEgwONHL8uUJsrof2/fb2ae5fcacd5371aaab575c35ca2ce2e/image8.png" />
          </figure><p>For example, a plugin that sends an email after a content item gets saved looks like this:</p>
            <pre><code>import { definePlugin } from "emdash";

export default () =&gt;
  definePlugin({
    id: "notify-on-publish",
    version: "1.0.0",
    capabilities: ["read:content", "email:send"],
    hooks: {
      "content:afterSave": async (event, ctx) =&gt; {
        if (event.collection !== "posts" || event.content.status !==    "published") return;

        await ctx.email!.send({
          to: "editors@example.com",
          subject: `New post published: ${event.content.title}`,
          text: `"${event.content.title}" is now live.`,
         });

        ctx.log.info(`Notified editors about ${event.content.id}`);
      },
    },
  });</code></pre>
            <p>This plugin explicitly requests two capabilities: <code>content:afterSave</code> to hook into the content lifecycle, and <code>email:send</code> to access the <code>ctx.email</code> function. It is impossible for the plugin to do anything other than use these capabilities. It has no external network access. If it does need network access, it can specify the exact hostname it needs to talk to, as part of its definition, and be granted only the ability to communicate with a particular hostname.</p><p>And in all cases, because the plugin’s needs are declared statically, upfront, it can always be clear exactly what the plugin is asking for permission to be able to do, at install time. A platform or administrator could define rules for what plugins are or aren’t allowed to be installed by certain groups of users, based on what permissions they request, rather than an allowlist of approved or safe plugins.</p>
    <div>
      <h3>Solving plugin security means solving marketplace lock-in</h3>
      <a href="#solving-plugin-security-means-solving-marketplace-lock-in">
        
      </a>
    </div>
    <p>WordPress plugin security is such a real risk that WordPress.org <a href="https://developer.wordpress.org/plugins/wordpress-org/plugin-developer-faq/#where-do-i-submit-my-plugin"><u>manually reviews and approves each plugin</u></a> in its marketplace. At the time of writing, that review queue is over 800 plugins long, and takes at least two weeks to traverse. The vulnerability surface area of WordPress plugins is so wide that in practice, all parties rely on marketplace reputation, ratings and reviews. And because WordPress plugins run in the same execution context as WordPress itself and are so deeply intertwined with WordPress code, some argue they must carry forward WordPress’ GPL license.</p><p>These realities combine to create a chilling effect on developers building plugins, and on platforms hosting WordPress sites.</p><p>Plugin security is the root of this problem. Marketplace businesses provide trust when parties otherwise cannot easily trust each other. In the case of the WordPress marketplace, the plugin security risk is so large and probable that many of your customers can only reasonably trust your plugin via the marketplace. But in order to be part of the marketplace your code must be licensed in a way that forces you to give it away for free everywhere other than that marketplace. You are locked in.</p><p>EmDash plugins have two important properties that mitigate this marketplace lock-in:</p><ol><li><p><b>Plugins can have any license</b>: they run independently of EmDash and share no code. It’s the plugin author’s choice.</p></li><li><p><b>Plugin code runs independently in a secure sandbox</b>: a plugin can be provided to an EmDash site, and trusted, without the EmDash site ever seeing the code.</p></li></ol><p>The first part is straightforward — as the plugin author, you choose what license you want. The same way you can when publishing to NPM, PyPi, Packagist or any other registry. It’s an open ecosystem for all, and up to the community, not the EmDash project, what license you use for plugins and themes.</p><p>The second part is where EmDash’s plugin architecture breaks free of the centralized marketplace.</p><p>Developers need to rely on a third party marketplace having vetted the plugin far less to be able to make decisions about whether to use or trust it. Consider the example plugin above that sends emails after content is saved; the plugin declares three things:</p><ul><li><p>It only runs on the <code>content:afterSave</code> hook</p></li><li><p>It has the <code>read:content</code> capability</p></li><li><p>It has the <code>email:send</code> capability</p></li></ul><p>The plugin can have tens of thousands of lines of code in it, but unlike a WordPress plugin that has access to everything and can talk to the public Internet, the person adding the plugin knows exactly what access they are granting to it. The clearly defined boundaries allow you to make informed decisions about security risks and to zoom in on more specific risks that relate directly to the capabilities the plugin is given.</p><p>The more that both sites and platforms can trust the security model to provide constraints, the more that sites and platforms can trust plugins, and break free of centralized control of marketplaces and reputation. Put another way: if you trust that food safety is enforced in your city, you’ll be adventurous and try new places. If you can’t trust that there might be a staple in your soup, you’ll be consulting Google before every new place you try, and it’s harder for everyone to open new restaurants.</p>
    <div>
      <h3>Every EmDash site has x402 support built in — charge for access to content</h3>
      <a href="#every-emdash-site-has-x402-support-built-in-charge-for-access-to-content">
        
      </a>
    </div>
    <p>The business model of the web <a href="https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/"><u>is at risk</u></a>, particularly for content creators and publishers. The old way of making content widely accessible, allowing all clients free access in exchange for traffic, breaks when there is no human looking at a site to advertise to, and the client is instead their agent accessing the web on their behalf. Creators need ways to continue to make money in this new world of agents, and to build new kinds of websites that serve what people’s agents need and will pay for. Decades ago a new wave of creators created websites that became great businesses (often using WordPress to power them) and a similar opportunity exists today.</p><p><a href="https://www.x402.org/"><u>x402</u></a> is an open, neutral standard for Internet-native payments. It lets anyone on the Internet easily charge, and any client pay on-demand, on a pay-per-use basis. A client, such as an agent, sends a HTTP request and receives a HTTP 402 Payment Required status code. In response, the client pays for access on-demand, and the server can let the client through to the requested content.</p><p>EmDash has built-in support for x402. This means anyone with an EmDash site can charge for access to their content without requiring subscriptions and with zero engineering work. All you need to do is configure which content should require payment, set how much to charge, and provide a Wallet address. The request/response flow ends up looking like this:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3IKfYGHF6Pgi3jQf1ERRQC/48815ffec3e204f4f2c6f7a40f232a93/image4.png" />
          </figure><p>Every EmDash site has a built-in business model for the AI era.</p>
    <div>
      <h3>Solving scale-to-zero for WordPress hosting platforms</h3>
      <a href="#solving-scale-to-zero-for-wordpress-hosting-platforms">
        
      </a>
    </div>
    <p>WordPress is not serverless: it requires provisioning and managing servers, scaling them up and down like a traditional web application. To maximize performance, and to be able to handle traffic spikes, there’s no avoiding the need to pre-provision instances and run some amount of idle compute, or share resources in ways that limit performance. This is particularly true for sites with content that must be server rendered and cannot be cached.</p><p>EmDash is different: it’s built to run on serverless platforms, and make the most out of the <a href="https://developers.cloudflare.com/workers/reference/how-workers-works/"><u>v8 isolate architecture</u></a> of Cloudflare’s open source runtime <a href="https://github.com/cloudflare/workerd"><u>workerd</u></a>. On an incoming request, the Workers runtime instantly spins up an isolate to execute code and serve a response. It scales back down to zero if there are no requests. And it <a href="https://blog.cloudflare.com/workers-pricing-scale-to-zero/"><u>only bills for CPU time</u></a> (time spent doing actual work).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3yIX0whveiJ7xQ9P20TeyA/84462e6ec58cab27fbd6bf1703efeabc/image7.png" />
          </figure><p>You can run EmDash anywhere, on any Node.js server — but on Cloudflare you can run millions of instances of EmDash using <a href="https://developers.cloudflare.com/cloudflare-for-platforms/"><u>Cloudflare for Platforms</u></a> that each instantly scale fully to zero or up to as many RPS as you need to handle, using the exact same network and runtime that the biggest websites in the world rely on.</p><p>Beyond cost optimizations and performance benefits, we’ve bet on this architecture at Cloudflare in part because we believe in having low cost and free tiers, and that everyone should be able to build websites that scale. We’re excited to help platforms extend the benefits of this architecture to their own customers, both big and small.</p>
    <div>
      <h3>Modern frontend theming and architecture via Astro</h3>
      <a href="#modern-frontend-theming-and-architecture-via-astro">
        
      </a>
    </div>
    <p>EmDash is powered by Astro, the web framework for content-driven websites. To create an EmDash theme, you create an Astro project that includes:</p><ul><li><p><b>Pages</b>: Astro routes for rendering content (homepage, blog posts, archives, etc.)</p></li><li><p><b>Layouts:</b> Shared HTML structure</p></li><li><p><b>Components:</b> Reusable UI elements (navigation, cards, footers)</p></li><li><p><b>Styles:</b> CSS or Tailwind configuration</p></li><li><p><b>A seed file:</b> JSON that tells the CMS what content types and fields to create</p></li></ul><p>This makes creating themes familiar to frontend developers who are <a href="https://npm-stat.com/charts.html?package=astro&amp;from=2024-01-01&amp;to=2026-03-30"><u>increasingly choosing Astro</u></a>, and to LLMs which are already trained on Astro.</p><p>WordPress themes, though incredibly flexible, operate with a lot of the same security risks as plugins, and the more popular and commonplace your theme, the more of a target it is. Themes run through integrating with <code>functions.php</code> which is an all-encompassing execution environment, enabling your theme to be both incredibly powerful and potentially dangerous. EmDash themes, as with dynamic plugins, turns this expectation on its head. Your theme can never perform database operations.</p>
    <div>
      <h3>An AI Native CMS — MCP, CLI, and Skills for EmDash</h3>
      <a href="#an-ai-native-cms-mcp-cli-and-skills-for-emdash">
        
      </a>
    </div>
    <p>The least fun part about working with any CMS is doing the rote migration of content: finding and replacing strings, migrating custom fields from one format to another, renaming, reordering and moving things around. This is either boring repetitive work or requires one-off scripts and  “single-use” plugins and tools that are usually neither fun to write nor to use.</p><p>EmDash is designed to be managed programmatically by your AI agents. It provides the context and the tools that your agents need, including:</p><ol><li><p><b>Agent Skills:</b> Each EmDash instance includes <a href="https://agentskills.io/home"><u>Agent Skills</u></a> that describe to your agent the capabilities EmDash can provide to plugins, the hooks that can trigger plugins, <a href="https://github.com/emdash-cms/emdash/blob/main/skills/creating-plugins/SKILL.md"><u>guidance on how to structure a plugin</u></a>, and even <a href="https://github.com/emdash-cms/emdash/blob/main/skills/wordpress-theme-to-emdash/SKILL.md"><u>how to port legacy WordPress themes to EmDash natively</u></a>. When you give an agent an EmDash codebase, EmDash provides everything the agent needs to be able to customize your site in the way you need.</p></li><li><p><b>EmDash CLI:</b> The <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx"><u>EmDash CLI</u></a> enables your agent to interact programmatically with your local or remote instance of EmDash. You can <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx#media-upload-file"><u>upload media</u></a>, <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx#emdash-search"><u>search for content</u></a>, <a href="https://github.com/emdash-cms/emdash/blob/main/docs/src/content/docs/reference/cli.mdx#schema-create-collection"><u>create and manage schemas</u></a>, and do the same set of things you can do in the Admin UI.</p></li><li><p><b>Built-in MCP Server:</b> Every EmDash instance provides its own remote Model Context Protocol (MCP) server, allowing you to do the same set of things you can do in the Admin UI.</p></li></ol>
    <div>
      <h3>Pluggable authentication, with Passkeys by default</h3>
      <a href="#pluggable-authentication-with-passkeys-by-default">
        
      </a>
    </div>
    <p>EmDash uses passkey-based authentication by default, meaning there are no passwords to leak and no brute-force vectors to defend against. User management includes familiar role-based access control out of the box: administrators, editors, authors, and contributors, each scoped strictly to the actions they need. Authentication is pluggable, so you can set EmDash up to work with your SSO provider, and automatically provision access based on IdP metadata.</p>
    <div>
      <h3>Import your WordPress sites to EmDash</h3>
      <a href="#import-your-wordpress-sites-to-emdash">
        
      </a>
    </div>
    <p>You can import an existing WordPress site by either going to WordPress admin and exporting a WXR file, or by installing the <a href="https://github.com/emdash-cms/wp-emdash/tree/main/plugins/emdash-exporter"><u>EmDash Exporter plugin</u></a> on a WordPress site, which configures a secure endpoint that is only exposed to you, and protected by a WordPress Application Password you control. Migrating content takes just a few minutes, and automatically works to bring any attached media into EmDash’s media library.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/SUFaWUIoEFSN2z9rclKZW/28870489d502cff34e35ab3b59f19eae/image1.png" />
          </figure><p>Creating any custom content types on WordPress that are not a Post or a Page has meant installing heavy plugins like Advanced Custom Fields, and squeezing the result into a crowded WordPress posts table. EmDash does things differently: you can define a schema directly in the admin panel, which will create entirely new EmDash collections for you, separately ordered in the database. On import, you can use the same capabilities to take any custom post types from WordPress, and create an EmDash content type from it. </p><p>For bespoke blocks, you can use the <a href="https://github.com/emdash-cms/emdash/blob/main/skills/creating-plugins/references/block-kit.md"><u>EmDash Block Kit Agent Skill</u></a> to instruct your agent of choice and build them for EmDash.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5xutdF9nvHYMYlN6XfqRGu/1db0e0d73327e926d606f92fdd7aabec/image3.png" />
          </figure>
    <div>
      <h3>Try it</h3>
      <a href="#try-it">
        
      </a>
    </div>
    <p>EmDash is v0.1.0 preview, and we’d love you to try it, give feedback, and we welcome contributions to the <a href="https://github.com/emdash-cms/emdash/"><u>EmDash GitHub repository</u></a>.</p><p>If you’re just playing around and want to first understand what’s possible — try out the admin interface in the <a href="https://emdashcms.com/"><u>EmDash Playground</u></a>.</p><p>To create a new EmDash site locally, via the CLI, run:</p><p><code>npm create emdash@latest</code></p><p>Or you can do the same via the Cloudflare dashboard below:</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/emdash-cms/templates/tree/main/blog-cloudflare"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>We’re excited to see what you build, and if you're active in the WordPress community, as a hosting platform, a plugin or theme author, or otherwise — we’d love to hear from you. Email us at emdash@cloudflare.com, and tell us what you’d like to see from the EmDash project.</p><p>If you want to stay up to date with major EmDash developments, you can leave your email address <a href="https://forms.gle/ofE1LYRYxkpAPqjE7"><u>here</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">64rkKr9jewVmxagIFgbwY4</guid>
            <dc:creator>Matt “TK” Taylor</dc:creator>
            <dc:creator>Matt Kane</dc:creator>
        </item>
        <item>
            <title><![CDATA[How we use Abstract Syntax Trees (ASTs) to turn Workflows code into visual diagrams ]]></title>
            <link>https://blog.cloudflare.com/workflow-diagrams/</link>
            <pubDate>Fri, 27 Mar 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[ Workflows are now visualized via step diagrams in the dashboard. Here’s how we translate your TypeScript code into a visual representation of the workflow.  ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://www.cloudflare.com/developer-platform/products/workflows/"><u>Cloudflare Workflows</u></a> is a durable execution engine that lets you chain steps, retry on failure, and persist state across long-running processes. Developers use Workflows to power background agents, manage data pipelines, build human-in-the-loop approval systems, and more.</p><p>Last month, we <a href="https://developers.cloudflare.com/changelog/post/2026-02-03-workflows-visualizer/"><u>announced</u></a> that every workflow deployed to Cloudflare now has a complete visual diagram in the dashboard.</p><p>We built this because being able to visualize your applications is more important now than ever before. Coding agents are writing code that you may or may not be reading. However, the shape of what gets built still matters: how the steps connect, where they branch, and what's actually happening.</p><p>If you've seen diagrams from visual workflow builders before, those are usually working from something declarative: JSON configs, YAML, drag-and-drop. However, Cloudflare Workflows are just code. They can include <a href="https://developers.cloudflare.com/workflows/build/workers-api/"><u>Promises, Promise.all, loops, conditionals,</u></a> and/or be nested in functions or classes. This dynamic execution model makes rendering a diagram a bit more complicated.</p><p>We use Abstract Syntax Trees (ASTs) to statically derive the graph, tracking <code>Promise</code> and <code>await</code> relationships to understand what runs in parallel, what blocks, and how the pieces connect. </p><p>Keep reading to learn how we built these diagrams, or deploy your first workflow and see the diagram for yourself.</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/templates/tree/main/workflows-starter-template"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Here’s an example of a diagram generated from Cloudflare Workflows code:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/44NnbqiNda2vgzIEneHQ3W/044856325693fbeb75ed1ab38b4db1c2/image1.png" />
          </figure>
    <div>
      <h3>Dynamic workflow execution</h3>
      <a href="#dynamic-workflow-execution">
        
      </a>
    </div>
    <p>Generally, workflow engines can execute according to either dynamic or sequential (static) execution order. Sequential execution might seem like the more intuitive solution: trigger workflow → step A → step B → step C, where step B starts executing immediately after the engine completes Step A, and so forth.</p><p><a href="https://developers.cloudflare.com/workflows/"><u>Cloudflare Workflows</u></a> follow the dynamic execution model. Since workflows are just code, the steps execute as the runtime encounters them. When the runtime discovers a step, that step gets passed over to the workflow engine, which manages its execution. The steps are not inherently sequential unless awaited — the engine executes all unawaited steps in parallel. This way, you can write your workflow code as flow control without additional wrappers or directives. Here’s how the handoff works:</p><ol><li><p>An <i>engine</i>, which is a “supervisor” Durable Object for that instance, spins up. The engine is responsible for the logic of the actual workflow execution. </p></li><li><p>The engine triggers a <a href="https://developers.cloudflare.com/cloudflare-for-platforms/workers-for-platforms/how-workers-for-platforms-works/#user-workers"><u>user worker</u></a> via <a href="https://developers.cloudflare.com/cloudflare-for-platforms/workers-for-platforms/configuration/dynamic-dispatch/"><u>dynamic dispatch</u></a>, passing control over to Workers runtime.</p></li><li><p>When Runtime encounters a <code>step.do</code>, it passes the execution back to the engine.</p></li><li><p>The engine executes the step, persists the result (or throws an error, if applicable) and triggers the user Worker again.  </p></li></ol><p>With this architecture, the engine does not inherently “know” the order of the steps that it is executing — but for a diagram, the order of steps becomes crucial information. The challenge here lies in getting the vast majority of workflows translated accurately into a diagnostically helpful graph; with the diagrams in beta, we will continue to iterate and improve on these representations.</p>
    <div>
      <h3>Parsing the code</h3>
      <a href="#parsing-the-code">
        
      </a>
    </div>
    <p>Fetching the script at <a href="https://developers.cloudflare.com/workers/get-started/guide/#4-deploy-your-project"><u>deploy time</u></a>, instead of run time, allows us to parse the workflow in its entirety to statically generate the diagram. </p><p>Taking a step back, here is the life of a workflow deployment:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1zoOCYji26ahxzh594VavQ/63ad96ae033653ffc7fd98df01ea6e27/image5.png" />
          </figure><p>To create the diagram, we fetch the script after it has been bundled by the internal configuration service which deploys Workers (step 2 under Workflow deployment). Then, we use a parser to create an abstract syntax tree (AST) representing the workflow, and our internal service generates and traverses an intermediate graph with all WorkflowEntrypoints and calls to workflows steps. We render the diagram based on the final result on our API. </p><p>When a Worker is deployed, the configuration service bundles (using <a href="https://esbuild.github.io/"><u>esbuild</u></a> by default) and minifies the code <a href="https://developers.cloudflare.com/workers/wrangler/configuration/#inheritable-keys"><u>unless specified otherwise</u></a>. This presents another challenge — while Workflows in TypeScript follow an intuitive pattern, their minified Javascript (JS) can be dense and indigestible. There are also different ways that code can be minified, depending on the bundler. </p><p>Here’s an example of Workflow code that shows <b>agents executing in parallel:</b></p>
            <pre><code>const summaryPromise = step.do(
         `summary agent (loop ${loop})`,
         async () =&gt; {
           return runAgentPrompt(
             this.env,
             SUMMARY_SYSTEM,
             buildReviewPrompt(
               'Summarize this text in 5 bullet points.',
               draft,
               input.context
             )
           );
         }
       );
        const correctnessPromise = step.do(
         `correctness agent (loop ${loop})`,
         async () =&gt; {
           return runAgentPrompt(
             this.env,
             CORRECTNESS_SYSTEM,
             buildReviewPrompt(
               'List correctness issues and suggested fixes.',
               draft,
               input.context
             )
           );
         }
       );
        const clarityPromise = step.do(
         `clarity agent (loop ${loop})`,
         async () =&gt; {
           return runAgentPrompt(
             this.env,
             CLARITY_SYSTEM,
             buildReviewPrompt(
               'List clarity issues and suggested fixes.',
               draft,
               input.context
             )
           );
         }
       );</code></pre>
            <p>Bundling with <a href="https://rspack.rs/"><u>rspack</u></a>, a snippet of the minified code looks like this:</p>
            <pre><code>class pe extends e{async run(e,t){de("workflow.run.start",{instanceId:e.instanceId});const r=await t.do("validate payload",async()=&gt;{if(!e.payload.r2Key)throw new Error("r2Key is required");if(!e.payload.telegramChatId)throw new Error("telegramChatId is required");return{r2Key:e.payload.r2Key,telegramChatId:e.payload.telegramChatId,context:e.payload.context?.trim()}}),s=await t.do("load source document from r2",async()=&gt;{const e=await this.env.REVIEW_DOCUMENTS.get(r.r2Key);if(!e)throw new Error(`R2 object not found: ${r.r2Key}`);const t=(await e.text()).trim();if(!t)throw new Error("R2 object is empty");return t}),n=Number(this.env.MAX_REVIEW_LOOPS??"5"),o=this.env.RESPONSE_TIMEOUT??"7 days",a=async(s,i,c)=&gt;{if(s&gt;n)return le("workflow.loop.max_reached",{instanceId:e.instanceId,maxLoops:n}),await t.do("notify max loop reached",async()=&gt;{await se(this.env,r.telegramChatId,`Review stopped after ${n} loops for ${e.instanceId}. Start again if you still need revisions.`)}),{approved:!1,loops:n,finalText:i};const h=t.do(`summary agent (loop ${s})`,async()=&gt;te(this.env,"You summarize documents. Keep the output short, concrete, and factual.",ue("Summarize this text in 5 bullet points.",i,r.context)))...</code></pre>
            <p>Or, bundling with <a href="https://vite.dev/"><u>vite</u></a>, here is a minified snippet:</p>
            <pre><code>class ht extends pe {
  async run(e, r) {
    b("workflow.run.start", { instanceId: e.instanceId });
    const s = await r.do("validate payload", async () =&gt; {
      if (!e.payload.r2Key)
        throw new Error("r2Key is required");
      if (!e.payload.telegramChatId)
        throw new Error("telegramChatId is required");
      return {
        r2Key: e.payload.r2Key,
        telegramChatId: e.payload.telegramChatId,
        context: e.payload.context?.trim()
      };
    }), n = await r.do(
      "load source document from r2",
      async () =&gt; {
        const i = await this.env.REVIEW_DOCUMENTS.get(s.r2Key);
        if (!i)
          throw new Error(`R2 object not found: ${s.r2Key}`);
        const c = (await i.text()).trim();
        if (!c)
          throw new Error("R2 object is empty");
        return c;
      }
    ), o = Number(this.env.MAX_REVIEW_LOOPS ?? "5"), l = this.env.RESPONSE_TIMEOUT ?? "7 days", a = async (i, c, u) =&gt; {
      if (i &gt; o)
        return H("workflow.loop.max_reached", {
          instanceId: e.instanceId,
          maxLoops: o
        }), await r.do("notify max loop reached", async () =&gt; {
          await J(
            this.env,
            s.telegramChatId,
            `Review stopped after ${o} loops for ${e.instanceId}. Start again if you still need revisions.`
          );
        }), {
          approved: !1,
          loops: o,
          finalText: c
        };
      const h = r.do(
        `summary agent (loop ${i})`,
        async () =&gt; _(
          this.env,
          et,
          K(
            "Summarize this text in 5 bullet points.",
            c,
            s.context
          )
        )
      )...</code></pre>
            <p>Minified code can get pretty gnarly — and depending on the bundler, it can get gnarly in a bunch of different directions.</p><p>We needed a way to parse the various forms of minified code quickly and precisely. We decided <code>oxc-parser</code> from the <a href="https://oxc.rs/"><u>JavaScript Oxidation Compiler</u></a> (OXC) was perfect for the job. We first tested this idea by having a container running Rust. Every script ID was sent to a <a href="https://developers.cloudflare.com/queues/"><u>Cloudflare Queue</u></a>, after which messages were popped and sent to the container to process. Once we confirmed this approach worked, we moved to a Worker written in Rust. Workers supports running <a href="https://developers.cloudflare.com/workers/languages/rust/"><u>Rust via WebAssembly</u></a>, and the package was small enough to make this straightforward.</p><p>The Rust Worker is responsible for first converting the minified JS into AST node types, then converting the AST node types into the graphical version of the workflow that is rendered on the dashboard. To do this, we generate a graph of pre-defined <a href="https://developers.cloudflare.com/workflows/build/visualizer/"><u>node types</u></a> for each workflow and translate into our graph representation through a series of node mappings. </p>
    <div>
      <h3>Rendering the diagram</h3>
      <a href="#rendering-the-diagram">
        
      </a>
    </div>
    <p>There were two challenges to rendering a diagram version of the workflow: how to track step and function relationships correctly, and how to define the workflow node types as simply as possible while covering all the surface area.</p><p>To guarantee that step and function relationships are tracked correctly, we needed to collect both the function and step names. As we discussed earlier, the engine only has information about the steps, but a step may be dependent on a function, or vice versa. For example, developers might wrap steps in functions or define functions as steps. They could also call steps within a function that come from different <a href="https://blog.cloudflare.com/workers-javascript-modules/"><u>modules</u></a> or rename steps. </p><p>Although the library passes the initial hurdle by giving us the AST, we still have to decide how to parse it.  Some code patterns require additional creativity. For example, functions — within a <code>WorkflowEntrypoint</code>, there can be functions that call steps directly, indirectly, or not at all. Consider <code>functionA</code>, which contains <code>console.log(await functionB(), await functionC()</code>) where <code>functionB</code> calls a <code>step.do()</code>. In that case, both <code>functionA</code> and <code>functionB</code> should be included on the workflow diagram; however, <code>functionC</code> should not. To catch all functions which include direct and indirect step calls, we create a subgraph for each function and check whether it contains a step call itself or whether it calls another function which might. Those subgraphs are represented by a function node, which contains all of its relevant nodes. If a function node is a leaf of the graph, meaning it has no direct or indirect workflow steps within it, it is trimmed from the final output. </p><p>We check for other patterns as well, including a list of static steps from which we can infer the workflow diagram or variables, defined in up to ten different ways. If your script contains multiple workflows, we follow a similar pattern to the subgraphs created for functions, abstracted one level higher. </p><p>For every AST node type, we had to consider every way they could be used inside of a workflow: loops, branches, promises, parallels, awaits, arrow functions… the list goes on. Even within these paths, there are dozens of possibilities. Consider just a few of the possible ways to loop:</p>
            <pre><code>// for...of
for (const item of items) {
	await step.do(`process ${item}`, async () =&gt; item);
}
// while
while (shouldContinue) {
	await step.do('poll', async () =&gt; getStatus());
}
// map
await Promise.all(
	items.map((item) =&gt; step.do(`map ${item}`, async () =&gt; item)),
);
// forEach
await items.forEach(async (item) =&gt; {
	await step.do(`each ${item}`, async () =&gt; item);
});</code></pre>
            <p>And beyond looping, how to handle branching:</p>
            <pre><code>// switch / case
switch (action.type) {
	case 'create':
		await step.do('handle create', async () =&gt; {});
		break;
	default:
		await step.do('handle unknown', async () =&gt; {});
		break;
}

// if / else if / else
if (status === 'pending') {
	await step.do('pending path', async () =&gt; {});
} else if (status === 'active') {
	await step.do('active path', async () =&gt; {});
} else {
	await step.do('fallback path', async () =&gt; {});
}

// ternary operator
await (cond
	? step.do('ternary true branch', async () =&gt; {})
	: step.do('ternary false branch', async () =&gt; {}));

// nullish coalescing with step on RHS
const myStepResult =
	variableThatCanBeNullUndefined ??
	(await step.do('nullish fallback step', async () =&gt; 'default'));

// try/catch with finally
try {
	await step.do('try step', async () =&gt; {});
} catch (_e) {
	await step.do('catch step', async () =&gt; {});
} finally {
	await step.do('finally step', async () =&gt; {});
}</code></pre>
            <p>Our goal was to create a concise API that communicated what developers need to know without overcomplicating it. But converting a workflow into a diagram meant accounting for every pattern (whether it follows best practices, or not) and edge case possible. As we discussed earlier, each step is not explicitly sequential, by default, to any other step. If a workflow does not utilize <code>await</code> and <code>Promise.all()</code>, we assume that the steps will execute in the order in which they are encountered. But if a workflow included <code>await</code>, <code>Promise</code> or <code>Promise.all()</code>, we needed a way to track those relationships.</p><p>We decided on tracking execution order, where each node has a <code>starts:</code> and <code>resolves:</code> field. The <code>starts</code> and <code>resolves</code> indices tell us when a promise started executing and when it ends relative to the first promise that started without an immediate, subsequent conclusion. This correlates to vertical positioning in the diagram UI (i.e., all steps with <code>starts:1</code> will be inline). If steps are awaited when they are declared, then <code>starts</code> and <code>resolves</code> will be undefined, and the workflow will execute in the order of the steps’ appearance to the runtime.</p><p>While parsing, when we encounter an unawaited <code>Promise</code> or <code>Promise.all()</code>, that node (or nodes) are marked with an entry number, surfaced in the <code>starts</code> field. If we encounter an <code>await</code> on that promise, the entry number is incremented by one and saved as the exit number (which is the value in <code>resolves</code>). This allows us to know which promises run at the same time and when they’ll complete in relation to each other.</p>
            <pre><code>export class ImplicitParallelWorkflow extends WorkflowEntrypoint&lt;Env, Params&gt; {
 async run(event: WorkflowEvent&lt;Params&gt;, step: WorkflowStep) {
   const branchA = async () =&gt; {
     const a = step.do("task a", async () =&gt; "a"); //starts 1
     const b = step.do("task b", async () =&gt; "b"); //starts 1
     const c = await step.waitForEvent("task c", { type: "my-event", timeout: "1 hour" }); //starts 1 resolves 2
     await step.do("task d", async () =&gt; JSON.stringify(c)); //starts 2 resolves 3
     return Promise.all([a, b]); //resolves 3
   };

   const branchB = async () =&gt; {
     const e = step.do("task e", async () =&gt; "e"); //starts 1
     const f = step.do("task f", async () =&gt; "f"); //starts 1
     return Promise.all([e, f]); //resolves 2
   };

   await Promise.all([branchA(), branchB()]);

   await step.sleep("final sleep", 1000);
 }
}</code></pre>
            <p>You can see the steps’ alignment in the diagram:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6EZJ38J3H55yH0OnT11vgg/6dde06725cd842725ee3af134b1505c0/image3.png" />
          </figure><p>After accounting for all of those patterns, we settled on the following list of node types: 	</p>
            <pre><code>| StepSleep
| StepDo
| StepWaitForEvent
| StepSleepUntil
| LoopNode
| ParallelNode
| TryNode
| BlockNode
| IfNode
| SwitchNode
| StartNode
| FunctionCall
| FunctionDef
| BreakNode;</code></pre>
            <p>Here are a few samples of API output for different behaviors: </p><p><code>function</code> call:</p>
            <pre><code>{
  "functions": {
    "runLoop": {
      "name": "runLoop",
      "nodes": []
    }
  }
}</code></pre>
            <p><code>if</code> condition branching to <code>step.do</code>:</p>
            <pre><code>{
  "type": "if",
  "branches": [
    {
      "condition": "loop &gt; maxLoops",
      "nodes": [
        {
          "type": "step_do",
          "name": "notify max loop reached",
          "config": {
            "retries": {
              "limit": 5,
              "delay": 1000,
              "backoff": "exponential"
            },
            "timeout": 10000
          },
          "nodes": []
        }
      ]
    }
  ]
}</code></pre>
            <p><code>parallel</code> with <code>step.do</code> and <code>waitForEvent</code>:</p>
            <pre><code>{
  "type": "parallel",
  "kind": "all",
  "nodes": [
    {
      "type": "step_do",
      "name": "correctness agent (loop ${...})",
      "config": {
        "retries": {
          "limit": 5,
          "delay": 1000,
          "backoff": "exponential"
        },
        "timeout": 10000
      },
      "nodes": [],
      "starts": 1
    },
...
    {
      "type": "step_wait_for_event",
      "name": "wait for user response (loop ${...})",
      "options": {
        "event_type": "user-response",
        "timeout": "unknown"
      },
      "starts": 3,
      "resolves": 4
    }
  ]
}</code></pre>
            
    <div>
      <h3>What’s next</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Ultimately, the goal of these Workflow diagrams is to serve as a full-service debugging tool. That means you’ll be able to:</p><ul><li><p>Trace an execution through the graph in real time</p></li><li><p>Discover errors, wait for human-in-the-loop approvals, and skip steps for testing</p></li><li><p>Access visualizations in local development</p></li></ul><p>Check out the diagrams on your <a href="https://dash.cloudflare.com/?to=/:account/workers/workflows"><u>Workflow overview pages</u></a>. If you have any feature requests or notice any bugs, share your feedback directly with the Cloudflare team by joining the <a href="https://discord.cloudflare.com/"><u>Cloudflare Developers community on Discord</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Workflows]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">4HOWpzOgT3eVU2wFa4adFU</guid>
            <dc:creator>André Venceslau</dc:creator>
            <dc:creator>Mia Malden</dc:creator>
        </item>
        <item>
            <title><![CDATA[Powering the agents: Workers AI now runs large models, starting with Kimi K2.5]]></title>
            <link>https://blog.cloudflare.com/workers-ai-large-models/</link>
            <pubDate>Thu, 19 Mar 2026 19:53:16 GMT</pubDate>
            <description><![CDATA[ Kimi K2.5 is now on Workers AI, helping you power agents entirely on Cloudflare’s Developer Platform. Learn how we optimized our inference stack and reduced inference costs for internal agent use cases.  ]]></description>
            <content:encoded><![CDATA[ <p>We're making Cloudflare the best place for building and deploying agents. But reliable agents aren't built on prompts alone; they require a robust, coordinated infrastructure of underlying primitives. </p><p>At Cloudflare, we have been building these primitives for years: <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> for state persistence, <a href="https://developers.cloudflare.com/workflows/"><u>Workflows</u></a> for long running tasks, and <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Workers</u></a> or <a href="https://developers.cloudflare.com/sandbox/"><u>Sandbox</u></a> containers for secure execution. Powerful abstractions like the <a href="https://developers.cloudflare.com/agents/"><u>Agents SDK</u></a> are designed to help you build agents on top of Cloudflare’s Developer Platform.</p><p>But these primitives only provided the execution environment. The agent still needed a model capable of powering it. </p><p>Starting today, Workers AI is officially in the big models game. We now offer frontier open-source models on our AI inference platform. We’re starting by releasing <a href="https://www.kimi.com/blog/kimi-k2-5"><u>Moonshot AI’s Kimi K2.5</u></a> model <a href="https://developers.cloudflare.com/workers-ai/models/kimi-k2.5"><u>on Workers AI</u></a>. With a full 256k context window and support for multi-turn tool calling, vision inputs, and structured outputs, the Kimi K2.5 model is excellent for all kinds of agentic tasks. By bringing a frontier-scale model directly into the Cloudflare Developer Platform, we’re making it possible to run the entire agent lifecycle on a single, unified platform.</p><p>The heart of an agent is the AI model that powers it, and that model needs to be smart, with high reasoning capabilities and a large context window. Workers AI now runs those models.</p>
    <div>
      <h2>The price-performance sweet spot</h2>
      <a href="#the-price-performance-sweet-spot">
        
      </a>
    </div>
    <p>We spent the last few weeks testing Kimi K2.5 as the engine for our internal development tools. Within our <a href="https://opencode.ai/"><u>OpenCode</u></a> environment, Cloudflare engineers use Kimi as a daily driver for agentic coding tasks. We have also integrated the model into our automated code review pipeline; you can see this in action via our public code review agent, <a href="https://github.com/ask-bonk/ask-bonk"><u>Bonk</u></a>, on Cloudflare GitHub repos. In production, the model has proven to be a fast, efficient alternative to larger proprietary models without sacrificing quality.</p><p>Serving Kimi K2.5 began as an experiment, but it quickly became critical after reviewing how the model performs and how cost-efficient it is. As an illustrative example: we have an agent that does security reviews of Cloudflare’s codebases. This agent processes over 7B tokens per day, and using Kimi, it has caught more than 15 confirmed issues in a single codebase. Doing some rough math, if we had run this agent on a mid-tier proprietary model, we would have spent $2.4M a year for this single use case, on a single codebase. Running this agent with Kimi K2.5 cost just a fraction of that: we cut costs by 77% simply by making the switch to Workers AI.</p><p>As AI adoption increases, we are seeing a fundamental shift not only in how engineering teams are operating, but how individuals are operating. It is becoming increasingly common for people to have a personal agent like <a href="https://openclaw.ai/"><u>OpenClaw</u></a> running 24/7. The volume of inference is skyrocketing.</p><p>This new rise in personal and coding agents means that cost is no longer a secondary concern; it is the primary blocker to scaling. When every employee has multiple agents processing hundreds of thousands of tokens per hour, the math for proprietary models stops working. Enterprises will look to transition to open-source models that offer frontier-level reasoning without the proprietary price tag. Workers AI is here to facilitate this shift, providing everything from serverless endpoints for a personal agent to dedicated instances powering autonomous agents across an entire organization.</p>
    <div>
      <h2>The large model inference stack</h2>
      <a href="#the-large-model-inference-stack">
        
      </a>
    </div>
    <p>Workers AI has served models, including LLMs, since its launch two years ago, but we’ve historically prioritized smaller models. Part of the reason was that for some time, open-source LLMs fell far behind the models from frontier model labs. This changed with models like Kimi K2.5, but to serve this type of very large LLM, we had to make changes to our inference stack. We wanted to share with you some of what goes on behind the scenes to support a model like Kimi.</p><p>We’ve been working on custom kernels for Kimi K2.5 to optimize how we serve the model, which is built on top of our proprietary <a href="https://blog.cloudflare.com/cloudflares-most-efficient-ai-inference-engine/"><u>Infire inference engine</u></a>. Custom kernels improve the model’s performance and GPU utilization, unlocking gains that would otherwise go unclaimed if you were just running the model out of the box. There are also multiple techniques and hardware configurations that can be leveraged to serve a large model. Developers typically use a combination of data, tensor, and expert parallelization techniques to optimize model performance. Strategies like disaggregated prefill are also important, in which you separate the prefill and generation stages onto different machines in order to get better throughput or higher GPU utilization. Implementing these techniques and incorporating them into the inference stack takes a lot of dedicated experience to get right. </p><p>Workers AI has already done the experimentation with serving techniques to yield excellent throughput on Kimi K2.5. A lot of this does not come out of the box when you self-host an open-source model. The benefit of using a platform like Workers AI is that you don’t need to be a Machine Learning Engineer, a DevOps expert, or a Site Reliability Engineer to do the optimizations required to host it: we’ve already done the hard part, you just need to call an API.</p>
    <div>
      <h2>Beyond the model — platform improvements for agentic workloads</h2>
      <a href="#beyond-the-model-platform-improvements-for-agentic-workloads">
        
      </a>
    </div>
    <p>In concert with this launch, we’ve also improved our platform and are releasing several new features to help you build better agents.</p>
    <div>
      <h3>Prefix caching and surfacing cached tokens</h3>
      <a href="#prefix-caching-and-surfacing-cached-tokens">
        
      </a>
    </div>
    <p>When you work with agents, you are likely sending a large number of input tokens as part of the context: this could be detailed system prompts, tool definitions, MCP server tools, or entire codebases. Inputs can be as large as the model context window, so in theory, you could be sending requests with almost 256k input tokens. That’s a lot of tokens.</p><p>When an LLM processes a request, the request is broken down into two stages: the prefill stage processes input tokens and the output stage generates output tokens. These stages are usually sequential, where input tokens have to be fully processed before you can generate output tokens. This means that sometimes the GPU is not fully utilized while the model is doing prefill.</p><p>With multi-turn conversations, when you send a new prompt, the client sends all the previous prompts, tools, and context from the session to the model as well. The delta between consecutive requests is usually just a few new lines of input; all the other context has already gone through the prefill stage during a previous request. This is where prefix caching helps. Instead of doing prefill on the entire request, we can cache the input tensors from a previous request, and only do prefill on the new input tokens. This saves a lot of time and compute from the prefill stage, which means a faster Time to First Token (TTFT) and a higher Tokens Per Second (TPS) throughput as you’re not blocked on prefill.</p><p>Workers AI has always done prefix caching, but we are now surfacing cached tokens as a usage metric and offering a discount on cached tokens compared to input tokens. (Pricing can be found on the <a href="https://developers.cloudflare.com/workers-ai/models/kimi-k2.5/"><u>model page</u></a>.) We also have new techniques for you to leverage in order to get a higher prefix cache hit rate, reducing your costs.</p>
    <div>
      <h3>New session affinity header for higher cache hit rates</h3>
      <a href="#new-session-affinity-header-for-higher-cache-hit-rates">
        
      </a>
    </div>
    <p>In order to route to the same model instance and take advantage of prefix caching, we use a new <code>x-session-affinity</code> header. When you send this header, you’ll improve your cache hit ratio, leading to more cached tokens and subsequently, faster TTFT, TPS, and lower inference costs.</p><p>You can pass the new header like below, with a unique string per session or per agent. Some clients like OpenCode implement this automatically out of the box. Our <a href="https://github.com/cloudflare/agents-starter"><u>Agents SDK starter</u></a> has already set up the wiring to do this for you, too.</p>
            <pre><code>curl -X POST \
"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/moonshotai/kimi-k2.5" \
  -H "Authorization: Bearer {API_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "x-session-affinity: ses_12345678" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is prefix caching and why does it matter?"
      }
    ],
    "max_tokens": 2400,
    "stream": true
  }'
</code></pre>
            
    <div>
      <h3>Redesigned async APIs</h3>
      <a href="#redesigned-async-apis">
        
      </a>
    </div>
    <p>Serverless inference is really hard. With a pay-per-token business model, it’s cheaper on a single request basis because you don’t need to pay for entire GPUs to service your requests. But there’s a trade-off: you have to contend with other people’s traffic and capacity constraints, and there’s no strict guarantee that your request will be processed. This is not unique to Workers AI — it’s evidently the case across serverless model providers, given the frequent news reports of overloaded providers and service disruptions. While we always strive to serve your request and have built-in autoscaling and rebalancing, there are hard limitations (like hardware) that make this a challenge.</p><p>For volumes of requests that would exceed synchronous rate limits, you can submit batches of inferences to be completed asynchronously. We’re introducing a revamped Asynchronous API, which means that for asynchronous use cases, you won’t run into Out of Capacity errors and inference will execute durably at some point. Our async API looks more like flex processing than a batch API, where we process requests in the async queue as long as we have headroom in our model instances. With internal testing, our async requests usually execute within 5 minutes, but this will depend on what live traffic looks like. As we bring Kimi to the public, we will tune our scaling accordingly, but the async API is the best way to make sure you don’t run into capacity errors in durable workflows. This is perfect for use cases that are not real-time, such as code scanning agents or research agents.</p><p>Workers AI previously had an asynchronous API, but we’ve recently revamped the systems under the hood. We now rely on a pull-based system versus the historical push-based system, allowing us to pull in queued requests as soon as we have capacity. We’ve also added better controls to tune the throughput of async requests, monitoring GPU utilization in real-time and pulling in async requests when utilization is low, so that critical synchronous requests get priority while still processing asynchronous requests efficiently.</p><p>To use the asynchronous API, you would send your requests as seen below. We also have a way to <a href="https://developers.cloudflare.com/workers-ai/platform/event-subscriptions/"><u>set up event notifications</u></a> so that you can know when the inference is complete instead of polling for the request. </p>
            <pre><code>// (1.) Push a request in queue
// pass queueRequest: true
let res = await env.AI.run("@cf/moonshotai/kimi-k2.5", {
  "requests": [{
    "messages": [{
      "role": "user",
      "content": "Tell me a joke"
    }]
  }, {
    "messages": [{
      "role": "user",
      "content": "Explain the Pythagoras theorem"
    }]
  }, ...{&lt;add more requests in a batch&gt;} ];
}, {
  queueRequest: true,
});


// (2.) grab the request id
let request_id;
if(res &amp;&amp; res.request_id){
  request_id = res.request_id;
}
// (3.) poll the status
let res = await env.AI.run("@cf/moonshotai/kimi-k2.5", {
  request_id: request_id
});

if(res &amp;&amp; res.status === "queued" || res.status === "running") {
 // retry by polling again
 ...
}
else 
 return Response.json(res); // This will contain the final completed response 
</code></pre>
            
    <div>
      <h2>Try it out today</h2>
      <a href="#try-it-out-today">
        
      </a>
    </div>
    <p>Get started with Kimi K2.5 on Workers AI today. You can read our developer docs to find out <a href="https://developers.cloudflare.com/workers-ai/models/kimi-k2.5/"><u>model information and pricing</u></a>, and how to take advantage of <a href="https://developers.cloudflare.com/workers-ai/features/prompt-caching/"><u>prompt caching via session affinity headers</u></a> and <a href="https://developers.cloudflare.com/workers-ai/features/batch-api/"><u>asynchronous API</u></a>. The <a href="https://github.com/cloudflare/agents-starter"><u>Agents SDK starter</u></a> also now uses Kimi K2.5 as its default model. You can also <a href="https://opencode.ai/docs/providers/"><u>connect to Kimi K2.5 on Workers AI via Opencode</u></a>. For a live demo, try it in our <a href="https://playground.ai.cloudflare.com/"><u>playground</u></a>.</p><p>And if this set of problems around serverless inference, ML optimizations, and GPU infrastructure sound  interesting to you — <a href="https://job-boards.greenhouse.io/cloudflare/jobs/6297179?gh_jid=6297179"><u>we’re hiring</u></a>!</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/36JzF0zePj2z7kZQK8Q2fg/73b0a7206d46f0eef170ffd1494dc4b3/BLOG-3247_2.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Agents]]></category>
            <guid isPermaLink="false">1wSO33KRdd5aUPAlSVDiqU</guid>
            <dc:creator>Michelle Chen</dc:creator>
            <dc:creator>Kevin Flansburg</dc:creator>
            <dc:creator>Ashish Datta</dc:creator>
            <dc:creator>Kevin Jain</dc:creator>
        </item>
        <item>
            <title><![CDATA[The truly programmable SASE platform]]></title>
            <link>https://blog.cloudflare.com/programmable-sase/</link>
            <pubDate>Mon, 02 Mar 2026 06:00:00 GMT</pubDate>
            <description><![CDATA[ As the only SASE platform with a native developer stack, we’re giving you the tools to build custom, real-time security logic and integrations directly at the edge. ]]></description>
            <content:encoded><![CDATA[ <p>Every organization approaches security through a unique lens, shaped by their tooling, requirements, and history. No two environments look the same, and none stay static for long. We believe the platforms that protect them shouldn't be static either.</p><p>Cloudflare built our global network to be programmable by design, so we can help organizations unlock this flexibility and freedom. In this post, we’ll go deeper into what programmability means, and how <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Cloudflare One</u></a>, our SASE platform, helps customers architect their security and networking with our building blocks to meet their unique and custom needs.</p>
    <div>
      <h2>What programmability actually means</h2>
      <a href="#what-programmability-actually-means">
        
      </a>
    </div>
    <p>The term programmability has become diluted by the industry. Most security vendors claim programmability because they have public APIs, documented Terraform providers, webhooks, and alerting. That’s great, and Cloudflare offers all of those things too.</p><p>These foundational capabilities provide customization, infrastructure-as-code, and security operations automation, but they're table stakes. With traditional programmability, you can configure a webhook to send an alert to Slack when a policy triggers.</p><p>But the true value of programmability is something different. It is the ability to intercept a security event, enrich it with external context, and act on it in real time. Say a user attempts to access a regulated application containing sensitive financial data. Before the request completes, you query your learning management system to verify the user has completed the required compliance training. If their certification has expired, or they never completed it, access is denied, and they are redirected to the training portal. The policy did not just trigger an alert — it made the decision. </p>
    <div>
      <h2>Building the most programmable SASE platform</h2>
      <a href="#building-the-most-programmable-sase-platform">
        
      </a>
    </div>
    <p>The Cloudflare global network spans more than 330 cities across the globe and operates within approximately 50 milliseconds of 95% of the Internet-connected population. This network runs every service on every server in every data center. That means our <a href="https://blog.cloudflare.com/cloudflare-sase-gartner-magic-quadrant-2025/"><u>industry-leading SASE platform</u></a> and <a href="https://www.cloudflare.com/lp/gartner-magic-quadrant-cnap-2025/"><u>Developer Platform</u></a> run side by side, on the same metal, making our Cloudflare services both composable and programmable. </p><p>When you use Cloudflare to protect your external web properties, you are using the same network, the same tools, and the same primitives as when you secure your users, devices, and private networks with Cloudflare One. Those are also the same primitives you use when you build and deploy full-stack applications on our <a href="https://www.cloudflare.com/developer-platform/products/"><u>Developer Platform</u></a>. They are designed to work together — not because they were integrated after the fact, but because they were never separate to begin with.</p><p>By design, this allows customers to extend policy decisions with custom logic in real time. You can call an external risk API, inject dynamic headers, or validate browser attributes. You can route traffic based on your business logic without adding latency or standing up separate infrastructure. Standalone <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/"><u>SASE</u></a> providers without their own compute platform require you to deploy automation in a separate cloud, manually configure webhooks, and accept the round-trip latency and management overhead of stitching together disconnected systems. With Cloudflare, your <a href="https://workers.cloudflare.com/"><u>Worker</u></a> augments inline SASE services like Access to enforce custom policies, at the edge, in milliseconds.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3PiutZ0tTvG7uFxBiAARwl/1231223aacc84fc635b77450df48a4ec/image2.png" />
          </figure>
    <div>
      <h2>What programmability unlocks</h2>
      <a href="#what-programmability-unlocks">
        
      </a>
    </div>
    <p>At its core, every security gateway operates on the same fundamental model. Traffic flows from sources, through policies, to destinations. The policies are where things get interesting, but in most platforms, your options are limited to predefined actions: allow, block, isolate, or quarantine.</p><p>We think there is a better way. What if you could invoke custom logic instead? </p><p>Rather than predefined actions, you could: </p><ul><li><p>Dynamically inject headers based on user identity claims</p></li><li><p>Call external risk engines for a real-time verdict before allowing access</p></li><li><p>Enforce access controls based on location and working hours</p></li></ul><p>Today, customers can already do many of these things with Cloudflare. And we are strengthening the integration between our <a href="https://www.cloudflare.com/sase/"><u>SASE</u></a> and <a href="https://www.cloudflare.com/developer-platform/"><u>Developer Platform</u></a> to make this even easier. Programmability extensions, like the ones listed above, will be natively integrated into Cloudflare One, enabling customers to build real-time, custom logic into their security and networking policies. Inspect a request and make a decision in milliseconds. Or run a Worker on a schedule to analyze user activity and update policies accordingly, such as adding users to a high-risk list based on signals from an external system.</p><p>We are building this around the concept of actions: both managed and custom. Managed actions will provide templates for common scenarios like IT service management integrations, redirects, and compliance automation. Custom actions allow you to define your own logic entirely. When a Gateway HTTP policy matches, instead of being limited to allow, block, or isolate, you can invoke a Cloudflare Worker directly. Your code runs at the edge, in real time, with full access to the request context. </p>
    <div>
      <h2>How customers are building today</h2>
      <a href="#how-customers-are-building-today">
        
      </a>
    </div>
    <p>While we are improving this experience, many customers are already using Cloudflare One and Developer Platform this way today. Here is a simple example that illustrates what you can do with this programmability. </p>
    <div>
      <h3>Automated device session revocation</h3>
      <a href="#automated-device-session-revocation">
        
      </a>
    </div>
    <p>The problem: A customer wanted to enforce periodic re-authentication for their Cloudflare One Client users, similar to how traditional VPNs require users to re-authenticate every few hours. Cloudflare's pre-defined session controls are designed around per-application policies, not global time-based expiration.</p><p>The solution: A scheduled Cloudflare Worker that queries the Devices API, identifies devices that have been inactive longer than a specified threshold, and revokes their registrations, forcing users to re-authenticate via their identity provider.</p>
            <pre><code>export default {
  async scheduled(event, env, ctx) {
    const API_TOKEN = env.API_TOKEN;
    const ACCOUNT_ID = env.ACCOUNT_ID;
    const REVOKE_INTERVAL_MINUTES = parseInt(env.REVOKE_INTERVAL_MINUTES); // Reuse for inactivity threshold
    const DRY_RUN = env.DRY_RUN === 'true';

    const headers = {
      'Authorization': `Bearer ${API_TOKEN}`,
      'Content-Type': 'application/json'
    };

    let cursor = '';
    let allDevices = [];

    // Fetch all registrations with cursor-based pagination
    while (true) {
      let url = `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/devices/registrations?per_page=100`;
      if (cursor) {
        url += `&amp;cursor=${cursor}`;
      }

      const devicesResponse = await fetch(url, { headers });
      const devicesData = await devicesResponse.json();
      if (!devicesData.success) {
        console.error('Failed to fetch registrations:', devicesData.errors);
        return;
      }

      allDevices = allDevices.concat(devicesData.result);

      // Extract next cursor (adjust if your response uses a different field, e.g., devicesData.result_info.cursor)
      cursor = devicesData.cursor || '';
      if (!cursor) break;
    }

    const now = new Date();

    for (const device of allDevices) {
      const lastSeen = new Date(device.last_seen_at);
      const minutesInactive = (now - lastSeen) / (1000 * 60);

      if (minutesInactive &gt; REVOKE_INTERVAL_MINUTES) {
        console.log(`Registration ${device.id} inactive for ${minutesInactive} minutes.`);

        if (DRY_RUN) {
          console.log(`Dry run: Would delete registration ${device.id}`);
        } else {
          const deleteResponse = await fetch(
            `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/devices/registrations/${device.id}`,
            { method: 'DELETE', headers }
          );
          const deleteData = await deleteResponse.json();
          if (deleteData.success) {
            console.log(`Deleted registration ${device.id}`);
          } else {
            console.error(`Failed to delete ${device.id}:`, deleteData.errors);
          }
        }
      }
    }
  }
};</code></pre>
            <p>Configure the Worker with environment secrets (<code>API_TOKEN, ACCOUNT_ID</code>, <code>REVOKE_INTERVAL_MINUTES</code>) and a cron trigger (<code>0 */4 * * *</code> for every 4 hours), and you have automated session management. Just getting a simple feature like this into a vendor’s roadmap could take months, and even longer to move into a management interface.</p><p>But with automated device session revocation, our technical specialist deployed this policy with the customer in an afternoon. It's been running in production for months.</p><p>We’ve observed countless implementations like this across Cloudflare One deployments. We’ve seen users implement coaching pages and purpose justification workflows by using our existing <a href="https://developers.cloudflare.com/cloudflare-one/traffic-policies/http-policies/#redirect"><u>redirect policies</u></a> and Workers. Other users have built custom logic that evaluates browser attributes before making policy or routing decisions. Each solves a unique problem that would otherwise require waiting for a vendor to build a specific, niche integration with a third-party system. Instead, customers are building exactly what they need, on their timeline, with logic they own.</p>
    <div>
      <h2>A programmable platform that changes the conversation</h2>
      <a href="#a-programmable-platform-that-changes-the-conversation">
        
      </a>
    </div>
    <p>We believe the future of enterprise security isn't a monolithic platform that tries to do everything. It's a composable and programmable platform that gives customers the tools and flexibility to extend it in any direction.</p><p>For security teams, we expect our platform to change the conversation. Instead of filing a feature request and hoping it makes the roadmap, you can build a tailored solution that addresses your exact requirements today. </p><p>For our partners and managed security service providers (MSSPs), our platform opens up their ability to build and deliver solutions for their specific customer base. That means industry-specific solutions, or capabilities for customers in a specific regulatory environment. Custom integrations become a competitive advantage, not a professional services engagement.</p><p>And for our customers, it means you're building on a platform that is easy to deploy and fundamentally adaptable to your most complex and changing needs. Your security platform grows with you — it doesn’t constrain you.</p>
    <div>
      <h2>What's next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>We're just getting started. Throughout 2026, you'll see us continue to deepen the integration between Cloudflare One and our Developer Platform. We plan to start by creating custom actions in Cloudflare Gateway that support dynamic policy enforcement. These actions can use auxiliary data stored in your organization's existing databases without the administrative or compliance challenges of migrating that data into Cloudflare. These same custom actions will also support request augmentation to pass along Cloudflare attributes to your internal systems, for better logging and access decisions in your downstream systems.  </p><p>In the meantime, the building blocks are already here. External evaluation rules, custom device posture checks, Gateway redirects, and the full power of Workers are available today. If you're not sure where to start, <a href="https://developers.cloudflare.com/cloudflare-one/"><u>our developer documentation</u></a> has guides and reference architectures for extending Cloudflare One.</p><p>We built Cloudflare on the belief that security should be ridiculously easy to use, but we also know that "easy" doesn't mean "one-size-fits-all." It means giving you the tools to build exactly what you need. We believe that’s the future of SASE. </p> ]]></content:encoded>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[SASE]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <guid isPermaLink="false">5XVjmkVenwJsJX1GQkMC9U</guid>
            <dc:creator>Abe Carryl</dc:creator>
        </item>
        <item>
            <title><![CDATA[We deserve a better streams API for JavaScript]]></title>
            <link>https://blog.cloudflare.com/a-better-web-streams-api/</link>
            <pubDate>Fri, 27 Feb 2026 06:00:00 GMT</pubDate>
            <description><![CDATA[ The Web streams API has become ubiquitous in JavaScript runtimes but was designed for a different era. Here's what a modern streaming API could (should?) look like. ]]></description>
            <content:encoded><![CDATA[ <p>Handling data in streams is fundamental to how we build applications. To make streaming work everywhere, the <a href="https://streams.spec.whatwg.org/"><u>WHATWG Streams Standard</u></a> (informally known as "Web streams") was designed to establish a common API to work across browsers and servers. It shipped in browsers, was adopted by Cloudflare Workers, Node.js, Deno, and Bun, and became the foundation for APIs like <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API"><u>fetch()</u></a>. It's a significant undertaking, and the people who designed it were solving hard problems with the constraints and tools they had at the time.</p><p>But after years of building on Web streams – implementing them in both Node.js and Cloudflare Workers, debugging production issues for customers and runtimes, and helping developers work through far too many common pitfalls – I've come to believe that the standard API has fundamental usability and performance issues that cannot be fixed easily with incremental improvements alone. The problems aren't bugs; they're consequences of design decisions that may have made sense a decade ago, but don't align with how JavaScript developers write code today.</p><p>This post explores some of the fundamental issues I see with Web streams and presents an alternative approach built around JavaScript language primitives that demonstrate something better is possible. </p><p>In benchmarks, this alternative can run anywhere between 2x to <i>120x</i> faster than Web streams in every runtime I've tested it on (including Cloudflare Workers, Node.js, Deno, Bun, and every major browser). The improvements are not due to clever optimizations, but fundamentally different design choices that more effectively leverage modern JavaScript language features. I'm not here to disparage the work that came before; I'm here to start a conversation about what can potentially come next.</p>
    <div>
      <h2>Where we're coming from</h2>
      <a href="#where-were-coming-from">
        
      </a>
    </div>
    <p>The Streams Standard was developed between 2014 and 2016 with an ambitious goal to provide "APIs for creating, composing, and consuming streams of data that map efficiently to low-level I/O primitives." Before Web streams, the web platform had no standard way to work with streaming data.</p><p>Node.js already had its own <a href="https://nodejs.org/api/stream.html"><u>streaming API</u></a> at the time that was ported to also work in browsers, but WHATWG chose not to use it as a starting point given that it is chartered to only consider the needs of Web browsers. Server-side runtimes only adopted Web streams later, after Cloudflare Workers and Deno each emerged with first-class Web streams support and cross-runtime compatibility became a priority.</p><p>The design of Web streams predates async iteration in JavaScript. The <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for-await...of"><code><u>for await...of</u></code></a> syntax didn't land until <a href="https://262.ecma-international.org/9.0/"><u>ES2018</u></a>, two years after the Streams Standard was initially finalized. This timing meant the API couldn't initially leverage what would eventually become the idiomatic way to consume asynchronous sequences in JavaScript. Instead, the spec introduced its own reader/writer acquisition model, and that decision rippled through every aspect of the API.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3X0niHShBlgF4LlpWYB7eC/f0bbf35f12ecc98a3888e6e3835acf3a/1.png" />
          </figure>
    <div>
      <h4>Excessive ceremony for common operations</h4>
      <a href="#excessive-ceremony-for-common-operations">
        
      </a>
    </div>
    <p>The most common task with streams is reading them to completion. Here's what that looks like with Web streams:</p>
            <pre><code>// First, we acquire a reader that gives an exclusive lock
// on the stream...
const reader = stream.getReader();
const chunks = [];
try {
  // Second, we repeatedly call read and await on the returned
  // promise to either yield a chunk of data or indicate we're
  // done.
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    chunks.push(value);
  }
} finally {
  // Finally, we release the lock on the stream
  reader.releaseLock();
}</code></pre>
            <p>You might assume this pattern is inherent to streaming. It isn't. The reader acquisition, the lock management, and the <code>{ value, done }</code> protocol are all just design choices, not requirements. They are artifacts of how and when the Web streams spec was written. Async iteration exists precisely to handle sequences that arrive over time, but async iteration did not yet exist when the streams specification was written. The complexity here is pure API overhead, not fundamental necessity.</p><p>Consider the alternative approach now that Web streams do support <code>for await...of</code>:</p>
            <pre><code>const chunks = [];
for await (const chunk of stream) {
  chunks.push(chunk);
}</code></pre>
            <p>This is better in that there is far less boilerplate, but it doesn't solve everything. Async iteration was retrofitted onto an API that wasn't designed for it, and it shows. Features like <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader"><u>BYOB (bring your own buffer)</u></a> reads aren't accessible through iteration. The underlying complexity of readers, locks, and controllers are still there, just hidden. When something does go wrong, or when additional features of the API are needed, developers find themselves back in the weeds of the original API, trying to understand why their stream is "locked" or why <code>releaseLock()</code> didn't do what they expected or hunting down bottlenecks in code they don't control.</p>
    <div>
      <h4>The locking problem</h4>
      <a href="#the-locking-problem">
        
      </a>
    </div>
    <p>Web streams use a locking model to prevent multiple consumers from interleaving reads. When you call <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/getReader"><code><u>getReader()</u></code></a>, the stream becomes locked. While locked, nothing else can read from the stream directly, pipe it, or even cancel it – only the code that is actually holding the reader can.</p><p>This sounds reasonable until you see how easily it goes wrong:</p>
            <pre><code>async function peekFirstChunk(stream) {
  const reader = stream.getReader();
  const { value } = await reader.read();
  // Oops — forgot to call reader.releaseLock()
  // And the reader is no longer available when we return
  return value;
}

const first = await peekFirstChunk(stream);
// TypeError: Cannot obtain lock — stream is permanently locked
for await (const chunk of stream) { /* never runs */ }</code></pre>
            <p>Forgetting <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/releaseLock"><code><u>releaseLock()</u></code></a> permanently breaks the stream. The <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/locked"><code><u>locked</u></code></a><code> </code>property tells you that a stream is locked, but not why, by whom, or whether the lock is even still usable. <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/pipeTo"><u>Piping</u></a> internally acquires locks, making streams unusable during pipe operations in ways that aren't obvious.</p><p>The semantics around releasing locks with pending reads were also unclear for years. If you called read() but didn't await it, then called releaseLock(), what happened? The spec was recently clarified to cancel pending reads on lock release – but implementations varied, and code that relied on the previous unspecified behavior can break.</p><p>That said, it's important to recognize that locking in itself is not bad. It does, in fact, serve an important purpose to ensure that applications properly and orderly consume or produce data. The key challenge is with the original manual implementation of it using APIs like <code>getReader() </code>and <code>releaseLock()</code>. With the arrival of automatic lock and reader management with async iterables, dealing with locks from the users point of view became a lot easier.</p><p>For implementers, the locking model adds a fair amount of non-trivial internal bookkeeping. Every operation must check lock state, readers must be tracked, and the interplay between locks, cancellation, and error states creates a matrix of edge cases that must all be handled correctly.</p>
    <div>
      <h4>BYOB: complexity without payoff</h4>
      <a href="#byob-complexity-without-payoff">
        
      </a>
    </div>
    <p><a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader"><u>BYOB (bring your own buffer)</u></a> reads were designed to let developers reuse memory buffers when reading from streams, an important optimization intended for high-throughput scenarios. The idea is sound: instead of allocating new buffers for each chunk, you provide your own buffer and the stream fills it.</p><p>In practice, (and yes, there are always exceptions to be found) BYOB is rarely used to any measurable benefit. The API is substantially more complex than default reads, requiring a separate reader type (<code>ReadableStreamBYOBReader</code>) and other specialized classes (e.g. <code>ReadableStreamBYOBRequest</code>), careful buffer lifecycle management, and understanding of <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer#transferring_arraybuffers"><code><u>ArrayBuffer</u></code><u> detachment</u></a> semantics. When you pass a buffer to a BYOB read, the buffer becomes detached – transferred to the stream – and you get back a different view over potentially different memory. This transfer-based model is error-prone and confusing:</p>
            <pre><code>const reader = stream.getReader({ mode: 'byob' });
const buffer = new ArrayBuffer(1024);
let view = new Uint8Array(buffer);

const result = await reader.read(view);
// 'view' should now be detached and unusable
// (it isn't always in every impl)
// result.value is a NEW view, possibly over different memory
view = result.value; // Must reassign</code></pre>
            <p>BYOB also can't be used with async iteration or TransformStreams, so developers who want zero-copy reads are forced back into the manual reader loop.</p><p>For implementers, BYOB adds significant complexity. The stream must track pending BYOB requests, handle partial fills, manage buffer detachment correctly, and coordinate between the BYOB reader and the underlying source. The <a href="https://github.com/web-platform-tests/wpt/tree/master/streams/readable-byte-streams"><u>Web Platform Tests for readable byte streams</u></a> include dedicated test files just for BYOB edge cases: detached buffers, bad views, response-after-enqueue ordering, and more.</p><p>BYOB ends up being complex for both users and implementers, yet sees little adoption in practice. Most developers stick with default reads and accept the allocation overhead.</p><p>Most userland implementations of custom ReadableStream instances do not typically bother with all the ceremony required to correctly implement both default and BYOB read support in a single stream – and for good reason. It's difficult to get right and most of the time consuming code is typically going to fallback on the default read path. The example below shows what a "correct" implementation would need to do. It's big, complex, and error prone, and not a level of complexity that the typical developer really wants to have to deal with:</p>
            <pre><code>new ReadableStream({
    type: 'bytes',
    
    async pull(controller: ReadableByteStreamController) {      
      if (offset &gt;= totalBytes) {
        controller.close();
        return;
      }
      
      // Check for BYOB request FIRST
      const byobRequest = controller.byobRequest;
      
      if (byobRequest) {
        // === BYOB PATH ===
        // Consumer provided a buffer - we MUST fill it (or part of it)
        const view = byobRequest.view!;
        const bytesAvailable = totalBytes - offset;
        const bytesToWrite = Math.min(view.byteLength, bytesAvailable);
        
        // Create a view into the consumer's buffer and fill it
        // not critical but safer when bytesToWrite != view.byteLength
        const dest = new Uint8Array(
          view.buffer,
          view.byteOffset,
          bytesToWrite
        );
        
        // Fill with sequential bytes (our "data source")
        // Can be any thing here that writes into the view
        for (let i = 0; i &lt; bytesToWrite; i++) {
          dest[i] = (offset + i) &amp; 0xFF;
        }
        
        offset += bytesToWrite;
        
        // Signal how many bytes we wrote
        byobRequest.respond(bytesToWrite);
        
      } else {
        // === DEFAULT READER PATH ===
        // No BYOB request - allocate and enqueue a chunk
        const bytesAvailable = totalBytes - offset;
        const chunkSize = Math.min(1024, bytesAvailable);
        
        const chunk = new Uint8Array(chunkSize);
        for (let i = 0; i &lt; chunkSize; i++) {
          chunk[i] = (offset + i) &amp; 0xFF;
        }
        
        offset += chunkSize;
        controller.enqueue(chunk);
      }
    },
    
    cancel(reason) {
      console.log('Stream canceled:', reason);
    }
  });</code></pre>
            <p>When a host runtime provides a byte-oriented ReadableStream from the runtime itself, for instance, as the <code>body </code>of a fetch <code>Response</code>, it is often far easier for the runtime itself to provide an optimized implementation of BYOB reads, but those still need to be capable of handling both default and BYOB reading patterns and that requirement brings with it a fair amount of complexity.</p>
    <div>
      <h4>Backpressure: good in theory, broken in practice</h4>
      <a href="#backpressure-good-in-theory-broken-in-practice">
        
      </a>
    </div>
    <p>Backpressure – the ability for a slow consumer to signal a fast producer to slow down – is a first-class concept in Web streams. In theory. In practice, the model has some serious flaws.</p><p>The primary signal is <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultController/desiredSize"><code><u>desiredSize</u></code></a> on the controller. It can be positive (wants data), zero (at capacity), negative (over capacity), or null (closed). Producers are supposed to check this value and stop enqueueing when it's not positive. But there's nothing enforcing this: <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultController/enqueue"><code><u>controller.enqueue()</u></code></a> always succeeds, even when desiredSize is deeply negative.</p>
            <pre><code>new ReadableStream({
  start(controller) {
    // Nothing stops you from doing this
    while (true) {
      controller.enqueue(generateData()); // desiredSize: -999999
    }
  }
});</code></pre>
            <p>Stream implementations can and do ignore backpressure; and some spec-defined features explicitly break backpressure. <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/tee"><code><u>tee()</u></code></a>, for instance, creates two branches from a single stream. If one branch reads faster than the other, data accumulates in an internal buffer with no limit. A fast consumer can cause unbounded memory growth while the slow consumer catches up, and there's no way to configure this or opt out beyond canceling the slower branch.</p><p>Web streams do provide clear mechanisms for tuning backpressure behavior in the form of the <code>highWaterMark</code> option and customizable size calculations, but these are just as easy to ignore as <code>desiredSize</code>, and many applications simply fail to pay attention to them.</p><p>The same issues exist on the <code>WritableStream</code> side. A <code>WritableStream</code> has a <code>highWaterMark</code> and <code>desiredSize</code>. There is a <code>writer.ready</code> promise that producers of data are supposed to pay attention but often don't.</p>
            <pre><code>const writable = getWritableStreamSomehow();
const writer = writable.getWriter();

// Producers are supposed to wait for the writer.ready
// It is a promise that, when resolves, indicates that
// the writables internal backpressure is cleared and
// it is ok to write more data
await writer.ready;
await writer.write(...);</code></pre>
            <p>For implementers, backpressure adds complexity without providing guarantees. The machinery to track queue sizes, compute <code>desiredSize</code>, and invoke <code>pull()</code> at the right times must all be implemented correctly. However, since these signals are advisory, all that work doesn't actually prevent the problems backpressure is supposed to solve.</p>
    <div>
      <h4>The hidden cost of promises</h4>
      <a href="#the-hidden-cost-of-promises">
        
      </a>
    </div>
    <p>The Web streams spec requires promise creation at numerous points, often in hot paths and often invisible to users. Each <code>read()</code> call doesn't just return a promise; internally, the implementation creates additional promises for queue management, <code>pull()</code> coordination, and backpressure signaling.</p><p>This overhead is mandated by the spec's reliance on promises for buffer management, completion, and backpressure signals. While some of it is implementation-specific, much of it is unavoidable if you're following the spec as written. For high-frequency streaming – video frames, network packets, real-time data – this overhead is significant.</p><p>The problem compounds in pipelines. Each <code>TransformStream</code> adds another layer of promise machinery between source and sink. The spec doesn't define synchronous fast paths, so even when data is available immediately, the promise machinery still runs.</p><p>For implementers, this promise-heavy design constrains optimization opportunities. The spec mandates specific promise resolution ordering, making it difficult to batch operations or skip unnecessary async boundaries without risking subtle compliance failures. There are many hidden internal optimizations that implementers do make but these can be complicated and difficult to get right.</p><p>While I was writing this blog post, Vercel's Malte Ubl published their own <a href="https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster"><u>blog post</u></a> describing some research work Vercel has been doing around improving the performance of Node.js' Web streams implementation. In that post they discuss the same fundamental performance optimization problem that every implementation of Web streams face:</p><blockquote><p>"Or consider pipeTo(). Each chunk passes through a full Promise chain: read, write, check backpressure, repeat. An {value, done} result object is allocated per read. Error propagation creates additional Promise branches.</p><p>None of this is wrong. These guarantees matter in the browser where streams cross security boundaries, where cancellation semantics need to be airtight, where you do not control both ends of a pipe. But on the server, when you are piping React Server Components through three transforms at 1KB chunks, the cost adds up.</p><p>We benchmarked native WebStream pipeThrough at 630 MB/s for 1KB chunks. Node.js pipeline() with the same passthrough transform: ~7,900 MB/s. That is a 12x gap, and the difference is almost entirely Promise and object allocation overhead." 
- Malte Ubl, <a href="https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster"><u>https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster</u></a></p></blockquote><p>As part of their research, they have put together a set of proposed improvements for Node.js' Web streams implementation that will eliminate promises in certain code paths which can yield a significant performance boost up to 10x faster, which only goes to prove the point: promises, while useful, add significant overhead. As one of the core maintainers of Node.js, I am looking forward to helping Malte and the folks at Vercel get their proposed improvements landed!</p><p>In a recent update made to Cloudflare Workers, I made similar kinds of modifications to an internal data pipeline that reduced the number of JavaScript promises created in certain application scenarios by up to 200x. The result is several orders of magnitude improvement in performance in those applications.</p>
    <div>
      <h3>Real-world failures</h3>
      <a href="#real-world-failures">
        
      </a>
    </div>
    
    <div>
      <h4>Exhausting resources with unconsumed bodies</h4>
      <a href="#exhausting-resources-with-unconsumed-bodies">
        
      </a>
    </div>
    <p>When <code>fetch()</code> returns a response, the body is a <a href="https://developer.mozilla.org/en-US/docs/Web/API/Response/body"><code><u>ReadableStream</u></code></a>. If you only check the status and don't consume or cancel the body, what happens? The answer varies by implementation, but a common outcome is resource leakage.</p>
            <pre><code>async function checkEndpoint(url) {
  const response = await fetch(url);
  return response.ok; // Body is never consumed or cancelled
}

// In a loop, this can exhaust connection pools
for (const url of urls) {
  await checkEndpoint(url);
}</code></pre>
            <p>This pattern has caused connection pool exhaustion in Node.js applications using <a href="https://nodejs.org/api/globals.html#fetch"><u>undici</u></a> (the <code>fetch() </code>implementation built into Node.js), and similar issues have appeared in other runtimes. The stream holds a reference to the underlying connection, and without explicit consumption or cancellation, the connection may linger until garbage collection – which may not happen soon enough under load.</p><p>The problem is compounded by APIs that implicitly create stream branches. <a href="https://developer.mozilla.org/en-US/docs/Web/API/Request/clone"><code><u>Request.clone()</u></code></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Response/clone"><code><u>Response.clone()</u></code></a> perform implicit <code>tee()</code> operations on the body stream – a detail that's easy to miss. Code that clones a request for logging or retry logic may unknowingly create branched streams that need independent consumption, multiplying the resource management burden.</p><p>Now, to be certain, these types of issues <i>are</i> implementation bugs. The connection leak was definitely something that undici needed to fix in its own implementation, but the complexity of the specification does not make dealing with these types of issues easy.</p><blockquote><p>"Cloning streams in Node.js's fetch() implementation is harder than it looks. When you clone a request or response body, you're calling tee() - which splits a single stream into two branches that both need to be consumed. If one consumer reads faster than the other, data buffers unbounded in memory waiting for the slow branch. If you don't properly consume both branches, the underlying connection leaks. The coordination required between two readers sharing one source makes it easy to accidentally break the original request or exhaust connection pools. It's a simple API call with complex underlying mechanics that are difficult to get right." - Matteo Collina, Ph.D. - Platformatic Co-Founder &amp; CTO, Node.js Technical Steering Committee Chair</p></blockquote>
    <div>
      <h4>Falling headlong off the tee() memory cliff</h4>
      <a href="#falling-headlong-off-the-tee-memory-cliff">
        
      </a>
    </div>
    <p><a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/tee"><code><u>tee()</u></code></a> splits a stream into two branches. It seems straightforward, but the implementation requires buffering: if one branch is read faster than the other, the data must be held somewhere until the slower branch catches up.</p>
            <pre><code>const [forHash, forStorage] = response.body.tee();

// Hash computation is fast
const hash = await computeHash(forHash);

// Storage write is slow — meanwhile, the entire stream
// may be buffered in memory waiting for this branch
await writeToStorage(forStorage);</code></pre>
            <p>The spec does not mandate buffer limits for <code>tee()</code>. And to be fair, the spec allows implementations to implement the actual internal mechanisms for <code>tee()</code>and other APIs in any way they see fit so long as the observable normative requirements of the specification are met. But if an implementation chooses to implement <code>tee()</code> in the specific way described by the streams specification, then <code>tee()</code> will come with a built-in memory management issue that is difficult to work around.</p><p>Implementations have had to develop their own strategies for dealing with this. Firefox initially used a linked-list approach that led to O<code>(n)</code> memory growth proportional to the consumption rate difference. In Cloudflare Workers, we opted to implement a shared buffer model where backpressure is signaled by the slowest consumer rather than the fastest.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5cl4vqYfaHaVXiHjLSXv0a/03a0b9fe4c9c0594e181ffee43b63998/2.png" />
          </figure>
    <div>
      <h4>Transform backpressure gaps</h4>
      <a href="#transform-backpressure-gaps">
        
      </a>
    </div>
    <p><code>TransformStream</code> creates a <code>readable/writable</code> pair with processing logic in between. The <code>transform()</code> function executes on <i>write</i>, not on read. Processing of the transform happens eagerly as data arrives, regardless of whether any consumer is ready. This causes unnecessary work when consumers are slow, and the backpressure signaling between the two sides has gaps that can cause unbounded buffering under load. The expectation in the spec is that the producer of the data being transformed is paying attention to the <code>writer.ready</code> signal on the writable side of the transform but quite often producers just simply ignore it.</p><p>If the transform's <code>transform() </code>operation is synchronous and always enqueues output immediately, it never signals backpressure back to the writable side even when the downstream consumer is slow. This is a consequence of the spec design that many developers completely overlook. In browsers, where there's only a single user and typically only a small number of stream pipelines active at any given time, this type of foot gun is often of no consequence, but it has a major impact on server-side or edge performance in runtimes that serve thousands of concurrent requests.</p>
            <pre><code>const fastTransform = new TransformStream({
  transform(chunk, controller) {
    // Synchronously enqueue — this never applies backpressure
    // Even if the readable side's buffer is full, this succeeds
    controller.enqueue(processChunk(chunk));
  }
});

// Pipe a fast source through the transform to a slow sink
fastSource
  .pipeThrough(fastTransform)
  .pipeTo(slowSink);  // Buffer grows without bound</code></pre>
            <p>What TransformStreams are supposed to do is check for backpressure on the controller and use promises to communicate that back to the writer:</p>
            <pre><code>const fastTransform = new TransformStream({
  async transform(chunk, controller) {
    if (controller.desiredSize &lt;= 0) {
      // Wait on the backpressure to clear somehow
    }

    controller.enqueue(processChunk(chunk));
  }
});</code></pre>
            <p>A difficulty here, however, is that the <code>TransformStreamDefaultController</code> does not have a ready promise mechanism like Writers do; so the <code>TransformStream</code> implementation would need to implement a polling mechanism to periodically check when <code>controller.desiredSize</code> becomes positive again.</p><p>The problem gets worse in pipelines. When you chain multiple transforms – say, parse, transform, then serialize – each <code>TransformStream</code> has its own internal readable and writable buffers. If implementers follow the spec strictly, data cascades through these buffers in a push-oriented fashion: the source pushes to transform A, which pushes to transform B, which pushes to transform C, each accumulating data in intermediate buffers before the final consumer has even started pulling. With three transforms, you can have six internal buffers filling up simultaneously.</p><p>Developers using the streams API are expected to remember to use options like <code>highWaterMark</code> when creating their sources, transforms, and writable destinations but often they either forget or simply choose to ignore it.</p>
            <pre><code>source
  .pipeThrough(parse)      // buffers filling...
  .pipeThrough(transform)  // more buffers filling...
  .pipeThrough(serialize)  // even more buffers...
  .pipeTo(destination);    // consumer hasn't started yet</code></pre>
            <p>Implementations have found ways to optimize transform pipelines by collapsing identity transforms, short-circuiting non-observable paths, deferring buffer allocation, or falling back to native code that does not run JavaScript at all. Deno, Bun, and Cloudflare Workers have all successfully implemented "native path" optimizations that can help eliminate much of the overhead, and Vercel's recent <a href="https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster"><u>fast-webstreams</u></a> research is working on similar optimizations for Node.js. But the optimizations themselves add significant complexity and still can't fully escape the inherently push-oriented model that TransformStream uses.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64FcAUPYrTvOSYOPoT2FkR/cc91e0d32dd47320e8ac9d6f431a2fda/3.png" />
          </figure>
    <div>
      <h4>GC thrashing in server-side rendering</h4>
      <a href="#gc-thrashing-in-server-side-rendering">
        
      </a>
    </div>
    <p>Streaming server-side rendering (SSR) is a particularly painful case. A typical SSR stream might render thousands of small HTML fragments, each passing through the streams machinery:</p>
            <pre><code>// Each component enqueues a small chunk
function renderComponent(controller) {
  controller.enqueue(encoder.encode(`&lt;div&gt;${content}&lt;/div&gt;`));
}

// Hundreds of components = hundreds of enqueue calls
// Each one triggers promise machinery internally
for (const component of components) {
  renderComponent(controller);  // Promises created, objects allocated
}</code></pre>
            <p>Every fragment means promises created for <code>read()</code> calls, promises for backpressure coordination, intermediate buffer allocations, and <code>{ value, done } </code>result objects – most of which become garbage almost immediately.</p><p>Under load, this creates GC pressure that can devastate throughput. The JavaScript engine spends significant time collecting short-lived objects instead of doing useful work. Latency becomes unpredictable as GC pauses interrupt request handling. I've seen SSR workloads where garbage collection accounts for a substantial portion (up to and beyond 50%) of total CPU time per request. That's time that could be spent actually rendering content.</p><p>The irony is that streaming SSR is supposed to improve performance by sending content incrementally. But the overhead of the streams machinery can negate those gains, especially for pages with many small components. Developers sometimes find that buffering the entire response is actually faster than streaming through Web streams, defeating the purpose entirely.</p>
    <div>
      <h3>The optimization treadmill</h3>
      <a href="#the-optimization-treadmill">
        
      </a>
    </div>
    <p>To achieve usable performance, every major runtime has resorted to non-standard internal optimizations for Web streams. Node.js, Deno, Bun, and Cloudflare Workers have all developed their own workarounds. This is particularly true for streams wired up to system-level I/O, where much of the machinery is non-observable and can be short-circuited.</p><p>Finding these optimization opportunities can itself be a significant undertaking. It requires end-to-end understanding of the spec to identify which behaviors are observable and which can safely be elided. Even then, whether a given optimization is actually spec-compliant is often unclear. Implementers must make judgment calls about which semantics they can relax without breaking compatibility. This puts enormous pressure on runtime teams to become spec experts just to achieve acceptable performance.</p><p>These optimizations are difficult to implement, frequently error-prone, and lead to inconsistent behavior across runtimes. Bun's "<a href="https://bun.sh/docs/api/streams#direct-readablestream"><u>Direct Streams</u></a>" optimization takes a deliberately and observably non-standard approach, bypassing much of the spec's machinery entirely. Cloudflare Workers' <a href="https://developers.cloudflare.com/workers/runtime-apis/streams/transformstream/"><code><u>IdentityTransformStream</u></code></a> provides a fast-path for pass-through transforms but is Workers-specific and implements behaviors that are not standard for a <code>TransformStream</code>. Each runtime has its own set of tricks and the natural tendency is toward non-standard solutions, because that's often the only way to make things fast.</p><p>This fragmentation hurts portability. Code that performs well on one runtime may behave differently (or poorly) on another, even though it's using "standard" APIs. The complexity burden on runtime implementers is substantial, and the subtle behavioral differences create friction for developers trying to write cross-runtime code, particularly those maintaining frameworks that must be able to run efficiently across many runtime environments.</p><p>It is also necessary to emphasize that many optimizations are only possible in parts of the spec that are unobservable to user code. The alternative, like Bun "Direct Streams", is to intentionally diverge from the spec-defined observable behaviors. This means optimizations often feel "incomplete". They work in some scenarios but not in others, in some runtimes but not others, etc. Every such case adds to the overall unsustainable complexity of the Web streams approach which is why most runtime implementers rarely put significant effort into further improvements to their streams implementations once the conformance tests are passing.</p><p>Implementers shouldn't need to jump through these hoops. When you find yourself needing to relax or bypass spec semantics just to achieve reasonable performance, that's a sign something is wrong with the spec itself. A well-designed streaming API should be efficient by default, not require each runtime to invent its own escape hatches.</p>
    <div>
      <h3>The compliance burden</h3>
      <a href="#the-compliance-burden">
        
      </a>
    </div>
    <p>A complex spec creates complex edge cases. The <a href="https://github.com/web-platform-tests/wpt/tree/master/streams"><u>Web Platform Tests for streams</u></a> span over 70 test files, and while comprehensive testing is a good thing, what's telling is what needs to be tested.</p><p>Consider some of the more obscure tests that implementations must pass:</p><ul><li><p>Prototype pollution defense: One test patches <code>Object.prototype.</code>then to intercept promise resolutions, then verifies that <code>pipeTo()</code> and <code>tee()</code> operations don't leak internal values through the prototype chain. This tests a security property that only exists because the spec's promise-heavy internals create an attack surface.</p></li><li><p>WebAssembly memory rejection: BYOB reads must explicitly reject ArrayBuffers backed by WebAssembly memory, which look like regular buffers but can't be transferred. This edge case exists because of the spec's buffer detachment model – a simpler API wouldn't need to handle it.</p></li><li><p>Crash regression for state machine conflicts: A test specifically checks that calling <code>byobRequest.respond()</code> after <code>enqueue()</code> doesn't crash the runtime. This sequence creates a conflict in the internal state machine — the <code>enqueue()</code> fulfills the pending read and should invalidate the <code>byobRequest</code>, but implementations must gracefully handle the subsequent <code>respond()</code> rather than corrupting memory in order to cover the very likely possibility that developers are not using the complex API correctly.</p></li></ul><p>These aren't contrived scenarios invented by test authors in total vacuum. They're consequences of the spec's design and reflect real world bugs.</p><p>For runtime implementers, passing the WPT suite means handling intricate corner cases that most application code will never encounter. The tests encode not just the happy path but the full matrix of interactions between readers, writers, controllers, queues, strategies, and the promise machinery that connects them all.</p><p>A simpler API would mean fewer concepts, fewer interactions between concepts, and fewer edge cases to get right resulting in more confidence that implementations actually behave consistently.</p>
    <div>
      <h3>The takeaway</h3>
      <a href="#the-takeaway">
        
      </a>
    </div>
    <p>Web streams are complex for users and implementers alike. The problems with the spec aren't bugs. They emerge from using the API exactly as designed. They aren't issues that can be fixed solely through incremental improvements. They're consequences of fundamental design choices. To improve things we need different foundations.</p>
    <div>
      <h2>A better streams API is possible</h2>
      <a href="#a-better-streams-api-is-possible">
        
      </a>
    </div>
    <p>After implementing the Web streams spec multiple times across different runtimes and seeing the pain points firsthand, I decided it was time to explore what a better, alternative streaming API could look like if designed from first principles today.</p><p>What follows is a proof of concept: it's not a finished standard, not a production-ready library, not even necessarily a concrete proposal for something new, but a starting point for discussion that demonstrates the problems with Web streams aren't inherent to streaming itself; they're consequences of specific design choices that could be made differently. Whether this exact API is the right answer is less important than whether it sparks a productive conversation about what we actually need from a streaming primitive.</p>
    <div>
      <h3>What is a stream?</h3>
      <a href="#what-is-a-stream">
        
      </a>
    </div>
    <p>Before diving into API design, it's worth asking: what is a stream?</p><p>At its core, a stream is just a sequence of data that arrives over time. You don't have all of it at once. You process it incrementally as it becomes available.</p><p>Unix pipes are perhaps the purest expression of this idea:</p>
            <pre><code>cat access.log | grep "error" | sort | uniq -c</code></pre>
            <p>
Data flows left to right. Each stage reads input, does its work, writes output. There's no pipe reader to acquire, no controller lock to manage. If a downstream stage is slow, upstream stages naturally slow down as well. Backpressure is implicit in the model, not a separate mechanism to learn (or ignore).</p><p>In JavaScript, the natural primitive for "a sequence of things that arrive over time" is already in the language: the async iterable. You consume it with <code>for await...of</code>. You stop consuming by stopping iteration.</p><p>This is the intuition the new API tries to preserve: streams should feel like iteration, because that's what they are. The complexity of Web streams – readers, writers, controllers, locks, queuing strategies – obscures this fundamental simplicity. A better API should make the simple case simple and only add complexity where it's genuinely needed.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3AUAA4bitbTOVSQg7Pd7fv/0856b44d78899dcffc4493f4146fb64f/4.png" />
          </figure>
    <div>
      <h3>Design principles</h3>
      <a href="#design-principles">
        
      </a>
    </div>
    <p>I built the proof-of-concept alternative around a different set of principles.</p>
    <div>
      <h4>Streams are iterables.</h4>
      <a href="#streams-are-iterables">
        
      </a>
    </div>
    <p>No custom <code>ReadableStream</code> class with hidden internal state. A readable stream is just an <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols#the_async_iterator_and_async_iterable_protocols"><code><u>AsyncIterable&lt;Uint8Array[]&gt;</u></code></a>. You consume it with <code>for await...of</code>. No readers to acquire, no locks to manage.</p>
    <div>
      <h4>Pull-through transforms</h4>
      <a href="#pull-through-transforms">
        
      </a>
    </div>
    <p>Transforms don't execute until the consumer pulls. There's no eager evaluation, no hidden buffering. Data flows on-demand from source, through transforms, to the consumer. If you stop iterating, processing stops.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bEXBTEOHBMnCRKGA7odt5/cf51074cce3bb8b2ec1b5158c7560b68/5.png" />
          </figure>
    <div>
      <h4>Explicit backpressure</h4>
      <a href="#explicit-backpressure">
        
      </a>
    </div>
    <p>Backpressure is strict by default. When a buffer is full, writes reject rather than silently accumulating. You can configure alternative policies – block until space is available, drop oldest, drop newest – but you have to choose explicitly. No more silent memory growth.</p>
    <div>
      <h4>Batched chunks</h4>
      <a href="#batched-chunks">
        
      </a>
    </div>
    <p>Instead of yielding one chunk per iteration, streams yield <code>Uint8Array[]:</code> arrays of chunks. This amortizes the async overhead across multiple chunks, reducing promise creation and microtask latency in hot paths.</p>
    <div>
      <h4>Bytes only</h4>
      <a href="#bytes-only">
        
      </a>
    </div>
    <p>The API deals exclusively with bytes (<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array"><code><u>Uint8Array</u></code></a>). Strings are UTF-8 encoded automatically. There's no "value stream" vs "byte stream" dichotomy. If you want to stream arbitrary JavaScript values, use async iterables directly. While the API uses <code>Uint8Array</code>, it treats chunks as opaque. There is no partial consumption, no BYOB patterns, no byte-level operations within the streaming machinery itself. Chunks go in, chunks come out, unchanged unless a transform explicitly modifies them.</p>
    <div>
      <h4>Synchronous fast paths matter</h4>
      <a href="#synchronous-fast-paths-matter">
        
      </a>
    </div>
    <p>The API recognizes that synchronous data sources are both necessary and common. The application should not be forced to always accept the performance cost of asynchronous scheduling simply because that's the only option provided. At the same time, mixing sync and async processing can be dangerous. Synchronous paths should always be an option and should always be explicit.</p>
    <div>
      <h3>The new API in action</h3>
      <a href="#the-new-api-in-action">
        
      </a>
    </div>
    
    <div>
      <h4>Creating and consuming streams</h4>
      <a href="#creating-and-consuming-streams">
        
      </a>
    </div>
    <p>In Web streams, creating a simple producer/consumer pair requires <code>TransformStream</code>, manual encoding, and careful lock management:</p>
            <pre><code>const { readable, writable } = new TransformStream();
const enc = new TextEncoder();
const writer = writable.getWriter();
await writer.write(enc.encode("Hello, World!"));
await writer.close();
writer.releaseLock();

const dec = new TextDecoder();
let text = '';
for await (const chunk of readable) {
  text += dec.decode(chunk, { stream: true });
}
text += dec.decode();</code></pre>
            <p>Even this relatively clean version requires: a <code>TransformStream</code>, manual <code>TextEncoder</code> and <code>TextDecoder</code>, and explicit lock release.</p><p>Here's the equivalent with the new API:</p>
            <pre><code>import { Stream } from 'new-streams';

// Create a push stream
const { writer, readable } = Stream.push();

// Write data — backpressure is enforced
await writer.write("Hello, World!");
await writer.end();

// Consume as text
const text = await Stream.text(readable);</code></pre>
            <p>The readable is just an async iterable. You can pass it to any function that expects one, including <code>Stream.text()</code> which collects and decodes the entire stream.</p><p>The writer has a simple interface: <code>write(), writev()</code> for batched writes, <code>end()</code> to signal completion, and <code>abort()</code> for errors. That's essentially it.</p><p>The Writer is not a concrete class. Any object that implements <code>write()</code>, <code>end()</code>, and <code>abort()</code> can be a writer making it easy to adapt existing APIs or create specialized implementations without subclassing. There's no complex <code>UnderlyingSink</code> protocol with <code>start()</code>, <code>write()</code>, <code>close()</code>, <code>and abort() </code>callbacks that must coordinate through a controller whose lifecycle and state are independent of the <code>WritableStream</code> it is bound to.</p><p>Here's a simple in-memory writer that collects all written data:</p>
            <pre><code>// A minimal writer implementation — just an object with methods
function createBufferWriter() {
  const chunks = [];
  let totalBytes = 0;
  let closed = false;

  const addChunk = (chunk) =&gt; {
    chunks.push(chunk);
    totalBytes += chunk.byteLength;
  };

  return {
    get desiredSize() { return closed ? null : 1; },

    // Async variants
    write(chunk) { addChunk(chunk); },
    writev(batch) { for (const c of batch) addChunk(c); },
    end() { closed = true; return totalBytes; },
    abort(reason) { closed = true; chunks.length = 0; },

    // Sync variants return boolean (true = accepted)
    writeSync(chunk) { addChunk(chunk); return true; },
    writevSync(batch) { for (const c of batch) addChunk(c); return true; },
    endSync() { closed = true; return totalBytes; },
    abortSync(reason) { closed = true; chunks.length = 0; return true; },

    getChunks() { return chunks; }
  };
}

// Use it
const writer = createBufferWriter();
await Stream.pipeTo(source, writer);
const allData = writer.getChunks();</code></pre>
            <p>No base class to extend, no abstract methods to implement, no controller to coordinate with. Just an object with the right shape.</p>
    <div>
      <h4>Pull-through transforms</h4>
      <a href="#pull-through-transforms">
        
      </a>
    </div>
    <p>Under the new API design, transforms should not perform any work until the data is being consumed. This is a fundamental principle.</p>
            <pre><code>// Nothing executes until iteration begins
const output = Stream.pull(source, compress, encrypt);

// Transforms execute as we iterate
for await (const chunks of output) {
  for (const chunk of chunks) {
    process(chunk);
  }
}</code></pre>
            <p><code>Stream.pull()</code> creates a lazy pipeline. The <code>compress</code> and <code>encrypt</code> transforms don't run until you start iterating output. Each iteration pulls data through the pipeline on demand.</p><p>This is fundamentally different from Web streams' <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/pipeThrough"><code><u>pipeThrough()</u></code></a>, which starts actively pumping data from the source to the transform as soon as you set up the pipe. Pull semantics mean you control when processing happens, and stopping iteration stops processing.</p><p>Transforms can be stateless or stateful. A stateless transform is just a function that takes chunks and returns transformed chunks:</p>
            <pre><code>// Stateless transform — a pure function
// Receives chunks or null (flush signal)
const toUpperCase = (chunks) =&gt; {
  if (chunks === null) return null; // End of stream
  return chunks.map(chunk =&gt; {
    const str = new TextDecoder().decode(chunk);
    return new TextEncoder().encode(str.toUpperCase());
  });
};

// Use it directly
const output = Stream.pull(source, toUpperCase);</code></pre>
            <p>Stateful transforms are simple objects with member functions that maintain state across calls:</p>
            <pre><code>// Stateful transform — a generator that wraps the source
function createLineParser() {
  // Helper to concatenate Uint8Arrays
  const concat = (...arrays) =&gt; {
    const result = new Uint8Array(arrays.reduce((n, a) =&gt; n + a.length, 0));
    let offset = 0;
    for (const arr of arrays) { result.set(arr, offset); offset += arr.length; }
    return result;
  };

  return {
    async *transform(source) {
      let pending = new Uint8Array(0);
      
      for await (const chunks of source) {
        if (chunks === null) {
          // Flush: yield any remaining data
          if (pending.length &gt; 0) yield [pending];
          continue;
        }
        
        // Concatenate pending data with new chunks
        const combined = concat(pending, ...chunks);
        const lines = [];
        let start = 0;

        for (let i = 0; i &lt; combined.length; i++) {
          if (combined[i] === 0x0a) { // newline
            lines.push(combined.slice(start, i));
            start = i + 1;
          }
        }

        pending = combined.slice(start);
        if (lines.length &gt; 0) yield lines;
      }
    }
  };
}

const output = Stream.pull(source, createLineParser());</code></pre>
            <p>For transforms that need cleanup on abort, add an abort handler:</p>
            <pre><code>// Stateful transform with resource cleanup
function createGzipCompressor() {
  // Hypothetical compression API...
  const deflate = new Deflater({ gzip: true });

  return {
    async *transform(source) {
      for await (const chunks of source) {
        if (chunks === null) {
          // Flush: finalize compression
          deflate.push(new Uint8Array(0), true);
          if (deflate.result) yield [deflate.result];
        } else {
          for (const chunk of chunks) {
            deflate.push(chunk, false);
            if (deflate.result) yield [deflate.result];
          }
        }
      }
    },
    abort(reason) {
      // Clean up compressor resources on error/cancellation
    }
  };
}</code></pre>
            <p>For implementers, there's no Transformer protocol with <code>start()</code>, <code>transform()</code>, <code>flush()</code> methods and controller coordination passed into a <code>TransformStream</code> class that has its own hidden state machine and buffering mechanisms. Transforms are just functions or simple objects: far simpler to implement and test.</p>
    <div>
      <h4>Explicit backpressure policies</h4>
      <a href="#explicit-backpressure-policies">
        
      </a>
    </div>
    <p>When a bounded buffer fills up and a producer wants to write more, there are only a few things you can do:</p><ol><li><p>Reject the write: refuse to accept more data</p></li><li><p>Wait: block until space becomes available</p></li><li><p>Discard old data: evict what's already buffered to make room</p></li><li><p>Discard new data: drop what's incoming</p></li></ol><p>That's it. Any other response is either a variation of these (like "resize the buffer," which is really just deferring the choice) or domain-specific logic that doesn't belong in a general streaming primitive. Web streams currently always choose Wait by default.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/68339c8QsvNmb7JcZ2lSDO/e52a86a9b8f52b52eb9328d5ee58f23a/6.png" />
          </figure><p>The new API makes you choose one of these four explicitly:</p><ul><li><p><code>strict</code> (default): Rejects writes when the buffer is full and too many writes are pending. Catches "fire-and-forget" patterns where producers ignore backpressure.</p></li><li><p><code>block</code>: Writes wait until buffer space is available. Use when you trust the producer to await writes properly.</p></li><li><p><code>drop-oldest</code>: Drops the oldest buffered data to make room. Useful for live feeds where stale data loses value.</p></li><li><p><code>drop-newest</code>: Discards incoming data when full. Useful when you want to process what you have without being overwhelmed.</p></li></ul>
            <pre><code>const { writer, readable } = Stream.push({
  highWaterMark: 10,
  backpressure: 'strict' // or 'block', 'drop-oldest', 'drop-newest'
});</code></pre>
            <p>No more hoping producers cooperate. The policy you choose determines what happens when the buffer fills.</p><p>Here's how each policy behaves when a producer writes faster than the consumer reads:</p>
            <pre><code>// strict: Catches fire-and-forget writes that ignore backpressure
const strict = Stream.push({ highWaterMark: 2, backpressure: 'strict' });
strict.writer.write(chunk1);  // ok (not awaited)
strict.writer.write(chunk2);  // ok (fills slots buffer)
strict.writer.write(chunk3);  // ok (queued in pending)
strict.writer.write(chunk4);  // ok (pending buffer fills)
strict.writer.write(chunk5);  // throws! too many pending writes

// block: Wait for space (unbounded pending queue)
const blocking = Stream.push({ highWaterMark: 2, backpressure: 'block' });
await blocking.writer.write(chunk1);  // ok
await blocking.writer.write(chunk2);  // ok
await blocking.writer.write(chunk3);  // waits until consumer reads
await blocking.writer.write(chunk4);  // waits until consumer reads
await blocking.writer.write(chunk5);  // waits until consumer reads

// drop-oldest: Discard old data to make room
const dropOld = Stream.push({ highWaterMark: 2, backpressure: 'drop-oldest' });
await dropOld.writer.write(chunk1);  // ok
await dropOld.writer.write(chunk2);  // ok
await dropOld.writer.write(chunk3);  // ok, chunk1 discarded

// drop-newest: Discard incoming data when full
const dropNew = Stream.push({ highWaterMark: 2, backpressure: 'drop-newest' });
await dropNew.writer.write(chunk1);  // ok
await dropNew.writer.write(chunk2);  // ok
await dropNew.writer.write(chunk3);  // silently dropped</code></pre>
            
    <div>
      <h4>Explicit Multi-consumer patterns</h4>
      <a href="#explicit-multi-consumer-patterns">
        
      </a>
    </div>
    
            <pre><code>// Share with explicit buffer management
const shared = Stream.share(source, {
  highWaterMark: 100,
  backpressure: 'strict'
});

const consumer1 = shared.pull();
const consumer2 = shared.pull(decompress);</code></pre>
            <p>Instead of <code>tee()</code> with its hidden unbounded buffer, you get explicit multi-consumer primitives. <code>Stream.share()</code> is pull-based: consumers pull from a shared source, and you configure the buffer limits and backpressure policy upfront.</p><p>There's also <code>Stream.broadcast()</code> for push-based multi-consumer scenarios. Both require you to think about what happens when consumers run at different speeds, because that's a real concern that shouldn't be hidden.</p>
    <div>
      <h4>Sync/async separation</h4>
      <a href="#sync-async-separation">
        
      </a>
    </div>
    <p>Not all streaming workloads involve I/O. When your source is in-memory and your transforms are pure functions, async machinery adds overhead without benefit. You're paying for coordination of "waiting" that adds no benefit.</p><p>The new API has complete parallel sync versions: <code>Stream.pullSync()</code>, <code>Stream.bytesSync()</code>, <code>Stream.textSync()</code>, and so on. If your source and transforms are all synchronous, you can process the entire pipeline without a single promise.</p>
            <pre><code>// Async — when source or transforms may be asynchronous
const textAsync = await Stream.text(source);

// Sync — when all components are synchronous
const textSync = Stream.textSync(source);</code></pre>
            <p>Here's a complete synchronous pipeline – compression, transformation, and consumption with zero async overhead:</p>
            <pre><code>// Synchronous source from in-memory data
const source = Stream.fromSync([inputBuffer]);

// Synchronous transforms
const compressed = Stream.pullSync(source, zlibCompressSync);
const encrypted = Stream.pullSync(compressed, aesEncryptSync);

// Synchronous consumption — no promises, no event loop trips
const result = Stream.bytesSync(encrypted);</code></pre>
            <p>The entire pipeline executes in a single call stack. No promises are created, no microtask queue scheduling occurs, and no GC pressure from short-lived async machinery. For CPU-bound workloads like parsing, compression, or transformation of in-memory data, this can be significantly faster than the equivalent Web streams code – which would force async boundaries even when every component is synchronous.</p><p>Web streams has no synchronous path. Even if your source has data ready and your transform is a pure function, you still pay for promise creation and microtask scheduling on every operation. Promises are fantastic for cases in which waiting is actually necessary, but they aren't always necessary. The new API lets you stay in sync-land when that's what you need.</p>
    <div>
      <h4>Bridging the gap between this and web streams</h4>
      <a href="#bridging-the-gap-between-this-and-web-streams">
        
      </a>
    </div>
    <p>The async iterator based approach provides a natural bridge between this alternative approach and Web streams. When coming from a ReadableStream to this new approach, simply passing the readable in as input works as expected when the ReadableStream is set up to yield bytes:</p>
            <pre><code>const readable = getWebReadableStreamSomehow();
const input = Stream.pull(readable, transform1, transform2);
for await (const chunks of input) {
  // process chunks
}</code></pre>
            <p>When adapting to a ReadableStream, a bit more work is required since the alternative approach yields batches of chunks, but the adaptation layer is as easily straightforward:</p>
            <pre><code>async function* adapt(input) {
  for await (const chunks of input) {
    for (const chunk of chunks) {
      yield chunk;
    }
  }
}

const input = Stream.pull(source, transform1, transform2);
const readable = ReadableStream.from(adapt(input));</code></pre>
            
    <div>
      <h4>How this addresses the real-world failures from earlier</h4>
      <a href="#how-this-addresses-the-real-world-failures-from-earlier">
        
      </a>
    </div>
    <ul><li><p>Unconsumed bodies: Pull semantics mean nothing happens until you iterate. No hidden resource retention. If you don't consume a stream, there's no background machinery holding connections open.</p></li><li><p>The <code>tee()</code> memory cliff: <code>Stream.share()</code> requires explicit buffer configuration. You choose the <code>highWaterMark</code> and backpressure policy upfront: no more silent unbounded growth when consumers run at different speeds.</p></li><li><p>Transform backpressure gaps: Pull-through transforms execute on-demand. Data doesn't cascade through intermediate buffers; it flows only when the consumer pulls. Stop iterating, stop processing.</p></li><li><p>GC thrashing in SSR: Batched chunks (<code>Uint8Array[]</code>) amortize async overhead. Sync pipelines via <code>Stream.pullSync()</code> eliminate promise allocation entirely for CPU-bound workloads.</p></li></ul>
    <div>
      <h3>Performance</h3>
      <a href="#performance">
        
      </a>
    </div>
    <p>The design choices have performance implications. Here are benchmarks from the reference implementation of this possible alternative compared to Web streams (Node.js v24.x, Apple M1 Pro, averaged over 10 runs):</p><table><tr><td><p><b>Scenario</b></p></td><td><p><b>Alternative</b></p></td><td><p><b>Web streams</b></p></td><td><p><b>Difference</b></p></td></tr><tr><td><p>Small chunks (1KB × 5000)</p></td><td><p>~13 GB/s</p></td><td><p>~4 GB/s</p></td><td><p>~3× faster</p></td></tr><tr><td><p>Tiny chunks (100B × 10000)</p></td><td><p>~4 GB/s</p></td><td><p>~450 MB/s</p></td><td><p>~8× faster</p></td></tr><tr><td><p>Async iteration (8KB × 1000)</p></td><td><p>~530 GB/s</p></td><td><p>~35 GB/s</p></td><td><p>~15× faster</p></td></tr><tr><td><p>Chained 3× transforms (8KB × 500)</p></td><td><p>~275 GB/s</p></td><td><p>~3 GB/s</p></td><td><p><b>~80–90× faster</b></p></td></tr><tr><td><p>High-frequency (64B × 20000)</p></td><td><p>~7.5 GB/s</p></td><td><p>~280 MB/s</p></td><td><p>~25× faster</p></td></tr></table><p>The chained transform result is particularly striking: pull-through semantics eliminate the intermediate buffering that plagues Web streams pipelines. Instead of each <code>TransformStream</code> eagerly filling its internal buffers, data flows on-demand from consumer to source.</p><p>Now, to be fair, Node.js really has not yet put significant effort into fully optimizing the performance of its Web streams implementation. There's likely significant room for improvement in Node.js' performance results through a bit of applied effort to optimize the hot paths there. That said, running these benchmarks in Deno and Bun also show a significant performance improvement with this alternative iterator based approach than in either of their Web streams implementations as well.</p><p>Browser benchmarks (Chrome/Blink, averaged over 3 runs) show consistent gains as well:</p><table><tr><td><p><b>Scenario</b></p></td><td><p><b>Alternative</b></p></td><td><p><b>Web streams</b></p></td><td><p><b>Difference</b></p></td></tr><tr><td><p>Push 3KB chunks</p></td><td><p>~135k ops/s</p></td><td><p>~24k ops/s</p></td><td><p>~5–6× faster</p></td></tr><tr><td><p>Push 100KB chunks</p></td><td><p>~24k ops/s</p></td><td><p>~3k ops/s</p></td><td><p>~7–8× faster</p></td></tr><tr><td><p>3 transform chain</p></td><td><p>~4.6k ops/s</p></td><td><p>~880 ops/s</p></td><td><p>~5× faster</p></td></tr><tr><td><p>5 transform chain</p></td><td><p>~2.4k ops/s</p></td><td><p>~550 ops/s</p></td><td><p>~4× faster</p></td></tr><tr><td><p>bytes() consumption</p></td><td><p>~73k ops/s</p></td><td><p>~11k ops/s</p></td><td><p>~6–7× faster</p></td></tr><tr><td><p>Async iteration</p></td><td><p>~1.1M ops/s</p></td><td><p>~10k ops/s</p></td><td><p><b>~40–100× faster</b></p></td></tr></table><p>These benchmarks measure throughput in controlled scenarios; real-world performance depends on your specific use case. The difference between Node.js and browser gains reflects the distinct optimization paths each environment takes for Web streams.</p><p>It's worth noting that these benchmarks compare a pure TypeScript/JavaScript implementation of the new API against the native (JavaScript/C++/Rust) implementations of Web streams in each runtime. The new API's reference implementation has had no performance optimization work; the gains come entirely from the design. A native implementation would likely show further improvement.</p><p>The gains illustrate how fundamental design choices compound: batching amortizes async overhead, pull semantics eliminate intermediate buffering, and the freedom for implementations to use synchronous fast paths when data is available immediately all contribute.</p><blockquote><p>"We’ve done a lot to improve performance and consistency in Node streams, but there’s something uniquely powerful about starting from scratch. New streams’ approach embraces modern runtime realities without legacy baggage, and that opens the door to a simpler, performant and more coherent streams model." 
- Robert Nagy, Node.js TSC member and Node.js streams contributor</p></blockquote>
    <div>
      <h2>What's next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>I'm publishing this to start a conversation. What did I get right? What did I miss? Are there use cases that don't fit this model? What would a migration path for this approach look like? The goal is to gather feedback from developers who've felt the pain of Web streams and have opinions about what a better API should look like.</p>
    <div>
      <h3>Try it yourself</h3>
      <a href="#try-it-yourself">
        
      </a>
    </div>
    <p>A reference implementation for this alternative approach is available now and can be found at <a href="https://github.com/jasnell/new-streams"><u>https://github.com/jasnell/new-streams</u></a>.</p><ul><li><p>API Reference: See the <a href="https://github.com/jasnell/new-streams/blob/main/API.md"><u>API.md</u></a> for complete documentation</p></li><li><p>Examples: The <a href="https://github.com/jasnell/new-streams/tree/main/samples"><u>samples directory</u></a> has working code for common patterns</p></li></ul><p>I welcome issues, discussions, and pull requests. If you've run into Web streams problems I haven't covered, or if you see gaps in this approach, let me know. But again, the idea here is not to say "Let's all use this shiny new object!"; it is to kick off a discussion that looks beyond the current status quo of Web Streams and returns back to first principles.</p><p>Web streams was an ambitious project that brought streaming to the web platform when nothing else existed. The people who designed it made reasonable choices given the constraints of 2014 – before async iteration, before years of production experience revealed the edge cases.</p><p>But we've learned a lot since then. JavaScript has evolved. A streaming API designed today can be simpler, more aligned with the language, and more explicit about the things that matter, like backpressure and multi-consumer behavior.</p><p>We deserve a better stream API. So let's talk about what that could look like.</p> ]]></content:encoded>
            <category><![CDATA[Standards]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[TypeScript]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Node.js]]></category>
            <category><![CDATA[Performance]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">37h1uszA2vuOfmXb3oAnZr</guid>
            <dc:creator>James M Snell</dc:creator>
        </item>
        <item>
            <title><![CDATA[How we rebuilt Next.js with AI in one week]]></title>
            <link>https://blog.cloudflare.com/vinext/</link>
            <pubDate>Tue, 24 Feb 2026 20:00:00 GMT</pubDate>
            <description><![CDATA[ One engineer used AI to rebuild Next.js on Vite in a week. vinext builds up to 4x faster, produces 57% smaller bundles, and deploys to Cloudflare Workers with a single command. ]]></description>
            <content:encoded><![CDATA[ <p><sub><i>*This post was updated at 12:35 pm PT to fix a typo in the build time benchmarks.</i></sub></p><p>Last week, one engineer and an AI model rebuilt the most popular front-end framework from scratch. The result, <a href="https://github.com/cloudflare/vinext"><u>vinext</u></a> (pronounced "vee-next"), is a drop-in replacement for Next.js, built on <a href="https://vite.dev/"><u>Vite</u></a>, that deploys to Cloudflare Workers with a single command. In early benchmarks, it builds production apps up to 4x faster and produces client bundles up to 57% smaller. And we already have customers running it in production. </p><p>The whole thing cost about $1,100 in tokens.</p>
    <div>
      <h2>The Next.js deployment problem</h2>
      <a href="#the-next-js-deployment-problem">
        
      </a>
    </div>
    <p><a href="https://nextjs.org/"><u>Next.js</u></a> is the most popular React framework. Millions of developers use it. It powers a huge chunk of the production web, and for good reason. The developer experience is top-notch.</p><p>But Next.js has a deployment problem when used in the broader serverless ecosystem. The tooling is entirely bespoke: Next.js has invested heavily in Turbopack but if you want to deploy it to Cloudflare, Netlify, or AWS Lambda, you have to take that build output and reshape it into something the target platform can actually run.</p><p>If you’re thinking: “Isn’t that what OpenNext does?”, you are correct. </p><p>That is indeed the problem <a href="https://opennext.js.org/"><u>OpenNext</u></a> was built to solve. And a lot of engineering effort has gone into OpenNext from multiple providers, including us at Cloudflare. It works, but quickly runs into limitations and becomes a game of whack-a-mole. </p><p>Building on top of Next.js output as a foundation has proven to be a difficult and fragile approach. Because OpenNext has to reverse-engineer Next.js's build output, this results in unpredictable changes between versions that take a lot of work to correct. </p><p>Next.js has been working on a first-class adapters API, and we've been collaborating with them on it. It's still an early effort but even with adapters, you're still building on the bespoke Turbopack toolchain. And adapters only cover build and deploy. During development, next dev runs exclusively in Node.js with no way to plug in a different runtime. If your application uses platform-specific APIs like Durable Objects, KV, or AI bindings, you can't test that code in dev without workarounds.</p>
    <div>
      <h2>Introducing vinext </h2>
      <a href="#introducing-vinext">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7BCYnb6nCnc9oRBPQnuES5/d217b3582f4fe30597a3b4bf000d9bd7/BLOG-3194_2.png" />
          </figure><p>What if instead of adapting Next.js output, we reimplemented the Next.js API surface on <a href="https://vite.dev/"><u>Vite</u></a> directly? Vite is the build tool used by most of the front-end ecosystem outside of Next.js, powering frameworks like Astro, SvelteKit, Nuxt, and Remix. A clean reimplementation, not merely a wrapper or adapter. We honestly didn't think it would work. But it’s 2026, and the cost of building software has completely changed.</p><p>We got a lot further than we expected.</p>
            <pre><code>npm install vinext</code></pre>
            <p>Replace <code>next</code> with <code>vinext</code> in your scripts and everything else stays the same. Your existing <code>app/</code>, <code>pages/</code>, and <code>next.config.js</code> work as-is.</p>
            <pre><code>vinext dev          # Development server with HMR
vinext build        # Production build
vinext deploy       # Build and deploy to Cloudflare Workers</code></pre>
            <p>This is not a wrapper around Next.js and Turbopack output. It's an alternative implementation of the API surface: routing, server rendering, React Server Components, server actions, caching, middleware. All of it built on top of Vite as a plugin. Most importantly Vite output runs on any platform thanks to the <a href="https://vite.dev/guide/api-environment"><u>Vite Environment API</u></a>.</p>
    <div>
      <h2>The numbers</h2>
      <a href="#the-numbers">
        
      </a>
    </div>
    <p>Early benchmarks are promising. We compared vinext against Next.js 16 using a shared 33-route App Router application.

Both frameworks are doing the same work: compiling, bundling, and preparing server-rendered routes. We disabled TypeScript type checking and ESLint in Next.js's build (Vite doesn't run these during builds), and used force-dynamic so Next.js doesn't spend extra time pre-rendering static routes, which would unfairly slow down its numbers. The goal was to measure only bundler and compilation speed, nothing else. Benchmarks run on GitHub CI on every merge to main. </p><p><b>Production build time:</b></p>
<div><table><colgroup>
<col></col>
<col></col>
<col></col>
</colgroup>
<thead>
  <tr>
    <th><span>Framework</span></th>
    <th><span>Mean</span></th>
    <th><span>vs Next.js</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Next.js 16.1.6 (Turbopack)</span></td>
    <td><span>7.38s</span></td>
    <td><span>baseline</span></td>
  </tr>
  <tr>
    <td><span>vinext (Vite 7 / Rollup)</span></td>
    <td>4.64s</td>
    <td>1.6x faster</td>
  </tr>
  <tr>
    <td><span>vinext (Vite 8 / Rolldown)</span></td>
    <td>1.67s</td>
    <td>4.4x faster</td>
  </tr>
</tbody></table></div><p><b>Client bundle size (gzipped):</b></p>
<div><table><colgroup>
<col></col>
<col></col>
<col></col>
</colgroup>
<thead>
  <tr>
    <th><span>Framework</span></th>
    <th><span>Gzipped</span></th>
    <th><span>vs Next.js</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Next.js 16.1.6</span></td>
    <td><span>168.9 KB</span></td>
    <td><span>baseline</span></td>
  </tr>
  <tr>
    <td><span>vinext (Rollup)</span></td>
    <td><span>74.0 KB</span></td>
    <td><span>56% smaller</span></td>
  </tr>
  <tr>
    <td><span>vinext (Rolldown)</span></td>
    <td><span>72.9 KB</span></td>
    <td><span>57% smaller</span></td>
  </tr>
</tbody></table></div><p>These benchmarks measure compilation and bundling speed, not production serving performance. The test fixture is a single 33-route app, not a representative sample of all production applications. We expect these numbers to evolve as three projects continue to develop. The <a href="https://benchmarks.vinext.workers.dev"><u>full methodology and historical results</u></a> are public. Take them as directional, not definitive.</p><p>The direction is encouraging, though. Vite's architecture, and especially <a href="https://rolldown.rs/"><u>Rolldown</u></a> (the Rust-based bundler coming in Vite 8), has structural advantages for build performance that show up clearly here.</p>
    <div>
      <h2>Deploying to Cloudflare Workers</h2>
      <a href="#deploying-to-cloudflare-workers">
        
      </a>
    </div>
    <p>vinext is built with Cloudflare Workers as the first deployment target. A single command takes you from source code to a running Worker:</p>
            <pre><code>vinext deploy</code></pre>
            <p>This handles everything: builds the application, auto-generates the Worker configuration, and deploys. Both the App Router and Pages Router work on Workers, with full client-side hydration, interactive components, client-side navigation, React state.</p><p>For production caching, vinext includes a Cloudflare KV cache handler that gives you ISR (Incremental Static Regeneration) out of the box:</p>
            <pre><code>import { KVCacheHandler } from "vinext/cloudflare";
import { setCacheHandler } from "next/cache";

setCacheHandler(new KVCacheHandler(env.MY_KV_NAMESPACE));</code></pre>
            <p><a href="https://developers.cloudflare.com/kv/"><u>KV</u></a> is a good default for most applications, but the caching layer is designed to be pluggable. That setCacheHandler call means you can swap in whatever backend makes sense. <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a> might be a better fit for apps with large cached payloads or different access patterns. We're also working on improvements to our Cache API that should provide a strong caching layer with less configuration. The goal is flexibility: pick the caching strategy that fits your app.</p><p>Live examples running right now:</p><ul><li><p><a href="https://app-router-playground.vinext.workers.dev"><u>App Router Playground</u></a></p></li><li><p><a href="https://hackernews.vinext.workers.dev"><u>Hacker News clone</u></a></p></li><li><p><a href="https://app-router-cloudflare.vinext.workers.dev"><u>App Router minimal</u></a></p></li><li><p><a href="https://pages-router-cloudflare.vinext.workers.dev"><u>Pages Router minimal</u></a></p></li></ul><p>We also have <a href="https://next-agents.threepointone.workers.dev/"><u>a live example</u></a> of Cloudflare Agents running in a Next.js app, without the need for workarounds like <a href="https://developers.cloudflare.com/workers/wrangler/api/#getplatformproxy"><u>getPlatformProxy</u></a>, since the entire app now runs in workerd, during both dev and deploy phases. This means being able to use Durable Objects, AI bindings, and every other Cloudflare-specific service without compromise. <a href="https://github.com/cloudflare/vinext-agents-example"><u>Have a look here.</u></a>   </p>
    <div>
      <h2>Frameworks are a team sport</h2>
      <a href="#frameworks-are-a-team-sport">
        
      </a>
    </div>
    <p>The current deployment target is Cloudflare Workers, but that's a small part of the picture. Something like 95% of vinext is pure Vite. The routing, the module shims, the SSR pipeline, the RSC integration: none of it is Cloudflare-specific.</p><p>Cloudflare is looking to work with other hosting providers about adopting this toolchain for their customers (the lift is minimal — we got a proof-of-concept working on <a href="https://vinext-on-vercel.vercel.app/"><u>Vercel</u></a> in less than 30 minutes!). This is an open-source project, and for its long term success, we believe it’s important we work with partners across the ecosystem to ensure ongoing investment. PRs from other platforms are welcome. If you're interested in adding a deployment target, <a href="https://github.com/cloudflare/vinext/issues"><u>open an issue</u></a> or reach out.</p>
    <div>
      <h2>Status: Experimental</h2>
      <a href="#status-experimental">
        
      </a>
    </div>
    <p>We want to be clear: vinext is experimental. It's not even one week old, and it has not yet been battle-tested with any meaningful traffic at scale. If you're evaluating it for a production application, proceed with appropriate caution.</p><p>That said, the test suite is extensive: over 1,700 Vitest tests and 380 Playwright E2E tests, including tests ported directly from the Next.js test suite and OpenNext's Cloudflare conformance suite. We’ve verified it against the Next.js App Router Playground. Coverage sits at 94% of the Next.js 16 API surface.

Early results from real-world customers are encouraging. We've been working with <a href="https://ndstudio.gov/"><u>National Design Studio</u></a>, a team that's aiming to modernize every government interface, on one of their beta sites, <a href="https://www.cio.gov/"><u>CIO.gov</u></a>. They're already running vinext in production, with meaningful improvements in build times and bundle sizes.</p><p>The README is honest about <a href="https://github.com/cloudflare/vinext#whats-not-supported-and-wont-be"><u>what's not supported and won't be</u></a>, and about <a href="https://github.com/cloudflare/vinext#known-limitations"><u>known limitations</u></a>. We want to be upfront rather than overpromise.</p>
    <div>
      <h2>What about pre-rendering?</h2>
      <a href="#what-about-pre-rendering">
        
      </a>
    </div>
    <p>vinext already supports Incremental Static Regeneration (ISR) out of the box. After the first request to any page, it's cached and revalidated in the background, just like Next.js. That part works today.</p><p>vinext does not yet support static pre-rendering at build time. In Next.js, pages without dynamic data get rendered during <code>next build</code> and served as static HTML. If you have dynamic routes, you use <code>generateStaticParams()</code> to enumerate which pages to build ahead of time. vinext doesn't do that… yet.</p><p>This was an intentional design decision for launch. It's  <a href="https://github.com/cloudflare/vinext/issues/9">on the roadmap</a>, but if your site is 100% prebuilt HTML with static content, you probably won't see much benefit from vinext today. That said, if one engineer can spend <span>$</span>1,100 in tokens and rebuild Next.js, you can probably spend $10 and migrate to a Vite-based framework designed specifically for static content, like <a href="https://astro.build/">Astro</a> (which <a href="https://blog.cloudflare.com/astro-joins-cloudflare/">also deploys to Cloudflare Workers</a>).</p><p>For sites that aren't purely static, though, we think we can do something better than pre-rendering everything at build time.</p>
    <div>
      <h2>Introducing Traffic-aware Pre-Rendering</h2>
      <a href="#introducing-traffic-aware-pre-rendering">
        
      </a>
    </div>
    <p>Next.js pre-renders every page listed in <code>generateStaticParams()</code> during the build. A site with 10,000 product pages means 10,000 renders at build time, even though 99% of those pages may never receive a request. Builds scale linearly with page count. This is why large Next.js sites end up with 30-minute builds.</p><p>So we built <b>Traffic-aware Pre-Rendering</b> (TPR). It's experimental today, and we plan to make it the default once we have more real-world testing behind it.</p><p>The idea is simple. Cloudflare is already the reverse proxy for your site. We have your traffic data. We know which pages actually get visited. So instead of pre-rendering everything or pre-rendering nothing, vinext queries Cloudflare's zone analytics at deploy time and pre-renders only the pages that matter.</p>
            <pre><code>vinext deploy --experimental-tpr

  Building...
  Build complete (4.2s)

  TPR (experimental): Analyzing traffic for my-store.com (last 24h)
  TPR: 12,847 unique paths — 184 pages cover 90% of traffic
  TPR: Pre-rendering 184 pages...
  TPR: Pre-rendered 184 pages in 8.3s → KV cache

  Deploying to Cloudflare Workers...
</code></pre>
            <p>For a site with 100,000 product pages, the power law means 90% of traffic usually goes to 50 to 200 pages. Those get pre-rendered in seconds. Everything else falls back to on-demand SSR and gets cached via ISR after the first request. Every new deploy refreshes the set based on current traffic patterns. Pages that go viral get picked up automatically. All of this works without <code>generateStaticParams()</code> and without coupling your build to your production database.</p>
    <div>
      <h2>Taking on the Next.js challenge, but this time with AI</h2>
      <a href="#taking-on-the-next-js-challenge-but-this-time-with-ai">
        
      </a>
    </div>
    <p>A project like this would normally take a team of engineers months, if not years. Several teams at various companies have attempted it, and the scope is just enormous. We tried once at Cloudflare! Two routers, 33+ module shims, server rendering pipelines, RSC streaming, file-system routing, middleware, caching, static export. There's a reason nobody has pulled it off.</p><p>This time we did it in under a week. One engineer (technically engineering manager) directing AI.</p><p>The first commit landed on February 13. By the end of that same evening, both the Pages Router and App Router had basic SSR working, along with middleware, server actions, and streaming. By the next afternoon, <a href="https://app-router-playground.vinext.workers.dev"><u>App Router Playground</u></a> was rendering 10 of 11 routes. By day three, <code>vinext deploy</code> was shipping apps to Cloudflare Workers with full client hydration. The rest of the week was hardening: fixing edge cases, expanding the test suite, bringing API coverage to 94%.</p><p>What changed from those earlier attempts? AI got better. Way better.</p>
    <div>
      <h2>Why this problem is made for AI</h2>
      <a href="#why-this-problem-is-made-for-ai">
        
      </a>
    </div>
    <p>Not every project would go this way. This one did because a few things happened to line up at the right time.</p><p><b>Next.js is well-specified.</b> It has extensive documentation, a massive user base, and years of Stack Overflow answers and tutorials. The API surface is all over the training data. When you ask Claude to implement <code>getServerSideProps</code> or explain how <code>useRouter</code> works, it doesn't hallucinate. It knows how Next works.</p><p><b>Next.js has an elaborate test suite.</b> The <a href="https://github.com/vercel/next.js"><u>Next.js repo</u></a> contains thousands of E2E tests covering every feature and edge case. We ported tests directly from their suite (you can see the attribution in the code). This gave us a specification we could verify against mechanically.</p><p><b>Vite is an excellent foundation.</b> <a href="https://vite.dev/"><u>Vite</u></a> handles the hard parts of front-end tooling: fast HMR, native ESM, a clean plugin API, production bundling. We didn't have to build a bundler. We just had to teach it to speak Next.js. <a href="https://github.com/vitejs/vite-plugin-rsc"><code><u>@vitejs/plugin-rsc</u></code></a> is still early, but it gave us React Server Components support without having to build an RSC implementation from scratch.</p><p><b>The models caught up.</b> We don't think this would have been possible even a few months ago. Earlier models couldn't sustain coherence across a codebase this size. New models can hold the full architecture in context, reason about how modules interact, and produce correct code often enough to keep momentum going. At times, I saw it go into Next, Vite, and React internals to figure out a bug. The state-of-the-art models are impressive, and they seem to keep getting better.</p><p>All of those things had to be true at the same time. Well-documented target API, comprehensive test suite, solid build tool underneath, and a model that could actually handle the complexity. Take any one of them away and this doesn't work nearly as well.</p>
    <div>
      <h2>How we actually built it</h2>
      <a href="#how-we-actually-built-it">
        
      </a>
    </div>
    <p>Almost every line of code in vinext was written by AI. But here's the thing that matters more: every line passes the same quality gates you'd expect from human-written code. The project has 1,700+ Vitest tests, 380 Playwright E2E tests, full TypeScript type checking via tsgo, and linting via oxlint. Continuous integration runs all of it on every pull request. Establishing a set of good guardrails is critical to making AI productive in a codebase.</p><p>The process started with a plan. I spent a couple of hours going back and forth with Claude in <a href="https://opencode.ai"><u>OpenCode</u></a> to define the architecture: what to build, in what order, which abstractions to use. That plan became the north star. From there, the workflow was straightforward:</p><ol><li><p>Define a task ("implement the <code>next/navigation</code> shim with usePathname, <code>useSearchParams</code>, <code>useRouter</code>").</p></li><li><p>Let the AI write the implementation and tests.</p></li><li><p>Run the test suite.</p></li><li><p>If tests pass, merge. If not, give the AI the error output and let it iterate.</p></li><li><p>Repeat.</p></li></ol><p>We wired up AI agents for code review too. When a PR was opened, an agent reviewed it. When review comments came back, another agent addressed them. The feedback loop was mostly automated. </p><p>It didn't work perfectly every time. There were PRs that were just wrong. The AI would confidently implement something that seemed right but didn't match actual Next.js behavior. I had to course-correct regularly. Architecture decisions, prioritization, knowing when the AI was headed down a dead end: that was all me. When you give AI good direction, good context, and good guardrails, it can be very productive. But the human still has to steer.</p><p>For browser-level testing, I used <a href="https://github.com/vercel-labs/agent-browser"><u>agent-browser</u></a> to verify actual rendered output, client-side navigation, and hydration behavior. Unit tests miss a lot of subtle browser issues. This caught them.</p><p>Over the course of the project, we ran over 800 sessions in OpenCode. Total cost: roughly $1,100 in Claude API tokens.</p>
    <div>
      <h2>What this means for software</h2>
      <a href="#what-this-means-for-software">
        
      </a>
    </div>
    <p>Why do we have so many layers in the stack? This project forced me to think deeply about this question. And to consider how AI impacts the answer.</p><p>Most abstractions in software exist because humans need help. We couldn't hold the whole system in our heads, so we built layers to manage the complexity for us. Each layer made the next person's job easier. That's how you end up with frameworks on top of frameworks, wrapper libraries, thousands of lines of glue code.</p><p>AI doesn't have the same limitation. It can hold the whole system in context and just write the code. It doesn't need an intermediate framework to stay organized. It just needs a spec and a foundation to build on.</p><p>It's not clear yet which abstractions are truly foundational and which ones were just crutches for human cognition. That line is going to shift a lot over the next few years. But vinext is a data point. We took an API contract, a build tool, and an AI model, and the AI wrote everything in between. No intermediate framework needed. We think this pattern will repeat across a lot of software. The layers we've built up over the years aren't all going to make it.</p>
    <div>
      <h2>Acknowledgments</h2>
      <a href="#acknowledgments">
        
      </a>
    </div>
    <p>Thanks to the Vite team. <a href="https://vite.dev/"><u>Vite</u></a> is the foundation this whole thing stands on. <a href="https://github.com/vitejs/vite-plugin-rsc"><code><u>@vitejs/plugin-rsc</u></code></a> is still early days, but it gave me RSC support without having to build that from scratch, which would have been a dealbreaker. The Vite maintainers were responsive and helpful as I pushed the plugin into territory it hadn't been tested in before.</p><p>We also want to acknowledge the <a href="https://nextjs.org/"><u>Next.js</u></a> team. They've spent years building a framework that raised the bar for what React development could look like. The fact that their API surface is so well-documented and their test suite so comprehensive is a big part of what made this project possible. vinext wouldn't exist without the standard they set.</p>
    <div>
      <h2>Try it</h2>
      <a href="#try-it">
        
      </a>
    </div>
    <p>vinext includes an <a href="https://agentskills.io"><u>Agent Skill</u></a> that handles migration for you. It works with Claude Code, OpenCode, Cursor, Codex, and dozens of other AI coding tools. Install it, open your Next.js project, and tell the AI to migrate:</p>
            <pre><code>npx skills add cloudflare/vinext</code></pre>
            <p>Then open your Next.js project in any supported tool and say:</p>
            <pre><code>migrate this project to vinext</code></pre>
            <p>The skill handles compatibility checking, dependency installation, config generation, and dev server startup. It knows what vinext supports and will flag anything that needs manual attention.</p><p>Or if you prefer doing it by hand:</p>
            <pre><code>npx vinext init    # Migrate an existing Next.js project
npx vinext dev     # Start the dev server
npx vinext deploy  # Ship to Cloudflare Workers</code></pre>
            <p>The source is at <a href="https://github.com/cloudflare/vinext"><u>github.com/cloudflare/vinext</u></a>. Issues, PRs, and feedback are welcome.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Performance]]></category>
            <guid isPermaLink="false">2w61xT0J7H7ECzhiABytS</guid>
            <dc:creator>Steve Faulkner</dc:creator>
        </item>
        <item>
            <title><![CDATA[Code Mode: give agents an entire API in 1,000 tokens]]></title>
            <link>https://blog.cloudflare.com/code-mode-mcp/</link>
            <pubDate>Fri, 20 Feb 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ The Cloudflare API has over 2,500 endpoints. Exposing each one as an MCP tool would consume over 2 million tokens. With Code Mode, we collapsed all of it into two tools and roughly 1,000 tokens of context. ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/"><u>Model Context Protocol (MCP)</u></a> has become the standard way for AI agents to use external tools. But there is a tension at its core: agents need many tools to do useful work, yet every tool added fills the model's context window, leaving less room for the actual task. </p><p><a href="https://blog.cloudflare.com/code-mode/"><u>Code Mode</u></a> is a technique we first introduced for reducing context window usage during agent tool use. Instead of describing every operation as a separate tool, let the model write code against a typed SDK and execute the code safely in a <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Worker Loader</u></a>. The code acts as a compact plan. The model can explore tool operations, compose multiple calls, and return just the data it needs. Anthropic independently explored the same pattern in their <a href="https://www.anthropic.com/engineering/code-execution-with-mcp"><u>Code Execution with MCP</u></a> post.</p><p>Today we are introducing <a href="https://github.com/cloudflare/mcp"><u>a new MCP server</u></a> for the <a href="https://developers.cloudflare.com/api/"><u>entire Cloudflare API</u></a> — from <a href="https://developers.cloudflare.com/dns/"><u>DNS</u></a> and <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Zero Trust</u></a> to <a href="https://workers.cloudflare.com/product/workers/"><u>Workers</u></a> and <a href="https://workers.cloudflare.com/product/r2/"><u>R2</u></a> — that uses Code Mode. With just two tools, search() and execute(), the server is able to provide access to the entire Cloudflare API over MCP, while consuming only around 1,000 tokens. The footprint stays fixed, no matter how many API endpoints exist.</p><p>For a large API like the Cloudflare API, Code Mode reduces the number of input tokens used by 99.9%. An equivalent MCP server without Code Mode would consume 1.17 million tokens — more than the entire context window of the most advanced foundation models.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7KqjQiI09KubtUSe9Dgf0N/6f37896084c7f34abca7dc36ab18d8e0/image2.png" />
          </figure><p><sup><i>Code mode savings vs native MCP, measured with </i></sup><a href="https://github.com/openai/tiktoken"><sup><i><u>tiktoken</u></i></sup></a><sup></sup></p><p>You can start using this new Cloudflare MCP server today. And we are also open-sourcing a new <a href="https://github.com/cloudflare/agents/tree/main/packages/codemode"><u>Code Mode SDK</u></a> in the <a href="https://github.com/cloudflare/agents"><u>Cloudflare Agents SDK</u></a>, so you can use the same approach in your own MCP servers and AI Agents.</p>
    <div>
      <h3>Server‑side Code Mode</h3>
      <a href="#server-side-code-mode">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ir1KOZHIjVNyqdC9FSuZs/334456a711fb2b5fa612b3fc0b4adc48/images_BLOG-3184_2.png" />
          </figure><p>This new MCP server applies Code Mode server-side. Instead of thousands of tools, the server exports just two: <code>search()</code> and <code>execute()</code>. Both are powered by Code Mode. Here is the full tool surface area that gets loaded into the model context:</p>
            <pre><code>[
  {
    "name": "search",
    "description": "Search the Cloudflare OpenAPI spec. All $refs are pre-resolved inline.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": {
          "type": "string",
          "description": "JavaScript async arrow function to search the OpenAPI spec"
        }
      },
      "required": ["code"]
    }
  },
  {
    "name": "execute",
    "description": "Execute JavaScript code against the Cloudflare API.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": {
          "type": "string",
          "description": "JavaScript async arrow function to execute"
        }
      },
      "required": ["code"]
    }
  }
]
</code></pre>
            <p>To discover what it can do, the agent calls <code>search()</code>. It writes JavaScript against a typed representation of the OpenAPI spec. The agent can filter endpoints by product, path, tags, or any other metadata and narrow thousands of endpoints to the handful it needs. The full OpenAPI spec never enters the model context. The agent only interacts with it through code.</p><p>When the agent is ready to act, it calls <code>execute()</code>. The agent writes code that can make Cloudflare API requests, handle pagination, check responses, and chain operations together in a single execution. </p><p>Both tools run the generated code inside a <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/worker-loader/"><u>Dynamic Worker</u></a> isolate — a lightweight V8 sandbox with no file system, no environment variables to leak through prompt injection and external fetches disabled by default. Outbound requests can be explicitly controlled with outbound fetch handlers when needed.</p>
    <div>
      <h4>Example: Protecting an origin from DDoS attacks</h4>
      <a href="#example-protecting-an-origin-from-ddos-attacks">
        
      </a>
    </div>
    <p>Suppose a user tells their agent: "protect my origin from DDoS attacks." The agent's first step is to consult documentation. It might call the <a href="https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/"><u>Cloudflare Docs MCP Server</u></a>, use a <a href="https://github.com/cloudflare/skills"><u>Cloudflare Skill</u></a>, or search the web directly. From the docs it learns: put <a href="https://www.cloudflare.com/application-services/products/waf/"><u>Cloudflare WAF</u></a> and <a href="https://www.cloudflare.com/ddos/"><u>DDoS protection</u></a> rules in front of the origin.</p><p><b>Step 1: Search for the right endpoints
</b>The <code>search</code> tool gives the model a <code>spec</code> object: the full Cloudflare OpenAPI spec with all <code>$refs</code> pre-resolved. The model writes JavaScript against it. Here the agent looks for WAF and ruleset endpoints on a zone:</p>
            <pre><code>async () =&gt; {
  const results = [];
  for (const [path, methods] of Object.entries(spec.paths)) {
    if (path.includes('/zones/') &amp;&amp;
        (path.includes('firewall/waf') || path.includes('rulesets'))) {
      for (const [method, op] of Object.entries(methods)) {
        results.push({ method: method.toUpperCase(), path, summary: op.summary });
      }
    }
  }
  return results;
}
</code></pre>
            <p>The server runs this code in a Workers isolate and returns:</p>
            <pre><code>[
  { "method": "GET",    "path": "/zones/{zone_id}/firewall/waf/packages",              "summary": "List WAF packages" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}", "summary": "Update a WAF package" },
  { "method": "GET",    "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}/rules", "summary": "List WAF rules" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}/rules/{rule_id}", "summary": "Update a WAF rule" },
  { "method": "GET",    "path": "/zones/{zone_id}/rulesets",                           "summary": "List zone rulesets" },
  { "method": "POST",   "path": "/zones/{zone_id}/rulesets",                           "summary": "Create a zone ruleset" },
  { "method": "GET",    "path": "/zones/{zone_id}/rulesets/phases/{ruleset_phase}/entrypoint", "summary": "Get a zone entry point ruleset" },
  { "method": "PUT",    "path": "/zones/{zone_id}/rulesets/phases/{ruleset_phase}/entrypoint", "summary": "Update a zone entry point ruleset" },
  { "method": "POST",   "path": "/zones/{zone_id}/rulesets/{ruleset_id}/rules",        "summary": "Create a zone ruleset rule" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/rulesets/{ruleset_id}/rules/{rule_id}", "summary": "Update a zone ruleset rule" }
]
</code></pre>
            <p>The full Cloudflare API spec has over 2,500 endpoints. The model narrowed that to the WAF and ruleset endpoints it needs, without any of the spec entering the context window. </p><p>The model can also drill into a specific endpoint's schema before calling it. Here it inspects what phases are available on zone rulesets:</p>
            <pre><code>async () =&gt; {
  const op = spec.paths['/zones/{zone_id}/rulesets']?.get;
  const items = op?.responses?.['200']?.content?.['application/json']?.schema;
  // Walk the schema to find the phase enum
  const props = items?.allOf?.[1]?.properties?.result?.items?.allOf?.[1]?.properties;
  return { phases: props?.phase?.enum };
}

{
  "phases": [
    "ddos_l4", "ddos_l7",
    "http_request_firewall_custom", "http_request_firewall_managed",
    "http_response_firewall_managed", "http_ratelimit",
    "http_request_redirect", "http_request_transform",
    "magic_transit", "magic_transit_managed"
  ]
}
</code></pre>
            <p>The agent now knows the exact phases it needs: <code>ddos_l7 </code>for DDoS protection and <code>http_request_firewall_managed</code> for WAF.</p><p><b>Step 2: Act on the API
</b>The agent switches to using <code>execute</code>. The sandbox gets a <code>cloudflare.request()</code> client that can make authenticated calls to the Cloudflare API. First the agent checks what rulesets already exist on the zone:</p>
            <pre><code>async () =&gt; {
  const response = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets`
  });
  return response.result.map(rs =&gt; ({
    name: rs.name, phase: rs.phase, kind: rs.kind
  }));
}

[
  { "name": "DDoS L7",          "phase": "ddos_l7",                        "kind": "managed" },
  { "name": "Cloudflare Managed","phase": "http_request_firewall_managed", "kind": "managed" },
  { "name": "Custom rules",     "phase": "http_request_firewall_custom",   "kind": "zone" }
]
</code></pre>
            <p>The agent sees that managed DDoS and WAF rulesets already exist. It can now chain calls to inspect their rules and update sensitivity levels in a single execution:</p>
            <pre><code>async () =&gt; {
  // Get the current DDoS L7 entrypoint ruleset
  const ddos = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/ddos_l7/entrypoint`
  });

  // Get the WAF managed ruleset
  const waf = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/http_request_firewall_managed/entrypoint`
  });
}
</code></pre>
            <p>This entire operation, from searching the spec and inspecting a schema to listing rulesets and fetching DDoS and WAF configurations, took four tool calls.</p>
    <div>
      <h3>The Cloudflare MCP server</h3>
      <a href="#the-cloudflare-mcp-server">
        
      </a>
    </div>
    <p>We started with MCP servers for individual products. Want an agent that manages DNS? Add the <a href="https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/dns-analytics"><u>DNS MCP server</u></a>. Want Workers logs? Add the <a href="https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/"><u>Workers Observability MCP server</u></a>. Each server exported a fixed set of tools that mapped to API operations. This worked when the tool set was small, but the Cloudflare API has over 2,500 endpoints. No collection of hand-maintained servers could keep up.</p><p>The Cloudflare MCP server simplifies this. Two tools, roughly 1,000 tokens, and coverage of every endpoint in the API. When we add new products, the same <code>search()</code> and <code>execute()</code> code paths discover and call them — no new tool definitions, no new MCP servers. It even has support for the <a href="https://developers.cloudflare.com/analytics/graphql-api/"><u>GraphQL Analytics API</u></a>.</p><p>Our MCP server is built on the latest MCP specifications. It is OAuth 2.1 compliant, using <a href="https://github.com/cloudflare/workers-oauth-provider"><u>Workers OAuth Provider</u></a> to downscope the token to selected permissions approved by the user when connecting. The agent  only gets the capabilities the user explicitly granted. </p><p>For developers, this means you can use a simple agent loop and still give your agent access to the full Cloudflare API with built-in progressive capability discovery.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/60ZoSFdK6t6hR6DpAn6Bub/93b86239cedb06d7fb265859be7590e8/images_BLOG-3184_4.png" />
          </figure>
    <div>
      <h3>Comparing approaches to context reduction</h3>
      <a href="#comparing-approaches-to-context-reduction">
        
      </a>
    </div>
    <p>Several approaches have emerged to reduce how many tokens MCP tools consume:</p><p><b>Client-side Code Mode</b> was our first experiment. The model writes TypeScript against typed SDKs and runs it in a Dynamic Worker Loader on the client. The tradeoff is that it requires the agent to ship with secure sandbox access. Code Mode is implemented in <a href="https://block.github.io/goose/blog/2025/12/15/code-mode-mcp/"><u>Goose</u></a> and Anthropics Claude SDK as <a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling"><u>Programmatic Tool Calling</u></a>.</p><p><b>Command-line interfaces </b>are another path. CLIs are self-documenting and reveal capabilities as the agent explores. Tools like <a href="https://openclaw.ai/"><u>OpenClaw</u></a> and <a href="https://blog.cloudflare.com/moltworker-self-hosted-ai-agent/"><u>Moltworker</u></a> convert MCP servers into CLIs using <a href="https://github.com/steipete/mcporter"><u>MCPorter</u></a> to give agents progressive disclosure. The limitation is obvious: the agent needs a shell, which not every environment provides and which introduces a much broader attack surface than a sandboxed isolate.</p><p><b>Dynamic tool search</b>, as used by <a href="https://x.com/trq212/status/2011523109871108570"><u>Anthropic in Claude Code</u></a>, surfaces a smaller set of tools hopefully relevant to the current task. It shrinks context use but now requires a search function that must be maintained and evaluated, and each matched tool still uses tokens.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5FPxVAuJggv7A08DbPsksb/aacb9087a79d08a1430ea87bb6960ad3/images_BLOG-3184_5.png" />
          </figure><p>Each approach solves a real problem. But for MCP servers specifically, server-side Code Mode combines their strengths: fixed token cost regardless of API size, no modifications needed on the agent side, progressive discovery built in, and safe execution inside a sandboxed isolate. The agent just calls two tools with code. Everything else happens on the server.</p>
    <div>
      <h3>Get started today</h3>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>The Cloudflare MCP server is available now. Point your MCP client at the server URL and you'll be redirected to Cloudflare to authorize and select the permissions to grant to your agent. Add this config to your MCP client: </p>
            <pre><code>{
  "mcpServers": {
    "cloudflare-api": {
      "url": "https://mcp.cloudflare.com/mcp"
    }
  }
}
</code></pre>
            <p>For CI/CD, automation, or if you prefer managing tokens yourself, create a Cloudflare API token with the permissions you need. Both user tokens and account tokens are supported and can be passed as bearer tokens in the <code>Authorization</code> header.</p><p>More information on different MCP setup configurations can be found at the <a href="https://github.com/cloudflare/mcp"><u>Cloudflare MCP repository</u></a>.</p>
    <div>
      <h3>Looking forward</h3>
      <a href="#looking-forward">
        
      </a>
    </div>
    <p>Code Mode solves context costs for a single API. But agents rarely talk to one service. A developer's agent might need the Cloudflare API alongside GitHub, a database, and an internal docs server. Each additional MCP server brings the same context window pressure we started with.</p><p><a href="https://blog.cloudflare.com/zero-trust-mcp-server-portals/"><u>Cloudflare MCP Server Portals</u></a> let you compose multiple MCP servers behind a single gateway with unified auth and access control. We are building a first-class Code Mode integration for all your MCP servers, and exposing them to agents with built-in progressive discovery and the same fixed-token footprint, regardless of how many services sit behind the gateway.</p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Optimization]]></category>
            <category><![CDATA[Open Source]]></category>
            <guid isPermaLink="false">2lWwgP33VT0NJjZ3pWShsw</guid>
            <dc:creator>Matt Carey</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building vertical microfrontends on Cloudflare’s platform]]></title>
            <link>https://blog.cloudflare.com/vertical-microfrontends/</link>
            <pubDate>Fri, 30 Jan 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ Deploy multiple Workers under a single domain with the ability to make them feel like single-page applications. We take a look at how service bindings enable URL path routing to multiple projects. ]]></description>
            <content:encoded><![CDATA[ <p><i>Updated at 6:55 a.m. PT</i></p><p>Today, we’re introducing a new Worker template for Vertical Microfrontends (VMFE). <a href="https://dash.cloudflare.com/?to=/:account/workers-and-pages/create?type=vmfe"><u>This template</u></a> allows you to map multiple independent <a href="https://workers.cloudflare.com/"><u>Cloudflare Workers</u></a> to a single domain, enabling teams to work in complete silos — shipping marketing, docs, and dashboards independently — while presenting a single, seamless application to the user.</p><a href="https://dash.cloudflare.com/?to=/:account/workers-and-pages/create?type=vmfe"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Most microfrontend architectures are "horizontal", meaning different <i>parts</i> of a single page are fetched from different services. Vertical microfrontends take a different approach by splitting the application by URL path. In this model, a team owning the `/blog` path doesn't <i>just</i> own a component; they own the entire vertical stack for that route – framework, library choice, CI/CD and more. Owning the entire stack of a path, or set of paths, allows teams to have true ownership of their work and ship with confidence.</p><p>Teams face problems as they grow, where different frameworks serve varying use cases. A marketing website could be better utilized with Astro, for example, while a dashboard might be better with React. Or say you have a monolithic code base where many teams ship as a collective. An update to add new features from several teams can get frustratingly rolled back because a single team introduced a regression. How do we solve the problem of obscuring the technical implementation details away from the user and letting teams ship a cohesive user experience with full autonomy and control of their domains?</p><p>Vertical microfrontends can be the answer. Let’s dive in and explore how they solve developer pain points together.</p>
    <div>
      <h2>What are vertical microfrontends?</h2>
      <a href="#what-are-vertical-microfrontends">
        
      </a>
    </div>
    <p>A vertical microfrontend is an architectural pattern where a single independent team owns an entire slice of the application’s functionality, from the user interface all the way down to the <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/">CI/CD pipeline</a>. These slices are defined by paths on a domain where you can associate individual Workers with specific paths:</p>
            <pre><code>/      = Marketing
/docs  = Documentation
/blog  = Blog
/dash  = Dashboard</code></pre>
            <p>We could take it a step further and focus on more granular sub-path Worker associations, too, such as a dashboard. Within a dashboard, you likely segment out various features or products by adding depth to your URL path (e.g. <code>/dash/product-a</code>) and navigating between two products could mean two entirely different code bases. </p><p>Now with vertical microfrontends, we could also have the following:</p>
            <pre><code>/dash/product-a  = WorkerA
/dash/product-b  = WorkerB</code></pre>
            <p>Each of the above paths are their own frontend project with zero shared code between them. The <code>product-a</code> and <code>product-b</code> routes map to separately deployed frontend applications that have their own frameworks, libraries, CI/CD pipelines defined and owned by their own teams. FINALLY.</p><p>You can now own your own code from end to end. But now we need to find a way to stitch these separate projects together, and even more so, make them feel as if they are a unified experience.</p><p>We experience this pain point ourselves here at Cloudflare, as the dashboard has many individual teams owning their own products. Teams must contend with the fact that changes made outside their control impact how users experience their product. </p><p>Internally, we are now using a similar strategy for our own dashboard. When users navigate from the core dashboard into our ZeroTrust product, in reality they are two entirely separate projects and the user is simply being routed to that project by its path <code>/:accountId/one</code>.</p>
    <div>
      <h2>Visually unified experiences</h2>
      <a href="#visually-unified-experiences">
        
      </a>
    </div>
    <p>Stitching these individual projects together to make them feel like a unified experience isn’t as difficult as you might think: It only takes a few lines of CSS magic. What we <i>absolutely do not want</i> to happen is to leak our implementation details and internal decisions to our users. If we fail to make this user experience feel like one cohesive frontend, then we’ve done a grave injustice to our users. </p><p>To accomplish this sleight of hand, let us take a little trip in understanding how view transitions and document preloading come into play.</p>
    <div>
      <h3>View transitions</h3>
      <a href="#view-transitions">
        
      </a>
    </div>
    <p>When we want to seamlessly navigate between two distinct pages while making it feel smooth to the end user, <a href="https://developer.mozilla.org/en-US/docs/Web/API/View_Transition_API"><u>view transitions</u></a> are quite useful. Defining specific <a href="https://www.w3schools.com/jsref/dom_obj_all.asp"><u>DOM elements</u></a> on our page to stick around until the next page is visible, and defining how any changes are handled, make for quite the powerful quilt-stitching tool for multi-page applications.</p><p>There may be, however, instances where making the various vertical microfrontends feel different is more than acceptable. Perhaps our marketing website, documentation, and dashboard are each uniquely defined, for instance. A user would not expect all three of those to feel cohesive as you navigate between the three parts. But… if you decide to introduce vertical slices to an individual experience such as the dashboard (e.g. <code>/dash/product-a</code> &amp; <code>/dash/product-b</code>), then users should <b>never</b> know they are two different repositories/workers/projects underneath.</p><p>Okay, enough talk — let’s get to work. I mentioned it was low-effort to make two separate projects feel as if they were one to a user, and if you have yet to hear about <a href="https://developer.mozilla.org/en-US/docs/Web/API/View_Transition_API"><u>CSS View Transitions</u></a> then I’m about to blow your mind.</p><p>What if I told you that you could make animated transitions between different views  — single-page app (SPA) or multi-page app (MPA) — feel as if they were one? Before any view transitions are added, if we navigate between pages owned by two different Workers, the interstitial loading state would be the white blank screen in our browser for some few hundred milliseconds until the full next page began rendering. Pages would not feel cohesive, and it certainly would not feel like a single-page application.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4vw1Am7gYUQPtmFFCRcsu1/774b881dff7ce1c26db88f30623dfc13/image3.png" />
          </figure><p><sup>Appears as multiple navigation elements between each site.</sup></p><p>If we want elements to stick around, rather than seeing a white blank page, we can achieve that by defining CSS View Transitions. With the code below, we’re telling our current document page that when a view transition event is about to happen, keep the <code>nav</code> DOM element on the screen, and if any delta in appearance exists between our existing page and our destination page, then we’ll animate that with an <code>ease-in-out</code> transition.</p><p>All of a sudden, two different Workers feel like one.</p>
            <pre><code>@supports (view-transition-name: none) {
  ::view-transition-old(root),
  ::view-transition-new(root) {
    animation-duration: 0.3s;
    animation-timing-function: ease-in-out;
  }
  nav { view-transition-name: navigation; }
}</code></pre>
            
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4h6Eh5LSX4552QJDvV1l7o/a5a43ee0e6e011bca58ecc2d74902744/image1.png" />
          </figure><p><sup>Appears as a single navigation element between three distinct sites.</sup></p>
    <div>
      <h3>Preloading</h3>
      <a href="#preloading">
        
      </a>
    </div>
    <p>Transitioning between two pages makes it <i>look</i> seamless — and we also want it to <i>feel</i> as instant as a client-side SPA. While currently Firefox and Safari do not support <a href="https://developer.mozilla.org/en-US/docs/Web/API/Speculation_Rules_API"><u>Speculation Rules</u></a>, Chrome/Edge/Opera do support the more recent newcomer. The speculation rules API is designed to improve performance for future navigations, particularly for document URLs, making multi-page applications feel more like single-page applications.</p><p>Breaking it down into code, what we need to define is a script rule in a specific format that tells the supporting browsers how to prefetch the other vertical slices that are connected to our web application — likely linked through some shared navigation.</p>
            <pre><code>&lt;script type="speculationrules"&gt;
  {
    "prefetch": [
      {
        "urls": ["https://product-a.com", "https://product-b.com"],
        "requires": ["anonymous-client-ip-when-cross-origin"],
        "referrer_policy": "no-referrer"
      }
    ]
  }
&lt;/script&gt;</code></pre>
            <p>With that, our application prefetches our other microfrontends and holds them in our in-memory cache, so if we were to navigate to those pages it would feel nearly instant.</p><p>You likely won’t require this for clearly discernible vertical slices (marketing, docs, dashboard) because users would expect a slight load between them. However, it is highly encouraged to use when vertical slices are defined within a specific visible experience (e.g. within dashboard pages).</p><p>Between <a href="https://developer.mozilla.org/en-US/docs/Web/API/View_Transition_API"><u>View Transitions</u></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Speculation_Rules_API"><u>Speculation Rules</u></a>, we are able to tie together entirely different code repositories to feel as if they were served from a single-page application. Wild if you ask me.</p>
    <div>
      <h2>Zero-config request routing</h2>
      <a href="#zero-config-request-routing">
        
      </a>
    </div>
    <p>Now we need a mechanism to host multiple applications, and a method to stitch them together as requests stream in. Defining a single Cloudflare Worker as the “Router” allows a single logical point (at the edge) to handle network requests and then forward them to whichever vertical microfrontend is responsible for that URL path. Plus it doesn’t hurt that then we can map a single domain to that router Worker and the rest “just works.”</p>
    <div>
      <h3>Service bindings</h3>
      <a href="#service-bindings">
        
      </a>
    </div>
    <p>If you have yet to explore Cloudflare Worker <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/"><u>service bindings</u></a>, then it is worth taking a moment to do so. </p><p>Service bindings allow one Worker to call into another, without going through a publicly-accessible URL. A Service binding allows Worker A to call a method on Worker B, or to forward a request from Worker A to Worker. Breaking it down further, the Router Worker can call into each vertical microfrontend Worker that has been defined (e.g. marketing, docs, dashboard), assuming each of them were Cloudflare Workers.</p><p>Why is this important? This is precisely the mechanism that “stitches” these vertical slices together. We’ll dig into how the request routing is handling the traffic split in the next section. But to define each of these microfrontends, we’ll need to update our Router Worker’s wrangler definition, so it knows which frontends it’s allowed to call into.
</p>
            <pre><code>{
  "$schema": "./node_modules/wrangler/config-schema.json",
  "name": "router",
  "main": "./src/router.js",
  "services": [
    {
      "binding": "HOME",
      "service": "worker_marketing"
    },
    {
      "binding": "DOCS",
      "service": "worker_docs"
    },
    {
      "binding": "DASH",
      "service": "worker_dash"
    },
  ]
}</code></pre>
            <p>Our above sample definition is defined in our Router Worker, which then tells us that we are permitted to make requests into three separate additional Workers (marketing, docs, and dash). Granting permissions is as simple as that, but let’s tumble into some of the more complex logic with request routing and HTML rewriting network responses.</p>
    <div>
      <h3>Request routing</h3>
      <a href="#request-routing">
        
      </a>
    </div>
    <p>With knowledge of the various other Workers we are able to call into if needed, now we need some logic in place to know where to direct network requests when. Since the Router Worker is assigned to our custom domain, all incoming requests hit it first at the network edge. It then determines which Worker should handle the request and manages the resulting response. </p><p>The first step is to map URL paths to associated Workers. When a certain request URL is received, we need to know where it needs to be forwarded. We do this by defining rules. While we support wildcard routes, dynamic paths, and parameter constraints, we are going to stay focused on the basics — literal path prefixes — as it illustrates the point more clearly. </p><p> In this example, we have three microfrontends:</p>
            <pre><code>/      = Marketing
/docs  = Documentation
/dash  = Dashboard</code></pre>
            <p>Each of the above paths need to be mapped to an actual Worker (see our wrangler definition for services in the section above). For our Router Worker, we define an additional variable with the following data, so we can know which paths should map to which service bindings. We now know where to route users as requests come in! Define a wrangler variable with the name ROUTES and the following contents:</p>
            <pre><code>{
  "routes":[
    {"binding": "HOME", "path": "/"},
    {"binding": "DOCS", "path": "/docs"},
    {"binding": "DASH", "path": "/dash"}
  ]
}</code></pre>
            <p>Let’s envision a user visiting our website path <code>/docs/installation</code>. Under the hood, what happens is the request first reaches our Router Worker which is in charge of understanding what URL paths map to which individual Workers. It understands that the <code>/docs</code> path prefix is mapped to our <code>DOCS</code> service binding which referencing our wrangler file points us at our <code>worker_docs</code> project. Our Router Worker, knowing that <code>/docs</code> is defined as a vertical microfrontend route, removes the <code>/docs</code> prefix from the path and forwards the request to our <code>worker_docs</code> Worker to handle the request and then finally returns whatever response we get.</p><p>Why does it drop the <code>/docs</code> path, though? This was an implementation detail choice that was made so that when the Worker is accessed via the Router Worker, it can clean up the URL to handle the request <i>as if </i>it were called from outside our Router Worker. Like any Cloudflare Worker, our <code>worker_docs</code> service might have its own individual URL where it can be accessed. We decided we wanted that service URL to continue to work independently. When it’s attached to our new Router Worker, it would automatically handle removing the prefix, so the service could be accessible from its own defined URL or through our Router Worker… either place, doesn’t matter.</p>
    <div>
      <h3>HTMLRewriter</h3>
      <a href="#htmlrewriter">
        
      </a>
    </div>
    <p>Splitting our various frontend services with URL paths (e.g. <code>/docs</code> or <code>/dash</code>) makes it easy for us to forward a request, but when our response contains HTML that doesn’t know it’s being reverse proxied through a path component… well, that causes problems. </p><p>Say our documentation website has an image tag in the response <code>&lt;img src="./logo.png" /&gt;</code>. If our user was visiting this page at <code>https://website.com/docs/</code>, then loading the <code>logo.png</code> file would likely fail because our <code>/docs</code> path is somewhat artificially defined only by our Router Worker.</p><p>Only when our services are accessed through our Router Worker do we need to do some HTML rewriting of absolute paths so our returned browser response references valid assets. In practice what happens is that when a request passes through our Router Worker, we pass the request to the correct Service Binding, and we receive the response from that. Before we pass that back to the client, we have an opportunity to rewrite the DOM — so where we see absolute paths, we go ahead and prepend that with the proxied path. Where previously our HTML was returning our image tag with <code>&lt;img src="./logo.png" /&gt;</code> we now modify it before returning to the client browser to <code>&lt;img src="./docs/logo.png" /&gt;</code>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/10jKx6qt2YcarpDyEsFNYV/3b0f11f56e3c9b2deef59934cf8efa7f/image2.png" />
          </figure><p>Let’s return for a moment to the magic of CSS view transitions and document preloading. We could of course manually place that code into our projects and have it work, but this Router Worker will <i>automatically</i> handle that logic for us by also using <a href="https://developers.cloudflare.com/workers/runtime-apis/html-rewriter/"><u>HTMLRewriter</u></a>. </p><p>In your Router Worker <code>ROUTES</code> variable, if you set <code>smoothTransitions</code> to <code>true</code> at the root level, then the CSS transition views code will be added automatically. Additionally, if you set the <code>preload</code> key within a route to <code>true</code>, then the script code speculation rules for that route will automatically be added as well. </p><p>Below is an example of both in action:</p>
            <pre><code>{
  "smoothTransitions":true, 
  "routes":[
    {"binding": "APP1", "path": "/app1", "preload": true},
    {"binding": "APP2", "path": "/app2", "preload": true}
  ]
}</code></pre>
            
    <div>
      <h2>Get started</h2>
      <a href="#get-started">
        
      </a>
    </div>
    <p>You can start building with the Vertical Microfrontend template today.</p><p>Visit the Cloudflare Dashboard <a href="https://dash.cloudflare.com/?to=/:account/workers-and-pages/create?type=vmfe"><u>deeplink here</u></a> or go to “Workers &amp; Pages” and click the “Create application” button to get started. From there, click “Select a template” and then “Create microfrontend” and you can begin configuring your setup.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1teTcTNHzQH3yCvbTz3xyU/8f9a4b2ef3ec1c6ed13cbdc51d6b13c5/image5.png" />
          </figure><p>
Check out the <a href="https://developers.cloudflare.com/workers/framework-guides/web-apps/microfrontends"><u>documentation</u></a> to see how to map your existing Workers and enable View Transitions. We can't wait to see what complex, multi-team applications you build on the edge!</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Dashboard]]></category>
            <category><![CDATA[Front End]]></category>
            <category><![CDATA[Micro-frontends]]></category>
            <guid isPermaLink="false">2u7SNZ4BZcQYHZYKqmdEaM</guid>
            <dc:creator>Brayden Wilmoth</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing Moltworker: a self-hosted personal AI agent, minus the minis]]></title>
            <link>https://blog.cloudflare.com/moltworker-self-hosted-ai-agent/</link>
            <pubDate>Thu, 29 Jan 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ Moltworker is a middleware Worker and adapted scripts that allows running OpenClaw (formerly Moltbot, formerly Clawdbot) on Cloudflare's Sandbox SDK and our Developer Platform APIs. So you can self-host an AI personal assistant — without any new hardware. ]]></description>
            <content:encoded><![CDATA[ <p><i></i></p><p><i>Editorial note: As of January 30, 2026, Moltbot has been </i><a href="https://openclaw.ai/blog/introducing-openclaw"><i><u>renamed</u></i></a><i> to OpenClaw.</i></p><p>The Internet woke up this week to a flood of people <a href="https://x.com/AlexFinn/status/2015133627043270750"><u>buying Mac minis</u></a> to run <a href="https://github.com/moltbot/moltbot"><u>Moltbot</u></a> (formerly Clawdbot), an open-source, self-hosted AI agent designed to act as a personal assistant. Moltbot runs in the background on a user's own hardware, has a sizable and growing list of integrations for chat applications, AI models, and other popular tools, and can be controlled remotely. Moltbot can help you with your finances, social media, organize your day — all through your favorite messaging app.</p><p>But what if you don’t want to buy new dedicated hardware? And what if you could still run your Moltbot efficiently and securely online? Meet <a href="https://github.com/cloudflare/moltworker"><u>Moltworker</u></a>, a middleware Worker and adapted scripts that allows running Moltbot on Cloudflare's Sandbox SDK and our Developer Platform APIs.</p>
    <div>
      <h2>A personal assistant on Cloudflare — how does that work? </h2>
      <a href="#a-personal-assistant-on-cloudflare-how-does-that-work">
        
      </a>
    </div>
    <p>Cloudflare Workers has never been <a href="https://developers.cloudflare.com/workers/runtime-apis/nodejs/"><u>as compatible</u></a> with Node.js as it is now. Where in the past we had to mock APIs to get some packages running, now those APIs are supported natively by the Workers Runtime.</p><p>This has changed how we can build tools on Cloudflare Workers. When we first implemented <a href="https://developers.cloudflare.com/browser-rendering/playwright/"><u>Playwright</u></a>, a popular framework for web testing and automation that runs on <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Rendering</u></a>, we had to rely on <a href="https://www.npmjs.com/package/memfs"><u>memfs</u></a>. This was bad because not only is memfs a hack and an external dependency, but it also forced us to drift away from the official Playwright codebase. Thankfully, with more Node.js compatibility, we were able to start using <a href="https://github.com/cloudflare/playwright/pull/62/changes"><u>node:fs natively</u></a>, reducing complexity and maintainability, which makes upgrades to the latest versions of Playwright easy to do.</p><p>The list of Node.js APIs we support natively keeps growing. The blog post “<a href="https://blog.cloudflare.com/nodejs-workers-2025/"><u>A year of improving Node.js compatibility in Cloudflare Workers</u></a>” provides an overview of where we are and what we’re doing.</p><p>We measure this progress, too. We recently ran an experiment where we took the 1,000 most popular NPM packages, installed and let AI loose, to try to run them in Cloudflare Workers, <a href="https://ghuntley.com/ralph/"><u>Ralph Wiggum as a "software engineer"</u></a> style, and the results were surprisingly good. Excluding the packages that are build tools, CLI tools or browser-only and don’t apply, only 15 packages genuinely didn’t work. <b>That's 1.5%</b>.</p><p>Here’s a graphic of our Node.js API support over time:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5GhwKJq2A2wG79I3NdhhDl/e462c30daf46b1b36d3f06bff479596b/image9.png" />
          </figure><p>We put together a page with the results of our internal experiment on npm packages support <a href="https://worksonworkers.southpolesteve.workers.dev/"><u>here</u></a>, so you can check for yourself.</p><p>Moltbot doesn’t necessarily require a lot of Workers Node.js compatibility because most of the code runs in a container anyway, but we thought it would be important to highlight how far we got supporting so many packages using native APIs. This is because when starting a new AI agent application from scratch, we can actually run a lot of the logic in Workers, closer to the user.</p><p>The other important part of the story is that the list of <a href="https://developers.cloudflare.com/directory/?product-group=Developer+platform"><u>products and APIs</u></a> on our Developer Platform has grown to the point where anyone can <a href="https://www.cloudflare.com/developer-platform/solutions/hosting/">build and run any kind of application</a> — even the most complex and demanding ones — on Cloudflare. And once launched, every application running on our Developer Platform immediately benefits from our secure and scalable global network.</p><p>Those products and services gave us the ingredients we needed to get started. First, we now have <a href="https://sandbox.cloudflare.com/"><u>Sandboxes</u></a>, where you can run untrusted code securely in isolated environments, providing a place to run the service. Next, we now have <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Rendering</u></a>, where you can programmatically control and interact with headless browser instances. And finally, <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a>, where you can store objects persistently. With those building blocks available, we could begin work on adapting Moltbot.</p>
    <div>
      <h2>How we adapted Moltbot to run on us</h2>
      <a href="#how-we-adapted-moltbot-to-run-on-us">
        
      </a>
    </div>
    <p>Moltbot on Workers, or Moltworker, is a combination of an entrypoint Worker that acts as an API router and a proxy between our APIs and the isolated environment, both protected by Cloudflare Access. It also provides an administration UI and connects to the Sandbox container where the standard Moltbot Gateway runtime and its integrations are running, using R2 for persistent storage.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3OD2oHgy5ilHpQO2GJvcLU/836a55b67a626d2cd378a654ad47901d/newdiagram.png" />
          </figure><p><sup>High-level architecture diagram of Moltworker.</sup></p><p>Let's dive in more.</p>
    <div>
      <h3>AI Gateway</h3>
      <a href="#ai-gateway">
        
      </a>
    </div>
    <p>Cloudflare AI Gateway acts as a proxy between your AI applications and any popular <a href="https://developers.cloudflare.com/ai-gateway/usage/providers/"><u>AI provider</u></a>, and gives our customers centralized visibility and control over the requests going through.</p><p>Recently we announced support for <a href="https://developers.cloudflare.com/changelog/2025-08-25-secrets-store-ai-gateway/"><u>Bring Your Own Key (BYOK)</u></a>, where instead of passing your provider secrets in plain text with every request, we centrally manage the secrets for you and can use them with your gateway configuration.</p><p>An even better option where you don’t have to manage AI providers' secrets at all end-to-end is to use <a href="https://developers.cloudflare.com/ai-gateway/features/unified-billing/"><u>Unified Billing</u></a>. In this case you top up your account with credits and use AI Gateway with any of the supported providers directly, Cloudflare gets charged, and we will deduct credits from your account.</p><p>To make Moltbot use AI Gateway, first we create a new gateway instance, then we enable the Anthropic provider for it, then we either add our Claude key or purchase credits to use Unified Billing, and then all we need to do is set the ANTHROPIC_BASE_URL environment variable so Moltbot uses the AI Gateway endpoint. That’s it, no code changes necessary.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/cMWRXgHR0mFLc5kp74nYk/a47fa09bdbb6acb3deb60fb16537945d/image11.png" />
          </figure><p>Once Moltbot starts using AI Gateway, you’ll have full visibility on costs and have access to logs and analytics that will help you understand how your AI agent is using the AI providers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5GOrNdgtdwMcU4bE8oLE19/6bc29bcac643125f5332a8ffba9d1322/image1.png" />
          </figure><p>Note that Anthropic is one option; Moltbot supports <a href="https://www.molt.bot/integrations"><u>other</u></a> AI providers and so does <a href="https://developers.cloudflare.com/ai-gateway/usage/providers/"><u>AI Gateway</u></a>. The advantage of using AI Gateway is that if a better model comes along from any provider, you don’t have to swap keys in your AI Agent configuration and redeploy — you can simply switch the model in your gateway configuration. And more, you specify model or provider <a href="https://developers.cloudflare.com/ai-gateway/configuration/fallbacks/"><u>fallbacks</u></a> to handle request failures and ensure reliability.</p>
    <div>
      <h3>Sandboxes</h3>
      <a href="#sandboxes">
        
      </a>
    </div>
    <p>Last year we anticipated the growing need for AI agents to run untrusted code securely in isolated environments, and we <a href="https://developers.cloudflare.com/changelog/2025-06-24-announcing-sandboxes/"><u>announced</u></a> the <a href="https://developers.cloudflare.com/sandbox/"><u>Sandbox SDK</u></a>. This SDK is built on top of <a href="https://developers.cloudflare.com/containers/"><u>Cloudflare Containers</u></a>, but it provides a simple API for executing commands, managing files, running background processes, and exposing services — all from your Workers applications.</p><p>In short, instead of having to deal with the lower-level Container APIs, the Sandbox SDK gives you developer-friendly APIs for secure code execution and handles the complexity of container lifecycle, networking, file systems, and process management — letting you focus on building your application logic with just a few lines of TypeScript. Here’s an example:</p>
            <pre><code>import { getSandbox } from '@cloudflare/sandbox';
export { Sandbox } from '@cloudflare/sandbox';

export default {
  async fetch(request: Request, env: Env): Promise&lt;Response&gt; {
    const sandbox = getSandbox(env.Sandbox, 'user-123');

    // Create a project structure
    await sandbox.mkdir('/workspace/project/src', { recursive: true });

    // Check node version
    const version = await sandbox.exec('node -v');

    // Run some python code
    const ctx = await sandbox.createCodeContext({ language: 'python' });
    await sandbox.runCode('import math; radius = 5', { context: ctx });
    const result = await sandbox.runCode('math.pi * radius ** 2', { context: ctx });

    return Response.json({ version, result });
  }
};</code></pre>
            <p>This fits like a glove for Moltbot. Instead of running Docker in your local Mac mini, we run Docker on Containers, use the Sandbox SDK to issue commands into the isolated environment and use callbacks to our entrypoint Worker, effectively establishing a two-way communication channel between the two systems.</p>
    <div>
      <h3>R2 for persistent storage</h3>
      <a href="#r2-for-persistent-storage">
        
      </a>
    </div>
    <p>The good thing about running things in your local computer or VPS is you get persistent storage for free. Containers, however, are inherently <a href="https://developers.cloudflare.com/containers/platform-details/architecture/"><u>ephemeral</u></a>, meaning data generated within them is lost upon deletion. Fear not, though — the Sandbox SDK provides the sandbox.mountBucket() that you can use to automatically, well, mount your R2 bucket as a filesystem partition when the container starts.</p><p>Once we have a local directory that is guaranteed to survive the container lifecycle, we can use that for Moltbot to store session memory files, conversations and other assets that are required to persist.</p>
    <div>
      <h3>Browser Rendering for browser automation</h3>
      <a href="#browser-rendering-for-browser-automation">
        
      </a>
    </div>
    <p>AI agents rely heavily on browsing the sometimes not-so-structured web. Moltbot utilizes dedicated Chromium instances to perform actions, navigate the web, fill out forms, take snapshots, and handle tasks that require a web browser. Sure, we can run Chromium on Sandboxes too, but what if we could simplify and use an API instead?</p><p>With Cloudflare’s <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Rendering</u></a>, you can programmatically control and interact with headless browser instances running at scale in our edge network. We support <a href="https://developers.cloudflare.com/browser-rendering/puppeteer/"><u>Puppeteer</u></a>, <a href="https://developers.cloudflare.com/browser-rendering/stagehand/"><u>Stagehand</u></a>, <a href="https://developers.cloudflare.com/browser-rendering/playwright/"><u>Playwright</u></a> and other popular packages so that developers can onboard with minimal code changes. We even support <a href="https://developers.cloudflare.com/browser-rendering/playwright/playwright-mcp/"><u>MCP</u></a> for AI.</p><p>In order to get Browser Rendering to work with Moltbot we do two things:</p><ul><li><p>First we create a <a href="https://github.com/cloudflare/moltworker/blob/main/src/routes/cdp.ts"><u>thin CDP proxy</u></a> (<a href="https://chromedevtools.github.io/devtools-protocol/"><u>CDP</u></a> is the protocol that allows instrumenting Chromium-based browsers) from the Sandbox container to the Moltbot Worker, back to Browser Rendering using the Puppeteer APIs.</p></li><li><p>Then we inject a <a href="https://github.com/cloudflare/moltworker/pull/20"><u>Browser Rendering skill</u></a> into the runtime when the Sandbox starts.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ZvQa7vS1T9Mm3nywqarQZ/9dec3d8d06870ee575a519440d34c499/image12.png" />
          </figure><p>From the Moltbot runtime perspective, it has a local CDP port it can connect to and perform browser tasks.</p>
    <div>
      <h3>Zero Trust Access for authentication policies</h3>
      <a href="#zero-trust-access-for-authentication-policies">
        
      </a>
    </div>
    <p>Next up we want to protect our APIs and Admin UI from unauthorized access. Doing authentication from scratch is hard, and is typically the kind of wheel you don’t want to reinvent or have to deal with. Zero Trust Access makes it incredibly easy to protect your application by defining specific policies and login methods for the endpoints. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1MDXXjbMs4PViN3kp9iFBY/a3095f07c986594d0c07d0276dbf22cc/image3.png" />
          </figure><p><sup>Zero Trust Access Login methods configuration for the Moltworker application.</sup></p><p>Once the endpoints are protected, Cloudflare will handle authentication for you and automatically include a <a href="https://developers.cloudflare.com/cloudflare-one/access-controls/applications/http-apps/authorization-cookie/application-token/"><u>JWT token</u></a> with every request to your origin endpoints. You can then <a href="https://developers.cloudflare.com/cloudflare-one/access-controls/applications/http-apps/authorization-cookie/validating-json/"><u>validate</u></a> that JWT for extra protection, to ensure that the request came from Access and not a malicious third party.</p><p>Like with AI Gateway, once all your APIs are behind Access you get great observability on who the users are and what they are doing with your Moltbot instance.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3BV4eqxKPXTiq18vvVpmZh/e034b7e7ea637a00c73c2ebe4d1400aa/image8.png" />
          </figure>
    <div>
      <h2>Moltworker in action</h2>
      <a href="#moltworker-in-action">
        
      </a>
    </div>
    <p>Demo time. We’ve put up a Slack instance where we could play with our own instance of Moltbot on Workers. Here are some of the fun things we’ve done with it.</p><p>We hate bad news.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4FxN935AgINZ8953WSswKB/e52d3eb268aa0732c5e6aa64a8e2adba/image6.png" />
          </figure><p>Here’s a chat session where we ask Moltbot to find the shortest route between Cloudflare in London and Cloudflare in Lisbon using Google Maps and take a screenshot in a Slack channel. It goes through a sequence of steps using Browser Rendering to navigate Google Maps and does a pretty good job at it. Also look at Moltbot’s memory in action when we ask him the second time.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1phWt3cVUwxe9tvCYpuAW3/97f456094ede6ca8fb55bf0dddf65d5b/image10.png" />
          </figure><p>We’re in the mood for some Asian food today, let’s get Moltbot to work for help.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6nJY7GOCopGnMy4IY7KMcf/0d57794df524780c3f4b27e65c968e19/image5.png" />
          </figure><p>We eat with our eyes too.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5BzB9pqJhuevRbOSJloeG0/23c2905f0c12c1e7f104aa28fcc1f595/image7.png" />
          </figure><p>Let’s get more creative and ask Moltbot to create a video where it browses our developer documentation. As you can see, it downloads and runs ffmpeg to generate the video out of the frames it captured in the browser.</p><div>
  
</div>
    <div>
      <h2>Run your own Moltworker</h2>
      <a href="#run-your-own-moltworker">
        
      </a>
    </div>
    <p>We open-sourced our implementation and made it available at<a href="https://github.com/cloudflare/moltworker"> <u>https://github.com/cloudflare/moltworker</u></a>, so you can deploy and run your own Moltbot on top of Workers today.</p><p>The <a href="https://github.com/cloudflare/moltworker/blob/main/README.md">README</a> guides you through the necessary setup steps. You will need a Cloudflare account and a <a href="https://developers.cloudflare.com/workers/platform/pricing/"><u>Workers Paid plan</u></a> to access Sandbox Containers; however, all other products are either entirely free (like <a href="https://developers.cloudflare.com/ai-gateway/reference/pricing/"><u>AI Gateway</u></a>) or include generous <a href="https://developers.cloudflare.com/r2/pricing/#free-tier"><u>free tiers </u></a>that allow you to get started and scale under reasonable limits.</p><p><b>Note that Moltworker is a proof of concept, not a Cloudflare product</b>. Our goal is to showcase some of the most exciting features of our <a href="https://developers.cloudflare.com/learning-paths/workers/devplat/intro-to-devplat/">Developer Platform</a> that can be used to run AI agents and unsupervised code efficiently and securely, and get great observability while taking advantage of our global network.</p><p>Feel free to contribute to or fork our <a href="https://github.com/cloudflare/moltworker"><u>GitHub</u></a> repository; we will keep an eye on it for a while for support. We are also considering contributing upstream to the official project with Cloudflare skills in parallel.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>We hope you enjoyed this experiment, and we were able to convince you that Cloudflare is the perfect place to run your AI applications and agents. We’ve been working relentlessly trying to anticipate the future and release features like the <a href="https://developers.cloudflare.com/agents/"><u>Agents SDK</u></a> that you can use to build your first agent <a href="https://developers.cloudflare.com/agents/guides/slack-agent/"><u>in minutes</u></a>, <a href="https://developers.cloudflare.com/sandbox/"><u>Sandboxes</u></a> where you can run arbitrary code in an isolated environment without the complications of the lifecycle of a container, and <a href="https://developers.cloudflare.com/ai-search/"><u>AI Search</u></a>, Cloudflare’s managed vector-based search service, to name a few.</p><p>Cloudflare now offers a complete toolkit for AI development: inference, storage APIs, databases, durable execution for stateful workflows, and built-in AI capabilities. Together, these building blocks make it possible to build and run even the most demanding AI applications on our global edge network.</p><p>If you're excited about AI and want to help us build the next generation of products and APIs, we're <a href="https://www.cloudflare.com/en-gb/careers/jobs/?department=Engineering"><u>hiring</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Containers]]></category>
            <category><![CDATA[Sandbox]]></category>
            <guid isPermaLink="false">45LuZGCXAcs7EMnB64zTQm</guid>
            <dc:creator>Celso Martinho</dc:creator>
            <dc:creator>Brian Brunner</dc:creator>
            <dc:creator>Sid Chatterjee</dc:creator>
            <dc:creator>Andreas Jansson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building a serverless, post-quantum Matrix homeserver]]></title>
            <link>https://blog.cloudflare.com/serverless-matrix-homeserver-workers/</link>
            <pubDate>Tue, 27 Jan 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ As a proof of concept, we built a Matrix homeserver to Cloudflare Workers — delivering encrypted messaging at the edge with automatic post-quantum cryptography. ]]></description>
            <content:encoded><![CDATA[ <p><sup><i>* This post was updated at 11:45 a.m. Pacific time to clarify that the use case described here is a proof of concept and a personal project. Some sections have been updated for clarity.</i></sup></p><p>Matrix is the gold standard for decentralized, end-to-end encrypted communication. It powers government messaging systems, open-source communities, and privacy-focused organizations worldwide. </p><p>For the individual developer, however, the appeal is often closer to home: bridging fragmented chat networks (like Discord and Slack) into a single inbox, or simply ensuring your conversation history lives on infrastructure you control. Functionally, Matrix operates as a decentralized, eventually consistent state machine. Instead of a central server pushing updates, homeservers exchange signed JSON events over HTTP, using a conflict resolution algorithm to merge these streams into a unified view of the room's history.</p><p><b>But there is a "tax" to running it. </b>Traditionally, operating a Matrix <a href="https://matrix.org/homeserver/about/"><u>homeserver</u></a> has meant accepting a heavy operational burden. You have to provision virtual private servers (VPS), tune PostgreSQL for heavy write loads, manage Redis for caching, configure <a href="https://www.cloudflare.com/learning/cdn/glossary/reverse-proxy/"><u>reverse proxies</u></a>, and handle rotation for <a href="https://www.cloudflare.com/application-services/products/ssl/">TLS certificates</a>. It’s a stateful, heavy beast that demands to be fed time and money, whether you’re using it a lot or a little.</p><p>We wanted to see if we could eliminate that tax entirely.</p><p><b>Spoiler: We could.</b> In this post, we’ll explain how we ported a Matrix homeserver to <a href="https://workers.cloudflare.com/"><u>Cloudflare Workers</u></a>. The resulting proof of concept is a serverless architecture where operations disappear, costs scale to zero when idle, and every connection is protected by <a href="https://www.cloudflare.com/learning/ssl/quantum/what-is-post-quantum-cryptography/"><u>post-quantum cryptography</u></a> by default. You can view the source code and <a href="https://github.com/nkuntz1934/matrix-workers"><u>deploy your own instance directly from Github</u></a>.</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/nkuntz1934/matrix-workers"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p></p>
    <div>
      <h2>From Synapse to Workers</h2>
      <a href="#from-synapse-to-workers">
        
      </a>
    </div>
    <p>Our starting point was <a href="https://github.com/matrix-org/synapse"><u>Synapse</u></a>, the Python-based reference Matrix homeserver designed for traditional deployments. PostgreSQL for persistence, Redis for caching, filesystem for media.</p><p>Porting it to Workers meant questioning every storage assumption we’d taken for granted.</p><p>The challenge was storage. Traditional homeservers assume strong consistency via a central SQL database. Cloudflare <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> offers a powerful alternative. This primitive gives us the strong consistency and atomicity required for Matrix state resolution, while still allowing the application to run at the edge.</p><p>We ported the core Matrix protocol logic — event authorization, room state resolution, cryptographic verification — in TypeScript using the Hono framework. D1 replaces PostgreSQL, KV replaces Redis, R2 replaces the filesystem, and Durable Objects handle real-time coordination.</p><p>Here’s how the mapping worked out:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JTja38UZRbFygluawrnz1/9bce290e3070155c734e874c17051551/BLOG-3101_2.png" />
          </figure>
    <div>
      <h2>From monolith to serverless</h2>
      <a href="#from-monolith-to-serverless">
        
      </a>
    </div>
    <p>Moving to Cloudflare Workers brings several advantages for a developer: simple deployment, lower costs, low latency, and built-in security.</p><p><b>Easy deployment: </b>A traditional Matrix deployment requires server provisioning, PostgreSQL administration, Redis cluster management, <a href="https://www.cloudflare.com/application-services/solutions/certificate-lifecycle-management/">TLS certificate renewal</a>, load balancer configuration, monitoring infrastructure, and on-call rotations.</p><p>With Workers, deployment is simply: wrangler deploy. Workers handles TLS, load balancing, DDoS protection, and global distribution. </p><p><b>Usage-based costs: </b>Traditional homeservers cost money whether anyone is using them or not. Workers pricing is request-based, so you pay when you’re using it, but costs drop to near zero when everyone’s asleep. </p><p><b>Lower latency globally:</b> A traditional Matrix homeserver in us-east-1 adds 200ms+ latency for users in Asia or Europe. Workers, meanwhile, run in 300+ locations worldwide. When a user in Tokyo sends a message, the Worker executes in Tokyo. </p><p><b>Built-in security: </b>Matrix homeservers can be high-value targets: They handle encrypted communications, store message history, and authenticate users. Traditional deployments require careful hardening: firewall configuration, rate limiting, DDoS mitigation, WAF rules, IP reputation filtering.</p><p>Workers provide all of this by default. </p>
    <div>
      <h3>Post-quantum protection </h3>
      <a href="#post-quantum-protection">
        
      </a>
    </div>
    <p>Cloudflare deployed post-quantum hybrid key agreement across all <a href="https://www.cloudflare.com/learning/ssl/why-use-tls-1.3/"><u>TLS 1.3</u></a> connections in <a href="https://blog.cloudflare.com/post-quantum-for-all/"><u>October 2022</u></a>. Every connection to our Worker automatically negotiates X25519MLKEM768 — a hybrid combining classical X25519 with ML-KEM, the post-quantum algorithm standardized by NIST.</p><p>Classical cryptography relies on mathematical problems that are hard for traditional computers but trivial for quantum computers running Shor’s algorithm. ML-KEM is based on lattice problems that remain hard even for quantum computers. The hybrid approach means both algorithms must fail for the connection to be compromised.</p>
    <div>
      <h3>Following a message through the system</h3>
      <a href="#following-a-message-through-the-system">
        
      </a>
    </div>
    <p>Understanding where encryption happens matters for security architecture. When someone sends a message through our homeserver, here’s the actual path:</p><p>The sender’s client takes the plaintext message and encrypts it with Megolm — Matrix’s end-to-end encryption. This encrypted payload then gets wrapped in TLS for transport. On Cloudflare, that TLS connection uses X25519MLKEM768, making it quantum-resistant.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/wGGYZ4LYspufH1c4psmL1/28acad8ab8e6535525dda413669c2d74/BLOG-3101_3.png" />
          </figure><p>The Worker terminates TLS, but what it receives is still encrypted — the Megolm ciphertext. We store that ciphertext in D1, index it by room and timestamp, and deliver it to recipients. But we never see the plaintext. The message “Hello, world” exists only on the sender’s device and the recipient’s device.</p><p>When the recipient syncs, the process reverses. They receive the encrypted payload over another quantum-resistant TLS connection, then decrypt locally with their Megolm session keys.</p>
    <div>
      <h3>Two layers, independent protection</h3>
      <a href="#two-layers-independent-protection">
        
      </a>
    </div>
    <p>This protects via two encryption layers that operate independently:</p><p>The <a href="https://www.cloudflare.com/learning/ssl/transport-layer-security-tls/"><u>transport layer (TLS)</u></a> protects data in transit. It’s encrypted at the client and decrypted at the Cloudflare edge. With X25519MLKEM768, this layer is now post-quantum.</p><p>The <a href="https://www.cloudflare.com/learning/ddos/what-is-layer-7/"><u>application layer</u></a> (Megolm E2EE) protects message content. It’s encrypted on the sender’s device and decrypted only on recipient devices. This uses classical Curve25519 cryptography.</p>
    <div>
      <h3>Who sees what</h3>
      <a href="#who-sees-what">
        
      </a>
    </div>
    <p>Any Matrix homeserver operator — whether running Synapse on a VPS or this implementation on Workers — can see metadata: which rooms exist, who’s in them, when messages were sent. But no one in the infrastructure chain can see the message content, because the E2EE payload is encrypted on sender devices before it ever hits the network. Cloudflare terminates TLS and passes requests to your Worker, but both see only Megolm ciphertext. Media in encrypted rooms is encrypted client-side before upload, and private keys never leave user devices.</p>
    <div>
      <h3>What traditional deployments would need</h3>
      <a href="#what-traditional-deployments-would-need">
        
      </a>
    </div>
    <p>Achieving post-quantum TLS on a traditional Matrix deployment would require upgrading OpenSSL or BoringSSL to a version supporting ML-KEM, configuring cipher suite preferences correctly, testing client compatibility across all Matrix apps, monitoring for TLS negotiation failures, staying current as PQC standards evolve, and handling clients that don’t support PQC gracefully.</p><p>With Workers, it’s automatic. Chrome, Firefox, and Edge all support X25519MLKEM768. Mobile apps using platform TLS stacks inherit this support. The security posture improves as Cloudflare’s <a href="https://developers.cloudflare.com/ssl/post-quantum-cryptography/"><u>PQC</u></a> deployment expands — no action required on our part.</p>
    <div>
      <h2>The storage architecture that made it work</h2>
      <a href="#the-storage-architecture-that-made-it-work">
        
      </a>
    </div>
    <p>The key insight from porting Tuwunel was that different data needs different consistency guarantees. We use each Cloudflare primitive for what it does best.</p>
    <div>
      <h3>D1 for the data model</h3>
      <a href="#d1-for-the-data-model">
        
      </a>
    </div>
    <p>D1 stores everything that needs to survive restarts and support queries: users, rooms, events, device keys. Over 25 tables covering the full Matrix data model. </p>
            <pre><code>CREATE TABLE events (
	event_id TEXT PRIMARY KEY,
	room_id TEXT NOT NULL,
	sender TEXT NOT NULL,
	event_type TEXT NOT NULL,
	state_key TEXT,
	content TEXT NOT NULL,
	origin_server_ts INTEGER NOT NULL,
	depth INTEGER NOT NULL
);
</code></pre>
            <p><a href="https://www.cloudflare.com/developer-platform/products/d1/">D1’s SQLite foundation</a> meant we could port Tuwunel’s queries with minimal changes. Joins, indexes, and aggregations work as expected.</p><p>We learned one hard lesson: D1’s eventual consistency breaks foreign key constraints. A write to rooms might not be visible when a subsequent write to events checks the foreign key. We removed all foreign keys and enforce referential integrity in application code.</p>
    <div>
      <h3>KV for ephemeral state</h3>
      <a href="#kv-for-ephemeral-state">
        
      </a>
    </div>
    <p>OAuth authorization codes live for 10 minutes, while refresh tokens last for a session.</p>
            <pre><code>// Store OAuth code with 10-minute TTL
kv.put(&amp;format!("oauth_code:{}", code), &amp;token_data)?
	.expiration_ttl(600)
	.execute()
	.await?;</code></pre>
            <p>KV’s global distribution means OAuth flows work fast regardless of where users are located.</p>
    <div>
      <h3>R2 for media</h3>
      <a href="#r2-for-media">
        
      </a>
    </div>
    <p>Matrix media maps directly to R2, so you can upload an image, get back a content-addressed URL – and egress is free.</p>
    <div>
      <h3>Durable Objects for atomicity</h3>
      <a href="#durable-objects-for-atomicity">
        
      </a>
    </div>
    <p>Some operations can’t tolerate eventual consistency. When a client claims a one-time encryption key, that key must be atomically removed. If two clients claim the same key, encrypted session establishment fails.</p><p>Durable Objects provide single-threaded, strongly consistent storage:</p>
            <pre><code>#[durable_object]
pub struct UserKeysObject {
	state: State,
	env: Env,
}

impl UserKeysObject {
	async fn claim_otk(&amp;self, algorithm: &amp;str) -&gt; Result&lt;Option&lt;Key&gt;&gt; {
    	// Atomic within single DO - no race conditions possible
    	let mut keys: Vec&lt;Key&gt; = self.state.storage()
        	.get("one_time_keys")
        	.await
        	.ok()
        	.flatten()
        	.unwrap_or_default();

    	if let Some(idx) = keys.iter().position(|k| k.algorithm == algorithm) {
        	let key = keys.remove(idx);
        	self.state.storage().put("one_time_keys", &amp;keys).await?;
        	return Ok(Some(key));
    	}
    	Ok(None)
	}
}</code></pre>
            <p>We use UserKeysObject for E2EE key management, RoomObject for real-time room events like typing indicators and read receipts, and UserSyncObject for to-device message queues. The rest flows through D1.</p>
    <div>
      <h3>Complete end-to-end encryption, complete OAuth</h3>
      <a href="#complete-end-to-end-encryption-complete-oauth">
        
      </a>
    </div>
    <p>Our implementation supports the full Matrix E2EE stack: device keys, cross-signing keys, one-time keys, fallback keys, key backup, and dehydrated devices.</p><p>Modern Matrix clients use OAuth 2.0/OIDC instead of legacy password flows. We implemented a complete OAuth provider, with dynamic client registration, PKCE authorization, RS256-signed JWT tokens, token refresh with rotation, and standard OIDC discovery endpoints.
</p>
            <pre><code>curl https://matrix.example.com/.well-known/openid-configuration
{
  "issuer": "https://matrix.example.com",
  "authorization_endpoint": "https://matrix.example.com/oauth/authorize",
  "token_endpoint": "https://matrix.example.com/oauth/token",
  "jwks_uri": "https://matrix.example.com/.well-known/jwks.json"
}
</code></pre>
            <p>Point Element or any Matrix client at the domain, and it discovers everything automatically.</p>
    <div>
      <h2>Sliding Sync for mobile</h2>
      <a href="#sliding-sync-for-mobile">
        
      </a>
    </div>
    <p>Traditional Matrix sync transfers megabytes of data on initial connection,  draining mobile battery and data plans.</p><p>Sliding Sync lets clients request exactly what they need. Instead of downloading everything, clients get the 20 most recent rooms with minimal state. As users scroll, they request more ranges. The server tracks position and sends only deltas.</p><p>Combined with edge execution, mobile clients can connect and render their room list in under 500ms, even on slow networks.</p>
    <div>
      <h2>The comparison</h2>
      <a href="#the-comparison">
        
      </a>
    </div>
    <p>For a homeserver serving a small team:</p><table><tr><th><p> </p></th><th><p><b>Traditional (VPS)</b></p></th><th><p><b>Workers</b></p></th></tr><tr><td><p>Monthly cost (idle)</p></td><td><p>$20-50</p></td><td><p>&lt;$1</p></td></tr><tr><td><p>Monthly cost (active)</p></td><td><p>$20-50</p></td><td><p>$3-10</p></td></tr><tr><td><p>Global latency</p></td><td><p>100-300ms</p></td><td><p>20-50ms</p></td></tr><tr><td><p>Time to deploy</p></td><td><p>Hours</p></td><td><p>Seconds</p></td></tr><tr><td><p>Maintenance</p></td><td><p>Weekly</p></td><td><p>None</p></td></tr><tr><td><p>DDoS protection</p></td><td><p>Additional cost</p></td><td><p>Included</p></td></tr><tr><td><p>Post-quantum TLS</p></td><td><p>Complex setup</p></td><td><p>Automatic</p></td></tr></table><p><sup>*</sup><sup><i>Based on public rates and metrics published by DigitalOcean, AWS Lightsail, and Linode as of January 15, 2026.</i></sup></p><p>The economics improve further at scale. Traditional deployments require capacity planning and over-provisioning. Workers scale automatically.</p>
    <div>
      <h2>The future of decentralized protocols</h2>
      <a href="#the-future-of-decentralized-protocols">
        
      </a>
    </div>
    <p>We started this as an experiment: could Matrix run on Workers? It can—and the approach can work for other stateful protocols, too.</p><p>By mapping traditional stateful components to Cloudflare’s primitives — Postgres to D1, Redis to KV, mutexes to Durable Objects — we can see  that complex applications don't need complex infrastructure. We stripped away the operating system, the database management, and the network configuration, leaving only the application logic and the data itself.</p><p>Workers offers the sovereignty of owning your data, without the burden of owning the infrastructure.</p><p>I have been experimenting with the implementation and am excited for any contributions from others interested in this kind of service. </p><p>Ready to build powerful, real-time applications on Workers? Get started with<a href="https://developers.cloudflare.com/workers/"> <u>Cloudflare Workers</u></a> and explore<a href="https://developers.cloudflare.com/durable-objects/"> <u>Durable Objects</u></a> for your own stateful edge applications. Join our<a href="https://discord.cloudflare.com"> <u>Discord community</u></a> to connect with other developers building at the edge.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Durable Objects]]></category>
            <category><![CDATA[D1]]></category>
            <category><![CDATA[Cloudflare Workers KV]]></category>
            <category><![CDATA[R2]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[WebAssembly]]></category>
            <category><![CDATA[Post-Quantum]]></category>
            <category><![CDATA[Encryption]]></category>
            <guid isPermaLink="false">6VOVAMNwIZ18hMaUlC6aqp</guid>
            <dc:creator>Nick Kuntz</dc:creator>
        </item>
        <item>
            <title><![CDATA[Astro is joining Cloudflare]]></title>
            <link>https://blog.cloudflare.com/astro-joins-cloudflare/</link>
            <pubDate>Fri, 16 Jan 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ The Astro Technology Company team — the creators of the Astro web framework — is joining Cloudflare. We’re doubling down on making Astro the best framework for content-driven websites, today and in the years to come. ]]></description>
            <content:encoded><![CDATA[ <p>The Astro Technology Company, creators of the Astro web framework, is joining Cloudflare.</p><p><a href="https://astro.build/"><u>Astro</u></a> is the web framework for building fast, content-driven websites. Over the past few years, we’ve seen an incredibly diverse range of developers and companies use Astro to build for the web. This ranges from established brands like Porsche and IKEA, to fast-growing AI companies like Opencode and OpenAI. Platforms that are built on Cloudflare, like <a href="https://webflow.com/feature/cloud"><u>Webflow Cloud</u></a> and <a href="https://vibe.wix.com/"><u>Wix Vibe</u></a>, have chosen Astro to power the websites their customers build and deploy to their own platforms. At Cloudflare, we use Astro, too — for our <a href="https://developers.cloudflare.com/"><u>developer docs</u></a>, <a href="https://workers.cloudflare.com/"><u>website</u></a>, <a href="https://sandbox.cloudflare.com/"><u>landing pages</u></a>, <a href="https://blog.cloudflare.com/"><u>blog</u></a>, and more. Astro is used almost everywhere there is content on the Internet. </p><p>By joining forces with the Astro team, we are doubling down on making Astro the best framework for content-driven websites for many years to come. The best version of Astro — <a href="https://github.com/withastro/astro/milestone/37"><u>Astro 6</u></a> —  is just around the corner, bringing a redesigned development server powered by Vite. The first public beta release of Astro 6 is <a href="https://github.com/withastro/astro/releases/tag/astro%406.0.0-beta.0"><u>now available</u></a>, with GA coming in the weeks ahead.</p><p>We are excited to share this news and even more thrilled for what it means for developers building with Astro. If you haven’t yet tried Astro — give it a spin and run <a href="https://docs.astro.build/en/getting-started/"><u>npm create astro@latest</u></a>.</p>
    <div>
      <h3>What this means for Astro</h3>
      <a href="#what-this-means-for-astro">
        
      </a>
    </div>
    <p>Astro will remain open source, MIT-licensed, and open to contributions, with a public roadmap and open governance. All full-time employees of The Astro Technology Company are now employees of Cloudflare, and will continue to work on Astro. We’re committed to Astro’s long-term success and eager to keep building.</p><p>Astro wouldn’t be what it is today without an incredibly strong community of open-source contributors. Cloudflare is also committed to continuing to support open-source contributions, via the <a href="https://astro.build/blog/astro-ecosystem-fund-update/"><u>Astro Ecosystem Fund</u></a>, alongside industry partners including Webflow, Netlify, Wix, Sentry, Stainless and many more.</p><p>From day one, Astro has been a bet on the web and portability: Astro is built to run anywhere, across clouds and platforms. Nothing changes about that. You can deploy Astro to any platform or cloud, and we’re committed to supporting Astro developers everywhere.</p>
    <div>
      <h3>There are many web frameworks out there — so why are developers choosing Astro?</h3>
      <a href="#there-are-many-web-frameworks-out-there-so-why-are-developers-choosing-astro">
        
      </a>
    </div>
    <p>Astro has been growing rapidly:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6SiPDolNqvmfQmHftQAr2W/b0b0b0c6725203b945d83da9b190c443/BLOG-3112_2.png" />
          </figure><p>Why? Many web frameworks have come and gone trying to be everything to everyone, aiming to serve the needs of both content-driven websites and web applications.</p><p>The key to Astro’s success: Instead of trying to serve every use case, Astro has stayed focused on <a href="https://docs.astro.build/en/concepts/why-astro/#design-principles"><u>five design principles</u></a>. Astro is…</p><ul><li><p><b>Content-driven:</b> Astro was designed to showcase your content.</p></li><li><p><b>Server-first:</b> Websites run faster when they render HTML on the server.</p></li><li><p><b>Fast by default:</b> It should be impossible to build a slow website in Astro.</p></li><li><p><b>Easy to use:</b> You don’t need to be an expert to build something with Astro.</p></li><li><p><b>Developer-focused:</b> You should have the resources you need to be successful.</p></li></ul><p>Astro’s <a href="https://docs.astro.build/en/concepts/islands/"><u>Islands Architecture</u></a> is a core part of what makes all of this possible. The majority of each page can be fast, static HTML — fast and simple to build by default, oriented around rendering content. And when you need it, you can render a specific part of a page as a client island, using any client UI framework. You can even mix and match multiple frameworks on the same page, whether that’s React.js, Vue, Svelte, Solid, or anything else:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1SjrMUpO9xZb0wxlATkrQo/16afe1efdb57da6b8b17cd804d94cfb2/BLOG-3112_3.png" />
          </figure>
    <div>
      <h3>Bringing back the joy in building websites</h3>
      <a href="#bringing-back-the-joy-in-building-websites">
        
      </a>
    </div>
    <p>The more Astro and Cloudflare started talking, the clearer it became how much we have in common. Cloudflare’s mission is to help build a better Internet — and part of that is to help build a <i>faster</i> Internet. Almost all of us grew up building websites, and we want a world where people have fun building things on the Internet, where anyone can publish to a site that is truly their own.</p><p>When Astro first <a href="https://astro.build/blog/introducing-astro/"><u>launched</u></a> in 2021, it had become painful to build great websites — it felt like a fight with build tools and frameworks. It sounds strange to say it, with the coding agents and powerful LLMs of 2026, but in 2021 it was very hard to build an excellent and fast website without being a domain expert in JavaScript build tooling. So much has gotten better, both because of Astro and in the broader frontend ecosystem, that we take this almost for granted today.</p><p>The Astro project has spent the past five years working to simplify web development. So as LLMs, then vibe coding, and now true coding agents have come along and made it possible for truly anyone to build — Astro provided a foundation that was simple and fast by default. We’ve all seen how much better and faster agents get when building off the right foundation, in a well-structured codebase. More and more, we’ve seen both builders and platforms choose Astro as that foundation.</p><p>We’ve seen this most clearly through the platforms that both Cloudflare and Astro serve, that extend Cloudflare to their own customers in creative ways using <a href="https://developers.cloudflare.com/cloudflare-for-platforms/"><u>Cloudflare for Platforms</u></a>, and have chosen Astro as the framework that their customers build on. </p><p>When you deploy to <a href="https://webflow.com/feature/cloud"><u>Webflow Cloud</u></a>, your Astro site just works and is deployed across Cloudflare’s network. When you start a new project with <a href="https://vibe.wix.com/"><u>Wix Vibe</u></a>, behind the scenes you’re creating an Astro site, running on Cloudflare. And when you generate a developer docs site using <a href="https://www.stainless.com/"><u>Stainless</u></a>, that generates an Astro project, running on Cloudflare, powered by <a href="https://astro.build/blog/stainless-astro-launch/"><u>Starlight</u></a> — a framework built on Astro.</p><p>Each of these platforms is built for a different audience. But what they have in common — beyond their use of Cloudflare and Astro — is they make it <i>fun</i> to create and publish content to the Internet. In a world where everyone can be both a builder and content creator, we think there are still so many more platforms to build and people to reach.</p>
    <div>
      <h3><b>Astro 6 — new local dev server, powered by Vite</b></h3>
      <a href="#astro-6-new-local-dev-server-powered-by-vite">
        
      </a>
    </div>
    <p>Astro 6 is coming, and the first open beta release is <a href="https://astro.build/blog/astro-6-beta/"><u>now available</u></a>. To be one of the first to try it out, run:</p><p><code>npm create astro@latest -- --ref next</code></p><p>Or to upgrade your existing Astro app, run:</p><p><code>npx @astrojs/upgrade beta</code></p><p>Astro 6 brings a brand new development server, built on the <a href="https://vite.dev/guide/api-environment"><u>Vite Environments API</u></a>, that runs your code locally using the same runtime that you deploy to. This means that when you run <code>astro dev</code> with the <a href="https://developers.cloudflare.com/workers/vite-plugin/"><u>Cloudflare Vite plugin</u></a>, your code runs in <a href="https://github.com/cloudflare/workerd"><u>workerd</u></a>, the open-source Cloudflare Workers runtime, and can use <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a>, <a href="https://developers.cloudflare.com/d1/"><u>D1</u></a>, <a href="https://developers.cloudflare.com/kv/"><u>KV</u></a>, <a href="https://developers.cloudflare.com/agents/"><u>Agents</u></a> and <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>more</u></a>. This isn’t just a Cloudflare feature: Any JavaScript runtime with a plugin that uses the Vite Environments API can benefit from this new support, and ensure local dev runs in the same environment, with the same runtime APIs as production.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4YAgzaSkgUr3gxK5Mkh62V/09847d3f15744b6f049864a6e898a343/BLOG-3112_4.png" />
          </figure><p><a href="https://docs.astro.build/en/reference/experimental-flags/live-content-collections/"><u>Live Content Collections</u></a> in Astro are also stable in Astro 6 and out of beta. These content collections let you update data in real time, without requiring a rebuild of your site. This makes it easy to bring in content that changes often, such as the current inventory in a storefront, while still benefitting from the built-in validation and caching that come with Astro’s existing support for <a href="https://v6.docs.astro.build/en/guides/content-collections"><u>content collections</u></a>.</p><p>There’s more to Astro 6, including Astro’s most upvoted feature request — first-class support for Content Security Policy (CSP) — as well as simpler APIs, an upgrade to <a href="https://zod.dev/?id=introduction"><u>Zod</u></a> 4, and more.</p>
    <div>
      <h3>Doubling down on Astro</h3>
      <a href="#doubling-down-on-astro">
        
      </a>
    </div>
    <p>We're thrilled to welcome the Astro team to Cloudflare. We’re excited to keep building, keep shipping, and keep making Astro the best way to build content-driven sites. We’re already thinking about what comes next beyond V6, and we’d love to hear from you.</p><p>To keep up with the latest, follow the <a href="https://astro.build/blog/"><u>Astro blog</u></a> and join the <a href="https://astro.build/chat"><u>Astro Discord</u></a>. Tell us what you’re building!</p><p></p> ]]></content:encoded>
            <category><![CDATA[Acquisitions]]></category>
            <category><![CDATA[Application Services]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">6snDEFT5jgryV5wPhY4HEj</guid>
            <dc:creator>Fred Schott</dc:creator>
            <dc:creator>Brendan Irvine-Broque</dc:creator>
        </item>
        <item>
            <title><![CDATA[How Workers powers our internal maintenance scheduling pipeline]]></title>
            <link>https://blog.cloudflare.com/building-our-maintenance-scheduler-on-workers/</link>
            <pubDate>Mon, 22 Dec 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Physical data center maintenance is risky on a global network. We built a maintenance scheduler on Workers to safely plan disruptive operations, while solving scaling challenges by viewing the state of our infrastructure through a graph interface on top of multiple data sources and metrics pipelines. ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare has data centers in over <a href="https://www.cloudflare.com/network/"><u>330 cities globally</u></a>, so you might think we could easily disrupt a few at any time without users noticing when we plan data center operations. However, the reality is that <a href="https://developers.cloudflare.com/support/disruptive-maintenance/"><u>disruptive maintenance</u></a> requires careful planning, and as Cloudflare grew, managing these complexities through manual coordination between our infrastructure and network operations specialists became nearly impossible.</p><p>It is no longer feasible for a human to track every overlapping maintenance request or account for every customer-specific routing rule in real time. We reached a point where manual oversight alone couldn't guarantee that a routine hardware update in one part of the world wouldn't inadvertently conflict with a critical path in another.</p><p>We realized we needed a centralized, automated "brain" to act as a safeguard — a system that could see the entire state of our network at once. By building this scheduler on <a href="https://workers.cloudflare.com/"><u>Cloudflare Workers</u></a>, we created a way to programmatically enforce safety constraints, ensuring that no matter how fast we move, we never sacrifice the reliability of the services on which our customers depend.</p><p>In this blog post, we’ll explain how we built it, and share the results we’re seeing now.</p>
    <div>
      <h2>Building a system to de-risk critical maintenance operations</h2>
      <a href="#building-a-system-to-de-risk-critical-maintenance-operations">
        
      </a>
    </div>
    <p>Picture an edge router that acts as one of a small, redundant group of gateways that collectively connect the public Internet to the many Cloudflare data centers operating in a metro area. In a populated city, we need to ensure that the multiple data centers sitting behind this small cluster of routers do not get cut off because the routers were all taken offline simultaneously. </p><p>Another maintenance challenge comes from our Zero Trust product, Dedicated CDN Egress IPs, which allows customers to choose specific data centers from which their user traffic will exit Cloudflare and be sent to their geographically close origin servers for low latency. (For the purpose of brevity in this post, we'll refer to the Dedicated CDN Egress IPs product as "Aegis," which was its former name.) If all the data centers a customer chose are offline at once, they would see higher latency and possibly 5xx errors, which we must avoid. </p><p>Our maintenance scheduler solves problems like these. We can make sure that we always have at least one edge router active in a certain area. And when scheduling maintenance, we can see if the combination of multiple scheduled events would cause all the data centers for a customer’s Aegis pools to be offline at the same time.</p><p>Before we created the scheduler, these simultaneous disruptive events could cause downtime for customers. Now, our scheduler notifies internal operators of potential conflicts, allowing us to propose a new time to avoid overlapping with other related data center maintenance events.</p><p>We define these operational scenarios, such as edge router availability and customer rules, as maintenance constraints which allow us to plan more predictable and safe maintenance.</p>
    <div>
      <h2>Maintenance constraints</h2>
      <a href="#maintenance-constraints">
        
      </a>
    </div>
    <p>Every constraint starts with a set of proposed maintenance items, such as a network router or list of servers. We then find all the maintenance events in the calendar that overlap with the proposed maintenance time window.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2vHCauxOGRXzhrO6DNDr2S/cf38b93ac9b812e5e064f800e537e549/image4.png" />
          </figure><p>Next, we aggregate product APIs, such as a list of Aegis customer IP pools. Aegis returns a set of IP ranges where a customer requested egress out of specific data center IDs, shown below.</p>
            <pre><code>[
    {
      "cidr": "104.28.0.32/32",
      "pool_name": "customer-9876",
      "port_slots": [
        {
          "dc_id": 21,
          "other_colos_enabled": true,
        },
        {
          "dc_id": 45,
          "other_colos_enabled": true,
        }
      ],
      "modified_at": "2023-10-22T13:32:47.213767Z"
    },
]</code></pre>
            <p>In this scenario, data center 21 and data center 45 relate to each other because we need at least one data center online for the Aegis customer 9876 to receive egress traffic from Cloudflare. If we tried to take data centers 21 and 45 down simultaneously, our coordinator would alert us that there would be unintended consequences for that customer workload.</p><p>We initially had a naive solution to load all data into a single Worker. This included all server relationships, product configurations, and metrics for product and infrastructure health to compute constraints. Even in our proof of concept phase, we ran into problems with “out of memory” errors.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1v4Q6bXsZLBXLbrbRrcW3o/00d291ef3db459e99ae9b620965b6bc7/image2.png" />
          </figure><p>We needed to be more cognizant of Workers’ <a href="https://developers.cloudflare.com/workers/platform/limits/"><u>platform limits</u></a>. This required loading only as much data as was absolutely necessary to process the constraint’s business logic. If a maintenance request for a router in Frankfurt, Germany, comes in, we almost certainly do not care what is happening in Australia since there is no overlap across regions. Thus, we should only load data for neighboring data centers in Germany. We needed a more efficient way to process relationships in our dataset.</p>
    <div>
      <h2>Graph processing on Workers</h2>
      <a href="#graph-processing-on-workers">
        
      </a>
    </div>
    <p>As we looked at our constraints, a pattern emerged where each constraint boiled down to two concepts: objects and associations. In graph theory, these components are known as vertices and edges, respectively. An object could be a network router and an association could be the list of Aegis pools in the data center that requires the router to be online. We took inspiration from Facebook’s <a href="https://research.facebook.com/publications/tao-facebooks-distributed-data-store-for-the-social-graph/"><u>TAO</u></a> research paper to establish a graph interface on top of our product and infrastructure data. The API looks like the following:</p>
            <pre><code>type ObjectID = string

interface MainTAOInterface&lt;TObject, TAssoc, TAssocType&gt; {
  object_get(id: ObjectID): Promise&lt;TObject | undefined&gt;

  assoc_get(id1: ObjectID, atype: TAssocType): AsyncIterable&lt;TAssoc&gt;
}</code></pre>
            <p>The core insight is that associations are typed. For example, a constraint would call the graph interface to retrieve Aegis product data.</p>
            <pre><code>async function constraint(c: AppContext, aegis: TAOAegisClient, datacenters: string[]): Promise&lt;Record&lt;string, PoolAnalysis&gt;&gt; {
  const datacenterEntries = await Promise.all(
    datacenters.map(async (dcID) =&gt; {
      const iter = aegis.assoc_get(c, dcID, AegisAssocType.DATACENTER_INSIDE_AEGIS_POOL)
      const pools: string[] = []
      for await (const assoc of iter) {
        pools.push(assoc.id2)
      }
      return [dcID, pools] as const
    }),
  )

  const datacenterToPools = new Map&lt;string, string[]&gt;(datacenterEntries)
  const uniquePools = new Set&lt;string&gt;()
  for (const pools of datacenterToPools.values()) {
    for (const pool of pools) uniquePools.add(pool)
  }

  const poolTotalsEntries = await Promise.all(
    [...uniquePools].map(async (pool) =&gt; {
      const total = aegis.assoc_count(c, pool, AegisAssocType.AEGIS_POOL_CONTAINS_DATACENTER)
      return [pool, total] as const
    }),
  )

  const poolTotals = new Map&lt;string, number&gt;(poolTotalsEntries)
  const poolAnalysis: Record&lt;string, PoolAnalysis&gt; = {}
  for (const [dcID, pools] of datacenterToPools.entries()) {
    for (const pool of pools) {
      poolAnalysis[pool] = {
        affectedDatacenters: new Set([dcID]),
        totalDatacenters: poolTotals.get(pool),
      }
    }
  }

  return poolAnalysis
}</code></pre>
            <p>We use two association types in the code above:</p><ol><li><p>DATACENTER_INSIDE_AEGIS_POOL, which retrieves the Aegis customer pools that a data center resides in.</p></li><li><p>AEGIS_POOL_CONTAINS_DATACENTER, which retrieves the data centers an Aegis pool needs to serve traffic.</p></li></ol><p>The associations are inverted indices of one another. The access pattern is exactly the same as before, but now the graph implementation has much more control of how much data it queries. Before, we needed to load all Aegis pools into memory and filter inside constraint business logic. Now, we can directly fetch only the data that matters to the application.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4b68YLIHiOPt5EeyTUTeBt/5f624f0d0912e7dfd0e308a3427d194c/unnamed.png" />
          </figure><p>The interface is powerful because our graph implementation can improve performance behind the scenes without complicating the business logic. This lets us use the scalability of Workers and Cloudflare’s CDN to fetch data from our internal systems very quickly.</p>
    <div>
      <h2>Fetch pipeline</h2>
      <a href="#fetch-pipeline">
        
      </a>
    </div>
    <p>We switched to using the new graph implementation, sending more targeted API requests. Response sizes dropped by 100x overnight, switching from loading a few massive requests to many tiny requests.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/71aDOicyippmUbj4ypXKw/73dacdf16ca0ac422efdfec9e86e9dbf/image5.png" />
          </figure><p>While this solves the issue of loading too much into memory, we now have a subrequest problem because instead of a few large HTTP requests, we make an order of magnitude more of small requests. Overnight, we started consistently breaching subrequest limits.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/36KjfOU8xIuUkwF7QOlNkK/e2275a50ff1bef497cdb201c2d3a6249/image3.png" />
          </figure><p>In order to solve this problem, we built a smart middleware layer between our graph implementation and the <code>fetch</code> API.</p>
            <pre><code>export const fetchPipeline = new FetchPipeline()
  .use(requestDeduplicator())
  .use(lruCacher({
    maxItems: 100,
  }))
  .use(cdnCacher())
  .use(backoffRetryer({
    retries: 3,
    baseMs: 100,
    jitter: true,
  }))
  .handler(terminalFetch);</code></pre>
            <p>If you’re familiar with Go, you may have seen the <a href="https://pkg.go.dev/golang.org/x/sync/singleflight"><u>singleflight</u></a> package before. We took inspiration from this idea and the first middleware component in the fetch pipeline deduplicates inflight HTTP requests, so they all wait on the same Promise for data instead of producing duplicate requests in the same Worker. Next, we use a lightweight Least Recently Used (LRU) cache to internally cache requests that we have already seen before.</p><p>Once both of those are complete, we use Cloudflare’s <code>caches.default.match</code> function to cache all GET requests in the region that the Worker is running. Since we have multiple data sources with different performance characteristics, we choose time to live (TTL) values carefully. For example, real-time data is only cached for 1 minute. Relatively static infrastructure data could be cached for 1–24 hours depending on the type of data. Power management data might be changed manually and infrequently, so we can cache it for longer at the edge.</p><p>In addition to those layers, we have the standard exponential backoff, retries and jitter. This helps reduce wasted <code>fetch</code> calls where a downstream resource might be unavailable temporarily. By backing off slightly, we increase the chance that we fetch the next request successfully. Conversely, if the Worker sends requests constantly without backoff, it will easily breach the subrequest limit when the origin starts returning 5xx errors.</p><p>Putting it all together, we saw ~99% cache hit rate. <a href="https://www.cloudflare.com/learning/cdn/what-is-a-cache-hit-ratio/"><u>Cache hit rate</u></a> is the percentage of HTTP requests served from Cloudflare’s fast cache memory (a "hit") versus slower requests to data sources running in our control plane (a "miss"), calculated as (hits / (hits + misses)). A high rate means better HTTP request performance and lower costs because querying data from cache in our Worker is an order of magnitude faster than fetching from an origin server in a different region. After tuning settings, for our in memory and CDN caches, hit rates have increased dramatically. Since much of our workload is real-time, we will never have a 100% hit rate as we must request fresh data at least once per minute.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1jifI33QpBkQPd7tE5Tapi/186a74b922faac3abe091b79f03d640b/image1.png" />
          </figure><p>We have talked about improving the fetching layer, but not about how we made origin HTTP requests faster. Our maintenance coordinator needs to react in real-time to network degradation and failure of machines in data centers. We use our distributed <a href="https://blog.cloudflare.com/how-cloudflare-runs-prometheus-at-scale/"><u>Prometheus</u></a> query engine, Thanos, to deliver performant metrics from the edge into the coordinator.</p>
    <div>
      <h2>Thanos in real-time</h2>
      <a href="#thanos-in-real-time">
        
      </a>
    </div>
    <p>To explain how our choice in using the graph processing interface affected our real-time queries, let’s walk through an example. In order to analyze the health of edge routers, we could send the following query:</p>
            <pre><code>sum by (instance) (network_snmp_interface_admin_status{instance=~"edge.*"})</code></pre>
            <p>Originally, we asked our Thanos service, which stores Prometheus metrics, for a list of each edge router’s current health status and would manually filter for routers relevant to the maintenance inside the Worker. This is suboptimal for many reasons. For example, Thanos returned multi-MB responses which it needed to decode and encode. The Worker also needed to cache and decode these large HTTP responses only to filter out the majority of the data while processing a specific maintenance request. Since TypeScript is single-threaded and parsing JSON data is CPU-bound, sending two large HTTP requests means that one is blocked waiting for the other to finish parsing.</p><p>Instead, we simply use the graph to find targeted relationships such as the interface links between edge and spine routers, denoted as <code>EDGE_ROUTER_NETWORK_CONNECTS_TO_SPINE</code>.</p>
            <pre><code>sum by (lldp_name) (network_snmp_interface_admin_status{instance=~"edge01.fra03", lldp_name=~"spine.*"})</code></pre>
            <p>The result is 1 Kb on average instead of multiple MBs, or approximately 1000x smaller. This also massively reduces the amount of CPU required inside the Worker because we offload most of the deserialization to Thanos. As we explained before, this means we need to make a higher number of these smaller fetch requests, but load balancers in front of Thanos can spread the requests evenly to increase throughput for this use case. </p><p>Our graph implementation and fetch pipeline successfully tamed the 'thundering herd' of thousands of tiny real-time requests. However, historical analysis presents a different I/O challenge. Instead of fetching small, specific relationships, we need to scan months of data to find conflicting maintenance windows. In the past, Thanos would issue a massive amount of random reads to our object store, <a href="https://www.cloudflare.com/developer-platform/products/r2/">R2</a>. To solve this massive bandwidth penalty without losing performance, we adopted a new approach the Observability team developed internally this year.</p>
    <div>
      <h2>Historical data analysis</h2>
      <a href="#historical-data-analysis">
        
      </a>
    </div>
    <p>There are enough maintenance use cases that we must rely on historical data to tell us if our solution is accurate and will scale with the growth of Cloudflare’s network. We do not want to cause incidents, and we also want to avoid blocking proposed physical maintenance unnecessarily. In order to balance these two priorities, we can use time series data about maintenance events that happened two months or even a year ago to tell us how often a maintenance event is violating one of our constraints, e.g. edge router availability or Aegis. We blogged earlier this year about using Thanos to <a href="https://blog.cloudflare.com/safe-change-at-any-scale/"><u>automatically release and revert software</u></a> to the edge.</p><p>Thanos primarily fans out to Prometheus, but when Prometheus' retention is not enough to answer the query it has to download data from object storage — R2 in our case. Prometheus TSDB blocks were originally designed for local SSDs, relying on random access patterns that become a bottleneck when moved to object storage. When our scheduler needs to analyze months of historical maintenance data to identify conflicting constraints, random reads from object storage incur a massive I/O penalty. To solve this, we implemented a conversion layer that transforms these blocks into <a href="https://parquet.apache.org/"><u>Apache Parquet</u></a> files. Parquet is a columnar format native to big data analytics that organizes data by column rather than row, which — together with rich statistics — allows us to only fetch what we need.</p><p>Furthermore, since we are rewriting TSDB blocks into Parquet files, we can also store the data in a way that allows us to read the data in just a few big sequential chunks.</p>
            <pre><code>sum by (instance) (hmd:release_scopes:enabled{dc_id="45"})</code></pre>
            <p>In the example above we would choose the tuple “(__name__, dc_id)” as a primary sorting key so that metrics with the name “hmd:release_scopes:enabled” and the same value for “dc_id” get sorted close together.</p><p>Our Parquet gateway now issues precise R2 range requests to fetch only the specific columns relevant to the query. This reduces the payload from megabytes to kilobytes. Furthermore, because these file segments are immutable, we can aggressively cache them on the Cloudflare CDN.</p><p>This turns R2 into a low-latency query engine, allowing us to backtest complex maintenance scenarios against long-term trends instantly, avoiding the timeouts and high tail latency we saw with the original TSDB format. The graph below shows a recent load test, where Parquet reached up to 15x the P90 performance compared to the old system for the same query pattern.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6lVj6W4W4MMUy6cEsDpk5G/21614b7ac003a86cb5162a2ba75f4c42/image8.png" />
          </figure><p>To get a deeper understanding of how the Parquet implementation works, you can watch this talk at PromCon EU 2025, <a href="https://www.youtube.com/watch?v=wDN2w2xN6bA&amp;list=PLoz-W_CUquUlHOg314_YttjHL0iGTdE3O&amp;index=16"><u>Beyond TSDB: Unlocking Prometheus with Parquet for Modern Scale</u></a>.</p>
    <div>
      <h2>Building for scale</h2>
      <a href="#building-for-scale">
        
      </a>
    </div>
    <p>By leveraging Cloudflare Workers, we moved from a system that ran out of memory to one that intelligently caches data and uses efficient observability tooling to analyze product and infrastructure data in real time. We built a maintenance scheduler that balances network growth with product performance.</p><p>But “balance” is a moving target.</p><p>Every day, we add more hardware around the world, and the logic required to maintain it without disrupting customer traffic gets exponentially harder with more products and types of maintenance operations. We’ve worked through the first set of challenges, but now we’re staring down more subtle, complex ones that only appear at this massive scale.</p><p>We need engineers who aren't afraid of hard problems. Join our <a href="https://www.cloudflare.com/careers/jobs/?department=Infrastructure"><u>Infrastructure team</u></a> and come build with us.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Prometheus]]></category>
            <category><![CDATA[Infrastructure]]></category>
            <guid isPermaLink="false">5pdspiP2m71MeIoVL8wv1i</guid>
            <dc:creator>Kevin Deems</dc:creator>
            <dc:creator>Michael Hoffmann</dc:creator>
        </item>
    </channel>
</rss>