The Cloudflare Blog

Durable Objects in Dynamic Workers: Give each AI-generated app its own database

Kenton Varda — Mon, 13 Apr 2026 13:08:35 GMT

A few weeks ago, we announced Dynamic Workers, a new feature of the Workers platform which lets you load Worker code on-the-fly into a secure sandbox. The Dynamic Worker Loader API essentially provides direct access to the basic compute isolation primitive that Workers has been based on all along: isolates, not containers. Isolates are much lighter-weight than containers, and as such, can load 100x faster using 1/10 the memory. They are so efficient, they can be treated as "disposable": start one up to run a few lines of code, then throw it away. Like a secure version of eval().

Dynamic Workers have many uses. In the original announcement, we focused on how to use them to run AI-agent-generated code as an alternative to tool calls. In this use case, an AI agent performs actions at the request of a user by writing a few lines of code and executing them. The code is single-use, intended to perform one task one time, and is thrown away immediately after it executes.

But what if you want an AI to generate more persistent code? What if you want your AI to build a small application with a custom UI the user can interact with? What if you want that application to have long-lived state? But of course, you still want it to run in a secure sandbox.

One way to do this would be to use Dynamic Workers, and simply provide the Worker with an RPC API that gives it access to storage. Using bindings, you could give the Dynamic Worker an API that points back to your remote SQL database (perhaps backed by Cloudflare D1, or a Postgres database you access through Hyperdrive — it's up to you).

But Workers also has a unique and extremely fast type of storage that may be a perfect fit for this use case: Durable Objects. A Durable Object is a special kind of Worker that has a unique name, with one instance globally per name. That instance has a SQLite database attached, which lives on local disk on the machine where the Durable Object runs. This makes storage access ridiculously fast: there is effectively zero latency.

Perhaps, then, what you really want is for your AI to write code for a Durable Object, and then you want to run that code in a Dynamic Worker.

But how?

This presents a weird problem. Normally, to use Durable Objects you have to:

Write a class extending DurableObject.
Export it from your Worker's main module.
Specify in your Wrangler config that storage should be provision for this class. This creates a Durable Object namespace that points at your class for handling incoming requests.
Declare a Durable Object namespace binding pointing at your namespace (or use ctx.exports), and use it to make requests to your Durable Object.

This doesn't extend naturally to Dynamic Workers. First, there is the obvious problem: The code is dynamic. You run it without invoking the Cloudflare API at all. But Durable Object storage has to be provisioned through the API, and the namespace has to point at an implementing class. It can't point at your Dynamic Worker.

But there is a deeper problem: Even if you could somehow configure a Durable Object namespace to point directly at a Dynamic Worker, would you want to? Do you want your agent (or user) to be able to create a whole namespace full of Durable Objects? To use unlimited storage spread around the world?

You probably don't. You probably want some control. You may want to limit, or at least track, how many objects they create. Maybe you want to limit them to just one object (probably good enough for vibe-coded personal apps). You may want to add logging and other observability. Metrics. Billing. Etc.

To do all this, what you really want is for requests to these Durable Objects to go to your code first, where you can then do all the "logistics", and then forward the request into the agent's code. You want to write a supervisor that runs as part of every Durable Object.

Solution: Durable Object Facets

Today we are releasing, in open beta, a feature that solves this problem.

Durable Object Facets allow you to load and instantiate a Durable Object class dynamically, while providing it with a SQLite database to use for storage. With Facets:

First you create a normal Durable Object namespace, pointing to a class you write.
In that class, you load the agent's code as a Dynamic Worker, and call into it.
The Dynamic Worker's code can implement a Durable Object class directly. That is, it literally exports a class declared as extends DurableObject.
You are instantiating that class as a "facet" of your own Durable Object.
The facet gets its own SQLite database, which it can use via the normal Durable Object storage APIs. This database is separate from the supervisor's database, but the two are stored together as part of the same overall Durable Object.

How it works

Here is a simple, complete implementation of an app platform that dynamically loads and runs a Durable Object class:

import { DurableObject } from "cloudflare:workers";

// For the purpose of this example, we'll use this static
// application code, but in the real world this might be generated
// by AI (or even, perhaps, a human user).
const AGENT_CODE = `
  import { DurableObject } from "cloudflare:workers";

  // Simple app that remembers how many times it has been invoked
  // and returns it.
  export class App extends DurableObject {
    fetch(request) {
      // We use storage.kv here for simplicity, but storage.sql is
      // also available. Both are backed by SQLite.
      let counter = this.ctx.storage.kv.get("counter") || 0;
      ++counter;
      this.ctx.storage.kv.put("counter", counter);

      return new Response("You've made " + counter + " requests.\\n");
    }
  }
`;

// AppRunner is a Durable Object you write that is responsible for
// dynamically loading applications and delivering requests to them.
// Each instance of AppRunner contains a different app.
export class AppRunner extends DurableObject {
  async fetch(request) {
    // We've received an HTTP request, which we want to forward into
    // the app.

    // The app itself runs as a child facet named "app". One Durable
    // Object can have any number of facets (subject to storage limits)
    // with different names, but in this case we have only one. Call
    // this.ctx.facets.get() to get a stub pointing to it.
    let facet = this.ctx.facets.get("app", async () => {
      // If this callback is called, it means the facet hasn't
      // started yet (or has hibernated). In this callback, we can
      // tell the system what code we want it to load.

      // Load the Dynamic Worker.
      let worker = this.#loadDynamicWorker();

      // Get the exported class we're interested in.
      let appClass = worker.getDurableObjectClass("App");

      return { class: appClass };
    });

    // Forward request to the facet.
    // (Alternatively, you could call RPC methods here.)
    return await facet.fetch(request);
  }

  // RPC method that a client can call to set the dynamic code
  // for this app.
  setCode(code) {
    // Store the code in the AppRunner's SQLite storage.
    // Each unique code must have a unique ID to pass to the
    // Dynamic Worker Loader API, so we generate one randomly.
    this.ctx.storage.kv.put("codeId", crypto.randomUUID());
    this.ctx.storage.kv.put("code", code);
  }

  #loadDynamicWorker() {
    // Use the Dynamic Worker Loader API like normal. Use get()
    // rather than load() since we may load the same Worker many
    // times.
    let codeId = this.ctx.storage.kv.get("codeId");
    return this.env.LOADER.get(codeId, async () => {
      // This Worker hasn't been loaded yet. Load its code from
      // our own storage.
      let code = this.ctx.storage.kv.get("code");

      return {
        compatibilityDate: "2026-04-01",
        mainModule: "worker.js",
        modules: { "worker.js": code },
        globalOutbound: null,  // block network access
      }
    });
  }
}

// This is a simple Workers HTTP handler that uses AppRunner.
export default {
  async fetch(req, env, ctx) {
    // Get the instance of AppRunner named "my-app".
    // (Each name has exactly one Durable Object instance in the
    // world.)
    let obj = ctx.exports.AppRunner.getByName("my-app");

    // Initialize it with code. (In a real use case, you'd only
    // want to call this once, not on every request.)
    await obj.setCode(AGENT_CODE);

    // Forward the request to it.
    return await obj.fetch(req);
  }
}

In this example:

AppRunner is a "normal" Durable Object written by the platform developer (you).
Each instance of AppRunner manages one application. It stores the app code and loads it on demand.
The application itself implements and exports a Durable Object class, which the platform expects is named App.
AppRunner loads the application code using Dynamic Workers, and then executes the code as a Durable Object Facet.
Each instance of AppRunner is one Durable Object composed of two SQLite databases: one belonging to the parent (AppRunner itself) and one belonging to the facet (App). These databases are isolated: the application cannot read AppRunner's database, only its own.

To run the example, copy the code above into a file worker.js, pair it with the following wrangler.jsonc, and run it locally with npx wrangler dev.

// wrangler.jsonc for the above sample worker.
{
  "compatibility_date": "2026-04-01",
  "main": "worker.js",
  "migrations": [
    {
      "tag": "v1",
      "new_sqlite_classes": [
        "AppRunner"
      ]
    }
  ],
  "worker_loaders": [
    {
      "binding": "LOADER",
    },
  ],
}

Start building

Facets are a feature of Dynamic Workers, available in beta immediately to users on the Workers Paid plan.

Check out the documentation to learn more about Dynamic Workers and Facets.

Sandboxing AI agents, 100x faster

Kenton Varda — Tue, 24 Mar 2026 13:00:00 GMT

Last September we introduced Code Mode, the idea that agents should perform tasks not by making tool calls, but instead by writing code that calls APIs. We've shown that simply converting an MCP server into a TypeScript API can cut token usage by 81%. We demonstrated that Code Mode can also operate behind an MCP server instead of in front of it, creating the new Cloudflare MCP server that exposes the entire Cloudflare API with just two tools and under 1,000 tokens.

But if an agent (or an MCP server) is going to execute code generated on-the-fly by AI to perform tasks, that code needs to run somewhere, and that somewhere needs to be secure. You can't just eval() AI-generated code directly in your app: a malicious user could trivially prompt the AI to inject vulnerabilities.

You need a sandbox: a place to execute code that is isolated from your application and from the rest of the world, except for the specific capabilities the code is meant to access.

Sandboxing is a hot topic in the AI industry. For this task, most people are reaching for containers. Using a Linux-based container, you can start up any sort of code execution environment you want. Cloudflare even offers our container runtime and our Sandbox SDK for this purpose.

But containers are expensive and slow to start, taking hundreds of milliseconds to boot and hundreds of megabytes of memory to run. You probably need to keep them warm to avoid delays, and you may be tempted to reuse existing containers for multiple tasks, compromising the security.

If we want to support consumer-scale agents, where every end user has an agent (or many!) and every agent writes code, containers are not enough. We need something lighter.

And we have it.

Dynamic Worker Loader: a lean sandbox

Tucked into our Code Mode post in September was the announcement of a new, experimental feature: the Dynamic Worker Loader API. This API allows a Cloudflare Worker to instantiate a new Worker, in its own sandbox, with code specified at runtime, all on the fly.

Dynamic Worker Loader is now in open beta, available to all paid Workers users.

Read the docs for full details, but here's what it looks like:

// Have your LLM generate code like this.
let agentCode: string = `
  export default {
    async myAgent(param, env, ctx) {
      // ...
    }
  }
`;

// Get RPC stubs representing APIs the agent should be able
// to access. (This can be any Workers RPC API you define.)
let chatRoomRpcStub = ...;

// Load a worker to run the code, using the worker loader
// binding.
let worker = env.LOADER.load({
  // Specify the code.
  compatibilityDate: "2026-03-01",
  mainModule: "agent.js",
  modules: { "agent.js": agentCode },

  // Give agent access to the chat room API.
  env: { CHAT_ROOM: chatRoomRpcStub },

  // Block internet access. (You can also intercept it.)
  globalOutbound: null,
});

// Call RPC methods exported by the agent code.
await worker.getEntrypoint().myAgent(param);

That's it.

100x faster

Dynamic Workers use the same underlying sandboxing mechanism that the entire Cloudflare Workers platform has been built on since its launch, eight years ago: isolates. An isolate is an instance of the V8 JavaScript execution engine, the same engine used by Google Chrome. They are how Workers work.

An isolate takes a few milliseconds to start and uses a few megabytes of memory. That's around 100x faster and 10x-100x more memory efficient than a typical container.

That means that if you want to start a new isolate for every user request, on-demand, to run one snippet of code, then throw it away, you can.

Unlimited scalability

Many container-based sandbox providers impose limits on global concurrent sandboxes and rate of sandbox creation. Dynamic Worker Loader has no such limits. It doesn't need to, because it is simply an API to the same technology that has powered our platform all along, which has always allowed Workers to seamlessly scale to millions of requests per second.

Want to handle a million requests per second, where every single request loads a separate Dynamic Worker sandbox, all running concurrently? No problem!

Zero latency

One-off Dynamic Workers usually run on the same machine — the same thread, even — as the Worker that created them. No need to communicate around the world to find a warm sandbox. Isolates are so lightweight that we can just run them wherever the request landed. Dynamic Workers are supported in every one of Cloudflare's hundreds of locations around the world.

It's all JavaScript

The only catch, vs. containers, is that your agent needs to write JavaScript.

Technically, Workers (including dynamic ones) can use Python and WebAssembly, but for small snippets of code — like that written on-demand by an agent — JavaScript will load and run much faster.

We humans tend to have strong preferences on programming languages, and while many love JavaScript, others might prefer Python, Rust, or countless others.

But we aren't talking about humans here. We're talking about AI. AI will write any language you want it to. LLMs are experts in every major language. Their training data in JavaScript is immense.

JavaScript, by its nature on the web, is designed to be sandboxed. It is the correct language for the job.

Tools defined in TypeScript

If we want our agent to be able to do anything useful, it needs to talk to external APIs. How do we tell it about the APIs it has access to?

MCP defines schemas for flat tool calls, but not programming APIs. OpenAPI offers a way to express REST APIs, but it is verbose, both in the schema itself and the code you'd have to write to call it.

For APIs exposed to JavaScript, there is a single, obvious answer: TypeScript.

Agents know TypeScript. TypeScript is designed to be concise. With very few tokens, you can give your agent a precise understanding of your API.

// Interface to interact with a chat room.
interface ChatRoom {
  // Get the last `limit` messages of the chat log.
  getHistory(limit: number): Promise;

  // Subscribe to new messages. Dispose the returned object
  // to unsubscribe.
  subscribe(callback: (msg: Message) => void): Promise;

  // Post a message to chat.
  post(text: string): Promise;
}

type Message = {
  author: string;
  time: Date;
  text: string;
}

Compare this with the equivalent OpenAPI spec (which is so long you have to scroll to see it all):

openapi: 3.1.0
info:
  title: ChatRoom API
  description: >
    Interface to interact with a chat room.
  version: 1.0.0

paths:
  /messages:
    get:
      operationId: getHistory
      summary: Get recent chat history
      description: Returns the last `limit` messages from the chat log, newest first.
      parameters:
        - name: limit
          in: query
          required: true
          schema:
            type: integer
            minimum: 1
      responses:
        "200":
          description: A list of messages.
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: "#/components/schemas/Message"

    post:
      operationId: postMessage
      summary: Post a message to the chat room
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - text
              properties:
                text:
                  type: string
      responses:
        "204":
          description: Message posted successfully.

  /messages/stream:
    get:
      operationId: subscribeMessages
      summary: Subscribe to new messages via SSE
      description: >
        Opens a Server-Sent Events stream. Each event carries a JSON-encoded
        Message object. The client unsubscribes by closing the connection.
      responses:
        "200":
          description: An SSE stream of new messages.
          content:
            text/event-stream:
              schema:
                description: >
                  Each SSE `data` field contains a JSON-encoded Message object.
                $ref: "#/components/schemas/Message"

components:
  schemas:
    Message:
      type: object
      required:
        - author
        - time
        - text
      properties:
        author:
          type: string
        time:
          type: string
          format: date-time
        text:
          type: string

We think the TypeScript API is better. It's fewer tokens and much easier to understand (for both agents and humans).

Dynamic Worker Loader makes it easy to implement a TypeScript API like this in your own Worker and then pass it in to the Dynamic Worker either as a method parameter or in the env object. The Workers Runtime will automatically set up a Cap'n Web RPC bridge between the sandbox and your harness code, so that the agent can invoke your API across the security boundary without ever realizing that it isn't using a local library.

That means your agent can write code like this:

// Thinking: The user asked me to summarize recent chat messages from Alice.
// I will filter the recent message history in code so that I only have to
// read the relevant messages.
let history = await env.CHAT_ROOM.getHistory(1000);
return history.filter(msg => msg.author == "alice");

HTTP filtering and credential injection

If you prefer to give your agents HTTP APIs, that's fully supported. Using the globalOutbound option to the worker loader API, you can register a callback to be invoked on every HTTP request, in which you can inspect the request, rewrite it, inject auth keys, respond to it directly, block it, or anything else you might like.

For example, you can use this to implement credential injection (token injection): When the agent makes an HTTP request to a service that requires authorization, you add credentials to the request on the way out. This way, the agent itself never knows the secret credentials, and therefore cannot leak them.

Using a plain HTTP interface may be desirable when an agent is talking to a well-known API that is in its training set, or when you want your agent to use a library that is built on a REST API (the library can run inside the agent's sandbox).

With that said, in the absence of a compatibility requirement, TypeScript RPC interfaces are better than HTTP:

As shown above, a TypeScript interface requires far fewer tokens to describe than an HTTP interface.
The agent can write code to call TypeScript interfaces using far fewer tokens than equivalent HTTP.
With TypeScript interfaces, since you are defining your own wrapper interface anyway, it is easier to narrow the interface to expose exactly the capabilities that you want to provide to your agent, both for simplicity and security. With HTTP, you are more likely implementing filtering of requests made against some existing API. This is hard, because your proxy must fully interpret the meaning of every API call in order to properly decide whether to allow it, and HTTP requests are complicated, with many headers and other parameters that could all be meaningful. It ends up being easier to just write a TypeScript wrapper that only implements the functions you want to allow.

Battle-hardened security

Hardening an isolate-based sandbox is tricky, as it is a more complicated attack surface than hardware virtual machines. Although all sandboxing mechanisms have bugs, security bugs in V8 are more common than security bugs in typical hypervisors. When using isolates to sandbox possibly-malicious code, it's important to have additional layers of defense-in-depth. Google Chrome, for example, implemented strict process isolation for this reason, but it is not the only possible solution.

We have nearly a decade of experience securing our isolate-based platform. Our systems automatically deploy V8 security patches to production within hours — faster than Chrome itself. Our security architecture features a custom second-layer sandbox with dynamic cordoning of tenants based on risk assessments. We've extended the V8 sandbox itself to leverage hardware features like MPK. We've teamed up with (and hired) leading researchers to develop novel defenses against Spectre. We also have systems that scan code for malicious patterns and automatically block them or apply additional layers of sandboxing. And much more.

When you use Dynamic Workers on Cloudflare, you get all of this automatically.

Helper libraries

We've built a number of libraries that you might find useful when working with Dynamic Workers:

Code Mode

@cloudflare/codemode simplifies running model-generated code against AI tools using Dynamic Workers. At its core is DynamicWorkerExecutor(), which constructs a purpose-built sandbox with code normalisation to handle common formatting errors, and direct access to a globalOutbound fetcher for controlling fetch() behaviour inside the sandbox — set it to null for full isolation, or pass a Fetcher binding to route, intercept or enrich outbound requests from the sandbox.

const executor = new DynamicWorkerExecutor({
  loader: env.LOADER,
  globalOutbound: null, // fully isolated 
});

const codemode = createCodeTool({
  tools: myTools,
  executor,
});

return generateText({
  model,
  messages,
  tools: { codemode },
});

The Code Mode SDK also provides two server-side utility functions. codeMcpServer({ server, executor }) wraps an existing MCP Server, replacing its tool surface with a single code() tool. openApiMcpServer({ spec, executor, request }) goes further: given an OpenAPI spec and an executor, it builds a complete MCP Server with search() and execute() tools as used by the Cloudflare MCP Server, and better suited to larger APIs.

In both cases, the code generated by the model runs inside Dynamic Workers, with calls to external services made over RPC bindings passed to the executor.

Learn more about the library and how to use it.

Bundling

Dynamic Workers expect pre-bundled modules. @cloudflare/worker-bundler handles that for you: give it source files and a package.json, and it resolves npm dependencies from the registry, bundles everything with esbuild, and returns the module map the Worker Loader expects.

import { createWorker } from "@cloudflare/worker-bundler";

const worker = env.LOADER.get("my-worker", async () => {
  const { mainModule, modules } = await createWorker({
    files: {
      "src/index.ts": `
        import { Hono } from 'hono';
        import { cors } from 'hono/cors';

        const app = new Hono();
        app.use('*', cors());
        app.get('/', (c) => c.text('Hello from Hono!'));
        app.get('/json', (c) => c.json({ message: 'It works!' }));

        export default app;
      `,
      "package.json": JSON.stringify({
        dependencies: { hono: "^4.0.0" }
      })
    }
  });

  return { mainModule, modules, compatibilityDate: "2026-01-01" };
});

await worker.getEntrypoint().fetch(request);

It also supports full-stack apps via createApp — bundle a server Worker, client-side JavaScript, and static assets together, with built-in asset serving that handles content types, ETags, and SPA routing.

Learn more about the library and how to use it.

File manipulation

@cloudflare/shell gives your agent a virtual filesystem inside a Dynamic Worker. Agent code calls typed methods on a state object — read, write, search, replace, diff, glob, JSON query/update, archive — with structured inputs and outputs instead of string parsing.

Storage is backed by a durable Workspace (SQLite + R2), so files persist across executions. Coarse operations like searchFiles, replaceInFiles, and planEdits minimize RPC round-trips — the agent issues one call instead of looping over individual files. Batch writes are transactional by default: if any write fails, earlier writes roll back automatically.

import { Workspace } from "@cloudflare/shell";
import { stateTools } from "@cloudflare/shell/workers";
import { DynamicWorkerExecutor, resolveProvider } from "@cloudflare/codemode";

const workspace = new Workspace({
  sql: this.ctx.storage.sql, // Works with any DO's SqlStorage, D1, or custom SQL backend
  r2: this.env.MY_BUCKET, // large files spill to R2 automatically
  name: () => this.name   // lazy — resolved when needed, not at construction
});

// Code runs in an isolated Worker sandbox with no network access
const executor = new DynamicWorkerExecutor({ loader: env.LOADER });

// The LLM writes this code; `state.*` calls dispatch back to the host via RPC
const result = await executor.execute(
  `async () => {
    // Search across all TypeScript files for a pattern
    const hits = await state.searchFiles("src/**/*.ts", "answer");
    // Plan multiple edits as a single transaction
    const plan = await state.planEdits([
      { kind: "replace", path: "/src/app.ts",
        search: "42", replacement: "43" },
      { kind: "writeJson", path: "/src/config.json",
        value: { version: 2 } }
    ]);
    // Apply atomically — rolls back on failure
    return await state.applyEditPlan(plan);
  }`,
  [resolveProvider(stateTools(workspace))]
);

The package also ships prebuilt TypeScript type declarations and a system prompt template, so you can drop the full state API into your LLM context in a handful of tokens.

Learn more about the library and how to use it.

How are people using it?

Code Mode

Developers want their agents to write and execute code against tool APIs, rather than making sequential tool calls one at a time. With Dynamic Workers, the LLM generates a single TypeScript function that chains multiple API calls together, runs it in a Dynamic Worker, and returns the final result back to the agent. As a result, only the output, and not every intermediate step, ends up in the context window. This cuts both latency and token usage, and produces better results, especially when the tool surface is large.

Our own Cloudflare MCP server is built exactly this way: it exposes the entire Cloudflare API through just two tools — search and execute — in under 1,000 tokens, because the agent writes code against a typed API instead of navigating hundreds of individual tool definitions.

Building custom automations

Developers are using Dynamic Workers to let agents build custom automations on the fly. Zite, for example, is building an app platform where users interact through a chat interface — the LLM writes TypeScript behind the scenes to build CRUD apps, connect to services like Stripe, Airtable, and Google Calendar, and run backend logic, all without the user ever seeing a line of code. Every automation runs in its own Dynamic Worker, with access to only the specific services and libraries that the endpoint needs.

“To enable server-side code for Zite’s LLM-generated apps, we needed an execution layer that was instant, isolated, and secure. Cloudflare’s Dynamic Workers hit the mark on all three, and out-performed all of the other platforms we benchmarked for speed and library support. The NodeJS compatible runtime supported all of Zite’s workflows, allowing hundreds of third party integrations, without sacrificing on startup time. Zite now services millions of execution requests daily thanks to Dynamic Workers.”
— Antony Toron, CTO and Co-Founder, Zite

Running AI-generated applications

Developers are building platforms that generate full applications from AI — either for their customers or for internal teams building prototypes. With Dynamic Workers, each app can be spun up on demand, then put back into cold storage until it's invoked again. Fast startup times make it easy to preview changes during active development. Platforms can also block or intercept any network requests the generated code makes, keeping AI-generated apps safe to run.

Pricing

Dynamically-loaded Workers are priced at $0.002 per unique Worker loaded per day (as of this post’s publication), in addition to the usual CPU time and invocation pricing of regular Workers.

For AI-generated "code mode" use cases, where every Worker is a unique one-off, this means the price is $0.002 per Worker loaded (plus CPU and invocations). This cost is typically negligible compared to the inference costs to generate the code.

During the beta period, the $0.002 charge is waived. As pricing is subject to change, please always check our Dynamic Workers pricing for the most current information.

Get Started

If you’re on the Workers Paid plan, you can start using Dynamic Workers today.

Dynamic Workers Starter

Use this “hello world” starter to get a Worker deployed that can load and execute Dynamic Workers.

Dynamic Workers Playground

You can also deploy the Dynamic Workers Playground, where you’ll be able to write or import code, bundle it at runtime with @cloudflare/worker-bundler, execute it through a Dynamic Worker, see real-time responses and execution logs.

Dynamic Workers are fast, scalable, and lightweight. Find us on Discord if you have any questions. We’d love to see what you build!

Unpacking Cloudflare Workers CPU Performance Benchmarks

Kenton Varda — Tue, 14 Oct 2025 20:00:25 GMT

On October 4, independent developer Theo Browne published a series of benchmarks designed to compare server-side JavaScript execution speed between Cloudflare Workers and Vercel, a competing compute platform built on AWS Lambda. The initial results showed Cloudflare Workers performing worse than Node.js on Vercel at a variety of CPU-intensive tasks, by a factor of as much as 3.5x.

We were surprised by the results. The benchmarks were designed to compare JavaScript execution speed in a CPU-intensive workload that never waits on external services. But, Cloudflare Workers and Node.js both use the same underlying JavaScript engine: V8, the open source engine from Google Chrome. Hence, one would expect the benchmarks to be executing essentially identical code in each environment. Physical CPUs can vary in performance, but modern server CPUs do not vary by anywhere near 3.5x.

On investigation, we discovered a wide range of small problems that contributed to the disparity, ranging from some bad tuning in our infrastructure, to differences between the JavaScript libraries used on each platform, to some issues with the test itself. We spent the week working on many of these problems, which means over the past week Workers got better and faster for all of our customers. We even fixed some problems that affect other compute providers but not us, such as an issue that made trigonometry functions much slower on Vercel. This post will dig into all the gory details.

It's important to note that the original benchmark was not representative of billable CPU usage on Cloudflare, nor did the issues involved impact most typical workloads. Most of the disparity was an artifact of the specific benchmark methodology. Read on to understand why.

With our fixes, the results now look much more like we'd expect:

There is still work to do, but we're happy to say that after these changes, Cloudflare now performs on par with Vercel in every benchmark case except the one based on Next.js. On that benchmark, the gap has closed considerably, and we expect to be able to eliminate it with further improvements detailed later in this post.

We are grateful to Theo for highlighting areas where we could make improvements, which will now benefit all our customers, and even many who aren't our customers.

Our benchmark methodology

We wanted to run Theo's test with no major design changes, in order to keep numbers comparable. Benchmark cases are nearly identical to Theo's original test but we made a couple changes in how we ran the test, in the hopes of making the results more accurate:

Theo ran the test client on a laptop connected by a Webpass internet connection in San Francisco, against Vercel instances running in its sfo1 region. In order to make our results easier to reproduce, we chose instead to run our test client directly in AWS's us-east-1 datacenter, invoking Vercel instances running in its iad1 region (which we understand to be in the same building). We felt this would minimize any impact from network latency. Because of this, Vercel's numbers are slightly better in our results than they were in Theo's.
We chose to use Vercel instances with 1 vCPU instead of 2. All of the benchmarks are single-threaded workloads, meaning they cannot take advantage of a second CPU anyway. Vercel's CTO, Malte Ubl, had stated publicly on X that using single-CPU instances would make no difference in this test, and indeed, we found this to be correct. Using 1 vCPU makes it easier to reason about pricing, since both Vercel and Cloudflare charge for CPU time ($0.128/hr for Vercel in iad1, and $0.072/hr for Cloudflare globally).
We made some changes to fix bugs in the test, for which we submitted a pull request. More on this below.

Cloudflare platform improvements

Theo's benchmarks covered a variety of frameworks, making it clear that no single JavaScript library could be at fault for the general problem. Clearly, we needed to look first at the Workers Runtime itself. And so we did, and we found two problems – not bugs, but tuning and heuristic choices which interacted poorly with the benchmarks as written.

Sharding and warm isolate routing: A problem of scheduling, not CPU speed

Over the last year we shipped smarter routing that sends traffic to warm isolates more often. That cuts cold starts for large apps, which matters for frameworks with heavy initialization requirements like Next.js. The original policy optimized for latency and throughput across billions of requests, but was less optimal for heavily CPU-bound workloads for the same reason that such workloads cause performance issues in other platforms like Node.js: When the CPU is busy computing an expensive operation for one request, other requests sent to the same isolate must wait for it to finish before they can proceed.

The system uses heuristics to detect when requests are getting blocked behind each other, and automatically spin up more isolates to compensate. However, these heuristics are not precise, and the particular workload generated by Theo's tests – in which a burst of expensive traffic would come from a single client – played poorly with our existing algorithm. As a result, the benchmarks showed much higher latency (and variability in latency) than would normally be expected.

It's important to understand that, as a result of this problem, the benchmark was not really measuring CPU time. Pricing on the Workers platform is based on CPU time – that is, time spent actually executing JavaScript code, as opposed to time waiting for things. Time spent waiting for the isolate to become available makes the request take longer, but is not billed as CPU time against the waiting request. So, this problem would not have affected your bill.

After analyzing the benchmarks, we updated the algorithm to detect sustained CPU-heavy work earlier, then bias traffic so that new isolates spin up faster. The result is that Workers can more effectively and efficiently autoscale when different workloads are applied. I/O-bound workloads coalesce into individual already warm isolates while CPU-bound are directed so that they do not block each other. This change has already been rolled out globally and is enabled automatically for everyone. It should be pretty clear from the graph when the change was rolled out:

V8 garbage collector tuning

While this scheduling issue accounted for the majority of the disparity in the benchmark, we did find a minor issue affecting code execution performance during our testing.

The range of issues that we uncovered in the framework code in these benchmarks repeatedly pointed at garbage collection and memory management issues as being key contributors to the results. But, we would expect these to be an issue with the same frameworks running in Node.js as well. To see exactly what was going on differently with Workers and why it was causing such a significant degradation in performance, we had to look inwards at our own memory management configuration.

The V8 garbage collector has a huge number of knobs that can be tuned that directly impact performance. One of these is the size of the "young generation". This is where newly created objects go initially. It's a memory area that's less compact, but optimized for short-lived objects. When objects have bounced around the "young space" for a few generations they get moved to the old space, which is more compact, but requires more CPU to reclaim.

V8 allows the embedding runtime to tune the size of the young generation. And it turns out, we had done so. Way back in June of 2017, just two months after the Workers project kicked off, we – or specifically, I, Kenton, as I was the only engineer on the project at the time – had configured this value according to V8's recommendations at the time for environments with 512MB of memory or less. Since Workers defaults to a limit of 128MB per isolate, this seemed appropriate.

V8's entire garbage collector has changed dramatically since 2017. When analyzing the benchmarks, it became apparent that the setting which made sense in 2017 no longer made sense in 2025, and we were now limiting V8's young space too rigidly. Our configuration was causing V8's garbage collection to work harder and more frequently than it otherwise needed to. As a result, we have backed off on the manual tuning and now allow V8 to pick its young space size more freely, based on its internal heuristics. This is already live on Cloudflare Workers, and it has given an approximately 25% boost to the benchmarks with only a small increase in memory usage. Of course, the benchmarks are not the only Workers that benefit: all Workers should now be faster. That said, for most Workers the difference has been much smaller.

Tuning OpenNext for performance

The platform changes solved most of the problem. Following the changes, our testing showed we were now even on all of the benchmarks save one: Next.js.

Next.js is a popular web application framework which, historically, has not had built-in support for hosting on a wide range of platforms. Recently, a project called OpenNext has arisen to fill the gap, making Next.js work well on many platforms, including Cloudflare. On investigation, we found several missing optimizations and other opportunities to improve performance, explaining much of why the benchmark performed poorly on Workers.

Unnecessary allocations and copies

When profiling the benchmark code, we noticed that garbage collection was dominating the timeline. From 10-25% of the request processing time was being spent reclaiming memory.

So we dug in and discovered that OpenNext, and in some cases Next.js and React itself, will often create unnecessary copies of internal data buffers at some of the worst times during the handling of the process. For instance, there's one pipeThrough() operation in the rendering pipeline that we saw creating no less than 50 2048-byte Buffer instances, whether they are actually used or not.

We further discovered that on every request, the Cloudflare OpenNext adapter has been needlessly copying every chunk of streamed output data as it’s passed out of the renderer and into the Workers runtime to return to users. Given this benchmark returns a 5 MB result on every request, that's a lot of data being copied!

In other places, we found that arrays of internal Buffer instances were being copied and concatenated using Buffer.concat for no other reason than to get the total number of bytes in the collection. That is, we spotted code of the form getBody().length. The function getBody() would concatenate a large number of buffers into a single buffer and return it, without storing the buffer anywhere. So, all that work was being done just to read the overall length. Obviously this was not intended, and fixing it was an easy win.

We've started opening a series of pull requests in OpenNext to fix these issues, and others in hot paths, removing some unnecessary allocations and copies:

We're not done. We intend to keep iterating through OpenNext code, making improvements wherever they’re needed – not only in the parts that run on Workers. Many of these improvements apply to other OpenNext platforms. The shared goal of OpenNext is to make NextJS as fast as possible regardless of where you choose to run your code.

Inefficient Streams Adapters

Much of the Next.js code was written to use Node.js's APIs for byte streams. Workers, however, prefers the web-standard Streams API, and uses it to represent HTTP request and response bodies. This necessitates using adapters to convert between the two APIs. When investigating the performance bottlenecks, we found a number of examples where inefficient streams adapters are being needlessly applied. For example:

const stream = Readable.toWeb(Readable.from(res.getBody()))

res.getBody() was performing a Buffer.concat(chunks) to copy accumulated chunks of data into a new Buffer, which was then passed as an iterable into a Node.js stream.Readable that was then wrapped by an adapter that returns a ReadableStream. While these utilities do serve a useful purpose, this becomes a data buffering nightmare since both Node.js streams and Web streams each apply their own internal buffers! Instead we can simply do:

const stream = ReadableStream.from(chunks);

This returns a ReadableStream directly from the accumulated chunks without additional copies, extraneous buffering, or passing everything through inefficient adaptation layers.

In other places we see that Next.js and React make extensive use of ReadableStream to pass bytes through, but the streams being created are value-oriented rather than byte-oriented! For example,

const readable = new ReadableStream({
  pull(controller) {
    controller.enqueue(chunks.shift());
    if (chunks.length === 0) {
      controller.close();
    }
});  // Default highWaterMark is 1!

Seems perfectly reasonable. However, there's an issue here. If the chunks are Buffer or Uint8Array instances, every instance ends up being a separate read by default. So if the chunk is only a single byte, or 1000 bytes, that's still always two reads. By converting this to a byte stream with a reasonable high water mark, we can make it possible to read this stream much more efficiently:

const readable = new ReadableStream({
  type: 'bytes',
  pull(controller) {
    controller.enqueue(chunks.shift());
    if (chunks.length === 0) {
      controller.close();
    }
}, { highWaterMark: 4096 });

Now, the stream can be read as a stream of bytes rather than a stream of distinct JavaScript values, and the individual chunks can be coalesced internally into 4096 byte chunks, making it possible to optimize the reads much more efficiently. Rather than reading each individual enqueued chunk one at a time, the ReadableStream will proactively call pull() repeatedly until the highWaterMark is reached. Reads then do not have to ask the stream for one chunk of data at a time.

While it would be best for the rendering pipeline to be using byte streams and paying attention to back pressure signals more, our implementation can still be tuned to better handle cases like this.

The bottom line? We've got some work to do! There are a number of improvements to make in the implementation of OpenNext and the adapters that allow it to work on Cloudflare that we will continue to investigate and iterate on. We've made a handful of these fixes already and we're already seeing improvements. Soon we also plan to start submitting patches to Next.js and React to make further improvements upstream that will ideally benefit the entire ecosystem.

JSON parsing

Aside from buffer allocations and streams, one additional item stood out like a sore thumb in the profiles: JSON.parse() with a reviver function. This is used in both React and Next.js and in our profiling this was significantly slower than it should be. We built a microbenchmark and found that JSON.parse with a reviver argument recently got even slower when the standard added a third argument to the reviver callback to provide access to the JSON source context.

For those unfamiliar with the reviver function, it allows an application to effectively customize how JSON is parsed. But it has drawbacks. The function gets called on every key-value pair included in the JSON structure, including every individual element of an Array that gets serialized. In Theo's NextJS benchmark, in any single request, it ends up being called well over 100,000 times!

Even though this problem affects all platforms, not just ours, we decided that we weren't just going to accept it. After all, we have contributors to V8 on the Workers runtime team! We've upstreamed a V8 patch that can speed up JSON.parse() with revivers by roughly 33 percent. That should be in V8 starting with version 14.3 (Chrome 143) and can help everyone using V8, not just Cloudflare: Node.js, Chrome, Deno, the entire ecosystem. If you are not using Cloudflare Workers or didn't change the syntax of your reviver you are currently suffering under the red performance bar.

We will continue to work with framework authors to reduce overhead in hot paths. Some changes belong in the frameworks, some belong in the engine, some in our platform.

Node.js's trigonometry problem

We are engineers, and we like to solve engineering problems — whether our own, or for the broader community.

Theo's benchmarks were actually posted in response to a different benchmark by another author which compared Cloudflare Workers against Vercel. The original benchmark focused on calling trigonometry functions (e.g. sine and cosine) in a tight loop. In this benchmark, Cloudflare Workers performed 3x faster than Node.js running on Vercel.

The author of the original benchmark offered this as evidence that Cloudflare Workers are just faster. Theo disagreed, and so did we. We expect to be faster, but not by 3x! We don't implement math functions ourselves; these come with V8. We weren't happy to just accept the win, so we dug in.

It turns out that Node.js is not using the latest, fastest path for these functions. Node.js can be built with either the clang or gcc compilers, and is written to support a broader range of operating systems and architectures than Workers. This means that Node.js' compilation often ends up using a lowest-common denominator for some things in order to provide support for the broadest range of platforms. V8 includes a compile-time flag that, in some configurations, allows it to use a faster implementation of the trig functions. In Workers, mostly by coincidence, that flag is enabled by default. In Node.js, it is not. We've opened a pull request to enable the flag in Node.js so that everyone benefits, at least on platforms where it can be supported.

Assuming that lands, and once AWS Lambda and Vercel are able to pick it up, we expect this specific gap to go away, making these operations faster for everyone. This change won't benefit our customers, since Cloudflare Workers already uses the faster trig functions, but a bug is a bug and we like making everything faster.

Benchmarks are hard

Even the best benchmarks have bias and tradeoffs. It's difficult to create a benchmark that is truly representative of real-world performance, and all too easy to misinterpret the results of benchmarks that are not. We particularly liked Planetscale's take on this subject.

These specific CPU-bound tests are not an ideal choice to represent web applications. Theo even notes this in his video. Most real-world applications on Workers and Vercel are bound by databases, downstream services, network, and page size. End user experience is what matters. CPU is one piece of that picture. That said, if a benchmark shows us slower, we take it seriously.

While the benchmarks helped us find and fix many real problems, we also found a few problems with the benchmarks themselves, which contributed to the apparent disparity in speed:

Running locally

The benchmark is designed to be run on your laptop, from which it hits Cloudflare's and Vercel's servers over the Internet. It makes the assumption that latency observed from the client is a close enough approximation of server-side CPU time. The reasons are fair: As Theo notes, Cloudflare does not permit an application to measure its own CPU time, in order to prevent timing side channel attacks. Actual CPU time can be seen in logs after the fact, but gathering those may be a lot of work. It's just easier to measure time from the client.

However, as Cloudflare and Vercel are hosted from different data centers, the network latency to each can be a factor in the benchmark, and this can skew the results. Typically, this effect will favor Cloudflare, because Cloudflare can run your Worker in locations spread across 330+ cities worldwide, and will tend to choose the closest one to you. Vercel, on the other hand, usually places compute in a central location, so latency will vary depending on your distance from that location.

For our own testing, to minimize this effect, we ran the benchmark client from a VM on AWS located in the same data center as our Vercel instances. Since Cloudflare is well-connected to every AWS location, we think this should have eliminated network latency from the picture. We chose AWS's us-east-1 / Vercel's iad1 for our test as it is widely seen as the default choice; any other choice could draw questions about cherry-picking.

Not all CPUs are equal

Cloudflare's servers aren't all identical. Although we refresh them aggressively, there will always be multiple generations of hardware in production at any particular time. Currently, this includes generations 10, 11, and 12 of our server hardware.

Other cloud providers are no different. No cloud provider simply throws away all their old servers every time a new version becomes available.

Of course, newer CPUs run faster, even for single-threaded workloads. The differences are not as large as they used to be 20-30 years ago, but they are not nothing. As such, an application may get (a little bit) lucky or unlucky depending on what machine it is assigned to.

In cloud environments, even identical CPUs can yield different performance depending on circumstances, due to multitenancy. The server your application is assigned to is running many others as well. In AWS Lambda, a server may be running hundreds of applications; in Cloudflare, with our ultra-efficient runtime, a server may be running thousands. These "noisy neighbors" won't share the same CPU core as your app, but they may share other resources, such as memory bandwidth. As a result, performance can vary.

It's important to note that these problems create correlated noise. That is, if you run the test again, the application is likely to remain assigned to the same machines as before – this is true of both Cloudflare and Vercel. So, this noise cannot be corrected by simply running more iterations. To correct for this type of noise on Cloudflare, one would need to initiate requests from a variety of geographic locations, in order to hit different Cloudflare data centers and therefore different machines. But, that is admittedly a lot of work. (We are not familiar with how best to get an application to switch machines on Vercel.)

A Next.js config bug

The Cloudflare version of the NextJS benchmark was not configured to use force-dynamic while the Vercel version was. This triggered curious behavior. Our understanding is that pages which are not "dynamic" should normally be rendered statically at build time. With OpenNext, however, it appears the pages are still rendered dynamically, but if multiple requests for the same page are received at the same time, OpenNext will only invoke the rendering once. Before we made the changes to fix our scheduling algorithm to avoid sending too many requests to the same isolate, this behavior may have somewhat counteracted that problem. Theo reports that he had disabled force-dynamic in the Cloudflare version specifically for this reason: with it on, our results were so bad as to appear outright broken, so he intentionally turned it off.

Ironically, though, once we fixed the scheduling issue, using "static" rendering (i.e. not enabling force-dynamic) hurt Cloudflare's performance for other reasons. It seems that when OpenNext renders a "cacheable" page, streaming of the response body is inhibited. This interacted poorly with a property of the benchmark client: it measured time-to-first-byte (TTFB), rather than total request/response time. When running in dynamic mode – as the test did on Vercel – the first byte would be returned to the client before the full page had been rendered. The rest of the rendering would happen as bytes streamed out. But with OpenNext in non-dynamic mode, the entire payload was rendered into a giant buffer upfront, before any bytes were returned to the client.

Due to the TTFB behavior of the benchmark client, in dynamic mode, the benchmark actually does not measure the time needed to fully render the page. We became suspicious when we noticed that Vercel's observability tools indicated more CPU time had been spent than the benchmark itself had reported.

One option would have been to change the benchmarks to use TTLB instead – that is, wait until the last byte is received before stopping the timer. However, this would make the benchmark even more affected by network differences: The responses are quite large, ranging from 2MB to 15MB, and so the results could vary depending on the bandwidth to the provider. Indeed, this would tend to favor Cloudflare, but as the point of the test is to measure CPU speed, not bandwidth, it would be an unfair advantage.

Once we changed the Cloudflare version of the test to use force-dynamic as well, matching the Vercel version, the streaming behavior then matched, making the request fair. This means that neither version is actually measuring the cost of rendering the full page to HTML, but at least they are now measuring the same thing.

As a side note, the original behavior allowed us to spot that OpenNext has a couple of performance bottlenecks in its implementation of the composable cache it uses to deduplicate rendering requests. While fixes to these aren't going to impact the numbers for this particular set of benchmarks, we're working on improving those pieces also.

A React SSR config bug

The React SSR benchmark contained a more basic configuration error. React inspects the environment variable NODE_ENV to decide whether the environment is "production" or a development environment. Many Node.js-based environments, including Vercel, set this variable automatically in production. Many frameworks, such as OpenNext, automatically set this variable for Workers in production as well. However, the React SSR benchmark was written against lower-level React APIs, not using any framework. In this case, the NODE_ENV variable wasn't being set at all.

And, unfortunately, when NODE_ENV is not set, React defaults to "dev mode", a mode that contains extra debugging checks and is therefore much slower than production mode. As a result, the numbers for Workers were much worse than they should have been.

Arguably, it may make sense for Workers to set this variable automatically for all deployed workers, particularly when Node.js compatibility is enabled. We are looking into doing this in the future, but for now we've updated the test to set it directly.

What we’re going to do next

Our improvements to the Workers Runtime are already live for all workers, so you do not need to change anything. Many apps will already see faster, steadier tail latency on compute heavy routes with less jitter during bursts. In places where garbage collection improved, some workloads will also use fewer billed CPU seconds.

We also sent Theo a pull request to update OpenNext with our improvements there, and with other test fixes.

But we're far from done. We still have work to do to close the gap between OpenNext and Next.js on Vercel – but given the other benchmark results, it's clear we can get there. We also have plans for further improvements to our scheduling algorithm, so that requests almost never block each other. We will continue to improve V8, and even Node.js – the Workers team employs multiple core contributors to each project. Our approach is simple: improve open source infrastructure so that everyone gets faster, then make sure our platform makes the most of those improvements.

And, obviously, we'll be writing more benchmarks, to make sure we're catching these kinds of issues ourselves in the future. If you have a benchmark that shows Workers being slower, send it to us with a repro. We will profile it, fix what we can upstream, and share back what we learn!

Code Mode: the better way to use MCP

Kenton Varda — Fri, 26 Sep 2025 13:00:00 GMT

It turns out we've all been using MCP wrong.

Most agents today use MCP by directly exposing the "tools" to the LLM.

We tried something different: Convert the MCP tools into a TypeScript API, and then ask an LLM to write code that calls that API.

The results are striking:

We found agents are able to handle many more tools, and more complex tools, when those tools are presented as a TypeScript API rather than directly. Perhaps this is because LLMs have an enormous amount of real-world TypeScript in their training set, but only a small set of contrived examples of tool calls.
The approach really shines when an agent needs to string together multiple calls. With the traditional approach, the output of each tool call must feed into the LLM's neural network, just to be copied over to the inputs of the next call, wasting time, energy, and tokens. When the LLM can write code, it can skip all that, and only read back the final results it needs.

In short, LLMs are better at writing code to call MCP, than at calling MCP directly.

What's MCP?

For those that aren't familiar: Model Context Protocol is a standard protocol for giving AI agents access to external tools, so that they can directly perform work, rather than just chat with you.

Seen another way, MCP is a uniform way to:

expose an API for doing something,
along with documentation needed for an LLM to understand it,
with authorization handled out-of-band.

MCP has been making waves throughout 2025 as it has suddenly greatly expanded the capabilities of AI agents.

The "API" exposed by an MCP server is expressed as a set of "tools". Each tool is essentially a remote procedure call (RPC) function – it is called with some parameters and returns a response. Most modern LLMs have the capability to use "tools" (sometimes called "function calling"), meaning they are trained to output text in a certain format when they want to invoke a tool. The program invoking the LLM sees this format and invokes the tool as specified, then feeds the results back into the LLM as input.

Anatomy of a tool call

Under the hood, an LLM generates a stream of "tokens" representing its output. A token might represent a word, a syllable, some sort of punctuation, or some other component of text.

A tool call, though, involves a token that does not have any textual equivalent. The LLM is trained (or, more often, fine-tuned) to understand a special token that it can output that means "the following should be interpreted as a tool call," and another special token that means "this is the end of the tool call." Between these two tokens, the LLM will typically write tokens corresponding to some sort of JSON message that describes the call.

For instance, imagine you have connected an agent to an MCP server that provides weather info, and you then ask the agent what the weather is like in Austin, TX. Under the hood, the LLM might generate output like the following. Note that here we've used words in <| and |> to represent our special tokens, but in fact, these tokens do not represent text at all; this is just for illustration.

I will use the Weather MCP server to find out the weather in Austin, TX.

I will use the Weather MCP server to find out the weather in Austin, TX.

<|tool_call|>
{
  "name": "get_current_weather",
  "arguments": {
    "location": "Austin, TX, USA"
  }
}
<|end_tool_call|>

Upon seeing these special tokens in the output, the LLM's harness will interpret the sequence as a tool call. After seeing the end token, the harness pauses execution of the LLM. It parses the JSON message and returns it as a separate component of the structured API result. The agent calling the LLM API sees the tool call, invokes the relevant MCP server, and then sends the results back to the LLM API. The LLM's harness will then use another set of special tokens to feed the result back into the LLM:

<|tool_result|>
{
  "location": "Austin, TX, USA",
  "temperature": 93,
  "unit": "fahrenheit",
  "conditions": "sunny"
}
<|end_tool_result|>

The LLM reads these tokens in exactly the same way it would read input from the user – except that the user cannot produce these special tokens, so the LLM knows it is the result of the tool call. The LLM then continues generating output like normal.

Different LLMs may use different formats for tool calling, but this is the basic idea.

What's wrong with this?

The special tokens used in tool calls are things LLMs have never seen in the wild. They must be specially trained to use tools, based on synthetic training data. They aren't always that good at it. If you present an LLM with too many tools, or overly complex tools, it may struggle to choose the right one or to use it correctly. As a result, MCP server designers are encouraged to present greatly simplified APIs as compared to the more traditional API they might expose to developers.

Meanwhile, LLMs are getting really good at writing code. In fact, LLMs asked to write code against the full, complex APIs normally exposed to developers don't seem to have too much trouble with it. Why, then, do MCP interfaces have to "dumb it down"? Writing code and calling tools are almost the same thing, but it seems like LLMs can do one much better than the other?

The answer is simple: LLMs have seen a lot of code. They have not seen a lot of "tool calls". In fact, the tool calls they have seen are probably limited to a contrived training set constructed by the LLM's own developers, in order to try to train it. Whereas they have seen real-world code from millions of open source projects.

Making an LLM perform tasks with tool calling is like putting Shakespeare through a month-long class in Mandarin and then asking him to write a play in it. It's just not going to be his best work.

But MCP is still useful, because it is uniform

MCP is designed for tool-calling, but it doesn't actually have to be used that way.

The "tools" that an MCP server exposes are really just an RPC interface with attached documentation. We don't really have to present them as tools. We can take the tools, and turn them into a programming language API instead.

But why would we do that, when the programming language APIs already exist independently? Almost every MCP server is just a wrapper around an existing traditional API – why not expose those APIs?

Well, it turns out MCP does something else that's really useful: It provides a uniform way to connect to and learn about an API.

An AI agent can use an MCP server even if the agent's developers never heard of the particular MCP server, and the MCP server's developers never heard of the particular agent. This has rarely been true of traditional APIs in the past. Usually, the client developer always knows exactly what API they are coding for. As a result, every API is able to do things like basic connectivity, authorization, and documentation a little bit differently.

This uniformity is useful even when the AI agent is writing code. We'd like the AI agent to run in a sandbox such that it can only access the tools we give it. MCP makes it possible for the agentic framework to implement this, by handling connectivity and authorization in a standard way, independent of the AI code. We also don't want the AI to have to search the Internet for documentation; MCP provides it directly in the protocol.

OK, how does it work?

We have already extended the Cloudflare Agents SDK to support this new model!

For example, say you have an app built with ai-sdk that looks like this:

const stream = streamText({
  model: openai("gpt-5"),
  system: "You are a helpful assistant",
  messages: [
    { role: "user", content: "Write a function that adds two numbers" }
  ],
  tools: {
    // tool definitions 
  }
})

You can wrap the tools and prompt with the codemode helper, and use them in your app:

import { codemode } from "agents/codemode/ai";

const {system, tools} = codemode({
  system: "You are a helpful assistant",
  tools: {
    // tool definitions 
  },
  // ...config
})

const stream = streamText({
  model: openai("gpt-5"),
  system,
  tools,
  messages: [
    { role: "user", content: "Write a function that adds two numbers" }
  ]
})

With this change, your app will now start generating and running code that itself will make calls to the tools you defined, MCP servers included. We will introduce variants for other libraries in the very near future. Read the docs for more details and examples.

Converting MCP to TypeScript

When you connect to an MCP server in "code mode", the Agents SDK will fetch the MCP server's schema, and then convert it into a TypeScript API, complete with doc comments based on the schema.

For example, connecting to the MCP server at https://gitmcp.io/cloudflare/agents, will generate a TypeScript definition like this:

interface FetchAgentsDocumentationInput {
  [k: string]: unknown;
}
interface FetchAgentsDocumentationOutput {
  [key: string]: any;
}

interface SearchAgentsDocumentationInput {
  /**
   * The search query to find relevant documentation
   */
  query: string;
}
interface SearchAgentsDocumentationOutput {
  [key: string]: any;
}

interface SearchAgentsCodeInput {
  /**
   * The search query to find relevant code files
   */
  query: string;
  /**
   * Page number to retrieve (starting from 1). Each page contains 30
   * results.
   */
  page?: number;
}
interface SearchAgentsCodeOutput {
  [key: string]: any;
}

interface FetchGenericUrlContentInput {
  /**
   * The URL of the document or page to fetch
   */
  url: string;
}
interface FetchGenericUrlContentOutput {
  [key: string]: any;
}

declare const codemode: {
  /**
   * Fetch entire documentation file from GitHub repository:
   * cloudflare/agents. Useful for general questions. Always call
   * this tool first if asked about cloudflare/agents.
   */
  fetch_agents_documentation: (
    input: FetchAgentsDocumentationInput
  ) => Promise;

  /**
   * Semantically search within the fetched documentation from
   * GitHub repository: cloudflare/agents. Useful for specific queries.
   */
  search_agents_documentation: (
    input: SearchAgentsDocumentationInput
  ) => Promise;

  /**
   * Search for code within the GitHub repository: "cloudflare/agents"
   * using the GitHub Search API (exact match). Returns matching files
   * for you to query further if relevant.
   */
  search_agents_code: (
    input: SearchAgentsCodeInput
  ) => Promise;

  /**
   * Generic tool to fetch content from any absolute URL, respecting
   * robots.txt rules. Use this to retrieve referenced urls (absolute
   * urls) that were mentioned in previously fetched documentation.
   */
  fetch_generic_url_content: (
    input: FetchGenericUrlContentInput
  ) => Promise;
};

This TypeScript is then loaded into the agent's context. Currently, the entire API is loaded, but future improvements could allow an agent to search and browse the API more dynamically – much like an agentic coding assistant would.

Running code in a sandbox

Instead of being presented with all the tools of all the connected MCP servers, our agent is presented with just one tool, which simply executes some TypeScript code.

The code is then executed in a secure sandbox. The sandbox is totally isolated from the Internet. Its only access to the outside world is through the TypeScript APIs representing its connected MCP servers.

These APIs are backed by RPC invocation which calls back to the agent loop. There, the Agents SDK dispatches the call to the appropriate MCP server.

The sandboxed code returns results to the agent in the obvious way: by invoking console.log(). When the script finishes, all the output logs are passed back to the agent.

Dynamic Worker loading: no containers here

This new approach requires access to a secure sandbox where arbitrary code can run. So where do we find one? Do we have to run containers? Is that expensive?

No. There are no containers. We have something much better: isolates.

The Cloudflare Workers platform has always been based on V8 isolates, that is, isolated JavaScript runtimes powered by the V8 JavaScript engine.

Isolates are far more lightweight than containers. An isolate can start in a handful of milliseconds using only a few megabytes of memory.

Isolates are so fast that we can just create a new one for every piece of code the agent runs. There's no need to reuse them. There's no need to prewarm them. Just create it, on demand, run the code, and throw it away. It all happens so fast that the overhead is negligible; it's almost as if you were just eval()ing the code directly. But with security.

The Worker Loader API

Until now, though, there was no way for a Worker to directly load an isolate containing arbitrary code. All Worker code instead had to be uploaded via the Cloudflare API, which would then deploy it globally, so that it could run anywhere. That's not what we want for Agents! We want the code to just run right where the agent is.

To that end, we've added a new API to the Workers platform: the Worker Loader API. With it, you can load Worker code on-demand. Here's what it looks like:

// Gets the Worker with the given ID, creating it if no such Worker exists yet.
let worker = env.LOADER.get(id, async () => {
  // If the Worker does not already exist, this callback is invoked to fetch
  // its code.

  return {
    compatibilityDate: "2025-06-01",

    // Specify the worker's code (module files).
    mainModule: "foo.js",
    modules: {
      "foo.js":
        "export default {\n" +
        "  fetch(req, env, ctx) { return new Response('Hello'); }\n" +
        "}\n",
    },

    // Specify the dynamic Worker's environment (`env`).
    env: {
      // It can contain basic serializable data types...
      SOME_NUMBER: 123,

      // ... and bindings back to the parent worker's exported RPC
      // interfaces, using the new `ctx.exports` loopback bindings API.
      SOME_RPC_BINDING: ctx.exports.MyBindingImpl({props})
    },

    // Redirect the Worker's `fetch()` and `connect()` to proxy through
    // the parent worker, to monitor or filter all Internet access. You
    // can also block Internet access completely by passing `null`.
    globalOutbound: ctx.exports.OutboundProxy({props}),
  };
});

// Now you can get the Worker's entrypoint and send requests to it.
let defaultEntrypoint = worker.getEntrypoint();
await defaultEntrypoint.fetch("http://example.com");

// You can get non-default entrypoints as well, and specify the
// `ctx.props` value to be delivered to the entrypoint.
let someEntrypoint = worker.getEntrypoint("SomeEntrypointClass", {
  props: {someProp: 123}
});

You can start playing with this API right now when running workerd locally with Wrangler (check out the docs), and you can sign up for beta access to use it in production.

Workers are better sandboxes

The design of Workers makes it unusually good at sandboxing, especially for this use case, for a few reasons:

Faster, cheaper, disposable sandboxes

The Workers platform uses isolates instead of containers. Isolates are much lighter-weight and faster to start up. It takes mere milliseconds to start a fresh isolate, and it's so cheap we can just create a new one for every single code snippet the agent generates. There's no need to worry about pooling isolates for reuse, prewarming, etc.

We have not yet finalized pricing for the Worker Loader API, but because it is based on isolates, we will be able to offer it at a significantly lower cost than container-based solutions.

Isolated by default, but connected with bindings

Workers are just better at handling isolation.

In Code Mode, we prohibit the sandboxed worker from talking to the Internet. The global fetch() and connect() functions throw errors.

But on most platforms, this would be a problem. On most platforms, the way you get access to private resources is, you start with general network access. Then, using that network access, you send requests to specific services, passing them some sort of API key to authorize private access.

But Workers has always had a better answer. In Workers, the "environment" (env object) doesn't just contain strings, it contains live objects, also known as "bindings". These objects can provide direct access to private resources without involving generic network requests.

In Code Mode, we give the sandbox access to bindings representing the MCP servers it is connected to. Thus, the agent can specifically access those MCP servers without having network access in general.

Limiting access via bindings is much cleaner than doing it via, say, network-level filtering or HTTP proxies. Filtering is hard on both the LLM and the supervisor, because the boundaries are often unclear: the supervisor may have a hard time identifying exactly what traffic is legitimately necessary to talk to an API. Meanwhile, the LLM may have difficulty guessing what kinds of requests will be blocked. With the bindings approach, it's well-defined: the binding provides a JavaScript interface, and that interface is allowed to be used. It's just better this way.

No API keys to leak

An additional benefit of bindings is that they hide API keys. The binding itself provides an already-authorized client interface to the MCP server. All calls made on it go to the agent supervisor first, which holds the access tokens and adds them into requests sent on to MCP.

This means that the AI cannot possibly write code that leaks any keys, solving a common security problem seen in AI-authored code today.

Try it now!

Sign up for the production beta

The Dynamic Worker Loader API is in closed beta. To use it in production, sign up today.

Or try it locally

If you just want to play around, though, Dynamic Worker Loading is fully available today when developing locally with Wrangler and workerd – check out the docs for Dynamic Worker Loading and code mode in the Agents SDK to get started.

Cap'n Web: a new RPC system for browsers and web servers

Kenton Varda — Mon, 22 Sep 2025 13:00:00 GMT

Allow us to introduce Cap'n Web, an RPC protocol and implementation in pure TypeScript.

Cap'n Web is a spiritual sibling to Cap'n Proto, an RPC protocol I (Kenton) created a decade ago, but designed to play nice in the web stack. That means:

Like Cap'n Proto, it is an object-capability protocol. ("Cap'n" is short for "capabilities and".) We'll get into this more below, but it's incredibly powerful.
Unlike Cap'n Proto, Cap'n Web has no schemas. In fact, it has almost no boilerplate whatsoever. This means it works more like the JavaScript-native RPC system in Cloudflare Workers.
That said, it integrates nicely with TypeScript.
Also unlike Cap'n Proto, Cap'n Web's underlying serialization is human-readable. In fact, it's just JSON, with a little pre-/post-processing.
It works over HTTP, WebSocket, and postMessage() out-of-the-box, with the ability to extend it to other transports easily.
It works in all major browsers, Cloudflare Workers, Node.js, and other modern JavaScript runtimes.
The whole thing compresses (minify+gzip) to under 10 kB with no dependencies.
It's open source under the MIT license.

Cap'n Web is more expressive than almost every other RPC system, because it implements an object-capability RPC model. That means it:

Supports bidirectional calling. The client can call the server, and the server can also call the client.
Supports passing functions by reference: If you pass a function over RPC, the recipient receives a "stub". When they call the stub, they actually make an RPC back to you, invoking the function where it was created. This is how bidirectional calling happens: the client passes a callback to the server, and then the server can call it later.
Similarly, supports passing objects by reference: If a class extends the special marker type RpcTarget, then instances of that class are passed by reference, with method calls calling back to the location where the object was created.
Supports promise pipelining. When you start an RPC, you get back a promise. Instead of awaiting it, you can immediately use the promise in dependent RPCs, thus performing a chain of calls in a single network round trip.
Supports capability-based security patterns.

In short, Cap'n Web lets you design RPC interfaces the way you'd design regular JavaScript APIs – while still acknowledging and compensating for network latency.

The best part is, Cap'n Web is absolutely trivial to set up.

A client looks like this:

import { newWebSocketRpcSession } from "capnweb";

// One-line setup.
let api = newWebSocketRpcSession("wss://example.com/api");

// Call a method on the server!
let result = await api.hello("World");

console.log(result);

And here's a complete Cloudflare Worker implementing an RPC server:

import { RpcTarget, newWorkersRpcResponse } from "capnweb";

// This is the server implementation.
class MyApiServer extends RpcTarget {
  hello(name) {
    return `Hello, ${name}!`
  }
}

// Standard Workers HTTP handler.
export default {
  fetch(request, env, ctx) {
    // Parse URL for routing.
    let url = new URL(request.url);

    // Serve API at `/api`.
    if (url.pathname === "/api") {
      return newWorkersRpcResponse(request, new MyApiServer());
    }

    // You could serve other endpoints here...
    return new Response("Not found", {status: 404});
  }
}

That's it. That's the app.

You can add more methods to MyApiServer, and call them from the client.
You can have the client pass a callback function to the server, and then the server can just call it.
You can define a TypeScript interface for your API, and easily apply it to the client and server.

It just works.

Why RPC? (And what is RPC anyway?)

Remote Procedure Calls (RPC) are a way of expressing communications between two programs over a network. Without RPC, you might communicate using a protocol like HTTP. With HTTP, though, you must format and parse your communications as an HTTP request and response, perhaps designed in REST style. RPC systems try to make communications look like a regular function call instead, as if you were calling a library rather than a remote service. The RPC system provides a "stub" object on the client side which stands in for the real server-side object. When a method is called on the stub, the RPC system figures out how to serialize and transmit the parameters to the server, invoke the method on the server, and then transmit the return value back.

The merits of RPC have been subject to a great deal of debate. RPC is often accused of committing many of the fallacies of distributed computing.

But this reputation is outdated. When RPC was first invented some 40 years ago, async programming barely existed. We did not have Promises, much less async and await. Early RPC was synchronous: calls would block the calling thread waiting for a reply. At best, latency made the program slow. At worst, network failures would hang or crash the program. No wonder it was deemed "broken".

Things are different today. We have Promise and async and await, and we can throw exceptions on network failures. We even understand how RPCs can be pipelined so that a chain of calls takes only one network round trip. Many large distributed systems you likely use every day are built on RPC. It works.

The fact is, RPC fits the programming model we're used to. Every programmer is trained to think in terms of APIs composed of function calls, not in terms of byte stream protocols nor even REST. Using RPC frees you from the need to constantly translate between mental models, allowing you to move faster.

When should you use Cap'n Web?

Cap'n Web is useful anywhere where you have two JavaScript applications speaking to each other over a network, including client-to-server and microservice-to-microservice scenarios. However, it is particularly well-suited to interactive web applications with real-time collaborative features, as well as modeling interactions over complex security boundaries.

Cap'n Web is still new and experimental, so for now, a willingness to live on the cutting edge may also be required!

Features, features, features…

Here's some more things you can do with Cap'n Web.

HTTP batch mode

Sometimes a WebSocket connection is a bit too heavyweight. What if you just want to make a quick one-time batch of calls, but don't need an ongoing connection?

For that, Cap'n Web supports HTTP batch mode:

import { newHttpBatchRpcSession } from "capnweb";

let batch = newHttpBatchRpcSession("https://example.com/api");

let result = await batch.hello("World");

console.log(result);

(The server is exactly the same as before.)

Note that once you've awaited an RPC in the batch, the batch is done, and all the remote references received through it become broken. To make more calls, you need to start over with a new batch. However, you can make multiple calls in a single batch:

let batch = newHttpBatchRpcSession("https://example.com/api");

// We can call make multiple calls, as long as we await them all at once.
let promise1 = batch.hello("Alice");
let promise2 = batch.hello("Bob");

let [result1, result2] = await Promise.all([promise1, promise2]);

console.log(result1);
console.log(result2);

And that brings us to another feature…

Chained calls (Promise Pipelining)

Here's where things get magical.

In both batch mode and WebSocket mode, you can make a call that depends on the result of another call, without waiting for the first call to finish. In batch mode, that means you can, in a single batch, call a method, then use its result in another call. The entire batch still requires only one network round trip.

For example, say your API is:

class MyApiServer extends RpcTarget {
  getMyName() {
    return "Alice";
  }

  hello(name) {
    return `Hello, ${name}!`
  }
}

You can do:

let namePromise = batch.getMyName();
let result = await batch.hello(namePromise);

console.log(result);

Notice the initial call to getMyName() returned a promise, but we used the promise itself as the input to hello(), without awaiting it first. With Cap'n Web, this just works: The client sends a message to the server saying: "Please insert the result of the first call into the parameters of the second."

Or perhaps the first call returns an object with methods. You can call the methods immediately, without awaiting the first promise, like:

let batch = newHttpBatchRpcSession("https://example.com/api");

// Authencitate the API key, returning a Session object.
let sessionPromise = batch.authenticate(apiKey);

// Get the user's name.
let name = await sessionPromise.whoami();

console.log(name);

This works because the promise returned by a Cap'n Web call is not a regular promise. Instead, it's a JavaScript Proxy object. Any methods you call on it are interpreted as speculative method calls on the eventual result. These calls are sent to the server immediately, telling the server: "When you finish the call I sent earlier, call this method on what it returns."

Did you spot the security?

This last example shows an important security pattern enabled by Cap'n Web's object-capability model.

When we call the authenticate() method, after it has verified the provided API key, it returns an authenticated session object. The client can then make further RPCs on the session object to perform operations that require authorization as that user. The server code might look like this:

class MyApiServer extends RpcTarget {
  authenticate(apiKey) {
    let username = await checkApiKey(apiKey);
    return new AuthenticatedSession(username);
  }
}

class AuthenticatedSession extends RpcTarget {
  constructor(username) {
    super();
    this.username = username;
  }

  whoami() {
    return this.username;
  }

  // ...other methods requiring auth...
}

Here's what makes this work: It is impossible for the client to "forge" a session object. The only way to get one is to call authenticate(), and have it return successfully.

In most RPC systems, it is not possible for one RPC to return a stub pointing at a new RPC object in this way. Instead, all functions are top-level, and can be called by anyone. In such a traditional RPC system, it would be necessary to pass the API key again to every function call, and check it again on the server each time. Or, you'd need to do authorization outside the RPC system entirely.

This is a common pain point for WebSockets in particular. Due to the design of the web APIs for WebSocket, you generally cannot use headers nor cookies to authorize them. Instead, authorization must happen in-band, by sending a message over the WebSocket itself. But this can be annoying for RPC protocols, as it means the authentication message is "special" and changes the state of the connection itself, affecting later calls. This breaks the abstraction.

The authenticate() pattern shown above neatly makes authentication fit naturally into the RPC abstraction. It's even type-safe: you can't possibly forget to authenticate before calling a method requiring auth, because you wouldn't have an object on which to make the call. Speaking of type-safety…

TypeScript

If you use TypeScript, Cap'n Web plays nicely with it. You can declare your RPC API once as a TypeScript interface, implement in on the server, and call it on the client:

// Shared interface declaration:
interface MyApi {
  hello(name: string): Promise;
}

// On the client:
let api: RpcStub = newWebSocketRpcSession("wss://example.com/api");

// On the server:
class MyApiServer extends RpcTarget implements MyApi {
  hello(name) {
    return `Hello, ${name}!`
  }
}

Now you get end-to-end type checking, auto-completed method names, and so on.

Note that, as always with TypeScript, no type checks occur at runtime. The RPC system itself does not prevent a malicious client from calling an RPC with parameters of the wrong type. This is, of course, not a problem unique to Cap'n Web – JSON-based APIs have always had this problem. You may wish to use a runtime type-checking system like Zod to solve this. (Meanwhile, we hope to add type checking based directly on TypeScript types in the future.)

An alternative to GraphQL?

If you’ve used GraphQL before, you might notice some similarities. One benefit of GraphQL was to solve the “waterfall” problem of traditional REST APIs by allowing clients to ask for multiple pieces of data in one query. For example, instead of making three sequential HTTP calls:

GET /user
GET /user/friends
GET /user/friends/photos

…you can write one GraphQL query to fetch it all at once.

That’s a big improvement over REST, but GraphQL comes with its own tradeoffs:

New language and tooling. You have to adopt GraphQL’s schema language, servers, and client libraries. If your team is all-in on JavaScript, that’s a lot of extra machinery.
Limited composability. GraphQL queries are declarative, which makes them great for fetching data, but awkward for chaining operations or mutations. For example, you can’t easily say: “create a user, then immediately use that new user object to make a friend request, all-in-one round trip.”
Different abstraction model. GraphQL doesn’t look or feel like the JavaScript APIs you already know. You’re learning a new mental model rather than extending the one you use every day.

How Cap'n Web goes further

Cap'n Web solves the waterfall problem without introducing a new language or ecosystem. It’s just JavaScript. Because Cap'n Web supports promise pipelining and object references, you can write code that looks like this:

let user = api.createUser({ name: "Alice" });
let friendRequest = await user.sendFriendRequest("Bob");

What happens under the hood? Both calls are pipelined into a single network round trip:

Create the user.
Take the result of that call (a new User object).
Immediately invoke sendFriendRequest() on that object.

All of this is expressed naturally in JavaScript, with no schemas, query languages, or special tooling required. You just call methods and pass objects around, like you would in any other JavaScript code.

In other words, GraphQL gave us a way to flatten REST’s waterfalls. Cap'n Web lets us go even further: it gives you the power to model complex interactions exactly the way you would in a normal program, with no impedance mismatch.

But how do we solve arrays?

With everything we've presented so far, there's a critical missing piece to seriously consider Cap'n Web as an alternative to GraphQL: handling lists. Often, GraphQL is used to say: "Perform this query, and then, for every result, perform this other query." For example: "List the user's friends, and then for each one, fetch their profile photo."

In short, we need an array.map() operation that can be performed without adding a round trip.

Cap'n Proto, historically, has never supported such a thing.

But with Cap'n Web, we've solved it. You can do:

let user = api.authenticate(token);

// Get the user's list of friends (an array).
let friendsPromise = user.listFriends();

// Do a .map() to annotate each friend record with their photo.
// This operates on the *promise* for the friends list, so does not
// add a round trip.
// (wait WHAT!?!?)
let friendsWithPhotos = friendsPromise.map(friend => {
  return {friend, photo: api.getUserPhoto(friend.id))};
}

// Await the friends list with attached photos -- one round trip!
let results = await friendsWithPhotos;

Wait… How!?

.map() takes a callback function, which needs to be applied to each element in the array. As we described earlier, normally when you pass a function to an RPC, the function is passed "by reference", meaning that the remote side receives a stub, where calling that stub makes an RPC back to the client where the function was created.

But that is NOT what is happening here. That would defeat the purpose: we don't want the server to have to round-trip to the client to process every member of the array. We want the server to just apply the transformation server-side.

To that end, .map() is special. It does not send JavaScript code to the server, but it does send something like "code", restricted to a domain-specific, non-Turing-complete language. The "code" is a list of instructions that the server should carry out for each member of the array. In this case, the instructions are:

Invoke api.getUserPhoto(friend.id).
Return an object {friend, photo}, where friend is the original array element and photo is the result of step 1.

But the application code just specified a JavaScript method. How on Earth could we convert this into the narrow DSL?

The answer is record-replay: On the client side, we execute the callback once, passing in a special placeholder value. The parameter behaves like an RPC promise. However, the callback is required to be synchronous, so it cannot actually await this promise. The only thing it can do is use promise pipelining to make pipelined calls. These calls are intercepted by the implementation and recorded as instructions, which can then be sent to the server, where they can be replayed as needed.

And because the recording is based on promise pipelining, which is what the RPC protocol itself is designed to represent, it turns out that the "DSL" used to represent "instructions" for the map function is just the RPC protocol itself. 🤯

Implementation details

JSON-based serialization

Cap'n Web's underlying protocol is based on JSON – but with a preprocessing step to handle special types. Arrays are treated as "escape sequences" that let us encode other values. For example, JSON does not have an encoding for Date objects, but Cap'n Web does. You might see a message that looks like this:

{
  event: "Birthday Week",
  timestamp: ["date", 1758499200000]
}

To encode a literal array, we simply double-wrap it in []:

{
  names: [["Alice", "Bob", "Carol"]]
}

In other words, an array with just one element which is itself an array, evaluates to the inner array literally. An array whose first element is a type name, evaluates to an instance of that type, where the remaining elements are parameters to the type.

Note that only a fixed set of types are supported: essentially, "structured clonable" types, and RPC stub types.

On top of this basic encoding, we define an RPC protocol inspired by Cap'n Proto – but greatly simplified.

RPC protocol

Since Cap'n Web is a symmetric protocol, there is no well-defined "client" or "server" at the protocol level. There are just two parties exchanging messages across a connection. Every kind of interaction can happen in either direction.

In order to make it easier to describe these interactions, I will refer to the two parties as "Alice" and "Bob".

Alice and Bob start the connection by establishing some sort of bidirectional message stream. This may be a WebSocket, but Cap'n Web also allows applications to define their own transports. Each message in the stream is JSON-encoded, as described earlier.

Alice and Bob each maintain some state about the connection. In particular, each maintains an "export table", describing all the pass-by-reference objects they have exposed to the other side, and an "import table", describing the references they have received. Alice's exports correspond to Bob's imports, and vice versa. Each entry in the export table has a signed integer ID, which is used to reference it. You can think of these IDs like file descriptors in a POSIX system. Unlike file descriptors, though, IDs can be negative, and an ID is never reused over the lifetime of a connection.

At the start of the connection, Alice and Bob each populate their export tables with a single entry, numbered zero, representing their "main" interfaces. Typically, when one side is acting as the "server", they will export their main public RPC interface as ID zero, whereas the "client" will export an empty interface. However, this is up to the application: either side can export whatever they want.

From there, new exports are added in two ways:

When Alice sends a message to Bob that contains within it an object or function reference, Alice adds the target object to her export table. IDs assigned in this case are always negative, starting from -1 and counting downwards.
Alice can send a "push" message to Bob to request that Bob add a value to his export table. The "push" message contains an expression which Bob evaluates, exporting the result. Usually, the expression describes a method call on one of Bob's existing exports – this is how an RPC is made. Each "push" is assigned a positive ID on the export table, starting from 1 and counting upwards. Since positive IDs are only assigned as a result of pushes, Alice can predict the ID of each push she makes, and can immediately use that ID in subsequent messages. This is how promise pipelining is achieved.

After sending a push message, Alice can subsequently send a "pull" message, which tells Bob that once he is done evaluating the "push", he should proactively serialize the result and send it back to Alice, as a "resolve" (or "reject") message. However, this is optional: Alice may not actually care to receive the return value of an RPC, if Alice only wants to use it in promise pipelining. In fact, the Cap'n Web implementation will only send a "pull" message if the application has actually awaited the returned promise.

Putting it together, a code sequence like this:

let namePromise = api.getMyName();
let result = await api.hello(namePromise);

console.log(result);

Might produce a message exchange like this:

// Call api.getByName(). `api` is the server's main export, so has export ID 0.
-> ["push", ["pipeline", 0, "getMyName", []]
// Call api.hello(namePromise). `namePromise` refers to the result of the first push,
// so has ID 1.
-> ["push", ["pipeline", 0, "hello", [["pipeline", 1]]]]
// Ask that the result of the second push be proactively serialized and returned.
-> ["pull", 2]
// Server responds.
<- ["resolve", 2, "Hello, Alice!"]

For more details about the protocol, check out the docs.

Try it out!

Cap'n Web is new and still highly experimental. There may be bugs to shake out. But, we're already using it today. Cap'n Web is the basis of the recently-launched "remote bindings" feature in Wrangler, allowing a local test instance of workerd to speak RPC to services in production. We've also begun to experiment with it in various frontend applications – expect more blog posts on this in the future.

In any case, Cap'n Web is open source, and you can start using it in your own projects now.

Check it out on GitHub.

Zero-latency SQLite storage in every Durable Object

Kenton Varda — Thu, 26 Sep 2024 13:00:00 GMT

Traditional cloud storage is inherently slow, because it is normally accessed over a network and must carefully synchronize across many clients that could be accessing the same data. But what if we could instead put your application code deep into the storage layer, such that your code runs directly on the machine where the data is stored, and the database itself executes as a local library embedded inside your application?

Durable Objects (DO) are a novel approach to cloud computing which accomplishes just that: Your application code runs exactly where the data is stored. Not just on the same machine: your storage lives in the same thread as the application, requiring not even a context switch to access. With proper use of caching, storage latency is essentially zero, while nevertheless being durable and consistent.

Until today, DOs only offered key/value oriented storage. But now, they support a full SQL query interface with tables and indexes, through the power of SQLite.

SQLite is the most-used SQL database implementation in the world, with billions of installations. It’s on practically every phone and desktop computer, and many embedded devices use it as well. It's known to be blazingly fast and rock solid. But it's been less common on the server. This is because traditional cloud architecture favors large distributed databases that live separately from application servers, while SQLite is designed to run as an embedded library. In this post, we'll show you how Durable Objects turn this architecture on its head and unlock the full power of SQLite in the cloud.

Refresher: what are Durable Objects?

Durable Objects (DOs) are a part of the Cloudflare Workers serverless platform. A DO is essentially a small server that can be addressed by a unique name and can keep state both in-memory and on-disk. Workers running anywhere on Cloudflare's network can send messages to a DO by its name, and all messages addressed to the same name — from anywhere in the world — will find their way to the same DO instance.

DOs are intended to be small and numerous. A single application can create billions of DOs distributed across our global network. Cloudflare automatically decides where a DO should live based on where it is accessed, automatically starts it up as needed when requests arrive, and shuts it down when idle. A DO has in-memory state while running and can also optionally store long-lived durable state. Since there is exactly one DO for each name, a DO can be used to coordinate between operations on the same logical object.

For example, imagine a real-time collaborative document editor application. Many users may be editing the same document at the same time. Each user's changes must be broadcast to other users in real time, and conflicts must be resolved. An application built on DOs would typically create one DO for each document. The DO would receive edits from users, resolve conflicts, broadcast the changes back out to other users, and keep the document content updated in its local storage.

DOs are especially good at real-time collaboration, but are by no means limited to this use case. They are general-purpose servers that can implement any logic you desire to serve requests. Even more generally, DOs are a basic building block for distributed systems.

When using Durable Objects, it's important to remember that they are intended to scale out, not up. A single object is inherently limited in throughput since it runs on a single thread of a single machine. To handle more traffic, you create more objects. This is easiest when different objects can handle different logical units of state (like different documents, different users, or different "shards" of a database), where each unit of state has low enough traffic to be handled by a single object. But sometimes, a lot of traffic needs to modify the same state: consider a vote counter with a million users all trying to cast votes at once. To handle such cases with Durable Objects, you would need to create a set of objects that each handle a subset of traffic and then replicate state to each other. Perhaps they use CRDTs in a gossip network, or perhaps they implement a fan-in/fan-out approach to a single primary object. Whatever approach you take, Durable Objects make it fast and easy to create more stateful nodes as needed.

Why is SQLite-in-DO so fast?

In traditional cloud architecture, stateless application servers run business logic and communicate over the network to a database. Even if the network is local, database requests still incur latency, typically measured in milliseconds.

When a Durable Object uses SQLite, SQLite is invoked as a library. This means the database code runs not just on the same machine as the DO, not just in the same process, but in the very same thread. Latency is effectively zero, because there is no communication barrier between the application and SQLite. A query can complete in microseconds.

Reads and writes are synchronous

The SQL query API in DOs does not require you to await results — they are returned synchronously:

// No awaits!
let cursor = sql.exec("SELECT name, email FROM users");
for (let user of cursor) {
  console.log(user.name, user.email);
}

This may come as a surprise to some. Querying a database is I/O, right? I/O should always be asynchronous, right? Isn't this a violation of the natural order of JavaScript?

It's OK! The database content is probably cached in memory already, and SQLite is being called as a library in the same thread as the application, so the query often actually won't spend any time at all waiting for I/O. Even if it does have to go to disk, it's a local SSD. You might as well consider the local disk as just another layer in the memory cache hierarchy: L5 cache, if you will. In any case, it will respond quickly.

Meanwhile, synchronous queries provide some big benefits. First, the logistics of asynchronous event loops have a cost, so in the common case where the data is already in memory, a synchronous query will actually complete faster than an async one.

More importantly, though, synchronous queries help you avoid subtle bugs. Any time your application awaits a promise, it's possible that some other code executes while you wait. The state of the world may have changed by the time your await completes. Maybe even other SQL queries were executed. This can lead to subtle bugs that are hard to reproduce because they require events to happen at just the wrong time. With a synchronous API, though, none of that can happen. Your code always executes in the order you wrote it, uninterrupted.

Fast writes with Output Gates

Database experts might have a deeper objection to synchronous queries: Yes, caching may mean we can perform reads and writes very fast. However, in the case of a write, just writing to cache isn't good enough. Before we return success to our client, we must confirm that the write is actually durable, that is, it has actually made it onto disk or network storage such that it cannot be lost if the power suddenly goes out.

Normally, a database would confirm all writes before returning to the application. So if the query is successful, it is confirmed. But confirming writes can be slow, because it requires waiting for the underlying storage medium to respond. Normally, this is OK because the write is performed asynchronously, so the program can go on and work on other things while it waits for the write to finish. It looks kind of like this:

But I just told you that in Durable Objects, writes are synchronous. While a synchronous call is running, no other code in the program can run (because JavaScript does not have threads). This is convenient, as mentioned above, because it means you don't need to worry that the state of the world may have changed while you were waiting. However, if write queries have to wait a while, and the whole program must pause and wait for them, then throughput will suffer.

Luckily, in Durable Objects, writes do not have to wait, due to a little trick we call "Output Gates".

In DOs, when the application issues a write, it continues executing without waiting for confirmation. However, when the DO then responds to the client, the response is blocked by the "Output Gate". This system holds the response until all storage writes relevant to the response have been confirmed, then sends the response on its way. In the rare case that the write fails, the response will be replaced with an error and the Durable Object itself will restart. So, even though the application constructed a "success" response, nobody can ever see that this happened, and thus nobody can be misled into believing that the data was stored.

Let's see what this looks like with multiple requests:

If you compare this against the first diagram above, you should notice a few things:

The timing of requests and confirmations are the same.
But, all responses were sent to the client sooner than in the first diagram. Latency was reduced! This is because the application is able to work on constructing the response in parallel with the storage layer confirming the write.
Request handling is no longer interleaved between the three requests. Instead, each request runs to completion before the next begins. The application does not need to worry, during the handling of one request, that its state might change unexpectedly due to a concurrent request.

With Output Gates, we get the ease-of-use of synchronous writes, while also getting lower latency and no loss of throughput.

N+1 selects? No problem.

Zero-latency queries aren't just faster, they allow you to structure your code differently, often making it simpler. A classic example is the "N+1 selects" or "N+1 queries" problem. Let's illustrate this problem with an example:

// N+1 SELECTs example

// Get the 100 most-recently-modified docs.
let docs = sql.exec(`
  SELECT title, authorId FROM documents
  ORDER BY lastModified DESC
  LIMIT 100
`).toArray();

// For each returned document, get the author name from the users table.
for (let doc of docs) {
  doc.authorName = sql.exec(
      "SELECT name FROM users WHERE id = ?", doc.authorId).one().name;
}

If you are an experienced SQL user, you are probably cringing at this code, and for good reason: this code does 101 queries! If the application is talking to the database across a network with 5ms latency, this will take 505ms to run, which is slow enough for humans to notice.

// Do it all in one query with a join?
let docs = sql.exec(`
  SELECT documents.title, users.name
  FROM documents JOIN users ON documents.authorId = users.id
  ORDER BY documents.lastModified DESC
  LIMIT 100
`).toArray();

Here we've used SQL features to turn our 101 queries into one query. Great! Except, what does it mean? We used an inner join, which is not to be confused with a left, right, or cross join. What's the difference? Honestly, I have no idea! I had to look up joins just to write this example and I'm already confused.

Well, good news: You don't need to figure it out. Because when using SQLite as a library, the first example above works just fine. It'll perform about the same as the second fancy version.

More generally, when using SQLite as a library, you don't have to learn how to do fancy things in SQL syntax. Your logic can be in regular old application code in your programming language of choice, orchestrating the most basic SQL queries that are easy to learn. It's fine. The creators of SQLite have made this point themselves.

Point-in-Time Recovery

While not necessarily related to speed, SQLite-backed Durable Objects offer another feature: any object can be reverted to the state it had at any point in time in the last 30 days. So if you accidentally execute a buggy query that corrupts all your data, don't worry: you can recover. There's no need to opt into this feature in advance; it's on by default for all SQLite-backed DOs. See the docs for details.

How do I use it?

Let's say we're an airline, and we are implementing a way for users to choose their seats on a flight. We will create a new Durable Object for each flight. Within that DO, we will use a SQL table to track the assignments of seats to passengers. The code might look something like this:

import {DurableObject} from "cloudflare:workers";

// Manages seat assignment for a flight.
//
// This is an RPC interface. The methods can be called remotely by other Workers
// running anywhere in the world. All Workers that specify same object ID
// (probably based on the flight number and date) will reach the same instance of
// FlightSeating.
export class FlightSeating extends DurableObject {
  sql = this.ctx.storage.sql;

  // Application calls this when the flight is first created to set up the seat map.
  initializeFlight(seatList) {
    this.sql.exec(`
      CREATE TABLE seats (
        seatId TEXT PRIMARY KEY,  -- e.g. "3B"
        occupant TEXT             -- null if available
      )
    `);

    for (let seat of seatList) {
      this.sql.exec(`INSERT INTO seats VALUES (?, null)`, seat);
    }
  }

  // Get a list of available seats.
  getAvailable() {
    let results = [];

    // Query returns a cursor.
    let cursor = this.sql.exec(`SELECT seatId FROM seats WHERE occupant IS NULL`);

    // Cursors are iterable.
    for (let row of cursor) {
      // Each row is an object with a property for each column.
      results.push(row.seatId);
    }

    return results;
  }

  // Assign passenger to a seat.
  assignSeat(seatId, occupant) {
    // Check that seat isn't occupied.
    let cursor = this.sql.exec(`SELECT occupant FROM seats WHERE seatId = ?`, seatId);
    let result = [...cursor][0];  // Get the first result from the cursor.
    if (!result) {
      throw new Error("No such seat: " + seatId);
    }
    if (result.occupant !== null) {
      throw new Error("Seat is occupied: " + seatId);
    }

    // If the occupant is already in a different seat, remove them.
    this.sql.exec(`UPDATE seats SET occupant = null WHERE occupant = ?`, occupant);

    // Assign the seat. Note: We don't have to worry that a concurrent request may
    // have grabbed the seat between the two queries, because the code is synchronous
    // (no `await`s) and the database is private to this Durable Object. Nothing else
    // could have changed since we checked that the seat was available earlier!
    this.sql.exec(`UPDATE seats SET occupant = ? WHERE seatId = ?`, occupant, seatId);
  }
}

(With just a little more code, we could extend this example to allow clients to subscribe to seat changes with WebSockets, so that if multiple people are choosing their seats at the same time, they can see in real time as seats become unavailable. But, that's outside the scope of this blog post, which is just about SQL storage.)

Then in wrangler.toml, define a migration setting up your DO class like usual, but instead of using new_classes, use new_sqlite_classes:

[[migrations]]
tag = "v1"
new_sqlite_classes = ["FlightSeating"]

SQLite-backed objects also support the existing key/value-based storage API: KV data is stored into a hidden table in the SQLite database. So, existing applications built on DOs will work when deployed using SQLite-backed objects.

However, because SQLite-backed objects are based on an all-new storage backend, it is currently not possible to switch an existing deployed DO class to use SQLite. You must ask for SQLite when initially deploying the new DO class; you cannot change it later. We plan to begin migrating existing DOs to the new storage backend in 2025.

Pricing

We’ve kept pricing for SQLite-in-DO similar to D1, Cloudflare’s serverless SQL database, by billing for SQL queries (based on rows) and SQL storage. SQL storage per object is limited to 1 GB during the beta period, and will be increased to 10 GB on general availability. DO requests and duration billing are unchanged and apply to all DOs regardless of storage backend.

During the initial beta, billing is not enabled for SQL queries (rows read and rows written) and SQL storage. SQLite-backed objects will incur charges for requests and duration. We plan to enable SQL billing in the first half of 2025 with advance notice.

	Workers Paid
Rows read	First 25 billion / month included + $0.001 / million rows
Rows written	First 50 million / month included + $1.00 / million rows
SQL storage	5 GB-month + $0.20/ GB-month

For more on how to use SQLite-in-Durable Objects, check out the documentation.

What about D1?

Cloudflare Workers already offers another SQLite-backed database product: D1. In fact, D1 is itself built on SQLite-in-DO. So, what's the difference? Why use one or the other?

In short, you should think of D1 as a more "managed" database product, while SQLite-in-DO is more of a lower-level “compute with storage” building block.

D1 fits into a more traditional cloud architecture, where stateless application servers talk to a separate database over the network. Those application servers are typically Workers, but could also be clients running outside of Cloudflare. D1 also comes with a pre-built HTTP API and managed observability features like query insights. With D1, where your application code and SQL database queries are not colocated like in SQLite-in-DO, Workers has Smart Placement to dynamically run your Worker in the best location to reduce total request latency, considering everything your Worker talks to, including D1. By the end of 2024, D1 will support automatic read replication for scalability and low-latency access around the world. If this managed model appeals to you, use D1.

Durable Objects require a bit more effort, but in return, give you more power. With DO, you have two pieces of code that run in different places: a front-end Worker which routes incoming requests from the Internet to the correct DO, and the DO itself, which runs on the same machine as the SQLite database. You may need to think carefully about which code to run where, and you may need to build some of your own tooling that exists out-of-the-box with D1. But because you are in full control, you can tailor the solution to your application's needs and potentially achieve more.

Under the hood: Storage Relay Service

When Durable Objects first launched in 2020, it offered only a simple key/value-based interface for durable storage. Under the hood, these keys and values were stored in a well-known off-the-shelf database, with regional instances of this database deployed to locations in our data centers around the world. Durable Objects in each region would store their data to the regional database.

For SQLite-backed Durable Objects, we have completely replaced the persistence layer with a new system built from scratch, called Storage Relay Service, or SRS. SRS has already been powering D1 for over a year, and can now be used more directly by applications through Durable Objects.

SRS is based on a simple idea:

Local disk is fast and randomly-accessible, but expensive and prone to disk failures. Object storage (like R2) is cheap and durable, but much slower than local disk and not designed for database-like access patterns. Can we get the best of both worlds by using a local disk as a cache on top of object storage?

So, how does it work?

The mismatch in functionality between local disk and object storage

A SQLite database on disk tends to undergo many small changes in rapid succession. Any row of the database might be updated by any particular query, but the database is designed to avoid rewriting parts that didn't change. Read queries may randomly access any part of the database. Assuming the right indexes exist to support the query, they should not require reading parts of the database that aren't relevant to the results, and should complete in microseconds.

Object storage, on the other hand, is designed for an entirely different usage model: you upload an entire "object" (blob of bytes) at a time, and download an entire blob at a time. Each blob has a different name. For maximum efficiency, blobs should be fairly large, from hundreds of kilobytes to gigabytes in size. Latency is relatively high, measured in tens or hundreds of milliseconds.

So how do we back up our SQLite database to object storage? An obviously naive strategy would be to simply make a copy of the database files from time to time and upload it as a new "object". But, uploading the database on every change — and making the application wait for the upload to complete — would obviously be way too slow. We could choose to upload the database only occasionally — say, every 10 minutes — but this means in the case of a disk failure, we could lose up to 10 minutes of changes. Data loss is, uh, bad! And even then, for most databases, it's likely that most of the data doesn't change every 10 minutes, so we'd be uploading the same data over and over again.

Trick one: Upload a log of changes

Instead of uploading the entire database, SRS records a log of changes, and uploads those.

Conveniently, SQLite itself already has a concept of a change log: the Write-Ahead Log, or WAL. SRS always configures SQLite to use WAL mode. In this mode, any changes made to the database are first written to a separate log file. From time to time, the database is "checkpointed", merging the changes back into the main database file. The WAL format is well-documented and easy to understand: it's just a sequence of "frames", where each frame is an instruction to write some bytes to a particular offset in the database file.

SRS monitors changes to the WAL file (by hooking SQLite's VFS to intercept file writes) to discover the changes being made to the database, and uploads those to object storage.

Unfortunately, SRS cannot simply upload every single change as a separate "object", as this would result in too many objects, each of which would be inefficiently small. Instead, SRS batches changes over a period of up to 10 seconds, or up to 16 MB worth, whichever happens first, then uploads the whole batch as a single object.

When reconstructing a database from object storage, we must download the series of change batches and replay them in order. Of course, if the database has undergone many changes over a long period of time, this can get expensive. In order to limit how far back it needs to look, SRS also occasionally uploads a snapshot of the entire content of the database. SRS will decide to upload a snapshot any time that the total size of logs since the last snapshot exceeds the size of the database itself. This heuristic implies that the total amount of data that SRS must download to reconstruct a database is limited to no more than twice the size of the database. Since we can delete data from object storage that is older than the latest snapshot, this also means that our total stored data is capped to 2x the database size.

Credit where credit is due: This idea — uploading WAL batches and snapshots to object storage — was inspired by Litestream, although our implementation is different.

Trick two: Relay through other servers in our global network

Batches are only uploaded to object storage every 10 seconds. But obviously, we cannot make the application wait for 10 whole seconds just to confirm a write. So what happens if the application writes some data, returns a success message to the user, and then the machine fails 9 seconds later, losing the data?

To solve this problem, we take advantage of our global network. Every time SQLite commits a transaction, SRS will immediately forward the change log to five "follower" machines across our network. Once at least three of these followers respond that they have received the change, SRS informs the application that the write is confirmed. (As discussed earlier, the write confirmation opens the Durable Object's "output gate", unblocking network communications to the rest of the world.)

When a follower receives a change, it temporarily stores it in a buffer on local disk, and then awaits further instructions. Later on, once SRS has successfully uploaded the change to object storage as part of a batch, it informs each follower that the change has been persisted. At that point, the follower can simply delete the change from its buffer.

However, if the follower never receives the persisted notification, then, after some timeout, the follower itself will upload the change to object storage. Thus, if the machine running the database suddenly fails, as long as at least one follower is still running, it will ensure that all confirmed writes are safely persisted.

Each of a database's five followers is located in a different physical data center. Cloudflare's network consists of hundreds of data centers around the world, which means it is always easy for us to find four other data centers nearby any Durable Object (in addition to the one it is running in). In order for a confirmed write to be lost, then, at least four different machines in at least three different physical buildings would have to fail simultaneously (three of the five followers, plus the Durable Object's host machine). Of course, anything can happen, but this is exceedingly unlikely.

Followers also come in handy when a Durable Object's host machine is unresponsive. We may not know for sure if the machine has died completely, or if it is still running and responding to some clients but not others. We cannot start up a new instance of the DO until we know for sure that the previous instance is dead – or, at least, that it can no longer confirm writes, since the old and new instances could then confirm contradictory writes. To deal with this situation, if we can't reach the DO's host, we can instead try to contact its followers. If we can contact at least three of the five followers, and tell them to stop confirming writes for the unreachable DO instance, then we know that instance is unable to confirm any more writes going forward. We can then safely start up a new instance to replace the unreachable one.

Bonus feature: Point-in-Time Recovery

I mentioned earlier that SQLite-backed Durable Objects can be asked to revert their state to any time in the last 30 days. How does this work?

This was actually an accidental feature that fell out of SRS's design. Since SRS stores a complete log of changes made to the database, we can restore to any point in time by replaying the change log from the last snapshot. The only thing we have to do is make sure we don't delete those logs too soon.

Normally, whenever a snapshot is uploaded, all previous logs and snapshots can then be deleted. But instead of deleting them immediately, SRS merely marks them for deletion 30 days later. In the meantime, if a point-in-time recovery is requested, the data is still there to work from.

For a database with a high volume of writes, this may mean we store a lot of data for a lot longer than needed. As it turns out, though, once data has been written at all, keeping it around for an extra month is pretty cheap — typically cheaper, even, than writing it in the first place. It's a small price to pay for always-on disaster recovery.

Get started with SQLite-in-DO

SQLite-backed DOs are available in beta starting today. You can start building with SQLite-in-DO by visiting developer documentation and provide beta feedback via the #durable-objects channel on our Developer Discord.

Do distributed systems like SRS excite you? Would you like to be part of building them at Cloudflare? We're hiring!

We've added JavaScript-native RPC to Cloudflare Workers

Kenton Varda — Fri, 05 Apr 2024 13:00:38 GMT

Cloudflare Workers now features a built-in RPC (Remote Procedure Call) system enabling seamless Worker-to-Worker and Worker-to-Durable Object communication, with almost no boilerplate. You just define a class:

export class MyService extends WorkerEntrypoint {
  sum(a, b) {
    return a + b;
  }
}

And then you call it:

let three = await env.MY_SERVICE.sum(1, 2);

No schemas. No routers. Just define methods of a class. Then call them. That's it.

But that's not it

This isn't just any old RPC. We've designed an RPC system so expressive that calling a remote service can feel like using a library – without any need to actually import a library! This is important not just for ease of use, but also security: fewer dependencies means fewer critical security updates and less exposure to supply-chain attacks.

To this end, here are some of the features of Workers RPC:

For starters, you can pass Structured Clonable types as the params or return value of an RPC. (That means that, unlike JSON, Dates just work, and you can even have cycles.)
You can additionally pass functions in the params or return value of other functions. When the other side calls the function you passed to it, they make a new RPC back to you.
Similarly, you can pass objects with methods. Method calls become further RPCs.
RPC to another Worker (over a Service Binding) usually does not even cross a network. In fact, the other Worker usually runs in the very same thread as the caller, reducing latency to zero. Performance-wise, it’s almost as fast as an actual function call.
When RPC does cross a network (e.g. to a Durable Object), you can invoke a method and then speculatively invoke further methods on the result in a single network round trip.
You can send a byte stream over RPC, and the system will automatically stream the bytes with proper flow control.
All of this is secure, based on the object-capability model.
The protocol and implementation are fully open source as part of workerd.

Workers RPC is a JavaScript-native RPC system. Under the hood, it is built on Cap'n Proto. However, unlike Cap'n Proto, Workers RPC does not require you to write a schema. (Of course, you can use TypeScript if you like, and we provide tools to help with this.)

In general, Workers RPC is designed to "just work" using idiomatic JavaScript code, so you shouldn't have to spend too much time looking at docs. We'll give you an overview in this blog post. But if you want to understand the full feature set, check out the documentation.

Why RPC? (And what is RPC anyway?)

The merits of RPC have been subject to a great deal of debate. RPC is often accused of committing many of the fallacies of distributed computing.

Example: Authentication Service

Here's a common scenario: You have one Worker that implements an application, and another Worker that is responsible for authenticating user credentials. The app Worker needs to call the auth Worker on each request to check the user's cookie.

This example uses a Service Binding, which is a way of configuring one Worker with a private channel to talk to another, without going through a public URL. Here, we have an application Worker that has been configured with a service binding to the Auth worker.

Before RPC, all communications between Workers needed to use HTTP. So, you might write code like this:

// OLD STYLE: HTTP-based service bindings.
export default {
  async fetch(req, env, ctx) {
    // Call the auth service to authenticate the user's cookie.
    // We send it an HTTP request using a service binding.

    // Construct a JSON request to the auth service.
    let authRequest = {
      cookie: req.headers.get("Cookie")
    };

    // Send it to env.AUTH_SERVICE, which is our service binding
    // to the auth worker.
    let resp = await env.AUTH_SERVICE.fetch(
        "https://auth/check-cookie", {
      method: "POST",
      headers: {
        "Content-Type": "application/json; charset=utf-8",
      },
      body: JSON.stringify(authRequest)
    });

    if (!resp.ok) {
      return new Response("Internal Server Error", {status: 500});
    }

    // Parse the JSON result.
    let authResult = await resp.json();

    // Use the result.
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }
    let username = authResult.username;

    return new Response(`Hello, ${username}!`);
  }
}

Meanwhile, your auth server might look like:

// OLD STYLE: HTTP-based auth server.
export default {
  async fetch(req, env, ctx) {
    // Parse URL to decide what endpoint is being called.
    let url = new URL(req.url);
    if (url.pathname == "/check-cookie") {
      // Parse the request.
      let authRequest = await req.json();

      // Look up cookie in Workers KV.
      let cookieInfo = await env.COOKIE_MAP.get(
          hash(authRequest.cookie), "json");

      // Construct the response.
      let result;
      if (cookieInfo) {
        result = {
          authorized: true,
          username: cookieInfo.username
        };
      } else {
        result = { authorized: false };
      }

      return Response.json(result);
    } else {
      return new Response("Not found", {status: 404});
    }
  }
}

This code has a lot of boilerplate involved in setting up an HTTP request to the auth service. With RPC, we can instead express this as a function call:

// NEW STYLE: RPC-based service bindings
export default {
  async fetch(req, env, ctx) {
    // Call the auth service to authenticate the user's cookie.
    // We invoke it using a service binding.
    let authResult = await env.AUTH_SERVICE.checkCookie(
        req.headers.get("Cookie"));

    // Use the result.
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }
    let username = authResult.username;

    return new Response(`Hello, ${username}!`);
  }
}

And the server side becomes:

// NEW STYLE: RPC-based auth server.
import { WorkerEntrypoint } from "cloudflare:workers";

export class AuthService extends WorkerEntrypoint {
  async checkCookie(cookie) {
    // Look up cookie in Workers KV.
    let cookieInfo = await this.env.COOKIE_MAP.get(
        hash(cookie), "json");

    // Return result.
    if (cookieInfo) {
      return {
        authorized: true,
        username: cookieInfo.username
      };
    } else {
      return { authorized: false };
    }
  }
}

This is a pretty nice simplification… but we can do much more!

Let's get fancy! Or should I say… classy?

Let's say we want our auth service to do a little more. Instead of just checking cookies, it provides a whole API around user accounts. In particular, it should let you:

Get or update the user's profile info.
Send the user an email notification.
Append to the user's activity log.

But, these operations should only be allowed after presenting the user's credentials.

Here's what the server might look like:

import { WorkerEntrypoint, RpcTarget } from "cloudflare:workers";

// `User` is an RPC interface to perform operations on a particular
// user. This class is NOT exported as an entrypoint; it must be
// received as the result of the checkCookie() RPC.
class User extends RpcTarget {
  constructor(uid, env) {
    super();

    // Note: Instance members like these are NOT exposed over RPC.
    // Only class (prototype) methods and properties are exposed.
    this.uid = uid;
    this.env = env;
  }

  // Get/set user profile, backed by Worker KV.
  async getProfile() {
    return await this.env.PROFILES.get(this.uid, "json");
  }
  async setProfile(profile) {
    await this.env.PROFILES.put(this.uid, JSON.stringify(profile));
  }

  // Send the user a notification email.
  async sendNotification(message) {
    let addr = await this.env.EMAILS.get(this.uid);
    await this.env.EMAIL_SERVICE.send(addr, message);
  }

  // Append to the user's activity log.
  async logActivity(description) {
    // (Please excuse this somewhat problematic implementation,
    // this is just a dumb example.)
    let timestamp = new Date().toISOString();
    await this.env.ACTIVITY.put(
        `${this.uid}/${timestamp}`, description);
  }
}

// Now we define the entrypoint service, which can be used to
// get User instances -- but only by presenting the cookie.
export class AuthService extends WorkerEntrypoint {
  async checkCookie(cookie) {
    // Look up cookie in Workers KV.
    let cookieInfo = await this.env.COOKIE_MAP.get(
        hash(cookie), "json");

    if (cookieInfo) {
      return {
        authorized: true,
        user: new User(cookieInfo.uid, this.env),
      };
    } else {
      return { authorized: false };
    }
  }
}

Now we can write a Worker that uses this API while displaying a web page:

export default {
  async fetch(req, env, ctx) {
    // `using` is a new JavaScript feature. Check out the
    // docs for more on this:
    // https://developers.cloudflare.com/workers/runtime-apis/rpc/lifecycle/
    using authResult = await env.AUTH_SERVICE.checkCookie(
        req.headers.get("Cookie"));
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }

    let user = authResult.user;
    let profile = await user.getProfile();

    await user.logActivity("You visited the site!");
    await user.sendNotification(
        `Thanks for visiting, ${profile.name}!`);

    return new Response(`Hello, ${profile.name}!`);
  }
}

Finally, this worker needs to be configured with a service binding pointing at the AuthService class. Its wrangler.toml may look like:

name = "app-worker"
main = "./src/app.js"

# Declare a service binding to the auth service.
[[services]]
binding = "AUTH_SERVICE"    # name of the binding in `env`
service = "auth-service"    # name of the worker in the dashboard
entrypoint = "AuthService"  # name of the exported RPC class

Wait, how?

What exactly happened here? The Server created an instance of the class User and returned it to the client. It has methods that the client can then just call? Are we somehow transferring code over the wire?

No, absolutely not! All code runs strictly in the isolate where it was originally loaded. What actually happens is, when the return value is passed over RPC, all class instances are replaced with RPC stubs. The stub, when called, makes a new RPC back to the server, where it calls the method on the original User object that was created there:

But then you might ask: how does the RPC stub know what methods are available? Is a list of methods passed over the wire?

In fact, no. The RPC stub is a special object called a "Proxy". It implements a "wildcard method", that is, it appears to have an infinite number of methods of every possible name. When you try to call a method, the name you called is sent to the server. If the original object has no such method, an exception is thrown.

Did you spot the security?

In the above example, we see that RPC is easy to use. We made an object! We called it! It all just felt natural, like calling a local API! Hooray!

But there's another extremely important property that the AuthService API has which you may have missed: As designed, you cannot perform any operation on a user without first checking the cookie. This is true despite the fact that the individual method calls do not require sending the cookie again, and the User object itself doesn't store the cookie.

The trick is, the initial checkCookie() RPC is what returns a User object in the first place. The AuthService API does not provide any other way to obtain a User instance. The RPC client cannot create a User object out of thin air, and cannot call methods of an object without first explicitly receiving a reference to it.

This is called capability-based security: we say that the User reference received by the client is a "capability", because receiving it grants the client the ability to perform operations on the user. The getProfile() method grants this capability only when the client has presented the correct cookie.

Capability-based security is often like this: security can be woven naturally into your APIs, rather than feel like an additional concern bolted on top.

More security: Named entrypoints

Another subtle but important detail to call out: in the above example, the auth service's RPC API is exported as a named class:

export class AuthService extends WorkerEntrypoint {

And in our wrangler.toml for the calling worker, we had to specify an "entrypoint", matching the class name:

entrypoint = "AuthService"  # name of the exported RPC class

In the past, service bindings would bind to the "default" entrypoint, declared with export default {. But, the default entrypoint is also typically exposed to the Internet, e.g. automatically mapped to a hostname under workers.dev (unless you explicitly turn that off). It can be tricky to safely assume that requests arriving at this entrypoint are in any way trusted.

With named entrypoints, this all changes. A named entrypoint is only accessible to Workers which have explicitly declared a binding to it. By default, only Workers on your own account can declare such bindings. Moreover, the binding must be declared at deploy time; a Worker cannot create new service bindings at runtime.

Thus, you can trust that requests arriving at a named entrypoint can only have come from Workers on your account and for which you explicitly created a service binding. In the future, we plan to extend this pattern further with the ability to lock down entrypoints, audit which Workers have bindings to them, tell the callee information about who is calling at runtime, and so on. With these tools, there is no need to write code in your app itself to authenticate access to internal APIs; the system does it for you.

What about type safety?

Workers RPC works in an entirely dynamically-typed way, just as JavaScript itself does. But just as you can apply TypeScript on top of JavaScript in general, you can apply it to Workers RPC.

The @cloudflare/workers-types package defines the type Service, which describes the type of a service binding. MyEntrypointType is the type of your server-side interface. Service applies all the necessary transformations to turn this into a client-side type, such as converting all methods to async, replacing functions and RpcTargets with (properly-typed) stubs, and so on.

It is up to you to share the definition of MyEntrypointType between your server app and its clients. You might do this by defining the interface in a separate shared TypeScript file, or by extracting a .d.ts type declaration file from your server code using tsc --declaration.

With that done, you can apply types to your client:

import { WorkerEntrypoint } from "cloudflare:workers";

// The interface that your server-side entrypoint implements.
// (This would probably be imported from a .d.ts file generated
// from your server code.)
declare class MyEntrypointType extends WorkerEntrypoint {
  sum(a: number, b: number): number;
}

// Define an interface Env specifying the bindings your client-side
// worker expects.
interface Env {
  MY_SERVICE: Service;
}

// Define the client worker's fetch handler with typed Env.
export default > {
  async fetch(req, env, ctx) {
    // Now env.MY_SERVICE is properly typed!
    const result = await env.MY_SERVICE.sum(1, 2);
    return new Response(result.toString());
  }
}

RPC to Durable Objects

Durable Objects allow you to create a "named" worker instance somewhere on the network that multiple other workers can then talk to, in order to coordinate between them. Each Durable Object also has its own private on-disk storage where it can store state long-term.

Previously, communications with a Durable Object had to take the form of HTTP requests and responses. With RPC, you can now just declare methods on your Durable Object class, and call them on the stub. One catch: to opt into RPC, you must declare your Durable Object class with extends DurableObject, like so:

import { DurableObject } from "cloudflare:workers";

export class Counter extends DurableObject {
  async increment() {
    // Increment our stored value and return it.
    let value = await this.ctx.storage.get("value");
    value = (value || 0) + 1;
    this.ctx.storage.put("value", value);
    return value;
  }
}

Now we can call it like:

let stub = env.COUNTER_NAMESPACE.get(id);
let value = await stub.increment();

TypeScript is supported here too, by defining your binding with type DurableObjectNamespace:

interface Env {
  COUNTER_NAMESPACE: DurableObjectNamespace;
}

Eliding awaits with speculative calls

When talking to a Durable Object, the object may be somewhere else in the world from the caller. RPCs must cross the network. This takes time: despite our best efforts, we still haven't figured out how to make information travel faster than the speed of light.

When you have a complex RPC interface where one call returns an object on which you wish to make further method calls, it's easy to end up with slow code that makes too many round trips over the network.

// Makes three round trips.
let foo = await stub.foo();
let baz = await foo.bar.baz();
let corge = await baz.qux[3].corge();

Workers RPC features a way to avoid this: If you know that a call will return a value containing a stub, and all you want to do with it is invoke a method on that stub, you can skip awaiting it:

// Same thing, only one round trip.
let foo = stub.foo();
let baz = foo.bar.baz();
let corge = await baz.qux[3].corge();

Whoa! How does this work?

RPC methods do not return normal promises. Instead, they return special RPC promises. These objects are "custom thenables", which means you can use them in all the ways you'd use a regular Promise, like awaiting it or calling .then() on it.

But an RPC promise is more than just a thenable. It is also a proxy. Like an RPC stub, it has a wildcard property. You can use this to express speculative RPC calls on the eventual result, before it has actually resolved. These speculative calls will be sent to the server immediately, so that they can begin executing as soon as the first RPC has finished there, before the result has actually made its way back over the network to the client.

This feature is also known as "Promise Pipelining". Although it isn't explicitly a security feature, it is commonly provided by object-capability RPC systems like Cap'n Proto.

The future: Custom Bindings Marketplace?

For now, Service Bindings and Durable Objects only allow communication between Workers running on the same account. So, RPC can only be used to talk between your own Workers.

But we'd like to take it further.

We have previously explained why Workers environments contain live objects, also known as "bindings". But today, only Cloudflare can add new binding types to the Workers platform – like Queues, KV, or D1. But what if anyone could invent their own binding type, and give it to other people?

Previously, we thought this would require creating a way to automatically load client libraries into the calling Workers. That seemed scary: it meant using someone's binding would require trusting their code to run inside your isolate. With RPC, there's no such trust. The binding only sees exactly what you explicitly pass to it. It cannot compromise the rest of your Worker.

Could Workers RPC provide the basis for a "bindings marketplace", where people can offer rich JavaScript APIs to each other in an easy and secure way? We're excited to explore and find out.

Try it now

Workers RPC is available today for all Workers users. To get started, check out the docs.

Why Workers environment variables contain live objects

Kenton Varda — Mon, 01 Apr 2024 13:00:10 GMT

If you've ever written a Cloudflare Worker using Workers KV for storage, you may have noticed something unsettling.

// A simple Worker that always returns the value named "content",
// read from Workers KV storage.
export default {
  async fetch(request, env, ctx) {
    return new Response(await env.MY_KV.get("content"));
  }
}

Do you feel something is… missing? Like… Where is the setup? The authorization keys? The client library instantiation? Aren't environment variables normally strings? How is it that env.MY_KV seems to be an object with a get() method that is already hooked up?

Coming from any other platform, you might expect to see something like this instead:

// How would a "typical cloud platform" do it?

// Import KV client library?
import { KV } from "cloudflare:kv";

export default {
  async fetch(request, env, ctx) {
    // Connect to the database?? Using my secret auth key???
    // Which comes from an environment variable????
    let myKv = KV.connect("my-kv-namespace", env.MY_KV_AUTHKEY);

    return new Response(await myKv.get("content"));
  }
}

As another example, consider service bindings, which allow a Worker to send requests to another Worker.

// A simple Worker that greets an authenticated user, delegating to a
// separate service to perform authentication.
export default {
  async fetch(request, env, ctx) {
    // Forward headers to auth service to get user info.
    let authResponse = await env.AUTH_SERVICE.fetch(
        "https://auth/getUser",
        {headers: request.headers});
    let userInfo = await authResponse.json();
    return new Response("Hello, " + userInfo.name);
  }
}

Notice in particular the use of env.AUTH_SERVICE.fetch() to send the request. This sends the request directly to the auth service, regardless of the hostname we give in the URL.

On "typical” platforms, you'd expect to use a real (perhaps internal) hostname to route the request instead, and also include some credentials proving that you're allowed to use the auth service API:

// How would a "typical cloud platform" do it?
export default {
  async fetch(request, env, ctx) {
    // Forward headers to auth service, via some internal hostname?
    // Hostname needs to be configurable, so get it from an env var.
    let authRequest = new Request(
        "https://" + env.AUTH_SERVICE_HOST + "/getUser",
        {headers: request.headers});

    // We also need to prove that our service is allowed to talk to
    // the auth service API. Add a header for that, containing a
    // secret token from our environment.
    authRequest.headers.set("X-Auth-Service-Api-Key",
        env.AUTH_SERVICE_API_KEY);

    // Now we can make the request.
    let authResponse = await fetch(authRequest);
    let userInfo = await authResponse.json();
    return new Response("Hello, " + userInfo.name);
  }
}

As you can see, in Workers, the "environment" is not just a bunch of strings. It contains full-fledged objects. We call each of these objects a "binding", because it binds the environment variable name to a resource. You configure exactly what resource a name is bound to when you deploy your Worker – again, just like a traditional environment variable, but not limited to strings.

We can clearly see above that bindings eliminate a little bit of boilerplate, which is nice. But, there's so much more.

Bindings don't just reduce boilerplate. They are a core design feature of the Workers platform which simultaneously improve developer experience and application security in several ways. Usually these two goals are in opposition to each other, but bindings elegantly solve for both at the same time.

Security

It may not be obvious at first glance, but bindings neatly solve a number of common security problems in distributed systems.

SSRF is Not A Thing

Bindings, when used properly, make Workers immune to Server-Side Request Forgery (SSRF) attacks, one of the most common yet deadly security vulnerabilities in application servers today. In an SSRF attack, an attacker tricks a server into making requests to other internal services that only it can see, thus giving the attacker access to those internal services.

As an example, imagine we have built a social media application where users are able to set their avatar image. Imagine that, as a convenience, instead of uploading an image from their local disk, a user can instead specify the URL of an image on a third-party server, and the application server will fetch that image to use as the avatar. Sounds reasonable, right? We can imagine the app contains some code like:

let resp = await fetch(userAvatarUrl);
let data = await resp.arrayBuffer();
await setUserAvatar(data);

One problem: What if the user claims their avatar URL is something like "https://auth-service.internal/status"? Whoops, now the above code will actually fetch a status page from the internal auth service, and set it as the user's avatar. Presumably, the user can then download their own avatar, and it'll contain the content of this status page, which they were not supposed to be able to access!

But using bindings, this is impossible: There is no URL that the attacker can specify to reach the auth service. The application must explicitly use the binding env.AUTH_SERVICE to reach it. The global fetch() function cannot reach the auth service no matter what URL it is given; it can only make requests to the public Internet.

A legacy caveat: When we originally designed Workers in 2017, the primary use case was implementing a middleware layer in front of an origin server, integrated with Cloudflare's CDN. At the time, bindings weren't a thing yet, and we were primarily trying to implement the Service Workers interface. To that end, we made a design decision: when a Worker runs on Cloudflare in front of some origin server, if you invoke the global fetch() function with a URL that is within your zone's domain, the request will be sent directly to the origin server, bypassing most logic Cloudflare would normally apply to a request received from the Internet. Sadly, this means that Workers which run in front of an origin server are not immune to SSRF – they must worry about it just like traditional servers on private networks must. Although this puts Workers in the same place as most servers, we now see a path to make SSRF a thing you never have to worry about when writing Workers. We will be introducing "origin bindings", where the origin server is represented by an explicit binding. That is, to send a request to your origin, you'd need to do env.ORIGIN.fetch(). Then, the global fetch() function can be restricted to only talk to the public Internet, fully avoiding SSRF. This is a big change and we need to handle backwards-compatibility carefully – expect to see more in the coming months. Meanwhile, for Workers that do not have an origin server behind them, or where the origin server does not rely on Cloudflare for security, global fetch() is SSRF-safe today.

And a reminder: Requests originating from Workers have a header, CF-Worker, identifying the domain name that owns the Worker. This header is intended for abuse mitigation: if your server is receiving abusive requests from a Worker, it tells you who to blame and gives you a way to filter those requests. This header is not intended for authorization. You should not implement a private API that grants access to your Workers based solely on the CF-Worker header matching your domain. If you do, you may re-open the opportunity for SSRF vulnerabilities within any Worker running on that domain.

You can't leak your API key if there is no API key

Usually, if your web app needs access to a protected resource, you will have to obtain some sort of an API key that grants access to the resource. But typically anyone who has this key can access the resource as if they were the Worker. This makes handling auth keys tricky. You can't put it directly in a config file, unless the entire config file is considered a secret. You can't check it into source control – you don't want to publish your keys to GitHub! You probably shouldn't even store the key on your hard drive – what if your laptop is compromised? And so on.

Even if you have systems in place to deliver auth keys to services securely (like Workers Secrets), if the key is just a string, the service itself can easily leak it. For instance, a developer might carelessly insert a log statement for debugging which logs the service's configuration – including keys. Now anyone who can access your logs can discover the secret, and there's probably no practical way to tell if such a leak has occurred.

With Workers bindings, we endeavor for bindings to be live objects, not secret keys. For instance, as seen in the first example in this post, when using a Workers KV binding, you never see a key at all. It's therefore impossible for a Worker to accidentally leak access to a KV namespace.

No certificate management

This is similar to the API key problem, but arguably worse. When internal services talk to each other over a network, you presumably want them to use secure transports, but typically that requires that every service have a certificate and a private key signed by some CA, and clients must be configured to trust that CA. This is all a big pain to manage, and often the result is that developers don't bother; they set up a VPC and assume the network is trusted.

In Workers, since all intra-service communications happen over a binding, the system itself can take on all the work of ensuring the transport is secure and goes to the right place.

No frustrating ACL management – but also no lazy "allow all"

At this point you might be thinking: Why are we talking about API keys at all? Cloudflare knows which Worker is sending any request. Can't it handle the authentication that way?

Consider the earlier example where we imagined that KV namespaces could be opened by name:

// Imagine KV namespaces could be open by name?
let myKv = KV.connect("my-kv-namespace", env.MY_KV_AUTHKEY);

What if we made it simply:

// No authkey, because the system knows whether the Worker has
// permission?
let myKv = KV.connect("my-kv-namespace");

We could then imagine that we could separately configure each KV namespace with an Access Control List (ACL) that specifies which Workers can access it.

Of course, the first problem with this is that it's vulnerable to SSRF. But, we discussed that already, so let's discuss another problem.

Many platforms use ACLs for security, but have you ever noticed how everyone hates them? You end up with two choices:

Tediously maintain ACLs on every resource. Inevitably, this is always a huge pain. First you deploy your code, which you think is properly configured. Then you discover that it's failing with permissions errors causing a production outage! So you go fiddle with the IAM system. There are 533,291 roles to choose from and none of them are actually what you want. It turns out you're supposed to create a custom role, but that's not obvious, and once you get there, the UI is confusing. Also it's easy to confuse your team's service account with your team's email group, so you give the permissions to the wrong principal, but it takes you an hour of staring at it to realize what you did wrong. Then somehow you manage to remove your own access to the resource and you can't add it back even though you're a project admin? (Why yes, all this did in fact happen to me, while using a cloud provider that shall remain nameless.)
Give up and grant everything access to everything. Just put all your services in a single VPC where they can all freely talk to each other. This is what most developers are inclined to do, if their security team doesn't step in to stop them.

Much of this pain comes about because connecting a server to a resource today involves two steps that should really be one step:

Configure the server to point at the resource.
Configure the resource to accept requests from the server.

Developers are primarily concerned with step 1, and forget that step 2 exists until it blows up in their faces. Then it's a mad scramble to learn how step 2 even works.

What if step 1 just implied step 2? Obviously, if you're trying to configure a service to access a resource, then you also want the resource to allow access to the service. As long as the person trying to set this up has permissions to both, then there is no reason for this to be a two-step process.

But in typical platforms, the platform itself has no way of knowing that a service has been configured to talk to a resource, because the configuration is just a string.

Bindings fix that. When you define a binding from a Worker to a particular KV namespace, the platform inherently understands that you are telling the Worker to use the KV namespace. Therefore, it can implicitly ensure that the correct permissions are granted. There is no step 2.

And conversely, if no binding is configured, then the Worker does not have access. That means that every Worker starts out with no access by default, and only receives access to exactly the things it needs. Secure by default.

As a related benefit, you can always accurately answer the question "What services are using this resource?" based on bindings. Since the system itself understands bindings and what they point to, the system can answer the query without knowing anything about the service's internals.

Developer Experience

We've seen that bindings improve security in a number of ways. Usually, people expect security and developer friendliness to be a trade-off, with each security measure making life harder for developers. Bindings, however, are entirely the opposite! They actually make life easier!

Easier setup

As we saw in the intro, using a binding reduces setup boilerplate. Instead of receiving an environment variable containing an API key which must be passed into some sort of library, the environment variable itself is an already-initialized client library.

Observability

Because the system understands what bindings a Worker has, and even exactly when those bindings are exercised, the system can answer a lot of questions that would normally require more manual instrumentation or analysis to answer, such as:

For a given Worker, what resources does it use? Since the system understands the types of all bindings and what they point to (it doesn't just see them as opaque strings), it can answer this question.
For a given resource, which Workers use it? This is the reverse query. The system can maintain an index of bindings in order to find ones pointing at a given resource.
How often does a particular Worker use a particular resource? Since bindings are invoked by calling methods on the binding itself, the system can observe these calls, log them, collect metrics, etc.

Testability via dependency injection

When you deploy a test version of your service, you probably want it to operate on test resources rather than real production resources. For instance, you might have a separate testing KV namespace for storage. But, you probably want to deploy exactly the same code to test that you will eventually deploy to production. That means the names of these resources cannot be hard-coded.

On traditional platforms, the obvious way to avoid hard-coding resource names is to put the name in an environment variable. Going back to our example from the intro, if KV worked in a traditional way not using bindings, you might end up with code like this:

// Hypothetical non-binding-based KV.
let myKv = KV.connect(env.MY_KV_NAMESPACE, env.MY_KV_AUTHKEY);

At best, you now have two environment variables (which had better stay in sync) just to specify what namespace to use.

But at worst, developers might forget to parameterize their resources this way.

A developer may write new code that is hard-coded to use a test database, and then forget to update it before pushing it to production, accidentally using the test database in prod.
A developer might prototype a new service using production resources from the start (or using new resources which become production resources), only later on deciding that they need to create a new deployment for testing. But by then, it may be a pain to find and parameterize all the different resources used.

With bindings, it's impossible to have this problem. Since you can only connect to a KV namespace through a binding, it's always possible to make a separate deployment of the same code which talks to a test namespace instead of production, e.g. using Wrangler Environments.

In the testing world, this is sometimes called "dependency injection". With bindings, dependencies are always injectable.

Adaptability

Dependency injection isn't just for tests. A service whose dependencies can be changed out easily will be easier to deploy into new environments, including new production environments.

Say, for instance, you have a service that authenticates users. Now you are launching a new product, which, for whatever reason, has a separate userbase from the original product. You need to deploy a new version of the auth service that uses a different database to implement a separate user set. As long as all dependencies are injectable, this should be easy.

Again, bindings are not the only way to achieve dependency injection, but a bindings-based system will tend to lead developers to write dependency-injectable code by default.

Q&A

Has anyone done this before?

You have. Every time you write code.

As it turns out, this approach is used all the time at the programming language level. Bindings are analogous to parameters of a function, or especially parameters to a class constructor. In a memory-safe programming language, you can't access an object unless someone has passed you a pointer or reference to that object. Objects in memory don't have URLs that you use to access them.

Programming languages work this way because they are designed to manage complexity, and this proves to be an elegant way to do so. Yet, this style which we're used to using at the programming language level is much less common at the distributed system level. The Cloudflare Workers platform aims to treat the network as one big computer, and so it makes sense to extend programming language concepts across the network.

Of course, we're not the first to apply this to distributed systems, either. The paradigm is commonly called "capability-based security", which brings us to the next question…

Is this capability-based security?

Bindings are very much inspired by capability-based security.

At present, bindings are not a complete capability system. In particular, there is currently no particular mechanism for a Worker to pass a binding to another Worker. However, this is something we can definitely imagine adding in the future.

Imagine, for instance, you want to call another Worker through a service binding, and as you do, you want to give that other Worker temporary access to a KV namespace for it to operate on. Wouldn't it be nice if you could just pass the object, and have it auto-revoked at the end of the request? In the future, we might introduce a notion of dynamic bindings which can bind to different resources on a per-request basis, where a calling Worker can pass in a particular value to use for a given request.

For the time being, bindings cannot really be called object capabilities. However, many of the benefits of bindings are the same benefits commonly attributed to capability systems. This is because of some basic similarities:

Like a capability, a binding simultaneously designates a resource and also confers permission to access that resource, without referencing any separate ACL.
Like capabilities, bindings do not exist in any global namespace: they are scoped to the env object passed to a specific Worker.
Like a capability, to use a binding, the application must explicitly specify which binding it is trying to use, and only specifies the binding. In particular, the application does not separately specify the name of the resource in any other namespace (no URL, no global ID, etc.). The existence of the binding only affects the application's behavior when the application explicitly invokes that binding.

Why is env a parameter to fetch(), not global?

This is a bit wonky, but the goal is to enable composition of Workers.

Imagine you have two Workers, one which implements your API, mapped to api.example.com, and one which serves static assets, mapped to assets.example.com. One day, for whatever reason, you decide you want to combine these two Workers into a single Worker. So you write this code:

import apiWorker from "api-worker.js";
import assetWorker from "asset-worker.js";

export default {
  async fetch(req, env, ctx) {
    let url = new URL(req.url);
    if (url.hostname == "api.example.com") {
      return apiWorker.fetch(req, env, ctx);
    } else if (url.hostname == "assets.example.com") {
      return assetWorker.fetch(req, env, ctx);
    } else {
      return new Response("Not found", {status: 404});
    }
  }
}

This is great! No code from either Worker needed to be modified at all. We just create a new file containing a router Worker that delegates to one or the other.

But, you discover a problem: both the API Worker and the assets Worker use a KV namespace binding, and it turns out that they both decided to name the binding env.KV, but these bindings are meant to point to different namespaces used for different purposes. Does this mean I have to go edit the Workers to change the name of the binding before I can merge them?

No, it doesn't, because I can just remap the environments before delegating:

import apiWorker from "api-worker.js";
import assetWorker from "asset-worker.js";

export default {
  async fetch(req, env, ctx) {
    let url = new URL(req.url);
    if (url.hostname == "api.example.com") {
      let subenv = {KV: env.API_KV};
      return apiWorker.fetch(req, subenv, ctx);
    } else if (url.hostname == "assets.example.com") {
      let subenv = {KV: env.ASSETS_KV};
      return assetWorker.fetch(req, subenv, ctx);
    } else {
      return new Response("Not found", {status: 404});
    }
  }
}

If environments were globals, this remapping would not be possible.

In fact, this benefit goes much deeper than this somewhat-contrived example. The fact that the environment is not a global essentially forces code to be internally designed for dependency injection (DI). Designing code to be DI-friendly sometimes seems tedious, but every time I've done it, I've been incredibly happy that I did. Such code tends to be much easier to test and to adapt to new circumstances, for the same reasons mentioned when we discussed dependency injection earlier, but applying at the level of individual modules rather than whole Workers.

With that said, if you really insist that you don't care about making your code explicitly DI-friendly, there is an alternative: Put your env into AsyncLocalStorage. That way it is "ambiently" available anywhere in your code, but you can still get some composability.

import { AsyncLocalStorage } from 'node:async_hooks';

// Allocate a new AsyncLocalStorage to store the value of `env`.
const ambientEnv = new AsyncLocalStorage();

// We can now define a global function that reads a key from env.MY_KV,
// without having to pass `env` down to it.
function getFromKv(key) {
  // Get the env from AsyncLocalStorage.
  return ambientEnv.getStore().MY_KV.get(key);
}

export default {
  async fetch(req, env, ctx) {
    // Put the env into AsyncLocalStorage while we handle the request,
    // so that calls to getFromKv() work.
    return ambientEnv.run(env, async () => {
      // Handle request, including calling functions that may call
      // getFromKv().

      // ... (code) ...
    });
  }
};

How does a KV binding actually work?

Under the hood, a Workers KV binding encapsulates a secret key used to access the corresponding KV namespace. This key is actually the encryption key for the namespace. The key is distributed to the edge along with the Worker's code and configuration, using encrypted storage to keep it safe.

Although the key is distributed with the Worker, the Worker itself has no way to access the key. In fact, even the owner of the Cloudflare account cannot see the key – it is simply never revealed outside of Cloudflare's systems. (Cloudflare employees are also prevented from viewing these keys.)

Even if an attacker somehow got ahold of the key, it would not be useful to them as-is. Cloudflare's API does not provide any way for a user to upload a raw key to use in a KV binding. The API instead has the client specify the public ID of the namespace they want to use. The deployment system verifies that the KV namespace in question is on the same account as the Worker being uploaded (and that the client is authorized to deploy Workers on said account).

Get Started

To learn about all the types of bindings offered by Workers and how to use them, check out the documentation.

Introducing workerd: the Open Source Workers runtime

Kenton Varda — Tue, 27 Sep 2022 13:01:00 GMT

Today I'm proud to introduce the first beta release of workerd, the JavaScript/Wasm runtime based on the same code that powers Cloudflare Workers. workerd is Open Source under the Apache License version 2.0.

workerd shares most of its code with the runtime that powers Cloudflare Workers, but with some changes designed to make it more portable to other environments. The name "workerd" (pronounced "worker dee") comes from the Unix tradition of naming servers with a "-d" suffix standing for "daemon". The name is not capitalized because it is a program name, which are traditionally lower-case in Unix-like environments.

Find the code on GitHub

What it's for

Self-hosting Workers

workerd can be used to self-host applications that you'd otherwise run on Cloudflare Workers. It is intended to be a production-ready web server for this purpose. workerd has been designed to be unopinionated about hosting environments, so that it should fit nicely into whatever server/VM/container hosting and orchestration system you prefer. It's just a web server.

Workers has always been based on standardized APIs, so that code is not locked into Cloudflare, and we work closely with other runtimes to promote compatibility. workerd provides another option to ensure that applications built on Workers can run anywhere, by leveraging the same underlying code to get exact, "bug-for-bug" compatibility.

Local development and testing

workerd is also designed to facilitate realistic local testing of Workers. Up until now, this has been achieved using Miniflare, which simulated the Workers API within a Node.js environment. Miniflare has worked well, but in a number of cases its behavior did not exactly match Workers running on Cloudflare. With the release of workerd, Miniflare and the Wrangler CLI tool will now be able to provide a more accurate simulation by leveraging the same runtime code we use in production.

Programmable proxies

workerd can act as an application host, a proxy, or both. It supports both forward and reverse proxy modes. In all cases, JavaScript code can be used to intercept and process requests and responses before forwarding them on. Traditional web servers and proxies have used bespoke configuration languages with quirks that are hard to master. Programming proxies in JavaScript instead provides more power while making the configuration easier to write and understand.

What it is

workerd is not just another way to run JavaScript and Wasm. Our runtime is uniquely designed in a number of ways.

Server-first

Many non-browser JavaScript and Wasm runtimes are designed to be general-purpose: you can use them to build command-line apps, local GUI apps, servers, or anything in between. workerd is not. It specifically focuses on servers, in particular (for now, at least) HTTP servers.

This means in particular that workerd-based applications are event-driven at the top level. Applications do not open listen sockets and accept connections from them; instead, the runtime pushes events to the application. It may seem like a minor difference, but this basic change in perspective directly enables many of the features below.

Web standard APIs

Wherever possible, Workers (and workerd in particular) offers the same standard APIs found in web browsers, such as Fetch, URL, WebCrypto, and others. This means that code built on workerd is more likely to be portable to browsers as well as to other standards-based runtimes. When Workers launched five years ago, it was unusual for a non-browser to offer web APIs, but we are pleased to see that the broader JavaScript ecosystem is now converging on them.

Nanoservices

workerd is a nanoservice runtime. What does that mean?

Microservices have become popular over the last decade as a way to split monolithic servers into smaller components that could be maintained and deployed independently. For example, a company that offers several web applications with a common user authentication flow might have a separate team that maintains the authentication logic. In a monolithic model, the authentication logic might have been offered to the application teams as a library. However, this could be frustrating for the maintainers of that logic, as making any change might require waiting for every application team to deploy an update to their respective server. By splitting the authentication logic into a separate server that all the others talk to, the authentication team is able to deploy changes on their own schedule.

However, microservices have a cost. What was previously a fast library call instead now requires communicating over a network. In addition to added overhead, this communication requires configuration and administration to ensure security and reliability. These costs become greater as the codebase is split into more and more services. Eventually, the costs outweigh the benefits.

Nanoservices are a new model that achieve the benefits of independent deployment with overhead closer to that of library calls. With workerd, many Workers can be configured to run in the same process. Each Worker runs in a separate "isolate", which gives the appearance of running independently of the others: each isolate loads separate code and has its own global scope. However, when one Worker explicitly sends a request to another Worker, the destination Worker actually runs in the same thread with zero latency. So, it performs more like a function call.

With nanoservices, teams can now break their code into many more independently-deployed pieces without worrying about the overhead.

(Some in the industry prefer to call nanoservices "functions", implying that each individual function making up an application could be its own service. I feel, however, that this puts too much emphasis on syntax rather than logical functionality. That said, it is the same concept.)

To really make nanoservices work well, we had to minimize the baseline overhead of each service. This required designing workerd very differently from most other runtimes, so that common resources could be shared between services as much as possible. First, as mentioned, we run many nanoservices within a single process, to share basic process overhead and minimize context switching costs. A second big architectural difference between workerd and other runtimes is how it handles built-in APIs. Many runtimes implement significant portions of their built-in APIs in JavaScript, which must then be loaded separately into each isolate. workerd does not; all the APIs are implemented in native code, so that all isolates may share the same copy of that code. These design choices would be difficult to retrofit into another runtime, and indeed these needs are exactly why we chose to build a custom runtime for Workers from the start.

Homogeneous deployment

In a typical microservices model, you might deploy different microservices to containers running across a cluster of machines, connected over a local network. You might manually choose how many containers to dedicate to each service, or you might configure some form of auto-scaling based on resource usage.

workerd offers an alternative model: Every machine runs every service.

workerd's nanoservices are much lighter-weight than typical containers. As a result, it's entirely reasonable to run a very large number of them – hundreds, maybe thousands – on a single server. This in turn means that you can simply deploy every service to every machine in your fleet.

Homogeneous deployment means that you don't have to worry about scaling individual services. Instead, you can simply load balance requests across the entire cluster, and scale the cluster as needed. Overall, this can greatly reduce the amount of administration work needed.

Cloudflare itself has used the homogeneous model on our network since the beginning. Every one of Cloudflare's edge servers runs our entire software stack, so any server can answer any kind of request on its own. We've found it works incredibly well. This is why services on Cloudflare – including ones that use Workers – are able to go from no traffic at all to millions of requests per second instantly without trouble.

Capability bindings: cleaner configuration and SSRF safety

workerd takes a different approach to most runtimes – indeed, to most software development platforms – in how an application accesses external resources.

Most development platforms start from assuming that the application can talk to the whole world. It is up to the application to figure out exactly what it wants to talk to, and name it in some global namespace, such as using a URL. So, an application server that wants to talk to the authentication microservice might use code like this:

// Traditional approach without capability bindings.
fetch("https://auth-service.internal-network.example.com/api", {
  method: "POST",
  body: JSON.stringify(authRequest),
  headers: { "Authorization": env.AUTH_SERVICE_TOKEN }
});

In workerd, we do things differently. An application starts out with no ability to talk to the rest of the world, and must be configured with specific capability bindings that provide it access to specific external resources. So, an application which needs to be able to talk to the authentication service would be configured with a binding called authService, and the code would look something like this:

// Capability-based approach. Hostname doesn't matter; all
// requests to AUTH_SERVICE.fetch() go to the auth service.
env.AUTH_SERVICE.fetch("https://auth/api", {
 method: "POST",
 body: JSON.stringify(authRequest),
});

This may at first appear to be a trivial difference. In both cases, we have to use configuration to control access to external services. In the traditional approach, we'd provide access tokens (and probably the service's hostname) as environment variables. In the new approach, the environment goes a bit further to provide a full-fledged object. Is this just syntax sugar?

It turns out, this slight change has huge advantages:

First, we can now restrict the global fetch() function to accept only publicly-routable URLs. This makes applications totally immune to SSRF attacks! You cannot trick an application into accessing an internal service unintentionally if the code to access internal services is explicitly different. (In fact, the global fetch() is itself backed by a binding, which can be configured. workerd defaults to connecting it to the public internet, but you can also override it to permit private addresses if you want, or to route to a specific proxy service, or to be blocked entirely.)

With that done, we now have an interesting property: All internal services which an application uses must be configurable. This means:

You can easily see a complete list of the internal services an application talks to, without reading all the code.
You can always replace these services with mocks for testing purposes.
You can always configure an application to authenticate itself differently (e.g. client certificates) or use a different back end, without changing code.

The receiving end of a binding benefits, too. Take the authentication service example, above. The auth service may be another Worker running in workerd as a nanoservice. In this case, the auth service does not need to be bound to any actual network address. Instead, it may be made available strictly to other Workers through their bindings. In this case, the authentication service doesn't necessarily need to verify that a request received came from an allowed client – because only allowed clients are able to send requests to it in the first place.

Overall, capability bindings allow simpler code that is secure by default, more composable, easier to test, and easier to understand and maintain.

Always backwards compatible

Cloudflare Workers has a hard rule against ever breaking a live Worker running in production. This same dedication to backwards compatibility extends to workerd.

workerd shares Workers' compatibility date system to manage breaking changes. Every Worker must be configured with a "compatibility date". The runtime then ensures that the API behaves exactly as it did on that date. At your leisure, you may check the documentation to see if new breaking changes are introduced at a future date, and update your code for them. Most such changes are minor and most code won't require any changes. However, you are never obliged to update. Old dates will continue to be supported by newer versions of workerd. It is always safe to update workerd itself without updating your code.

What it's not

To avoid misleading or disappointing anyone, I need to take a moment to call out what workerd is not.

workerd is not a Secure Sandbox

It's important to note that workerd is not, on its own, a secure way to run possibly-malicious code. If you wish to run code you don't trust using workerd, you must enclose it in an additional sandboxing layer, such as a virtual machine configured for sandboxing.

workerd itself is designed such that a Worker should not be able to access any external resources to which it hasn't been granted a capability. However, a complete sandbox solution not only must be designed to restrict access, but also must account for the possibility of bugs – both in software and in hardware. workerd on its own is not sufficient to protect against hardware bugs like Spectre, nor can it adequately defend against the possibility of vulnerabilities in V8 or in workerd's own code.

The Cloudflare Workers service uses the same code found in workerd, but adds many additional layers of security on top to harden against such bugs. I described some of these in a past blog post. However, these measures are closely tied to our particular environment. For example, we rely on build automation to push V8 patches to production immediately upon becoming available; we separate customers according to risk profile; we rely on non-portable kernel features and assumptions about the host system to enforce security and resource limits. All of this is very specific to our environment, and cannot be packaged up in a reusable way.

workerd is not an independent project

workerd is the core of Cloudflare Workers, a fast-moving project developed by a dedicated team at Cloudflare. We are not throwing code over the wall to forget about, nor are we expecting volunteers to do our jobs for us. workerd's GitHub repository will be the canonical source used by Cloudflare Workers and our team will be doing much of their work directly in this repository. Just like V8 is developed primarily by the Chrome team for use in Chrome, workerd will be developed primarily by the Cloudflare Workers team for use in Cloudflare Workers.

This means we cannot promise that external contributions will sit on a level playing field with internal ones. Code reviews take time, and work that is needed for Cloudflare Workers will take priority. We also cannot promise we will accept every feature contribution. Even if the code is already written, reviews and maintenance have a cost. Within Cloudflare, we have a product management team who carefully evaluates what features we should and shouldn't offer, and plenty of ideas generated internally ultimately don't make the cut.

If you want to contribute a big new feature to workerd, your best bet is to talk to us before you write code, by raising an issue on GitHub early to get input. That way, you can find out if we're likely to accept a PR before you write it. We also might be able to give hints on how best to implement.

It's also important to note that while workerd's internal interfaces may sometimes appear clean and reusable, we cannot make any guarantee that those interfaces won't completely change on a whim. If you are trying to build on top of workerd internals, you will need to be prepared either to accept a fair amount of churn, or pin to a specific version.

workerd is not an off-the-shelf edge compute platform

As hinted above, the full Cloudflare Workers service involves a lot of technology beyond workerd itself, including additional security, deployment mechanisms, orchestration, and so much more. workerd itself is a portion of our runtime codebase, which is itself a small (albeit critical) piece of the overall Cloudflare Workers service.

We are pleased, though, that this means it is possible for us to release this code under a permissive Open Source license.

Try the Beta

As of this blog post, workerd is in beta. If you want to try it out,

Find the readme on GitHub

A Workers optimization that reduces your bill

Kenton Varda — Fri, 14 Jan 2022 13:58:51 GMT

Recently, we made an optimization to the Cloudflare Workers runtime which reduces the amount of time Workers need to spend in memory. We're passing the savings on to you for all your Unbound Workers.

Background

Workers are often used to implement HTTP proxies, where JavaScript is used to rewrite an HTTP request before sending it on to an origin server, and then to rewrite the response before sending it back to the client. You can implement any kind of rewrite in a Worker, including both rewriting headers and bodies.

Many Workers, though, do not actually modify the response body, but instead simply allow the bytes to pass through from the origin to the client. In this case, the Worker's application code has finished executing as soon as the response headers are sent, before the body bytes have passed through. Historically, the Worker was nevertheless considered to be "in use" until the response body had fully finished streaming.

For billing purposes, under the Workers Unbound pricing model, we charge duration-memory (gigabyte-seconds) for the time in which the Worker is in use.

The change

On December 15-16, we made a change to the way we handle requests that are streaming through the response without modifying the content. This change means that we can mark application code as “idle” as soon as the response headers are returned.

Since no further application code will execute on behalf of the request, the system does not need to keep the request state in memory – it only needs to track the low-level native sockets and pump the bytes through. So now, during this time, the Worker will be considered idle, and could even be evicted before the stream completes (though this would be unlikely unless the stream lasts for a very long time).

Visualized it looks something like this:

As a result of this change, we've seen that the time a Worker is considered "in use" by any particular request has dropped by an average of 70%. Of course, this number varies a lot depending on the details of each Worker. Some may see no benefit, others may see an even larger benefit.

This change is totally invisible to the application. To any external observer, everything behaves as it did before. But, since the system now considers a Worker to be idle during response streaming, the response streaming time will no longer be billed. So, if you saw a drop in your bill, this is why!

But it doesn’t stop there!

The change also applies to a few other frequently used scenarios, namely Websocket proxying, reading from the cache and streaming from KV.

WebSockets: once a Worker has arranged to proxy through a WebSocket, as long as it isn't handling individual messages in your Worker code, the Worker does not remain in use during the proxying. The change applies to regular stateless Workers, but not to Durable Objects, which are not usually used for proxying.

export default {
  async fetch(request: Request) {
    //Do anything before
    const upgradeHeader = request.headers.get('Upgrade')
    if (upgradeHeader || upgradeHeader === 'websocket') {
      return await fetch(request)
    }
    //Or with other requests
  }
}

Reading from Cache: If you return the response from a cache.match call, the Worker is considered idle as soon as the response headers are returned.

export default {
  async fetch(request: Request) {
    let response = await caches.default.match('https://example.com')
    if (response) {
      return response
    }
    // get/create response and put into cache
  }
}

Streaming from KV: And lastly, when you stream from KV. This one is a bit trickier to get right, because often people retrieve the value from KV as a string, or JSON object and then create a response with that value. But if you fetch the value as a stream, as done in the example below, you can create a Response with the ReadableStream.

interface Env {
  MY_KV_NAME: KVNamespace
}

export default {
  async fetch(request: Request, env: Env) {
    const readableStream = await env.MY_KV_NAME.get('hello_world.pdf', { type: 'stream' })
    if (readableStream) {
      return new Response(readableStream, { headers: { 'content-type': 'application/pdf' } })
    }
  },
}

Interested in Workers Unbound?

If you are already using Unbound, your bill will have automatically dropped already.

Now is a great time to check out Unbound if you haven’t already, especially since recently, we’ve also removed the egress fees. Unbound allows you to build more complex workloads on our platform and only pay for what you use.

We are always looking for opportunities to make Workers better. Often that improvement takes the form of powerful new features such as the soon-to-be released Service Bindings and, of course, performance enhancements. This time, we are delighted to make Cloudflare Workers even cheaper than they already were.

Backwards-compatibility in Cloudflare Workers

Kenton Varda — Tue, 19 Oct 2021 15:20:10 GMT

Cloudflare Workers is our serverless platform that runs your code in 250+ cities worldwide.

On the Workers team, we have a policy:

A change to the Workers Runtime must never break an application that is live in production.

It seems obvious enough, but this policy has deep consequences. What if our API has a bug, and some deployed Workers accidentally depend on that bug? Then, seemingly, we can't fix the bug! That sounds… bad?

This post will dig deeper into our policy, explaining why Workers is different from traditional server stacks in this respect, and how we're now making backwards-incompatible changes possible by introducing "compatibility dates".

TL;DR: Developers may now opt into backwards-incompatible fixes by setting a compatibility date.

Serverless demands strict compatibility

Workers is a serverless platform, which means we maintain the server stack for you. You do not have to manage the runtime version, you only manage your own code. This means that when we update the Workers Runtime, we update it for everyone. We do this at least once a week, sometimes more.

This means that if a runtime upgrade breaks someone's application, it's really bad. The developer didn't make any change, so won't be watching for problems. They may be asleep, or on vacation. If we want people to trust serverless, we can't let this happen.

This is very different from traditional server platforms, where the developer maintains their own stack. For example, when a developer maintains a traditional VM-based server running Node.js applications, then the developer must decide exactly when to upgrade to a new version of Node.js. Careful developers do not upgrade Node.js 14 to Node.js 16 in production without testing first. They typically verify that their application works in a staging environment before going to production. A developer who doesn't have time to spend testing each new version may instead choose to rely on a long-term support release, applying only low-risk security patches.

In the old world, if the Node.js maintainers decide to make a breaking change to an obscure API between releases, it's OK. Downstream developers are expected to test their code before upgrading, and address any breakages. But in the serverless world, it's not OK: developers have no control over when upgrades happen, therefore upgrades must never break anything.

But sometimes we need to fix things

Sometimes, we get things wrong, and we need to fix them. But sometimes, the fix would break people.

For example, in Workers, the fetch() function is used to make outgoing HTTP requests. Unfortunately, due to an oversight, our original implementation of fetch(), when given a non-HTTP URL, would silently interpret it as HTTP instead. For example, if you did fetch("ftp://example.com"), you'd get the same result as fetch("http://example.com").

This is obviously not what we want and could lead to confusion or deeper bugs. Instead, fetch() should throw an exception in these cases. However, we couldn't simply fix the problem, because a surprising number of live Workers depended on the behavior. For whatever reason, some Workers fetch FTP URLs and expect to get a result back. Perhaps they are fetching from sites that support both FTP and HTTP, and they arbitrarily chose FTP and it worked. Perhaps the fetches aren't actually working, but changing a 404 error result into an exception would break things worse. When you have tens of thousands of new developers deploying applications every month, inevitably there's always someone relying on any bug. We can't "fix" the bug because it would break these applications.

The obvious solutions don't work

Could we contact developers and ask them to fix their code?

No, because the problem is our fault, not the application developer's, and the developer may not have time to help us fix our problems.

The fact that a Worker is doing something "wrong" -- like using an FTP URL when they should be using HTTP -- doesn't necessarily mean the developer did anything wrong. Everyone writes code with bugs. Good developers rely on careful testing to make sure their code does what it is supposed to.

But what if the test only worked because of a bug in the underlying platform that caused it to do the right thing by accident? Well, that's the platform's fault. The developer did everything they could: they tested their code thoroughly, and it worked.

Developers are busy people. Nobody likes hearing that they need to drop whatever they are doing to fix a problem in code that they thought worked -- especially code that has been working fine for years without anyone touching it. We think developers have enough on their plates already, we shouldn't be adding more work.

Could we run multiple versions of the Workers Runtime?

No, for three reasons.

First, in order for edge computing to be effective, we need to be able to host a very large number of applications in each instance of the Workers Runtime. This is what allows us to run your code in hundreds of locations around the world at minimal cost. If we ran a separate copy of the runtime for each application, we'd need to charge a lot more, or deploy your code to far fewer locations. So, realistically it is infeasible for us to have different Workers asking for different versions of the runtime.

Second, part of the promise of serverless is that developers shouldn't have to worry about updating their stack. If we start letting people pin old versions, then we have to start telling people how long they are allowed to do so, alerting people about security updates, giving people documentation that differentiates versions, and so on. We don't want developers to have to think about any of that.

Third, this doesn't actually solve the real problem anyway. We can easily implement multiple behaviors within the same runtime binary. But how do we know which behavior to use for any particular Worker?

Introducing Compatibility Dates

Going forward, every Worker is assigned a "compatibility date", which must be a date in the past. The date is specified inside the project's metadata (for Wrangler projects, in wrangler.toml). This metadata is passed to the Cloudflare API along with the application code whenever it is updated and deployed. A compatibility date typically starts out as the date when the Worker was first created, but can be updated from time to time.

# wrangler.toml
compatibility_date = "2021-09-20"

We can now introduce breaking changes. When we do, the Workers Runtime must implement both the old and the new behavior, and chooses behavior based on the compatibility date. Each time we introduce a new change, we choose a date in the future when that change will become the default. Workers with a later compatibility date will see the change; Workers with an older compatibility date will retain the old behavior.

A page in our documentation lists the history of breaking changes -- and only breaking changes. When you wish to update your Worker's compatibility date, you can refer to this page to quickly determine what might be affected, so that you can test for problems.

We will reserve the compatibility system strictly for changes which cannot be made without causing a breakage. We don't want to force people to update their compatibility date to get regular updates, including new features, non-breaking bug fixes, and so on.

If you'd prefer never to update your compatibility date, that's OK! Old compatibility dates are intended to be supported forever. However, if you are frequently updating your code, you should update your compatibility date along with it.

Acknowledgement

While the details are a bit different, we were inspired by Stripe's API versioning, as well as the absolute promise of backwards compatibility maintained by both the Linux kernel system call API and the Web Platform implemented by browsers.

Dynamic Process Isolation: Research by Cloudflare and TU Graz

Kenton Varda — Tue, 12 Oct 2021 12:59:31 GMT

Last year, I wrote about the Cloudflare Workers security model, including how we fight Spectre attacks. In that post, I explained that there is no known complete defense against Spectre — regardless of whether you're using isolates, processes, containers, or virtual machines to isolate tenants. What we do have, though, is a huge number of tools to increase the cost of a Spectre attack, to the point where it becomes infeasible. Cloudflare Workers has been designed from the very beginning with protection against side channel attacks in mind, and because of this we have been able to incorporate many defenses that other platforms — such as virtual machines and web browsers — cannot. However, the performance and scalability requirements of edge compute make it infeasible to run every Worker in its own private process, so we cannot rely on the usual defenses provided by the operating system kernel and address space separation.

Given our different approach, we cannot simply rely on others to tell us if we are safe. We had to do our own research. To do this we partnered with researchers at Graz University of Technology (TU Graz) to study the impact of Spectre on our environment. The team at TU Graz are some of the foremost experts on the topic, having co-discovered Spectre initially as well as discovered several follow-on bugs like NetSpectre, ZombieLoad, Fallout, and others.

Today we are publishing a paper describing our findings, authored by Martin Schwarzl, Pietro Borrello, Andreas Kogler, Thomas Schuster, Daniel Gruss, Michael Schwarz, and myself. This paper covers research done in 2019 and early 2020. The research both tests the possibility of attacking Workers using Spectre, and proposes a new defense mechanism, which we now employ in production.

For this research, the team at TU Graz had full access to the Workers Runtime source code and were able to compile and run it locally for testing.

The research has two basic components.

Part 1: Develop an attack

A side channel attack (of which Spectre is one variety) is kind of like playing poker with a CPU. In poker, players try to understand what their opponents are thinking by looking for subtle unconscious behaviors, such as a nervous look or a hand motion. These behaviors are called "tells". In a side channel attack, the attacker wants to find out secrets that the CPU knows. The CPU won't reveal these secrets directly, but they can sometimes subtly affect how long the CPU spends to perform certain operations, kind of like a poker tell. If an attacker can carefully time the CPU's actions, they can potentially discover the underlying secrets. Spectre attacks in particular focus on side channels that result from the CPU's use of speculative execution, in which the CPU executes code that it is not yet sure should be executed, and then attempts to roll it back if not. Speculative execution is a particularly potent tool in side channel attacks because it essentially allows the attacker to program custom side channels in speculatively-executed code.

Many Spectre defenses focus on eliminating the "tells" by trying to prevent the variability in the CPU's timing. This is hard, because CPUs are extremely complex and there are many ways that their timing can be affected. While many specific "tells" have been found and mitigated, there are undoubtedly many more that haven't been disclosed. This has led to a game of whack-a-mole, where researchers continuously find new "tells" while CPU vendors rush out kernel and microcode patches to solve them — often with large performance losses as a side effect.

In Workers, we have focused on a different approach: preventing the attacker from seeing the "tells". The Workers Runtime is designed to prevent a Worker from measuring its own execution time, as well as to prevent other forms of non-deterministic behavior like multithreading that could be used in place of a timer. I described these techniques in detail in last year's post.

However, this approach can't be perfect as long as Workers are allowed to talk to the rest of the world. A Worker could always communicate with a remote time server to measure time. Such communications will be far less accurate than a local timer, and since the timing differences are extremely small, they will be hard to measure this way. But, by using amplification techniques to improve the strength of the signal, repeating the attack many times and applying statistics, it could still be possible to derive secrets.

We therefore set out to develop an attack based on this approach. Upon applying the best techniques available to us, we were indeed able to produce a working Spectre variant 1 attack that could leak memory at a rate of 120 bits per hour. Compared to attacks demonstrated on many other platforms, 120 bits per hour is pretty slow. However, it's obviously still fast enough to be a problem.

It's important to note, though, that this speed was achieved in an ideal scenario:

Since the Workers Runtime prevents Workers from measuring their own execution time, any attack would need to rely on a remote time server. But for the purpose of our test, the "remote" server was in fact located on the same machine. In a real-world scenario, such a server would need to be accessed over the Internet, making the timing less accurate.
The machine running the test had no other load. A real-world machine would be processing hundreds or thousands of requests concurrently, creating noise.
The attack only demonstrated that it could read some bits that it shouldn't. In order to read interesting bits, an attacker would first need to locate those bits, which likely would require reading hundreds or thousands of other bits first.

In the real world, these factors appear to make an attack too slow to be interesting. If an attack takes days or weeks to carry out, the contents of memory are highly likely to change before it can read them. For example, we update the Workers Runtime code at least once a week, which causes a restart of all processes.

That said, we did not feel comfortable relying on this argument as our defense. Instead, we set out to do better.

Part 2: Enhance our defenses

In the second part of the research, we designed and implemented a novel Spectre defense which we call Dynamic Process Isolation.

Dynamic Process Isolation was described in my blog post last year. At the time, this system was still in testing, but it has since been fully deployed in production.

In short, our defense uses hardware performance counters to detect Workers whose performance characteristics could be indicative of an attack. Before the attack has had enough time to leak any bits, we move the Worker into a separate operating system process, thus taking advantage of the additional defenses implemented by the OS kernel. Crucially, since a benign Worker can still operate normally while in an isolated process, we are able to use a detector that produces false positives, as long as the rate is relatively low. This affordance made it possible for us to develop a working classifier where previous work in the area had struggled.

Specifically, we developed a detector based on measuring branch mispredictions. Spectre variant 1 attacks — the fastest and easiest kind of Spectre attack — work by fooling the CPU's branch predictor to trigger speculative code execution. Such an attack, when running in our environment, must trigger repeated mispredictions in a loop, in order to get enough data to apply statistics to overcome the noise floor. We can see these mispredictions in the hardware performance counters. While an attack could try to evade the detector by spreading out its trials over a longer time period, doing so would slow down the attack by orders of magnitude, which is exactly our goal. Classifiers for other Spectre variants might be straightforward to build as well, however, we find other variants already produce much lower bandwidth or are otherwise effectively mitigated by our existing defenses.

This defense successfully detects and mitigates the attack we developed. We also tested it against a number of Spectre proofs of concept and found it caught all of them. Meanwhile, the rate of false positives is well within the range we can tolerate: Out of many thousands of Workers running on our platform, we see only about 20 being falsely detected as attacks.

For more details, check out the paper and my blog post from last year.

Read the Paper

Collaborating with TU Graz was a great experience. We are very happy to work with some of the world's foremost experts on this problem, and to have produced not just an attack but also a constructive defense.

For more details, download the full paper on arXiv.

Durable Objects: Easy, Fast, Correct — Choose three

Kenton Varda — Tue, 03 Aug 2021 13:24:44 GMT

Storage in distributed systems is surprisingly hard to get right. Distributed databases and consensus are well-known to be extremely hard to build. But, application code isn't necessarily easy either. There are many ways in which apps that use databases can have subtle timing bugs that could result in inconsistent results, or even data loss. Worse, these problems can be very hard to test for, as they'll often manifest only under heavy load, or only after a sudden machine failure.

Up until recently, Durable Objects were no exception. A Durable Object is a special kind of Cloudflare Worker that has access to persistent storage and processes requests in one of Cloudflare’s points of presence. Each Object has its own private storage, accessible through a classical key/value storage API. Like any classical database API, this storage API had to be used carefully to avoid possible race conditions and data loss, especially when performance mattered. And like any classical database API, many apps got it wrong.

However, rather than fix the apps, we decided to fix the model. Last month, we rolled out deep changes to the Durable Objects runtime such that many applications which previously contained subtle race conditions are now correct by default, and many that were previously slow are now fast. Developers can now write their code in an intuitive way, and have it work. No changes at all are needed to your code in order to take advantage of these new features.

So, let me tell you about what changed…

Background: Durable Objects are Single-Threaded

To understand what changed, it's necessary to first understand Durable Objects. For a full introduction, see the Durable Objects announcement blog post.

The most important point is: Each Durable Object runs in exactly one location, in one single thread, at a time. Each object has its own private on-disk storage. This is a very different situation from a typical database, where many clients may be accessing the same data. In Durable Objects, any particular piece of data belongs to exactly one thread at a time.

Because a single Durable Object is single-threaded, it's possible, and even encouraged, to keep state and perform synchronization in memory. This is, indeed, the killer feature of Durable Objects. With classical databases, in-memory state is extremely difficult to keep synchronized between all database clients. But with Durable Objects, since each piece of data belongs to a specific thread, this synchronization is easy.

However, interacting with the disk is still an I/O (input/output) operation, which means that each operation returns a Promise which you must await. As we'll see, this re-introduces some of the synchronization difficulties that we were trying to avoid. However, it turns out, we can solve these difficulties within the system itself, without bothering application developers.

An Example

Consider this code:

// Used to be slow and racy -- but not anymore!
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

At first glance, this seems like reasonable code that returns a unique number each time it is called (incrementing each time).

Unfortunately, before now, this code had two problems:

It had a subtle race condition (even though Durable Objects are single-threaded!).
It was kind of slow.

The Race Condition

A race condition occurs when two operations running concurrently might interfere with each other in a way that makes them behave incorrectly. Race conditions are commonly associated with code that uses multiple threads.

JavaScript, however, famously does not use threads. Instead, it uses event-driven programming, with callbacks. It's not possible for two pieces of JavaScript code to be running "at the same time" in the same isolate (and Durable Objects promises that no other isolate could possibly be accessing the same storage). Does that mean that race conditions aren't a problem in JavaScript, the way they are in multi-threaded apps?

Unfortunately, it does not. The problem is, the code above is an async function, containing two await statements. Each time await is used, execution pauses, waiting for the specified Promise to complete.

In the meantime, though, other code can run! For example, the Durable Object might receive two requests at the same time. If each of them calls getUniqueNumber(), then the two calls might be interleaved. Each time one call performs an await, execution may switch to the other call. So, the two calls might end up looking like this:

Request 1 timeline	Request 2 timeline
async function getUniqueNumber() { let val = await this.storage.get("counter");
	async function getUniqueNumber() { let val = await this.storage.get("counter");
await this.storage.put("counter", val + 1);
	await this.storage.put("counter", val + 1);
return val; }
	return val; }

There's a big problem here: Both of these two calls will call get("counter") before either of them calls put("counter", val + 1). That means, both of them will return the same value!

This problem is especially bad because it only happens when multiple requests are being handled at the same time -- and even then, only sometimes. It is very hard to test for this kind of problem, and everything might seem just fine when the application is deployed, as long as it isn't getting too much traffic. But one day, when a lot of visitors try to use the same object at the same time, all of a sudden getUniqueNumber() starts returning duplicates!

The Slowness

To add insult to injury, getUniqueNumber() was (until recently) pretty slow. The problem is, it has to do two round trips to storage -- a get() and a put(). The get() might typically take a couple milliseconds. The put(), however, will take much longer, probably tens of milliseconds.

Why is put() so slow? Because we don't want to lose data. The worst thing an application can do is tell the user that their action was successful when it wasn't. If, for some reason, a write cannot be completed, then it's imperative that the application presents an error to the user, so that the user knows that something is wrong and they'll have to try again or look for a fix.

In order to make sure an application does not prematurely report success to the user, await put() has to make sure it doesn't return until the data is actually safe on disk. Disks are slow, so this might take a while.

But that's not all. Disks can fail. In order for the data to be really safe, we have to write the same data on multiple disks, in multiple machines. That means we have to wait for some network traffic.

But that's still not all. What if a meteor were to come out of the sky and land on a Cloudflare data center, completely destroying it? Or, more likely, what if the power or network connection failed? We don't want a user's data to be lost in this case, or even temporarily become unavailable. Therefore, Durable Object data is replicated to multiple Cloudflare locations. This requires communicating across long distances before any write can be confirmed. There is little we can do to make this faster, the speed of light being what it is.

A call to getUniqueNumber() will therefore always take tens of milliseconds. If an application calls it multiple times, awaiting each call before beginning the next, it can easily become very slow very quickly. Or, at least, that was the case before our recent changes.

The Wrong Fixes

There are several ways that an application could fix these problems, but all of them have their own issues.

Transactions?

Many databases offer "transactions". A transaction allows an application to make sure some operation completes "atomically", with no interference from concurrent operations.

The Durable Objects storage API has always supported transactions. We could use them to fix our getUniqueNumber() implementation like so:

// No more race condition... but slow and complicated.
async function getUniqueNumber() {
  let val;
  await this.storage.transaction(async (txn) => {
    val = await txn.get("counter");
    await txn.put("counter", val + 1);
  });
  return val;
}

This fixes our race condition. Now, if getUniqueNumber() is called multiple times concurrently such that the storage operations interleave, the system will detect the problem. One of the concurrent calls will be chosen to be the "winner", and will complete normally. The other calls will be canceled and retried, so that they can see the value written by the first call.

This fixes our problems! But, at some cost:

getUniqueNumber() is now even slower than it was before. The difference typically won't be huge, but setting up a transaction does require some additional coordination in the database. Of course, if the transaction needs to be retried, then it may end up being much slower. And retries will tend to happen more when load gets high… the worst possible time.
Speaking of retries, many developers might not realize that the transaction callback can be called multiple times. It's difficult to test for this, since retries will only happen when concurrent operations cause conflicts. The problem is especially acute when the application is trying to synchronize not just on-disk state, but also in-memory state -- if the transaction callback modifies in-memory state, it must be careful to ensure that its changes are idempotent. The need for idempotency may not be top of mind for most developers, and tests won't catch the problem, making it very easy to end up deploying buggy code.

So we solved our problem, but we did it with a foot-gun. If we keep using the foot-gun, we're probably going to shoot our own feet eventually.

Is there another way?

In-memory caching?

Durable Objects' superpower is their in-memory state. Each object has only one active instance at any particular time. All requests sent to that object are handled by that same instance. That means, you can store some state in memory.

// Much faster! But (used to be) wrong.
async function getUniqueNumber() {
  if (this.val === undefined) {
    this.val = await this.storage.get("counter");
  }

  let result = this.val;
  ++this.val;
  this.storage.put("counter", this.val);
  return result;
}

This code is MUCH faster than the previous implementation, because it stores the value in memory. In fact, after the function runs once, further calls won't wait for any I/O at all -- they will return immediately. This is because by caching the value in memory, we avoid waiting for a get() (except for the first time), and we don't wait for the put() either, trusting that it will complete asynchronously later on.

Returning immediately also means that there's no opportunity for concurrency, so the calls that return immediately will always return unique numbers! This means that not only is this implementation faster than our original implementation, it is also more correct. This is only possible because the Durable Objects platform guarantees that there will only be one instance, and therefore only one copy of this.val.

Unfortunately, there are two problems with this code:

We still have a race condition on initialization. If the first two calls to getUniqueNumber() happen to occur at about the same time, then initialization will be performed multiple times. The second call will likely clobber what the first call did, and the two calls will end up returning the same number. We could solve this problem by making initialization more complicated -- the first call could create an initialization promise, and other concurrent calls could wait on it, so that initialization really only happens once. But this creates even deeper complexity: What if initialization fails for some reason? The object could be placed in a permanently broken state. It's possible to get this right, but it's surprisingly tricky.
Because we don't wait for the put() to report success, it's possible that it could be silently lost. For example, if the machine hosting the Durable Object suffered a sudden power failure, then the Durable Object would be transferred to some other machine. When it starts up there, calls to getUniqueNumber() might return numbers that had already been returned under the old instance before it failed, because the put()s hadn't actually completed before the failure occurred. But if we await the put(), then our function becomes slow again, and creates more opportunities for race conditions (e.g. in the calling code).

Our answer: Make it automatic

When looking at this, we had two options:

Try to carefully document these problems and educate developers about them, so that they could write code that does the right thing.
Change the system so that naturally-written code just does the right thing by default -- and runs quickly.

We chose option 2. We accomplished this in three parts.

Part 1: Input Gates

Let's go back to our original example. Can we make this example "just work", even in the face of concurrent requests?

// Can this "just work" please?
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

It turns out we can! We create a new rule:

Input gates: While a storage operation is executing, no events shall be delivered to the object except for storage completion events. Any other events will be deferred until such a time as the object is no longer executing JavaScript code and is no longer waiting for any storage operations. We say that these events are waiting for the "input gate" to open.

If we do this, then our storage operations above are no longer an opportunity for concurrency. Our concurrent requests now look like this:

Request 1 timeline	Request 2 timeline
async function getUniqueNumber() { let val = await this.storage.get("counter");
	// Request 2 delivery is blocked because // request 1 is waiting for storage.
await this.storage.put("counter", val + 1);
	// Request 2 delivery is blocked because // request 1 is waiting for storage.
return val; }
	async function getUniqueNumber() { let val = await this.storage.get("counter"); await this.storage.put("counter", val + 1); return val; }

The two calls return unique numbers, as expected. Hooray! (Unfortunately, we did it by delaying the second request, creating latency and reducing throughput -- but we'll address that in part 3, below.)

Note that our rule does not preclude making multiple concurrent requests to storage at the same time. You can still say:

let promise1 = this.storage.get("foo");
let promise2 = this.storage.put("bar", 123);
await promise1;
frob();
await promise2;

Here, the get() and put() execute concurrently. Moreover, the call to frob() may execute before the put() has completed (but strictly after the get() completes, since we awaited that promise). However, no other event -- such as receiving a new request -- can unexpectedly happen in the meantime.

On the other hand, the rule protects you not just against concurrent incoming requests, but also concurrent responses to outgoing requests. For example, say you have:

async function task1() {
  await fetch("https://example.com/api1");
  return await this.getUniqueNumber();
}
async function task2() {
  await fetch("https://example.com/api2");
  return await this.getUniqueNumber();
}
let promise1 = task1();
let promise2 = task2();
let val1 = await promise1;
let val2 = await promise2;

This code launches two fetch() calls concurrently. After each fetch completes, getUniqueNumber() is invoked. Could the two calls interfere with each other?

No, they will not. The completion of a fetch() is itself a kind of event. Our rule states that such events cannot be delivered while storage events are in progress. When the first of the two fetches returns, the app calls getUniqueNumber(), which starts performing some storage operations. If the second fetch() also returns while these storage operations are still outstanding, that return will be deferred until after the storage operations are done. Once again, our code ends up correct!

At this point, the async programming experts in the audience are probably starting to feel like something is fishy here. Indeed, there is a catch. What if we do:

// Still a problem even with input gates.
let promise1 = getUniqueNumber();
let promise2 = getUniqueNumber();
let val1 = await promise1;
let val2 = await promise2;

In this case, there is, in fact, a problem. Two calls to getUniqueNumber() are initiated by the same event. The application does not await the first call before starting the second, so the two calls end up running concurrently. Our special rule doesn't protect us here, because there is no incoming event that can be deferred between when the two calls are made. From the system's point of view, there's no way to distinguish this code from code which legitimately decided to perform two storage operations in parallel.

As such, in this case, the two calls to getUniqueNumber() will interfere with each other. However, this problem is far less likely to come about by accident, and is far easier to catch in testing. This bug is deterministic, not caused by the unpredictable timing of network events. We consider this an acceptable caveat in order to solve the larger problem posed by concurrent requests.

Part 2: Output Gates

Let's go back to our in-memory caching example. Can we make it work?

// Can we make this "just work"?
async function getUniqueNumber() {
  if (this.val === undefined) {
    this.val = await this.storage.get("counter");
  }

  let result = this.val;
  ++this.val;
  this.storage.put("counter", this.val);
  return result;
}

With input gates (part 1), we've solved one of the two problems this code had: the race condition of initialization. We no longer need to worry that two requests will call this at the same time, leading this.val to be initialized twice.

However, the problem with not awaiting the put() is still there. If we don't await it, then we could lose data. If we do await it, then the call is slow.

We make another new rule:

Output gates: When a storage write operation is in progress, any new outgoing network messages will be held back until the write has completed. We say that these messages are waiting for the "output gate" to open. If the write ultimately fails, the outgoing network messages will be discarded and replaced with errors, while the Durable Object will be shut down and restarted from scratch.

With this rule, we no longer have to await the result of put(). Our code can happily continue executing and just assume the put() will succeed. If the put() doesn't succeed, then anything the application does here will never be observable to the rest of the world anyway. For example, if the app prematurely sends a response to the user saying that the operation succeeded, this response will not actually be delivered until after the put() completes successfully. So, by the time the user receives the message, it is no longer "premature"! In the very rare event that the write operation fails, the user will not receive the premature confirmation at all.

Note that output gates apply not only to responses sent back to a client, but also to new outgoing requests made with fetch() -- those requests will be delayed from being sent until all prior writes are confirmed. So, once again, it is impossible for anything else in the world to observe a premature confirmation.

With this change, our getUniqueNumber() implementation with in-memory caching is now fully correct, while retaining most of its speed advantage over the non-caching implementation. Except for the very first call, the application will never be blocked waiting for getUniqueNumber() to finish. The final response from the app to the client will be delayed pending write confirmation, but that write can be performed in parallel with any writes the application performs after getUniqueNumber() completes.

Part 3: Automatic in-memory caching

Our in-memory caching example now works great. But, it's still a little bit complicated and unnatural to write. Let's go back to our original, simple code one more time… can we make it fast by default?

// Can we make this not just work, but just work FAST?
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

The answer to this part is a classic one: we can add automatic caching to the storage layer, just like most operating systems do for disk storage.

We have rolled out an in-memory caching layer for Durable Objects. This layer keeps up to several megabytes worth of data directly in memory in the process where the object runs.

When a get() requests a key that is in cache, the operation returns immediately, without even context-switching out of the thread and isolate where the object is hosted. If the key is not in cache, then a storage request will still be needed, but reads complete relatively quickly.

Better yet, put() requests now always complete "instantaneously". A put() simply writes to cache. We rely on output gates ("part 2", above) to prevent the premature confirmation of writes to any external party. Writes will be coalesced (even if you await them), so that the output gate waits only for O(1) network round trips of latency, not O(n).

Moreover, because get() and put() now complete instantly in most or all cases, the negative impact of input gates on throughput is largely mitigated, because the gate now spends relatively little time blocked.

With Durable Objects built-in caching, our simple code is now just as fast as our code that manually implemented in-memory caching. Combined with input and output gates, our code is now simple, fast, and correct, all at the same time.

Bonus Correctness

Our caching layer provides some bonus consistency guarantees, in addition to performance.

First, writes are automatically coalesced. That is, if you perform multiple put() or delete() operations without awaiting them or anything else in between, then the operations are automatically grouped together and stored atomically. In the case of a sudden power failure, after coming back up, either all of the writes will have been stored, or none of them will. For example:

// Move a value from "foo" to "bar".
let val = await this.storage.get("foo");

this.storage.delete("foo");
this.storage.put("bar", val);
// There's no possibility of data loss, because the delete() and the
// following put() are automatically coalesced into one atomic
// operation. This is true as long as you do not `await` anything
// in between.

Second, the API is also able to provide stronger ordering guarantees for reads. Previously, overlapping storage operations did not have guaranteed ordering. For example, if you issued a get() and a put() on the same key at the same time (without awaiting one before starting the other), then it was not deterministic whether the get() might return the value written by the put() -- regardless of the ordering of the statements in your code. The caching layer fixes this. Now, operations are performed in exactly the order in which they were initiated, regardless of when they complete.

These two features eliminate more subtle bugs that might otherwise be hard to catch in testing, so that you don't have to be a database expert to write code that works.

Optional Bypass

We expect gates and caching will be a win in the vast majority of use cases, but not always. In some use cases, concurrency won't lead to any problems, and so blocking it may be a loss. Sometimes, the application is OK with prematurely confirming writes in order to minimize latency. And sometimes, caching may just waste memory because the same keys are not frequently accessed.

For those cases, we offer explicit bypasses:

this.storage.get("foo", {allowConcurrency: true, noCache: true});
this.storage.put("foo", "bar", {allowUnconfirmed: true, noCache: true});

Developers who have taken the time to think carefully about these issues can use these flags to tune performance to their specific needs. For those who don't want to think about it, the defaults should work well.

Conclusion

Concurrency is hard. It doesn't matter if you're a novice or an expert: even experts regularly get it wrong. It's difficult to think about all the ways that concurrent operations might overlap to corrupt your application state.

The traditional answer has been to make applications stateless, and defer all concurrency control to the database layer using transactions. However, transactions are slow, which is a big reason why so many web applications today take hundreds of milliseconds or more to respond to basic actions.

Durable Objects are all about state. By keeping state in memory in addition to on disk, and directing requests for the same data to be coordinated through the same instance, we can make applications much faster. But until recently, this was extremely tricky to get right.

With input gates, output gates, and caching, code written in the most intuitive way now "just works", and runs fast. This means you can focus on building your application, without wasting time optimizing I/O performance and debugging obscure race conditions.

Containers at the edge: it’s not what you think, or maybe it is

Kenton Varda — Sat, 17 Apr 2021 13:00:00 GMT

At Cloudflare, we’re committed to making it as easy as possible for developers to make their ideas come to life. Our announcements this week aim to give developers all the tools they need to build their next application on the edge. These include things like static site hosting, certificate management, and image services, just to name a few.

Today, we’re thrilled to announce that we’re exploring a new type of service at the edge: containers.

This announcement will be exciting to some and surprising to many. On this very blog, we’ve talked about why we believe isolates — rather than containers on the edge — will be the future model for applications on the web.

Isolates are best for Distributed Systems

Let us be clear: isolates are the best way to do edge compute, period. The Worker's platform is designed to allow developers to treat our global network as one big computer. This has been a long-held dream of generations of engineers, inspiring slogans like "The Network is the Computer" — a trademark which, incidentally, we now own. Isolates and Durable Objects are finally making that vision possible.

In short, isolates excel at distributed systems. They are perfect for treating the network as one big computer.

Isolates are great for distributed systems because, by being extremely lightweight, they enable us to reduce the unit of compute to a very fine granularity. That in turn allows work to be more effectively distributed across a large network. It is completely reasonable and efficient (takes just a few milliseconds, less than a TLS handshake) to spin up an isolated to handle one single HTTP request on the edge, which means we can choose the ideal location for each request to be processed. In contrast, because containers and virtual machines are heavier weight, it's necessary to centralize traffic on a few instances to achieve economies of scale.

But there's still a place for containers

Some applications are not really meant to be distributed. Consider, for example, a modern, single-player 3D video game. Such a game can be processing dozens of gigabytes of data every second, which by some measures sounds like "Big Data." Can we make games like that better by running them as a distributed system across a cluster of machines? It turns out… probably not. The problem is that all that data is being compiled down into a single output stream (video frames) which must be delivered in a specific sequence with minimal latency. With today's technology, it just doesn't make sense to distribute this work across a network. As such, isolates don't offer much benefit for this use case.

Meanwhile, at least today, isolates present a challenge when supporting legacy systems. The ecosystem of tooling and technology stacks for isolates is still young and developing. Writing a new application on isolates is great, but taking a complex existing codebase and porting it to run in isolates takes considerable effort. In the case of something like a 3D game, it may not even be possible, as the APIs to access GPUs may not be available. We expect this to improve, but it won't happen overnight.

Isolates	Containers
Distributed/global systems	Legacy/single-user applications
Web application servers	3D rendering
Big data (e.g. MapReduce)	CI builds

We needed them too

We even have a small confession to make: we already built the capability to run containers at the edge for ourselves, specifically for our Browser Isolation product. This product lets you run your web browser on Cloudflare's servers and stream the graphics back to your client machine, increasing security and performance. We didn't build our own browser for this — our technology is based on Chromium.

Chromium is a big existing codebase that cannot realistically run inside isolates today. In fact, the "isolate engine" that Workers is built on — V8 — is itself a piece of Chromium. It's not designed to nest within itself — maybe someday, but not today.

Moreover, a web browser is another example of an application that doesn't make sense to be "distributed." A browser is extremely complex, but serves only one user. It doesn't need to be infinitely scalable or run all around the world at once.

So, instead of trying to build Browser Isolation on Workers, we deployed a container engine to our edge to run Chromium.

Another way to run isolates at the edge

“The edge”, of course, doesn’t have to mean running in all 200+ data centers all the time. We’ve also been able to use containers on the edge ourselves by running them in off-peak locations and for non-latency sensitive tasks. The scheduler for scheduled Workers, for example, runs on our internal container service. Since scheduled events don’t have an end user waiting on a timely response, we’re able to run events in data centers where it’s nighttime and the traffic levels are low.

Another great use case is running CI builds on the edge, though not for the reason you think. Web traffic in any particular location goes through daily cycles. During off-peak hours, a lot of compute is not used. These off-peak locations would be perfect for running batch work like builds in order to maximize compute efficiency.

What about migrating my containers to the edge to make them faster?

While there are some use cases better suited for containers, moving your container workload from its centralized location to the edge may not be the silver bullet you were hoping for.

A container-based web application running in Node.js or Django, for example, is unlikely to reap the same benefits from running on the edge. Due to the high overhead required by containers, your application will experience hundreds of milliseconds and often upwards of seconds of cold starts even when running on the edge. In that context, the saved network latency becomes negligible.

Even if the average request to a warmed-up container was faster, would you be willing to pay a premium for distributing it to 200+ data centers, rather than your current one or two?

Another thing to keep in mind is that being at the edge may introduce considerable cognitive overhead for legacy server stacks in containers. Managing the state of your application running in 200+ locations around the world is very different from managing it in one, two, or even three data centers. We've specifically designed Workers and Durable Objects to abstract away these concerns, but with classical server stacks running in containers, it may not be so easy.

With Cloudflare Workers and now Durable Objects — which were built with the edge in mind — we believe we have the right abstractions to allow developers to build for the edge first.

Container support is for a more limited class of applications that can’t be easily migrated today.

Still can’t contain your excitement?

If you have a use case for running containers at our edge, we’d love to know about it! Sign up for our early access (currently restricted to our enterprise plans) and let us know.

Workers Durable Objects Beta: A New Approach to Stateful Serverless

Kenton Varda — Mon, 28 Sep 2020 13:00:00 GMT

We launched Cloudflare Workers® in 2017 with a radical vision: code running at the network edge could not only improve performance, but also be easier to deploy and cheaper to run than code running in a single datacenter. That vision means Workers is about more than just edge compute -- we're rethinking how applications are built.

Using a "serverless" approach has allowed us to make deploys dead simple, and using isolate technology has allowed us to deliver serverless more cheaply and without the lengthy cold starts that hold back other providers. We added easy-to-use eventually-consistent edge storage to the platform with Workers KV.

But up until today, it hasn't been possible to manage state with strong consistency, or to coordinate in real time between multiple clients, entirely on the edge. Thus, these parts of your application still had to be hosted elsewhere.

Durable Objects provide a truly serverless approach to storage and state: consistent, low-latency, distributed, yet effortless to maintain and scale. They also provide an easy way to coordinate between clients, whether it be users in a particular chat room, editors of a particular document, or IoT devices in a particular smart home. Durable Objects are the missing piece in the Workers stack that makes it possible for whole applications to run entirely on the edge, with no centralized "origin" server at all.

Today we are beginning a closed beta of Durable Objects.

Request a beta invite »

What is a "Durable Object"?

I'm going to be honest: naming this product was hard, because it's not quite like any other cloud technology that is widely-used today. This proverbial bike shed has many layers of paint, but ultimately we settled on "Unique Durable Objects", or "Durable Objects" for short. Let me explain what they are by breaking that down:

Objects: Durable Objects are objects in the sense of Object-Oriented Programming. A Durable Object is an instance of a class -- literally, a class definition written in JavaScript (or your language of choice). The class has methods which define its public interface. An object is an instance of this class, combining the code with some private state.
Unique: Each object has a globally-unique identifier. That object exists in only one location in the whole world at a time. Any Worker running anywhere in the world that knows the object's ID can send messages to it. All those messages end up delivered to the same place.
Durable: Unlike a normal object in JavaScript, Durable Objects can have persistent state stored on disk. Each object's durable state is private to it, which means not only that access to storage is fast, but the object can even safely maintain a consistent copy of the state in memory and operate on it with zero latency. The in-memory object will be shut down when idle and recreated later on-demand.

What can they do?

Durable Objects have two primary abilities:

Storage: Each object has attached durable storage. Because this storage is private to a specific object, the storage is always co-located with the object. This means the storage can be very fast while providing strong, transactional consistency. Durable Objects apply the serverless philosophy to storage, splitting the traditional large monolithic databases up into many small, logical units. In doing so, we get the advantages you've come to expect from serverless: effortless scaling with zero maintenance burden.
Coordination: Historically, with Workers, each request would be randomly load-balanced to a Worker instance. Since there was no way to control which instance received a request, there was no way to force two clients to talk to the same Worker, and therefore no way for clients to coordinate through Workers. Durable Objects change that: requests related to the same topic can be forwarded to the same object, which can then coordinate between them, without any need to touch storage. For example, this can be used to facilitate real-time chat, collaborative editing, video conferencing, pub/sub message queues, game sessions, and much more.

The astute reader may notice that many coordination use cases call for WebSockets -- and indeed, conversely, most WebSocket use cases require coordination. Because of this complementary relationship, along with the Durable Objects beta, we've also added WebSocket support to Workers. For more on this, see the Q&A below.

Region: Earth

When using Durable Objects, Cloudflare automatically determines the Cloudflare datacenter that each object will live in, and can transparently migrate objects between locations as needed.

Traditional databases and stateful infrastructure usually require you to think about geographical "regions", so that you can be sure to store data close to where it is used. Thinking about regions can often be an unnatural burden, especially for applications that are not inherently geographical.

With Durable Objects, you instead design your storage model to match your application's logical data model. For example, a document editor would have an object for each document, while a chat app would have an object for each chat. There is no problem creating millions or billions of objects, as each object has minimal overhead.

Killer app: Real-time collaborative document editing

Let's say you have a spreadsheet editor application -- or, really, any kind of app where users edit a complex document. It works great for one user, but now you want multiple users to be able to edit it at the same time. How do you accomplish this?

For the standard web application stack, this is a hard problem. Traditional databases simply aren't designed to be real-time. When Alice and Bob are editing the same spreadsheet, you want every one of Alice's keystrokes to appear immediately on Bob's screen, and vice versa. But if you merely store the keystrokes to a database, and have the users repeatedly poll the database for new updates, at best your application will have poor latency, and at worst you may find database transactions repeatedly fail as users on opposite sides of the world fight over editing the same content.

The secret to solving this problem is to have a live coordination point. Alice and Bob connect to the same coordinator, typically using WebSockets. The coordinator then forwards Alice's keystrokes to Bob and Bob's keystrokes to Alice, without having to go through a storage layer. When Alice and Bob edit the same content at the same time, the coordinator resolves conflicts instantly. The coordinator can then take responsibility for updating the document in storage -- but because the coordinator keeps a live copy of the document in-memory, writing back to storage can happen asynchronously.

Every big-name real-time collaborative document editor works this way. But for many web developers, especially those building on serverless infrastructure, this kind of solution has long been out-of-reach. Standard serverless infrastructure -- and even cloud infrastructure more generally -- just does not make it easy to assign these coordination points and direct users to talk to the same instance of your server.

Durable Objects make this easy. Not only do they make it easy to assign a coordination point, but Cloudflare will automatically create the coordinator close to the users using it and migrate it as needed, minimizing latency. The availability of local, durable storage means that changes to the document can be saved reliably in an instant, even if the eventual long-term storage is slower. Or, you can even store the entire document on the edge and abandon your database altogether.

With Durable Objects lowering the barrier, we hope to see real-time collaboration become the norm across the web. There's no longer any reason to make users refresh for updates.

Example: An atomic counter

Here's a very simple example of a Durable Object which can be incremented, decremented, and read over HTTP. This counter is consistent even when receiving simultaneous requests from multiple clients -- none of the increments or decrements will be lost. At the same time, reads are served entirely from memory, no disk access needed.

export class Counter {
  // Constructor called by the system when the object is needed to
  // handle requests.
  constructor(controller, env) {
    // `controller.storage` is an interface to access the object's
    // on-disk durable storage.
    this.storage = controller.storage
  }

  // Private helper method called from fetch(), below.
  async initialize() {
    let stored = await this.storage.get("value");
    this.value = stored || 0;
  }

  // Handle HTTP requests from clients.
  //
  // The system calls this method when an HTTP request is sent to
  // the object. Note that these requests strictly come from other
  // parts of your Worker, not from the public internet.
  async fetch(request) {
    // Make sure we're fully initialized from storage.
    if (!this.initializePromise) {
      this.initializePromise = this.initialize();
    }
    await this.initializePromise;

    // Apply requested action.
    let url = new URL(request.url);
    switch (url.pathname) {
      case "/increment":
        ++this.value;
        await this.storage.put("value", this.value);
        break;
      case "/decrement":
        --this.value;
        await this.storage.put("value", this.value);
        break;
      case "/":
        // Just serve the current value. No storage calls needed!
        break;
      default:
        return new Response("Not found", {status: 404});
    }

    // Return current value.
    return new Response(this.value);
  }
}

Once the class has been bound to a Durable Object namespace, a particular instance of Counter can be accessed from anywhere in the world using code like:

// Derive the ID for the counter object named "my-counter".
// This name is associated with exactly one instance in the
// whole world.
let id = COUNTER_NAMESPACE.idFromName("my-counter");

// Send a request to it.
let response = await COUNTER_NAMESPACE.get(id).fetch(request);

Demo: Chat

Chat is arguably real-time collaboration in its purest form. And to that end, we have built a demo open source chat app that runs entirely at the edge using Durable Objects.

See the source code on GitHub »

This chat app uses a Durable Object to control each chat room. Users connect to the object using WebSockets. Messages from one user are broadcast to all the other users. The chat history is also stored in durable storage, but this is only for history. Real-time messages are relayed directly from one user to others without going through the storage layer.

Additionally, this demo uses Durable Objects for a second purpose: Applying a rate limit to messages from any particular IP. Each IP is assigned a Durable Object that tracks recent request frequency, so that users who send too many messages can be temporarily blocked -- even across multiple chat rooms. Interestingly, these objects don't actually store any durable state at all, because they only care about very recent history, and it's not a big deal if a rate limiter randomly resets on occasion. So, these rate limiter objects are an example of a pure coordination object with no storage.

This chat app is only a few hundred lines of code. The deployment configuration is only a few lines. Yet, it will scale seamlessly to any number of chat rooms, limited only by Cloudflare's available resources. Of course, any individual chat room's scalability has a limit, since each object is single-threaded. But, that limit is far beyond what a human participant could keep up with anyway.

Other use cases

Durable Objects have infinite uses. Here are just a few ideas, beyond the ones described above:

Shopping cart: An online storefront could track a user's shopping cart in an object. The rest of the storefront could be served as a fully static web site. Cloudflare will automatically host the cart object close to the end user, minimizing latency.
Game server: A multiplayer game could track the state of a match in an object, hosted on the edge close to the players.
IoT coordination: Devices within a family's house could coordinate through an object, avoiding the need to talk to distant servers.
Social feeds: Each user could have a Durable Object that aggregates their subscriptions.
Comment/chat widgets: A web site that is otherwise static content can add a comment widget or even a live chat widget on individual articles. Each article would use a separate Durable Object to coordinate. This way the origin server can focus on static content only.

The Future: True Edge Databases

We see Durable Objects as a low-level primitive for building distributed systems. Some applications, like those mentioned above, can use objects directly to implement a coordination layer, or maybe even as their sole storage layer.

However, Durable Objects today are not a complete database solution. Each object can see only its own data. To perform a query or transaction across multiple objects, the application needs to do some extra work.

That said, every big distributed database – whether it be relational, document, graph, etc. – is, at some low level, composed of "chunks" or "shards" that store one piece of the overall data. The job of a distributed database is to coordinate between chunks.

We see a future of edge databases that store each "chunk" as a Durable Object. By doing so, it will be possible to build databases that operate entirely at the edge, fully distributed with no regions or home location. These databases need not be built by us; anyone can potentially build them on top of Durable Objects. Durable Objects are only the first step in the edge storage journey.

Join the Beta

Storing data is a big responsibility which we do not take lightly. Because of the critical importance of getting it right, we are being careful. We will be making Durable Objects available gradually over the next several months.

As with any beta, this product is a work in progress, and some of what is described in this post is not fully enabled yet. Full details of beta limitations can be found in the documentation.

If you'd like to try out Durable Objects now, tell us about your use case. We'll be selecting the most interesting use cases for early access.

Request a beta invite »

Q&A

Can Durable Objects serve WebSockets?

Yes.

As part of the Durable Objects beta, we've made it possible for Workers to act as WebSocket endpoints -- including as a client or as a server. Before now, Workers could proxy WebSocket connections on to a back-end server, but could not speak the protocol directly.

While technically any Worker can speak WebSocket in this way, WebSockets are most useful when combined with Durable Objects. When a client connects to your application using a WebSocket, you need a way for server-generated events to be sent back to the existing socket connection. Without Durable Objects, there's no way to send an event to the specific Worker holding a WebSocket. With Durable Objects, you can now forward the WebSocket to an Object. Messages can then be addressed to that Object by its unique ID, and the Object can then forward those messages down the WebSocket to the client.

The chat app demo presented above uses WebSockets. Check out the source code to see how it works.

How does this compare to Workers KV?

Two years ago, we introduced Workers KV, a global key-value data store. KV is a fairly minimalist global data store that serves certain purposes well, but is not for everyone. KV is eventually consistent, which means that writes made in one location may not be visible in other locations immediately. Moreover, it implements "last write wins" semantics, which means that if a single key is being modified from multiple locations in the world at once, it's easy for those writes to overwrite each other. KV is designed this way to support low-latency reads for data that doesn't frequently change. However, these design decisions make KV inappropriate for state that changes frequently, or when changes need to be immediately visible worldwide.

Durable Objects, in contrast, are not primarily a storage product at all -- many use cases for them do not actually utilize durable storage. To the extent that they do provide storage, Durable Objects sit at the opposite end of the storage spectrum from KV. They are extremely well-suited to workloads requiring transactional guarantees and immediate consistency. However, since transactions inherently must be coordinated in a single location, and clients on the opposite side of the world from that location will experience moderate latency due to the inherent limitations of the speed of light. Durable Objects will combat this problem by auto-migrating to live close to where they are used.

In short, Workers KV remains the best way to serve static content, configuration, and other rarely-changing data around the world, while Durable Objects are better for managing dynamic state and coordination.

Going forward, we plan to utilize Durable Objects in the implementation of Workers KV itself, in order to deliver even better performance.

Why not use CRDTs?

You can build CRDT-based storage on top of Durable Objects, but Durable Objects do not require you to use CRDTs.

Conflict-free Replicated Data Types (CRDTs), or their cousins, Operational Transforms (OTs), are a technology that allows data to be edited from multiple places in the world simultaneously without synchronization, and without data loss. For example, these technologies are commonly used in the implementation of real-time collaborative document editors, so that a user's keypresses can show up in their local copy of the document in real time, without waiting to see if anyone else edited another part of the document first. Without getting into details, you can think of these techniques like a real time version of "git fork" and "git merge", where all merge conflicts are resolved automatically in a deterministic way, so that everyone ends up with the same state in the end.

CRDTs are a powerful technology, but applying them correctly can be challenging. Only certain kinds of data structures lend themselves to automatic conflict resolution in a way that doesn't lead to easy data loss. Any developer familiar with git can see the problem: arbitrary conflict resolution is hard, and any automated algorithm for it will likely get things wrong sometimes. It's all the more difficult if the algorithm has to handle merges in arbitrary order and still get the same answer.

We feel that, for most applications, CRDTs are overly complex and not worth the effort. Worse, the set of data structures that can be represented as a CRDT is too limited for many applications. It's usually much easier to assign a single authoritative coordination point for each document, which is exactly what Durable Objects accomplish.

With that said, CRDTs can be used on top of Durable Objects. If an object's state lends itself to CRDT treatment, then an application could replicate that object into several objects serving different regions, which then synchronize their states via CRDT. This would make sense for applications to implement as an optimization if and when they find it is worth the effort.

Last thoughts: What does it mean for state to be "serverless"?

Traditionally, serverless has focused on stateless compute. In serverless architectures, the logical unit of compute is reduced to something fine-grained: a single event, such as an HTTP request. This works especially well because events just happened to be the logical unit of work that we think about when designing server applications. No one thinks about their business logic in units of "servers" or "containers" or "processes" -- we think about events. It is exactly because of this semantic alignment that serverless succeeds in shifting so much of the logistical burden of maintaining servers away from the developer and towards the cloud provider.

However, serverless architecture has traditionally been stateless. Each event executes in isolation. If you wanted to store data, you had to connect to a traditional database. If you wanted to coordinate between requests, you had to connect to some other service that provides that ability. These external services have tended to re-introduce the operational concerns that serverless was intended to avoid. Developers and service operators have to worry not just about scaling their databases to handle increasing load, but also about how to split their database into "regions" to effectively handle global traffic. The latter concern can be especially cumbersome.

So how can we apply the serverless philosophy to state? Just like serverless compute is about splitting compute into fine-grained pieces, serverless state is about splitting state into fine-grained pieces. Again, we seek to find a unit of state that corresponds to logical units in our application. The logical unit of state in an application is not a "table" or a "collection" or a "graph". Instead, it depends on the application. The logical unit of state in a chat app is a chat room. The logical unit of state in an online spreadsheet editor is a spreadsheet. The logical unit of state in an online storefront is a shopping cart. By making the physical unit of storage provided by the storage layer match the logical unit of state inherent in the application, we can allow the underlying storage provider (Cloudflare) to take responsibility for a wide array of logistical concerns that previously fell on the developer, including scalability and regionality.

This is what Durable Objects do.

Mitigating Spectre and Other Security Threats: The Cloudflare Workers Security Model

Kenton Varda — Wed, 29 Jul 2020 13:00:00 GMT

Hello, I'm an engineer on the Workers team, and today I want to talk to you about security.

Cloudflare is a security company, and the heart of Workers is, in my view, a security project. Running code written by third parties is always a scary proposition, and the primary concern of the Workers team is to make that safe.

For a project like this, it is not enough to pass a security review and say "ok, we're secure" and move on. It's not even enough to consider security at every stage of design and implementation. For Workers, security in and of itself is an ongoing project, and that work is never done. There are always things we can do to reduce the risk and impact of future vulnerabilities.

Today, I want to give you an overview of our security architecture, and then address two specific issues that we are frequently asked about: V8 bugs, and Spectre.

Architectural Overview

Let's start with a quick overview of the Workers Runtime architecture.

There are two fundamental parts of designing a code sandbox: secure isolation and API design.

Isolation

First, we need to create an execution environment where code can't access anything it's not supposed to.

For this, our primary tool is V8, the JavaScript engine developed by Google for use in Chrome. V8 executes code inside "isolates", which prevent that code from accessing memory outside the isolate -- even within the same process. Importantly, this means we can run many isolates within a single process. This is essential for an edge compute platform like Workers where we must host many thousands of guest apps on every machine, and rapidly switch between these guests thousands of times per second with minimal overhead. If we had to run a separate process for every guest, the number of tenants we could support would be drastically reduced, and we'd have to limit edge compute to a small number of big enterprise customers who could pay a lot of money. With isolate technology, we can make edge compute available to everyone.

Sometimes, though, we do decide to schedule a worker in its own private process. We do this if it uses certain features that we feel need an extra layer of isolation. For example, when a developer uses the devtools debugger to inspect their worker, we run that worker in a separate process. This is because historically, in the browser, the inspector protocol has only been usable by the browser's trusted operator, and therefore has not received as much security scrutiny as the rest of V8. In order to hedge against the increased risk of bugs in the inspector protocol, we move inspected workers into a separate process with a process-level sandbox. We also use process isolation as an extra defense against Spectre, which I'll describe later in this post.

Additionally, even for isolates that run in a shared process with other isolates, we run multiple instances of the whole runtime on each machine, which we call "cordons". Workers are distributed among cordons by assigning each worker a level of trust and separating low-trusted workers from those we trust more highly. As one example of this in operation: a customer who signs up for our free plan will not be scheduled in the same process as an enterprise customer. This provides some defense-in-depth in the case a zero-day security vulnerability is found in V8. But I'll talk more about V8 bugs, and how we address them, later in this post.

At the whole-process level, we apply another layer of sandboxing for defense in depth. The "layer 2" sandbox uses Linux namespaces and seccomp to prohibit all access to the filesystem and network. Namespaces and seccomp are commonly used to implement containers. However, our use of these technologies is much stricter than what is usually possible in container engines, because we configure namespaces and seccomp after the process has started (but before any isolates have been loaded). This means, for example, we can (and do) use a totally empty filesystem (mount namespace) and use seccomp to block absolutely all filesystem-related system calls. Container engines can't normally prohibit all filesystem access because doing so would make it impossible to use exec() to start the guest program from disk; in our case, our guest programs are not native binaries, and the Workers runtime itself has already finished loading before we block filesystem access.

The layer 2 sandbox also totally prohibits network access. Instead, the process is limited to communicating only over local Unix domain sockets, to talk to other processes on the same system. Any communication to the outside world must be mediated by some other local process outside the sandbox.

One such process in particular, which we call the "supervisor", is responsible for fetching worker code and configuration from disk or from other internal services. The supervisor ensures that the sandbox process cannot read any configuration except that which is relevant to the workers that it should be running.

For example, when the sandbox process receives a request for a worker it hasn't seen before, that request includes the encryption key for that worker's code (including attached secrets). The sandbox can then pass that key to the supervisor in order to request the code. The sandbox cannot request any worker for which it has not received the appropriate key. It cannot enumerate known workers. It also cannot request configuration it doesn't need; for example, it cannot request the TLS key used for HTTPS traffic to the worker.

Aside from reading configuration, the other reason for the sandbox to talk to other processes on the system is to implement APIs exposed to Workers. Which brings us to API design.

API Design

There is a saying: "If a tree falls in the forest, but no one is there to hear it, does it make a sound?" I have a related saying: "If a Worker executes in a fully-isolated environment in which it is totally prevented from communicating with the outside world, does it actually run?"

Complete code isolation is, in fact, useless. In order for Workers to do anything useful, they have to be allowed to communicate with users. At the very least, a Worker needs to be able to receive requests and respond to them. It would also be nice if it could send requests to the world, safely. For that, we need APIs.

In the context of sandboxing, API design takes on a new level of responsibility. Our APIs define exactly what a Worker can and cannot do. We must be very careful to design each API so that it can only express operations which we want to allow, and no more. For example, we want to allow Workers to make and receive HTTP requests, while we do not want them to be able to access the local filesystem or internal network services.

Let's dig into the easier example first. Currently, Workers does not allow any access to the local filesystem. Therefore, we do not expose a filesystem API at all. No API means no access.

But, imagine if we did want to support local filesystem access in the future. How would we do that? We obviously wouldn't want Workers to see the whole filesystem. Imagine, though, that we wanted each Worker to have its own private directory on the filesystem where it can store whatever it wants.

To do this, we would use a design based on capability-based security. Capabilities are a big topic, but in this case, what it would mean is that we would give the worker an object of type Directory, representing a directory on the filesystem. This object would have an API that allows creating and opening files and subdirectories, but does not permit traversing "up" the tree to the parent directory. Effectively, each worker would see its private Directory as if it were the root of their own filesystem.

How would such an API be implemented? As described above, the sandbox process cannot access the real filesystem, and we'd prefer to keep it that way. Instead, file access would be mediated by the supervisor process. The sandbox talks to the supervisor using Cap'n Proto RPC, a capability-based RPC protocol. (Cap'n Proto is an open source project currently maintained by the Cloudflare Workers team.) This protocol makes it very easy to implement capability-based APIs, so that we can strictly limit the sandbox to accessing only the files that belong to the Workers it is running.

Now what about network access? Today, Workers are allowed to talk to the rest of the world only via HTTP -- both incoming and outgoing. There is no API for other forms of network access, therefore it is prohibited (though we plan to support other protocols in the future).

As mentioned before, the sandbox process cannot connect directly to the network. Instead, all outbound HTTP requests are sent over a Unix domain socket to a local proxy service. That service implements restrictions on the request. For example, it verifies that the request is either addressed to a public Internet service, or to the Worker's zone's own origin server, not to internal services that might be visible on the local machine or network. It also adds a header to every request identifying the worker from which it originates, so that abusive requests can be traced and blocked. Once everything is in order, the request is sent on to our HTTP caching layer, and then out to the Internet.

Similarly, inbound HTTP requests do not go directly to the Workers Runtime. They are first received by an inbound proxy service. That service is responsible for TLS termination (the Workers Runtime never sees TLS keys), as well as identifying the correct Worker script to run for a particular request URL. Once everything is in order, the request is passed over a Unix domain socket to the sandbox process.

V8 bugs and the "patch gap"

Every non-trivial piece of software has bugs, and sandboxing technologies are no exception. Virtual machines have bugs, containers have bugs, and yes, isolates (which we use) also have bugs. We can't live life pretending that no further bugs will ever be discovered; instead, we must assume they will and plan accordingly.

We rely heavily on isolation provided by V8, the JavaScript engine built by Google for use in Chrome. This has good sides and bad sides. On one hand, V8 is an extraordinarily complicated piece of technology, creating a wider "attack surface" than virtual machines. More complexity means more opportunities for something to go wrong. On the bright side, though, an extraordinary amount of effort goes into finding and fixing V8 bugs, owing to its position as arguably the most popular sandboxing technology in the world. Google regularly pays out 5-figure bounties to anyone finding a V8 sandbox escape. Google also operates fuzzing infrastructure that automatically finds bugs faster than most humans can. Google's investment does a lot to minimize the danger of V8 "zero-days" -- bugs that are found by the bad guys and not known to Google.

But, what happens after a bug is found and reported by the good guys? V8 is open source, so fixes for security bugs are developed in the open and released to everyone at the same time -- good guys and bad guys. It's important that any patch be rolled out to production as fast as possible, before the bad guys can develop an exploit.

The time between publishing the fix and deploying it is known as the "patch gap". Earlier this year, Google announced that Chrome's patch gap had been reduced from 33 days to 15 days.

Fortunately, we have an advantage here, in that we directly control the machines on which our system runs. We have automated almost our entire build and release process, so the moment a V8 patch is published, our systems automatically build a new release of the Workers Runtime and, after one-click sign-off from the necessary (human) reviewers, automatically push that release out to production.

As a result, our patch gap is now under 24 hours. A patch published by V8's team in Munich during their work day will usually be in production before the end of our work day in the US.

Spectre: Introduction

We get a lot of questions about Spectre. The V8 team at Google has stated in no uncertain terms that V8 itself cannot defend against Spectre. Since Workers relies on V8 for sandboxing, many have asked if that leaves Workers vulnerable. However, we do not need to depend on V8 for this; the Workers environment presents many alternative approaches to mitigating Spectre.

Spectre is complicated and nuanced, and there's no way I can cover everything there is to know about it or how Workers addresses it in a single blog post. But, hopefully I can clear up some of the confusion and concern.

What is it?

Spectre is a class of attacks in which a malicious program can trick the CPU into "speculatively" performing computation using data that the program is not supposed to have access to. The CPU eventually realizes the problem and does not allow the program to see the results of the speculative computation. However, the program may be able to derive bits of the secret data by looking at subtle side effects of the computation, such as the effects on cache.

For more background about Spectre, check out our Learning Center page on the topic.

Why does it matter for Workers?

Spectre encompasses a wide variety of vulnerabilities present in modern CPUs. The specific vulnerabilities vary by architecture and model, and it is likely that many vulnerabilities exist which haven't yet been discovered.

These vulnerabilities are a problem for every cloud compute platform. Any time you have more than one tenant running code on the same machine, Spectre attacks come into play. However, the "closer together" the tenants are, the more difficult it can be to mitigate specific vulnerabilities. Many of the known issues can be mitigated at the kernel level (protecting processes from each other) or at the hypervisor level (protecting VMs), often with the help of CPU microcode updates and various tricks (many of which, unfortunately, come with serious performance impact).

In Cloudflare Workers, we isolate tenants from each other using V8 isolates -- not processes nor VMs. This means that we cannot necessarily rely on OS or hypervisor patches to "solve" Spectre for us. We need our own strategy.

Why not use process isolation?

Cloudflare Workers is designed to run your code in every single Cloudflare location, of which there are currently 200 worldwide and growing.

We wanted Workers to be a platform that is accessible to everyone -- not just big enterprise customers who can pay megabucks for it. We need to handle a huge number of tenants, where many tenants get very little traffic.

Combine these two points, and things get tricky.

A typical, non-edge serverless provider could handle a low-traffic tenant by sending all of that tenant's traffic to a single machine, so that only one copy of the application needs to be loaded. If the machine can handle, say, a dozen tenants, that's plenty. That machine can be hosted in a mega-datacenter with literally millions of machines, achieving economies of scale. However, this centralization incurs latency and worldwide bandwidth costs when the users don't happen to be nearby.

With Workers, on the other hand, every tenant, regardless of traffic level, currently runs in every Cloudflare location. And in our quest to get as close to the end user as possible, we sometimes choose locations that only have space for a limited number of machines. The net result is that we need to be able to host thousands of active tenants per machine, with the ability to rapidly spin up inactive ones on-demand. That means that each guest cannot take more than a couple megabytes of memory -- hardly enough space for a call stack, much less everything else that a process needs.

Moreover, we need context switching to be extremely cheap. Many Workers resident in memory will only handle an event every now and then, and many Workers spend less than a fraction of a millisecond on any particular event. In this environment, a single core can easily find itself switching between thousands of different tenants every second. Moreover, to handle one event, a significant amount of communication needs to happen between the guest application and its host, meaning still more switching and communications overhead. If each tenant lives in its own process, all this overhead is orders of magnitude larger than if many tenants live in a single process. When using strict process isolation in Workers, we find the CPU cost can easily be 10x what it is with a shared process.

In order to keep Workers inexpensive, fast, and accessible to everyone, we must solve these issues, and that means we must find a way to host multiple tenants in a single process.

There is no "fix" for Spectre

A dirty secret that the industry doesn't like to admit: no one has "fixed" Spectre. Not even when using heavyweight virtual machines. Everyone is still vulnerable.

The current approach being taken by most of the industry is essentially a game of whack-a-mole. Every couple months, researchers uncover a new Spectre vulnerability. CPU vendors release some new microcode, OS vendors release kernel patches, and everyone has to update.

But is it enough to merely deploy the latest patches?

It is abundantly clear that many more vulnerabilities exist, but haven't yet been publicized. Who might know about those vulnerabilities? Most of the bugs being published are being found by (very smart) graduate students on a shoestring budget. Imagine, for a minute, how many more bugs a well-funded government agency, able to buy the very best talent in the world, could be uncovering.

To truly defend against Spectre, we need to take a different approach. It's not enough to block individual known vulnerabilities. We must address the entire class of vulnerabilities at once.

We can't stop it, but we can slow it down

Unfortunately, it's unlikely that any catch-all "fix" for Spectre will be found. But for the sake of argument, let's try.

Fundamentally, all Spectre vulnerabilities use side channels to detect hidden processor state. Side channels, by definition, involve observing some non-deterministic behavior of a system. Conveniently, most software execution environments try hard to eliminate non-determinism, because non-deterministic execution makes applications unreliable.

However, there are a few sorts of non-determinism that are still common. The most obvious among these is timing. The industry long ago gave up on the idea that a program should take the same amount of time every time it runs, because deterministic timing is fundamentally at odds with heuristic performance optimization. Sure enough, most Spectre attacks focus on timing as a way to detect the hidden microarchitectural state of the CPU.

Some have proposed that we can solve this by making timers inaccurate or adding random noise. However, it turns out that this does not stop attacks; it only makes them slower. If the timer tracks real time at all, then anything you can do to make it inaccurate can be overcome by running an attack multiple times and using statistics to filter out the noise.

Many security researchers see this as the end of the story. What good is slowing down an attack, if the attack is still possible? Once the attacker gets your private key, it's game over, right? What difference does it make if it takes them a minute or a month?

Cascading Slow-downs

We find that, actually, measures that slow down an attack can be powerful.

Our key insight is this: as an attack becomes slower, new techniques become practical to make it even slower still. The goal, then, is to chain together enough techniques that an attack becomes so slow as to be uninteresting.

Much of cryptography, after all, is technically vulnerable to "brute force" attacks -- technically, with enough time, you can break it. But when the time required is thousands (or even billions) of years, we decide that this is good enough.

So, what do we do to slow down Spectre attacks to the point of meaninglessness?

Freezing a Spectre Attack

Step 0: Don't allow native code

We do not allow our customers to upload native-code binaries to run on our network. We only accept JavaScript and WebAssembly. Of course, many other languages, like Python, Rust, or even Cobol, can be compiled or transpiled to one of these two formats; the important point is that we do another pass on our end, using V8, to convert these formats into true native code.

This, in itself, doesn't necessarily make Spectre attacks harder. However, I present this as step 0 because it is fundamental to enabling everything else below.

Accepting native code programs implies being beholden to an existing CPU architecture (typically, x86). In order to execute code with reasonable performance, it is usually necessary to run the code directly on real hardware, severely limiting the host's control over how that execution plays out. For example, a kernel or hypervisor has no ability to prohibit applications from invoking the CLFLUSH instruction, an instruction which is very useful in side channel attacks and almost nothing else.

Moreover, supporting native code typically implies supporting whole existing operating systems and software stacks, which bring with them decades of expectations about how the architecture works under them. For example, x86 CPUs allow a kernel or hypervisor to disable the RDTSC instruction, which reads a high-precision timer. Realistically, though, disabling it will break many programs because they are implemented to use RDTSC any time they want to know the current time.

Supporting native code would bind our hands in terms of mitigation techniques. By using an abstract intermediate format, we have much greater freedom.

Step 1: Disallow timers and multi-threading

In Workers, you can get the current time using the JavaScript Date API, for example by calling Date.now(). However, the time value returned by this is not really the current time. Instead, it is the time at which the network message was received which caused the application to begin executing. While the application executes, time is locked in place. For example, say an attacker writes:

let start = Date.now();
for (let i = 0; i < 1e6; i++) {
  doSpectreAttack();
}
let end = Date.now();

The values of start and end will always be exactly the same. The attacker cannot use Date to measure the execution time of their code, which they would need to do to carry out an attack.

As an aside: This is a measure we actually implemented in mid-2017, long before Spectre was announced (and before we knew about it). We implemented this measure because we were worried about timing side channels in general. Side channels have been a concern of the Workers team from day one, and we have designed our system from the ground up with this concern in mind.

Related to our taming of Date, we also do not permit multi-threading or shared memory in Workers. Everything related to the processing of one event happens on the same thread -- otherwise, it would be possible to "race" threads in order to "MacGyver" an implicit timer. We don't even allow multiple Workers operating on the same request to run concurrently. For example, if you have installed a Cloudflare App on your zone which is implemented using Workers, and your zone itself also uses Workers, then a request to your zone may actually be processed by two Workers in sequence. These run in the same thread.

So, we have prevented code execution time from being measured locally. However, that doesn't actually prevent it from being measured: it can still be measured remotely. For example, the HTTP client that is sending a request to trigger the execution of the Worker can measure how long it takes for the Worker to respond. Of course, such a measurement is likely to be very noisy, since it would have to traverse the Internet. Such noise can be overcome, in theory, by executing the attack many times and taking an average.

Another aside: Some people have suggested that if a serverless platform like Workers were to completely reset an application's state between requests, so that every request "starts fresh", this would make attacks harder. That is, imagine that a Worker's global variables were reset after every request, meaning you cannot store state in globals in one request and then read that state in the next. Then, doesn't that mean the attack has to start over from scratch for every request? If each request is limited to, say, 50ms of CPU time, does that mean that a Spectre attack isn't possible, because there's not enough time to carry it out? Unfortunately, it's not so simple. State doesn't have to be stored in the Worker; it could instead be stored in a conspiring client. The server can return its state to the client in each response, and the client can send it back to the server in the next request.

But is an attack based on remote timers really feasible in practice? In adversarial testing, with help from leading Spectre experts, we have not been able to develop an attack that actually works in production.

However, we don't feel the lack of a working attack means we should stop building defenses. Instead, we're currently testing some more advanced measures, which we plan to roll out in the coming weeks.

Step 2: Dynamic Process Isolation

We know that if an attack is possible at all, it would take a very long time to run -- hours at the very least, maybe as long as weeks. But once an attack has been running even for a second, we have a huge amount of new data that we can use to trigger further measures.

Spectre attacks, you see, do a lot of "weird stuff" that you wouldn't usually expect to see in a normal program. These attacks intentionally try to create pathological performance scenarios in order to amplify microarchitectural effects. This is especially true when the attack has already been forced to run billions of times in a loop in order to overcome other mitigations, like those discussed above. This tends to show up in metrics like CPU performance counters.

Now, the usual problem with using performance metrics to detect Spectre attacks is that sometimes you get false positives. Sometimes, a legitimate program behaves really badly. You can't go around shutting down every app that has bad performance.

Luckily, we don't have to. Instead, we can choose to reschedule any Worker with suspicious performance metrics into its own process. As I described above, we can't do this with every Worker, because the overhead would be too high. But, it's totally fine to process-isolate just a few Workers, defensively. If the Worker is legitimate, it will keep operating just fine, albeit with a little more overhead. Fortunately for us, the nature of our platform is such that we can reschedule a Worker into its own process at basically any time.

In fact, fancy performance-counter based triggering may not even be necessary here. If a Worker merely uses a large amount of CPU time per event, then the overhead of isolating it in its own process is relatively less, because it switches context less often. So, we might as well use process isolation for any Worker that is CPU-hungry.

Once a Worker is isolated, then we can rely on the operating system's Spectre defenses, just aslike, for example, most desktop web browsers now do.

Over the past year we've been working with the experts at Graz Technical University to develop this approach. (TU Graz's team co-discovered Spectre itself, and has been responsible for a huge number of the follow-on discoveries since then.) We have developed the ability to dynamically isolate workers, and we have identified metrics which reliably detect attacks. The whole system is currently undergoing testing to work out any remaining bugs, and we expect to roll it out fully within the next several weeks.

But wait, didn't I say earlier that even process isolation isn't a complete defense, because it only addresses known vulnerabilities? Yes, this is still true. However, the trend over time is that new Spectre attacks tend to be slower and slower to carry out, and hence we can reasonably guess that by imposing process isolation we have further slowed down even attacks that we don't know about yet.

Step 3: Periodic Whole-Memory Shuffling

After Step 2, we already think we've prevented all known attacks, and we're only worried about hypothetical unknown attacks. How long does a hypothetical unknown attack take to carry out? Well, obviously, nobody knows. But with all the mitigations in place so far, and considering that new attacks have generally been slower than older ones, we think it's reasonable to guess attacks will take days or longer.

On a time scale of a day, we have new things we can do. In particular, it's totally reasonable to restart the entire Workers runtime on a daily basis, which resets the locations of everything in memory, forcing attacks to restart the process of discovering the locations of secrets.

We can also reschedule Workers across physical machines or cordons, so that the window to attack any particular neighbor is limited.

In general, because Workers are fundamentally preemptible (unlike containers or VMs), we have a lot of freedom to frustrate attacks.

Once we have dynamic process isolation fully deployed, we plan to develop these ideas next. We see this as an ongoing investment, not something that will ever be "done".

Conclusion

Phew. You just read twelve pages about Workers security. Hopefully I've convinced you that designing a secure sandbox is only the beginning of building a secure compute platform, and the real work is never done. Popular security culture often dwells on clever hacks and clean fixes. But for the difficult real-world problems, often there is no right answer or simple fix, only the hard work of building defenses thicker and thicker.

WebAssembly on Cloudflare Workers

Kenton Varda — Mon, 01 Oct 2018 15:00:00 GMT

We just announced ten major new products and initiatives over Crypto Week and Birthday Week, but our work is never finished. We're continuously upgrading our existing products with new functionality.

Today, we're extending Cloudflare Workers with support for WebAssembly. All Workers customers can now augment their applications with WASM at no additional cost.

What is WebAssembly?

WebAssembly -- often abbreviated as "WASM" -- is a technology that extends the web platform to support compiled languages like C, C++, Rust, Go, and more. These languages can be compiled to a special WASM binary format and then loaded in a browser.

WASM code is securely sandboxed, just like JavaScript. But, because it is based on compiled lower-level languages, it can be much faster for certain kinds of resource-intensive tasks where JavaScript is not a good fit. In addition to performance benefits, WASM allows you to reuse existing code written in languages other than JavaScript.

What are Workers?

For those that don't know: Cloudflare Workers lets you deploy "serverless" JavaScript code directly to our 153-and-growing datacenters. Your Worker handles your site's HTTP traffic directly at the location closest to your end user, allowing you to achieve lower latency and reduce serving costs. Last week we added storage to Workers, making it possible to build applications that run entirely on Cloudflare.

Until now, Workers has only supported JavaScript. With the addition of WebAssembly, you can now use a wide range of languages and do more, faster. As always, when you deploy code on Cloudflare, it is distributed to every one of our locations world-wide in under 30 seconds.

When to use WebAssembly

It's important to note that WASM is not always the right tool for the job. For lightweight tasks like redirecting a request to a different URL or checking an authorization token, sticking to pure JavaScript is probably both faster and easier than WASM. WASM programs operate in their own separate memory space, which means that it's necessary to copy data in and out of that space in order to operate on it. Code that mostly interacts with external objects without doing any serious "number crunching" likely does not benefit from WASM.

On the other hand, WASM really shines when you need to perform a resource-hungry, self-contained operation, like resizing an image, or processing an audio stream. These operations require lots of math and careful memory management. While it's possible to perform such tasks in pure JavaScript — and engines like V8 have gone to impressive lengths to optimize such code — in the end nothing beats a compiled language with static types and explicit allocation.

As an example, the image below is resized dynamically by a Cloudflare Worker using a WebAssembly module to decode and resize the image. Only the original image is cached — the resize happens on-the-fly at our edge when you move the slider. Find the code here.

How to use WebAssembly with Cloudflare Workers

WASM used in a Worker must be deployed together with the Worker. When editing a script in the online Workers editor, click on the "Resources" tab. Here, you can add a WebAssembly module.

You will be prompted to upload your WASM module file and assign it a global variable name. Once uploaded, your module will appear as a global variable of type WebAssembly.Module in your worker script. You can then instantiate it like this:

// Define imported functions that your WASM can call.
const imports = { exampleImport(a, b) { return a + b; } }

// Instantiate the module.
const instance = new WebAssembly.Instance(MY_WASM_MODULE, imports)

// Now you can call the functions that your WASM exports.
instance.exports.exampleExport(123);

Check out the MDN WebAssembly API documentation for more details on instantiating WebAssembly modules.

You can also, of course, upload WebAssembly modules via our API instead of the online editor.

Check out the documentation for details »

Building WASM modules

Today, building a WebAssembly module for Cloudflare is a somewhat manual process involving low-level tools. Check out our demo repository for details.

Now that the basic support is in place, we plan to work with Emscripten and the rest of the WASM community to make sure building WASM for Cloudflare Workers is as seamless as building for a web browser. Stay tuned for further developments.

The Future

We're excited by the possibilities that WebAssembly opens up. Perhaps, by integrating with Cloudflare Spectrum, we could allow existing C/C++ server code to handle arbitrary TCP and UDP protocols on the edge, like a sort of massively-distributed inetd. Perhaps game servers could reduce latency by running on Cloudflare, as close to the player as possible. Maybe, with the help of some GPUs and OpenGL bindings, you could do 3D rendering and real-time streaming directly from the edge. Let us know what you'd like to see »

Want to help us build it? We're hiring!

Everyone can now run JavaScript on Cloudflare with Workers

Kenton Varda — Tue, 13 Mar 2018 13:00:00 GMT

This post is also available in 日本語.

Exactly one year ago today, Cloudflare gave me a mission: Make it so people can run code on Cloudflare's edge. At the time, we didn't yet know what that would mean. Would it be container-based? A new Turing-incomplete domain-specific language? Lua? "Functions"? There were lots of ideas.

Eventually, we settled on what now seems the obvious choice: JavaScript, using the standard Service Workers API, running in a new environment built on V8. Five months ago, we gave you a preview of what we were building, and started the beta.

Today, with thousands of scripts deployed and many billions of requests served, Cloudflare Workers is now ready for everyone.

"Moving away from VCL and adopting Cloudflare Workers will allow us to do some creative routing that will let us deliver JavaScript to npm's millions of users even faster than we do now. We will be building our next generation of services on Cloudflare's platform and we get to do it in JavaScript!"

— CJ Silverio, CTO, npm, Inc.

What is the Cloud, really?

Historically, web application code has been split between servers and browsers. Between them lies a vast but fundamentally dumb network which merely ferries data from point to point.

We don't believe this lives up to the promise of "The Cloud."

We believe the true dream of cloud computing is that your code lives in the network itself. Your code doesn't run in "us-west-4" or "South Central Asia (Mumbai)", it runs everywhere.

More concretely, it should run where it is most needed. When responding to a user in New Zealand, your code should run in New Zealand. When crunching data in your database, your code should run on the machines that store the data. When interacting with a third-party API, your code should run wherever that API is hosted. When human explorers reach Mars, they aren't going to be happy waiting a half an hour for your app to respond -- your code needs to be running on Mars.

Cloudflare Workers are our first step towards this vision. When you deploy a Worker, it is deployed to Cloudflare's entire edge network of over a hundred locations worldwide in under 30 seconds. Each request for your domain will be handled by your Worker at a Cloudflare location close to the end user, with no need for you to think about individual locations. The more locations we bring online, the more your code just "runs everywhere."

Well, OK… it won't run on Mars. Yet. You out there, Elon?

What's a Worker?

Cloudflare Workers derive their name from Web Workers, and more specifically Service Workers, the W3C standard API for scripts that run in the background in a web browser and intercept HTTP requests. Cloudflare Workers are written against the same standard API, but run on Cloudflare's servers, not in a browser.

Here are the tools you get to work with:

Execute any JavaScript code, using the latest standard language features.
Intercept and modify HTTP request and response URLs, status, headers, and body content.
Respond to requests directly from your Worker, or forward them elsewhere.
Send HTTP requests to third-party servers.
Send multiple requests, in serial or parallel, and use the responses to compose a final response to the original request.
Send asynchronous requests after the response has already been returned to the client (for example, for logging or analytics).
Control other Cloudflare features, such as caching behavior.

The possible uses for Workers are infinite, and we're excited to see what our customers come up with. Here are some ideas we've seen in the beta:

Route different types of requests to different origin servers.
Expand HTML templates on the edge, to reduce bandwidth costs at your origin.
Apply access control to cached content.
Redirect a fraction of users to a staging server.
Perform A/B testing between two entirely different back-ends.
Build "serverless" applications that rely entirely on web APIs.
Create custom security filters to block unwanted traffic unique to your app.
Rewrite requests to improve cache hit rate.
Implement custom load balancing and failover logic.
Apply quick fixes to your application without having to update your production servers.
Collect analytics without running code in the user's browser.
Much more.

Here's an example.

// A Worker which:
// 1. Redirects visitors to the home page ("/") to a
//    country-specific page (e.g. "/US/").
// 2. Blocks hotlinks.
// 3. Serves images directly from Google Cloud Storage.
addEventListener('fetch', event => {
  event.respondWith(handle(event.request))
})

async function handle(request) {
  let url = new URL(request.url)
  if (url.pathname == "/") {
    // This is a request for the home page ("/").
    // Redirect to country-specific path.
    // E.g. users in the US will be sent to "/US/".
    let country = request.headers.get("CF-IpCountry")
    url.pathname = "/" + country + "/"
    return Response.redirect(url, 302)

  } else if (url.pathname.startsWith("/images/")) {
    // This is a request for an image (under "/images").
    // First, block third-party referrers to discourage
    // hotlinking.
    let referer = request.headers.get("Referer")
    if (referer &&
        new URL(referer).hostname != url.hostname) {
      return new Response(
          "Hotlinking not allowed.",
          { status: 403 })
    }

    // Hotlink check passed. Serve the image directly
    // from Google Cloud Storage, to save serving
    // costs. The image will be cached at Cloudflare's
    // edge according to its Cache-Control header.
    url.hostname = "example-bucket.storage.googleapis.com"
    return fetch(url, request)
  } else {
    // Regular request. Forward to origin server.
    return fetch(request)
  }
}

It's Really Fast

Sometimes people ask us if JavaScript is "slow". Nothing could be further from the truth.

Workers uses the V8 JavaScript engine built by Google for Chrome. V8 is not only one of the fastest implementations of JavaScript, but one of the fastest implementations of any dynamically-typed language, period. Due to the immense amount of work that has gone into optimizing V8, it outperforms just about any popular server programming language with the possible exceptions of C/C++, Rust, and Go. (Incidentally, we will support those soon, via WebAssembly.)

The bottom line: A typical Worker script executes in less than one millisecond. Most users are unable to measure any latency difference when they enable Workers -- except, of course, when their worker actually improves latency by responding directly from the edge.

On another speed-related note, Workers deploy fast, too. Workers deploy globally in under 30 seconds from the time you save and enable the script.

Pricing

Workers are a paid add-on to Cloudflare. We wanted to keep the pricing as simple as possible, so here's the deal:

Get Started

Log into your Cloudflare account and visit the "Workers" section to configure Workers.
Experiment with Workers in the Playground, no account required.
Read the documentation to learn how Workers are written.
Check out the original announcement blog post for more technical details.
Discuss Workers in the Cloudflare Community.

"Cloudflare Workers saves us a great deal of time. Managing bot traffic without Workers would consume valuable development and server resources that are better spent elsewhere."

— John Thompson, Senior System Administrator, MaxMind

How to Monkey-Patch the Linux Kernel

Kenton Varda — Mon, 23 Oct 2017 23:28:00 GMT

I have a weird setup. I type in Dvorak. But, when I hold ctrl or alt, my keyboard reverts to Qwerty.

You see, the classic text-editing hotkeys, ctrl+Z, ctrl+X, ctrl+C, and ctrl+V are all located optimally for a Qwerty layout: next to the control key, easy to reach with your left hand while mousing with your right. In Dvorak, unfortunately, these hotkeys are scattered around mostly on the right half of the keyboard, making them much less convenient. Using Dvorak for typing but Qwerty for hotkeys turns out to be a nice compromise.

But, the only way I could find to make this work on Linux / X was to write a program that uses X "grabs" to intercept key events and rewrite them. That was mostly fine, until recently, when my machine, unannounced, updated to Wayland. Remarkably, I didn't even notice at first! But at some point, I realized my hotkeys weren't working right. You see, Wayland, unlike X, actually has some sensible security rules, and as a result, random programs can't just capture all keyboard events anymore. Which broke my setup.

Yes, that's right, I'm that guy:

Source: xkcd 1172

So what was I to do? I began worrying that I'd need to modify the keyboard handling directly in Wayland or in the Linux kernel. Maintaining my own fork of core system infrastructure that changes frequently was not an attractive thought.

Desperate, I asked the Cloudflare Engineering chat channel if anyone knew a better way. That's when Marek Kroemeke came to the rescue:

Following Marek's link, I found:

#! /usr/bin/env stap

# This is not useful, but it demonstrates that
# Systemtap can modify variables in a running kernel.

# Usage: ./keyhack.stp -g

probe kernel.function("kbd_event") {
  # Changes 'm' to 'b' .
  if ($event_code == 50) $event_code = 48
}

probe end {
  printf("\nDONE\n")
}

Oh my. What is this? What do you mean, "this is not useful"? This is almost exactly what I want!

SystemTap: Not just for debugging?

SystemTap is a tool designed to allow you to probe the Linux kernel for debugging purposes. It lets you hook any kernel function (yes, any C function defined anywhere in the kernel) and log the argument values, or other system state. Scripts are written in a special language designed to prevent you from doing anything that could break your system.

But it turns out you can do more than just read: With the -g flag (for "guru mode", in which you accept responsibility for your actions), you can not just read, but modify. Moreover, you can inject raw C code, escaping the restrictions of SystemTap's normal language.

SystemTap's command-line tool, stap, compiles your script into a Linux kernel module and loads it. The module, on load, will find the function you want to probe and will overwrite it with a jump to your probing code. The probe code does what you specify, then jumps back to the original function body to continue as usual. When you terminate stap (e.g. via ctrl+C on the command line), it unloads the module, restoring the probed function to its original state.

This means it's easy and relatively safe to inject a probe into your running system at any time. If it doesn't do what you want, you can safely remove it, modify it, and try again. There's no need to modify the actual kernel code nor recompile your kernel. You can make your changes without maintaining a fork.

This is, of course, a well-known practice in dynamic programming languages, where it's generally much easier. We call it "Monkey-Patching".

When is it OK to Monkey-Patch?

"Monkey-patch" is often used as a pejorative. Many developers cringe at the thought. It's an awful hack! Never do that!

Indeed, in a lot of contexts, monkey-patching is a terrible idea. At a previous job, I spent weeks debugging problems caused by a bad (but well-meaning) monkey-patch made by one of our dependencies.

But, often, a little monkey-patch can save a lot of work. By monkey-patching my kernel, I can get the keyboard behavior I want without maintaining a fork forever, and without spending weeks developing a feature worthy of pushing upstream. And when patching my own machine, I can't hurt anyone but myself.

I would propose two rules for monkey patching:

Only the exclusive owner of the environment may monkey-patch it. The "owner" is an entity who has complete discretion and control over all code that exists within the environment in which the monkey-patch is visible. For a self-contained application which specifies all its dependencies precisely, the application developer may be permitted to monkey-patch libraries within the application's runtime -- but libraries and frameworks must never apply monkey-patches. When we're talking about the kernel, the "owner" is the system administrator.
The owner takes full responsibility for any breakages caused. If something doesn't work right, it's up to the owner to deal with it or abandon their patch.

In this case, I'm the owner of my system, and therefore I have the right to monkey-patch it. If my monkey-patch breaks (say, because the kernel functions I was patching changed in a later kernel version), or if it breaks other programs I use, that's my problem and I'll deal with it.

Setting Up

To use SystemTap, you must have the kernel headers and debug symbols installed. I found the documentation was not quite right on my Debian system. I managed to get everything installed by running:

sudo apt install systemtap linux-headers-amd64 linux-image-amd64-dbg

Note that the debug symbols are a HUGE package (~500MB). Such is the price you pay, it seems.

False Starts

Starting from the sample script that remaps 'm' to 'b', it seemed obvious how to proceed. I saved the script to a file and did:

sudo stap -g keyhack.stp

But… nothing happened. My 'm' key still typed 'm'.

To debug, I added some printf() statements (which conveniently print to the terminal where stap runs). But, it appeared the keyboard events were indeed being captured. So why did 'm' still type 'm'?

It turns out, no one was listening. The kbd_event function is part of the text-mode terminal support. Sure enough, if I switched virtual terminals over to a text terminal, the key was being remapped. But Wayland uses a totally different code path to receive key events -- the /dev/input devices. These devices are implemented by the evdev module.

Looking through evdev.c, at first evdev_event() looks tempting as a probe point: it has almost the same signature as kbd_event(). Unfortunately, this function is not usually called by the driver; rather, the multi-event version, evdev_events(), usually is. But that version takes an array, which seems more tedious to deal with.

Looking further, I came across __pass_event(), which evdev_events() calls for each event. It's slightly different from kbd_event() in that the event is encapsulated in a struct, but at least it only takes one event at a time. This seemed like the easiest place to probe, so I tried it:

# DOES NOT WORK
probe module("evdev").function("__pass_event") {
  # Changes 'm' to 'b'.
  if ($event->code == 50) $event->code = 48
}

Alas, this didn't quite work. When running stap, I got:

semantic error: failed to retrieve location attribute for 'event'

This error seems strange. The function definitely has a parameter called event!

The problem is, __pass_event() is a static function that is called from only one place. As a result, the compiler inlines it. When a function is inlined, its parameters often cease to have a well-defined location in memory, so reading and modifying them becomes infeasible. SystemTap relies on debug info tables that specify where to find parameters, but in this case the tables don't have an answer.

The Working Version

Alas, it seemed I'd need to use evdev_events() and deal with the array after all. This function takes an array of events to deliver at once, so its parameters aren't quite as convenient. But, it has multiple call sites, so it isn't inlined. I just needed a little loop:

probe module("evdev").function("evdev_events") {
  for (i = 0; i < $count; i++) {
    # Changes 'm' to 'b'.
    if ($vals[i]->code == 50) $vals[i]->code = 48
  }
}

Success! This script works. I no longer have any way to type 'm'.

From here, implementing the Dvorak-Qwerty key-remapping behavior I wanted was a simple matter of writing some code to track modifier key state and remap keys. You can find my full script on GitHub.

Introducing Cloudflare Workers: Run JavaScript Service Workers at the Edge

Kenton Varda — Fri, 29 Sep 2017 13:00:00 GMT

UPDATE 2018/3/13: Cloudflare Workers is now available to everyone.

TL;DR: You'll soon be able to deploy JavaScript to Cloudflare's edge, written against an API similar to Service Workers.

Try writing a Worker in the playground »

Introduction

Every technology, when sufficiently complicated, becomes programmable.

You see this everywhere, but as a lifelong gamer, my personal favorite example is probably graphics cards. In the '90s, graphics hardware generally provided a fixed set of functionality. The OpenGL standard specified that the geometry pipeline would project points from 3D space onto your viewport, then the raster pipeline would draw triangles between them, with gradient shading and perhaps a texture applied. You could only use one texture at a time. There was only one lighting algorithm, which more or less made every surface look like plastic. If you wanted to do anything else, you often had to give up the hardware entirely and drop back to software.

Of course, new algorithms and techniques were being developed all the time. So, hardware vendors would add the best ideas to their hardware as "extensions". OpenGL ended up with hundreds of vendor-specific extensions to support things like multi-texturing, bump maps, reflections, dynamic shadows, and more.

Then, in 2001, everything changed. The first GPU with a programmable shading pipeline was released. Now you could write little programs that ran directly on the hardware, processing each vertex or pixel in arbitrary ways. Now people could experiment with algorithms for rendering realistic skin, or cartoon shading, or so much else, without waiting for hardware vendors to implement their ideas for them.

Cloudflare is about to go through a similar transition. At its most basic level, Cloudflare is an HTTP cache that runs in 117 locations worldwide (and growing). The HTTP standard defines a fixed feature set for HTTP caches. Cloudflare, of course, does much more, such as providing DNS and SSL, shielding your site against attacks, load balancing across your origin servers, and so much else.

But, these are all fixed functions. What if you want to load balance with a custom affinity algorithm? What if standard HTTP caching rules aren't quite right, and you need some custom logic to boost your cache hit rate? What if you want to write custom WAF rules tailored for your application?

You want to write code

We can keep adding features forever, but we'll never cover every possible use case this way. Instead, we're making Cloudflare's edge network programmable. We provide servers in 117+ locations around the world -- you decide how to use them.

Of course, when you have hundreds of locations and millions of customers, traditional means of hosting software don't quite work. We can't very well give every customer their own virtual machine in each location -- or even their own container. We need something both more scalable and easier for developers to manage. Of course, security is also a concern: we must ensure that code deployed to Cloudflare cannot damage our network nor harm other customers.

After looking at many possibilities, we settled on the most ubiquitous language on the web today: JavaScript.

We run JavaScript using V8, the JavaScript engine developed for Google Chrome. That means we can securely run scripts from multiple customers on our servers in much the same way Chrome runs scripts from multiple web sites -- using technology that has had nearly a decade of scrutiny. (Of course, we add a few sandboxing layers of our own on top of this.)

But what API is this JavaScript written against? For this, we looked to web standards -- specifically, the Service Worker API. Service Workers are a feature implemented by modern browsers which allow you to load a script which intercepts web requests destined for your server before they hit the network, allowing you a chance to rewrite them, redirect them, or even respond directly. Service Workers were designed to run in browsers, but it turns out that the Service Worker API is a perfect fit for what we wanted to support on the edge. If you've ever written a Service Worker, then you already know how to write a Cloudflare Service Worker.

What it looks like

Here are some examples of Service Workers you might run on Cloudflare.

Remember: these are written against the standard Service Workers API. The only difference is that they run on Cloudflare's edge rather than in the browser.

Here is a worker which skips the cache for requests that have a Cookie header (e.g. because the user is logged in). Of course, a real-life site would probably have more complicated conditions for caching, but this is code, so you can do anything.

// A Service Worker which skips cache if the request contains
// a cookie.
addEventListener('fetch', event => {
  let request = event.request
  if (request.headers.has('Cookie')) {
    // Cookie present. Add Cache-Control: no-cache.
    let newHeaders = new Headers(request.headers)
    newHeaders.set('Cache-Control', 'no-cache')
    event.respondWith(fetch(request, {headers: newHeaders}))
  }

  // Use default behavior.
  return
})

Here is a worker which performs a site-wide search-and-replace, replacing the word "Worker" with "Minion". Try it out on this blog post.

// A Service Worker which replaces the word "Worker" with
// "Minion" in all site content.
addEventListener("fetch", event => {
  event.respondWith(fetchAndReplace(event.request))
})

async function fetchAndReplace(request) {
  // Fetch from origin server.
  let response = await fetch(request)

  // Make sure we only modify text, not images.
  let type = response.headers.get("Content-Type") || ""
  if (!type.startsWith("text/")) {
    // Not text. Don't modify.
    return response
  }

  // Read response body.
  let text = await response.text()

  // Modify it.
  let modified = text.replace(
      /Worker/g, "Minion")

  // Return modified response.
  return new Response(modified, {
    status: response.status,
    statusText: response.statusText,
    headers: response.headers
  })
}

Here is a worker which searches the page content for URLs wrapped in double-curly-brackets, fetches those URLs, and then substitutes them into the page. This implements a sort of primitive template engine supporting something like "Edge Side Includes".

// A Service Worker which replaces {{URL}} with the contents of
// the URL. (A simplified version of "Edge Side Includes".)
addEventListener("fetch", event => {
  event.respondWith(fetchAndInclude(event.request))
})

async function fetchAndInclude(request) {
  // Fetch from origin server.
  let response = await fetch(request)

  // Make sure we only modify text, not images.
  let type = response.headers.get("Content-Type") || ""
  if (!type.startsWith("text/")) {
    // Not text. Don't modify.
    return response
  }

  // Read response body.
  let text = await response.text()

  // Search for instances of {{URL}}.
  let regexp = /{{([^}]*)}}/g
  let parts = []
  let pos = 0
  let match
  while (match = regexp.exec(text)) {
    let url = new URL(match[1], request.url)
    parts.push({
      before: text.slice(pos, match.index),
      // Start asynchronous fetch of this URL.
      promise: fetch(url.toString())
          .then((response) => response.text())
    })
    pos = regexp.lastIndex
  }

  // Now that we've started all the subrequests,
  // wait for each and collect the text.
  let chunks = []
  for (let part of parts) {
    chunks.push(part.before)
    // Wait for the async fetch from earlier to complete.
    chunks.push(await part.promise)
  }
  chunks.push(text.slice(pos))
  // Concatenate all text and return.
  return new Response(chunks.join(""), {
    status: response.status,
    statusText: response.statusText,
    headers: response.headers
  })
}

Play with it yourself!

We've created the Cloudflare Workers playground at cloudflareworkers.com where you can try writing your own script and applying it to your site.

Try it out now »

Questions and Answers

Is it "Cloudflare Workers" or "Cloudflare Service Workers"?

A "Cloudflare Worker" is JavaScript you write that runs on Cloudflare's edge. A "Cloudflare Service Worker" is specifically a worker which handles HTTP traffic and is written against the Service Worker API. Currently, this is the only kind of worker we've implemented, but in the future we may introduce other worker types for certain specialized tasks.

What can I do with Service Workers on the edge?

Anything and everything. You're writing code, so the possibilities are infinite. Your Service Worker will intercept all HTTP requests destined for your domain, and can return any valid HTTP response. Your worker can make outgoing HTTP requests to any server on the public internet.

Here are just a few ideas how to use Service Workers on Cloudflare:

Improve performance

Use custom logic to decide which requests are cacheable at the edge, and canonicalize them to improve cache hit rate.
Expand HTML templates directly on the edge, fetching only dynamic content from your server.
Respond to stateless requests directly from the edge without contacting your origin server at all.
Split one request into multiple parallel requests to different servers, then combine the responses.

Enhance security

Implement custom security rules and filters.
Implement custom authentication and authorization mechanisms.

Increase reliability

Deploy fast fixes to your site in seconds, without having to update your own servers.
Implement custom load balancing and failover logic.
Respond dynamically when your origin server is unreachable.

But these are just examples. The whole point of Cloudflare Workers is that you can do things we haven't thought of!

Why JavaScript?

Cloudflare Workers are written in JavaScript, executed using the V8 JavaScript engine (from Google Chrome). We chose JavaScript and V8 for two main reasons:

Security: The V8 JavaScript engine is arguably the most scrutinized code sandbox in the history of computing, and the Chrome security team is one of the best in the world. Moreover, Google pays massive bug bounties to anyone who can find a vulnerability. (That said, we have added additional layers of our own sandboxing on top of V8.)
Ubiquity: JavaScript is everywhere. Anyone building a web application already needs to know it: whereas their server could be written in a variety of languages, the client has to be JavaScript, because that's what browsers run.

We did consider several other possibilities:

Lua: Lua is already deeply integrated into nginx, providing exactly the kind of scripting hooks that we need -- indeed, much of our own business logic running at the edge today is written in Lua. Moreover, Lua already provides facilities for sandboxing. However, in practice, Lua's security as a sandbox has received limited scrutiny, as, historically, there has not been much value in finding a Lua sandbox breakout -- this would change rapidly if we chose it, probably leading to trouble. Moreover, Lua is not very widely known among web developers today.
Virtual machines: Virtual machines are, of course, widely used and scrutinized as sandboxes, and most web service back-end developers are familiar with them already. However, virtual machines are heavy: each one must be allocated hundreds of megabytes of RAM and typically takes tens of seconds to boot. We need a solution that allows us to deploy every customer's code to every one of our hundreds of locations. That means we need each one's RAM overhead to be as low as possible, and we need startup to be fast enough that we can do it on-demand, so that we can safely shut down workers that aren't receiving traffic. Virtual machines do not scale to these needs.
Containers: My personal background is in container-based sandboxing. With careful use of Linux "namespaces" paired with a strong seccomp-bpf filter and other attack surface reduction techniques, it's possible to set up a pretty secure sandbox which can run native Linux binaries. This would have the advantage that we could allow developers to deploy native code, or code written in any language that runs on Linux. However, even though containers are much more efficient than virtual machines, they still aren't efficient enough. Each worker would have to run in its own OS-level process, consuming RAM and inducing context-switching overhead. And while native code can load quickly, many server-oriented language environments are not optimized for startup time. Finally, container security is still immature: although a properly-configured container can be pretty secure, we still see container breakout bugs being found in the Linux kernel every now and then.
Vx32: We considered a fascinating little sandbox known as Vx32, which uses "software fault isolation" to be able to run native-code binaries in a sandboxed way with multiple sandboxes per process. While this approach was tantalizing in its elegance, it had the down side that developers would need to cross-compile their code to a different platform, meaning we'd have to spend a great deal of time on tooling for a smooth experience. Moreover, while it would mitigate some of the context switching overhead compared to multiple processes, RAM usage would still likely be high as very little of the software stack could be shared between sandboxes.

Ultimately, it became clear to us that V8 was the best choice. The final nail in the coffin was the realization that V8 includes WebAssembly out-of-the-box, meaning that people who really need to deploy code written in other languages can still do so.

Why not Node.js?

Node.js is a server-oriented JavaScript runtime that also uses V8. At first glance, it would seem to make a lot of sense to reuse Node.js rather than build on V8 directly.

However, as it turns out, despite being built on V8, Node is not designed to be a sandbox. Yes, we know about the vm module, but if you look closely, it says right there on the page: "Note: The vm module is not a security mechanism. Do not use it to run untrusted code."

As such, if we were to build on Node, we'd lose the benefits of V8's sandbox. We'd instead have to do process-level (a.k.a. container-based) sandboxing, which, as discussed above, is less secure and less efficient.

Why the Service Worker API?

Early on in the design process, we nearly made a big mistake.

Nearly everyone who has spent a lot of time scripting nginx or otherwise working with HTTP proxy services (so, basically everyone at Cloudflare) tends to have a very specific idea of what the API should look like. We all start from the assumption that we'd provide two main "hooks" where the developer could insert a callback: a request hook and a response hook. The request hook callback could modify the request, and the response hook callback modify the response. Then we think about the cache, and we say: ah, some hooks should run pre-cache and some post-cache. So now we have four hooks. Generally, it was assumed these hooks would be pure, non-blocking functions.

Then, between design meetings at our London office, I had lunch with Ingvar Stepanyan, who among other things had been doing work with Service Workers in the browser. Ingvar pointed out the obvious: This is exactly the use case for which the W3C Service Workers API was designed. Service Workers implement proxies and control caching, traditionally in the browser.

But the Service Worker API is not based on a request hook and a response hook. Instead, a Service Worker implements an endpoint: it registers one event handler which receives requests and responds to those requests. That handler, though, is asynchronous, meaning it can do other I/O before producing a response. Among other things, it can make its own HTTP requests (which we call "subrequests"). So, a simple service worker can modify the original request, forward it to the origin as a subrequest, receive a response, modify that, and then return it: everything the hooks model can do.

But a Service Worker is much more powerful. It can make multiple subrequests, in series or in parallel, and combine the results. It can respond directly without making a subrequest at all. It can even make an asynchronous subrequest after it has already responded to the original request. A Service Worker can also directly manipulate the cache through the Cache API. So, there's no need for "pre cache" and "post cache" hooks. You just stick the cache lookup into your code where you want it.

To add icing to the cake, the Service Worker API, and related modern web APIs like Fetch and Streams, have been designed very carefully by some incredibly smart people. It uses modern JavaScript idioms like Promises, and it is well-documented by MDN and others. Had we designed our own API, it would surely have been worse on all counts.

It quickly became obvious to us that the Service Worker API was the correct API for our use case.

When can I use it?

Cloudflare Workers are a big change to Cloudflare and we're rolling it out slowly. If you'd like to get early access -- or just want to be notified when it's ready:

If you want to read more, refer to our documentation.