The Cloudflare Blog

Moving past bots vs. humans

Thibault Meunier — Tue, 21 Apr 2026 13:00:00 GMT

For us humans to interact with the online world, we need a gateway: keyboard, screen, browser, device. What is called "human detection" online are patterns that humans use when interacting with such devices. These patterns have changed in recent years: a startup CEO now uses their browser to summarize the news, a tech enthusiast automates the process to book their concert tickets when sales open at night, someone who's visually impaired enables accessibility on their screen reader, and companies route their employee traffic through zero trust proxies.

At the same time, website owners are still looking to protect their data, manage their resources, control content distribution, and prevent abuse. These problems aren’t solved by knowing whether the client is a human or a bot: There are wanted bots and there are unwanted humans. These problems require knowing intent and behavior. The ability to detect automation remains critical. However, as the distinctions between actors become blurry, the systems we build now should accommodate a future where "bots vs. humans" is not the important data point.

What actually matters is not humanity in the abstract, but questions such as: is this attack traffic, is that crawler load proportional to the traffic it returns, do I expect this user to connect from this new country, are my ads being gamed?

What we discuss with the term “bots” is really two stories. The first is whether website owners should let known crawlers through when they are not getting traffic back. We have touched on this with bot authentication with http message signatures for crawlers that want to identify without being impersonated. The second is the emergence of new clients that do not embed the same behaviors as web browsers historically did, which matters for systems such as private rate limit.

In this post, we explore how web protection works today, and how it must evolve when the line between bot and human is fading.

The Web we had

When we use the Web, we don't talk directly to the thousands of servers we interact with every day. We use Web browsers. These are also known as "user agents" because they act on our behalf, representing our interests so that we can safely shop, read, and watch the Web without giving sites access to our entire computer or phone.

Websites also have an interest in how browsers work. They want to make sure that their content is presented accurately (fits the screen on mobile, has the right background color, the correct language). Websites also want to ensure that people are able to complete a purchase, read their articles, use their microphone, or sign in securely without a password. They also want people to see the ads beside the articles.

This tension between the interests of browser users and websites has been going on for a long time. Publishers typically want pixel-level control over the experiences of their users, but the people on the other side of the browser often want to use the data they access in ways that weren't envisioned by the publisher.

Web browser vendors and the standards ecosystem around them have paid careful attention to balancing these interests, sometimes with great controversy. For example, you can use browser extensions to block ads, but over time browsers have restricted what such extensions can do. Accessibility standards (e.g., WCAG) have paved the way for using Web content in ways that aren't about pixels, backed in many places by regulatory requirements. One can question the specifics of each of those tradeoffs, but they come as a package: if you want to be on the Web, you have to accept it, whether you are a publisher or a user.

Now, however, that balance is shifting. Having an assistant summarize the news or aggregate research is not a new concept, but AI democratizes this capability for everyone. The friction comes from how these emerging clients operate. A human assistant might print an article or take a screenshot without the publisher knowing, but they still use a standard web browser to render the site in the first place. AI agents bypass this step, disrupting the balanced approach to publishers’ vs. users’ rights that browsers built. They quietly fetch the raw data without rendering the page. For publishers, because of their overlap with pre-existing browser traffic, these clients are inherently opaque. Website owners cannot tell if their fetched content is serving one private report (possibly distorted, possibly unattributed) or being ingested to train a model for a million users, which disrupts the predictable (and monetizable) traffic that keeps their sites online.

The implicit agreement that made the Web work is breaking down. To understand how, the next section goes over a common architecture on the Internet.

The client-server model

Let’s take a step back, and look at one of the main deployment patterns on the Internet: the client-server model. A client makes a request to a server to obtain a resource:

^{Figure 1: Client-Server model. A client sends a request which the server responds to.}

To handle more requests, a website can increase its capacity to serve; it can deploy additional servers or place a cache in front of static traffic. Similarly, the number of requests coming from the client side can increase if one client makes more requests, or if the number of clients multiplies.

^{Figure 2: Multiple clients send multiple requests to different servers, with one fronted by a CDN.}

That simplicity is part of what made the Web successful. It allows many kinds of clients to exist, and it allows the network to evolve without each server needing to know exactly what software is on the other end.

^{Figure 3: Two different client contexts that send requests to servers. Each server only sees a request, but not the end-user behind it.}

That openness also creates uncertainty. A website can see a valid request for a resource, yet it usually cannot know what happens after the response leaves the server: whether the content is rendered for one person using a keyboard, a mouse, and a screen to control a browser; or if it's an independent program making requests automatically, archiving responses, indexing them, and feeding into a larger system.

Bot management today

This model works surprisingly well. That is why operating a website can be as simple as starting a web server with a connection to the Internet. It holds only until the server has to decide which requests it can afford to serve, trust, or prioritize.

Sometimes that is about capacity. If your service is provisioned to handle 100 requests per second globally, but you're receiving 200, you have to drop certain requests. If your server only has 1 CPU but incoming requests require 2, you have to drop requests. If the cost of serving 200 is prohibitive, then you have to rate-limit all requests.

You can drop requests at random. It's possibly unfair, and may miss the target by affecting wanted clients, but it works. In the absence of other signals, there is no other choice.

And capacity is only part of the picture. Servers also try to distinguish among clients for many other reasons: to separate attacks from ordinary traffic, to manage non-malicious load, to prevent extraction of data, to limit ad fraud, to prevent fake account creation, or to stop automated actions being taken on a user's behalf.

The difficulty is that web clients are unauthenticated by default, while still exposing many partial signals. Therefore, most servers decide to apply access control logic based on the information they receive. If a single IP address is making 10x the number of requests as others, it might be blocked. A server that goes further might infer that this IP address is used by a VPN, and therefore proxies the traffic of more than one user. The service could decide to apply a coefficient: assuming each client can make 10 requests per second, a shared IP address would be allowed 100 rps before seeing their requests being dropped.

That's one of the keys to bot management: it aims to provide the server with more information about the client to help it make decisions. This information is inherently imprecise, because the client is not under the control of the server. In addition, the same information creates fingerprint vectors that can be used by the server for different purposes such as personalized advertising. This transforms a mitigation vector to a tracking vector.

At a high level, the server sees the following signals from the client:

Passive client signals: required to make a request on the Internet. Clients necessarily send your IP address, and usually establish a TLS session.
Active client signals: voluntarily provided by the client, often invisible to the end user. This includes a User-Agent header or authentication credentials.
Server signals: information the server observes, such as the geographic location of the edge server handling the request, or the local time the request is received.

To limit and cap volumetric abuse, what matters to the origin is the capability and intent of the client to make multiple requests. In the case of an ad-funded website, the origin needs confidence that ads are actually displayed to the end-user. To preserve their brand, origins may want to ensure that the client has specific rendering capabilities: PDF reader, SVG renderer, virtual keyboard. And if the request is coming from an intercepting proxy, the origin may want to ensure that the request actually originates from an end client

If traffic grows then so do the costs to operate. If clients do not generate value, monetary or not, then the server has no incentive to cover those costs.

Different operators respond to this environment differently. Some large crawlers and platforms identify themselves because predictable access is worth the cost of being attributable. It may even help. Others try to avoid identification: because they expect to be blocked, because they seek anonymity, or because they are operating on behalf of end users. The result is an unstable balance built on partial signals.

This is why the human versus bots frame is misleading. What the origin cares about is not humanity in the abstract, but whether the client is behaving in ways the site can support.

A digression: the rate limit trilemma

^{Figure 4: Rate limit trilemma. Decentralized, anonymous, accountable — pick two}

There's a fundamental tension in how we govern access on the Internet: decentralized, anonymous, accountable — pick two.

Fully decentralized + anonymous means no accountability. A blocked client can spawn a new account without impact on its reputation. This implies that origins have to invest more to manage their resources. This is the default of the Web.
Decentralized + accountable means everyone knows who you are, which works for certain use cases but has clear drawbacks. Think OAuth mechanisms such as “Log in with”, which requires account registration and revealing activity to a third party.
Anonymous + accountable likely requires governance, rules, and enforcement. No widely deployed system achieves both properties for the same actor. The closest precedent is the Web PKI, where governance (CA policies, Certificate Transparency) holds servers accountable. When that governance fails, there are consequences. No equivalent exists today for the client side.

Current tools build on elements from that first space to strive for the second: TLS fingerprints, IP addresses, robots.txt. They attempt accountability, but only hold as long as the derived fingerprints remain stable.

The important distinctions are what, not who

For a website owner deciding how to handle incoming traffic, the meaningful distinction isn't necessarily bots vs. humans. It's about balancing the origin’s needs to understand the traffic it receives with the clients’ needs to preserve their privacy.

Platforms and services that want to be identifiable

Figure 5: A crawler makes multiple request to a server

Some traffic comes from known operators making high volumes of requests: search engine crawlers, cloud platforms, enterprise infrastructure. These actors often have low privacy expectations. They're infrastructure making millions of requests from identifiable sources. The ability to identify the source of a request helps to mitigate misjudgment if an infrastructure provider is sending you too many requests or accessing pages it should not. Self-identification is one of the principles for responsible AI bots we proposed. It is based on these principles that Cloudflare operates its URL scanner for Radar, or how we expose crawling capabilities.

For this traffic, identity works. More precisely, some operators can tolerate attributable requests because reliable access is worth it. Web Bot Auth using HTTP Message Signatures allow operators to cryptographically sign their requests. OpenAI, Google, Cloudflare, or AWS, for example, sign requests originating from their platforms. Origins can verify "this request really came from the platform infrastructure" without relying on IP ranges or User-Agent strings.

Humans and other end-users rightfully have expectations other than being identifiable, to preserve anonymity without sacrificing their access and quality of experience.

Distributed traffic that needs anonymity

^{Figure 6: Three distinct browsers make a request to a server. One is operated by a human, one by an on-device assistant, and one is proxied through a corporate proxy.}

Other traffic comes from many sources, each making relatively few requests. This includes humans browsing the web, researchers doing measurements, scrapers using residential proxies, and increasingly, AI assistants acting on humans’ behalf.

And increasingly the distinction between bots and humans is moot. There is no meaningful difference between the AI assistant booking concert tickets and the human who would have done so manually. Both are distributed. Both need anonymity. In each case, an origin would want to create less friction for users who wish to use the service as intended, rather than abuse it.

Identity could work. To replace the old assumption we had for IP addresses, it should provide a unique, verifiable set of attributes tied to a specific client, proven through an account login, an email address, or a hardware key. However, it implies the need to present this identity when accessing websites. It also undermines privacy.

We want to build modern solutions that prove behavior without proving identity.

Anonymous credentials for the Web

Since 2019, clients accessing websites via Cloudflare have been able to provide such proof of behavior, by sending a privacy token along with their request. This is due to Cloudflare’s early support for Privacy Pass. Privacy Pass, as standardised in RFC 9576, RFC 9578, lets a client carry an issuer-backed proof of some prior check, such as having solved a challenge, without turning that result into a stable identifier. It defines tokens that are unlinkable with any prior visit, request, or session.

This matters because it offers a different model from fingerprinting. Instead of collecting passive signals, the server can ask the client for an active privacy-preserving signal.

This reduces the friction on session establishment. Privacy Pass has scaled to billions of tokens per day across Cloudflare’s infrastructure, primarily for privacy relay services.

^{Figure 7: Privacy Pass Redemption and Issuance Protocol Interaction from}^{Section 3.1 of RFC 9576}

The RFC highlights four roles. The issuer trusts one or more attesters to perform some checks before issuing credentials (tokens in the RFC case). The client holds these credentials and decides when to present them, within the right scope. The origin remains in control of which issuers it trusts and what each presentation means. This does not remove abuse or policy questions, it simply provides clients and servers with a privacy-preserving way to handle them.

The system is simple, but it also has bounds: it does not, for example, allow for dynamic rate limits. If a client is issued 100 tokens, and starts consuming too many resources after the first or second session, there’s no way to invalidate the remaining tokens that were previously issued.

In addition, because of the unlinkability property, it’s hard for new issuers to emerge. There is no feedback mechanism that an origin can provide regarding the quality of the signal an issuer token conveys.

Finally, there’s a 1:1 relationship between the number of tokens that an issuer provides, and the number of unlinkable presentations that can be made with those tokens when they are redeemed: one token per presentation. Ideally, we would like a system in which the client contacts an issuer once and can later make multiple presentations scoped to a particular origin context. That points toward user agents holding vouched credentials and presenting proofs derived from them, rather than repeatedly acquiring single-use tokens.

Our goal is to help establish an open private rate limiting ecosystem. In that spirit, we are helping to develop and explore new Privacy Pass primitives, such as Anonymous Rate-Limit Credentials (ARC) and Anonymous Credit Tokens (ACT).

With ACT, for instance, clients can prove something like "I have a good history with this service" without revealing "I am this user.” ACT preserves unlinkability between presentations at the protocol level, which is the key cryptographic property here. Even in the joint issuer-origin deployment model in Section 4.3 of RFC 9576, the protocol is designed so that token issuance and presentation are not directly linkable. That does not eliminate correlation through other layers such as IP addresses, cookies, account state, or timing. The same properties can be provided using standardized VOPRF and BlindRSA primitives within the reverse flow framework that ACT implements.

A successful ecosystem needs to be an open issuer ecosystem. In practice, that means more than saying anyone can mint credentials. Origins need to be able to decide which issuers to trust. User agents need a consistent way to present what is being requested. The ecosystem also needs ways for issuers to establish reputation and for relying parties to stop trusting low-quality issuers. No single gatekeeper should control participation.

To make this work, there needs to be a protocol and client API that works across browsers and other user agents. It has to be simple to deploy, clear to users, and narrow enough that browsers can place limits on abusive proof requests rather than merely surfacing them.

The trajectory if we do nothing

Website owners are already reacting to the disruption caused by emerging clients. This is partly caused by large-scale scraping and model training, and also by user agents acting in ways sites did not anticipate. Websites, therefore, have asked for more technical means to block AI crawlers and associated tools. In an ecosystem where the lines between bots and humans are increasingly blurred, the measures we have today will become less effective on their own.

If those measures aren't effective, we can expect sites to pivot: requiring an account to see any content, or tying access to a stable identifier. This means no more ad-supported login-free articles, no more "three free articles a month.” Other content businesses may move away from the Web completely, offering their data and services directly to AI vendors for a fee, or within walled gardens operated by large platforms.

These outcomes are bad. Everyone benefits from the open access to information that the Web offers. It is not that all sites will make these choices. There are many reasons for offering content online, and not all of them are commercial. But if enough sites do, they change what "normal" is on the Web to be something worse.

That matters because the open Web is an environment in which different clients can gather information from different sources without relying on a handful of players. We also benefit from having a diversity of sources of information. On a Web where access to information is largely mediated through a small handful of companies, we put too much power into too few hands. The result is not just more friction for anonymous clients, but a more brittle Internet with fewer ways for publishers to meet users.

Anonymous authentication brings some risk, too

We should be clear about what we're building. Infrastructure for proving properties can become infrastructure for requiring properties. Anonymous credentials are meant to prove something about their holder; for example, "I solved a challenge" or "I have not exceeded a rate limit." But a system that can prove any single attribute is also capable of proving other attributes, which is a source of concern.

Today, presenting a Privacy Pass token may convey "solved a CAPTCHA". Tomorrow, the same systems could prove entirely different attributes. For instance, issuing tokens only to devices that "have device attestation" excludes older devices and their users. Similarly, requiring attributes such as "has an Apple or Google account" excludes users of non-mainstream platforms.

Once the infrastructure exists to verify anonymous proofs, what gets proven can expand. We need to make sure this does not gate access to the Internet.

Why we should build it anyway

Gates already exist. Platforms increasingly require identity. Websites are blocking traffic coming from shared proxies. The question isn't whether gates will appear, it's whether the user remains in control of their privacy.

As we’ve discussed, bot management requires some signals to be shared. The alternatives to anonymous proofs are worse. Without the ability to prove attributes anonymously, every gate requires fingerprints: retry from a specific browser, link your account, don’t use a VPN. These may not even be options to people, such as the ones which have no idea their connections are proxied.

Privacy-preserving credentials do not remove the need for trust or policy. They can make those demands more explicit and less pervasive. Unlike fingerprints, proofs are explicit. Users can see what is being asked, and clients such as web browsers and AI assistants can help enforce consent.

To decide, use this guardrail

There is a simple test to evaluate the next methods for the Internet that serves everyone: do the methods allow anyone, from anywhere in the world, to build their own device, their own browser, use any operating system, and get access to the Web. If that property cannot hold, if device attestation from specific manufacturers becomes the only viable signal, we should stop.

This means we need to foster an open issuer ecosystem, where no single gatekeeper decides who can participate. In the rate limit trilemma, decentralization is mandatory on the open Web. We don't yet know fully how to build it, but we know we need to foster it.

A new balance

Until now the Web has largely been in balance. Some aspects may have been a happy accident, while others could have been inevitable. For many end users and publishers, it worked because the Web stayed open enough to support a variety of clients accessing a similar variety of resources.

That balance is at risk. Privacy-preserving primitives for the Web are one attempt to build a different outcome: privacy-preserving, open, accountable. It is not guaranteed to succeed. But it is better than waiting.

If you are interested in tracking and participating, this work happens in the open at the IETF and at the W3C. We believe the existing places where people gathered to shape the Web of today are the best places to design the Web of tomorrow.

The Internet is for the end user, and they need to be in the center of it.

We deserve a better streams API for JavaScript

James M Snell — Fri, 27 Feb 2026 06:00:00 GMT

Handling data in streams is fundamental to how we build applications. To make streaming work everywhere, the WHATWG Streams Standard (informally known as "Web streams") was designed to establish a common API to work across browsers and servers. It shipped in browsers, was adopted by Cloudflare Workers, Node.js, Deno, and Bun, and became the foundation for APIs like fetch(). It's a significant undertaking, and the people who designed it were solving hard problems with the constraints and tools they had at the time.

But after years of building on Web streams – implementing them in both Node.js and Cloudflare Workers, debugging production issues for customers and runtimes, and helping developers work through far too many common pitfalls – I've come to believe that the standard API has fundamental usability and performance issues that cannot be fixed easily with incremental improvements alone. The problems aren't bugs; they're consequences of design decisions that may have made sense a decade ago, but don't align with how JavaScript developers write code today.

This post explores some of the fundamental issues I see with Web streams and presents an alternative approach built around JavaScript language primitives that demonstrate something better is possible.

In benchmarks, this alternative can run anywhere between 2x to 120x faster than Web streams in every runtime I've tested it on (including Cloudflare Workers, Node.js, Deno, Bun, and every major browser). The improvements are not due to clever optimizations, but fundamentally different design choices that more effectively leverage modern JavaScript language features. I'm not here to disparage the work that came before; I'm here to start a conversation about what can potentially come next.

Where we're coming from

The Streams Standard was developed between 2014 and 2016 with an ambitious goal to provide "APIs for creating, composing, and consuming streams of data that map efficiently to low-level I/O primitives." Before Web streams, the web platform had no standard way to work with streaming data.

Node.js already had its own streaming API at the time that was ported to also work in browsers, but WHATWG chose not to use it as a starting point given that it is chartered to only consider the needs of Web browsers. Server-side runtimes only adopted Web streams later, after Cloudflare Workers and Deno each emerged with first-class Web streams support and cross-runtime compatibility became a priority.

The design of Web streams predates async iteration in JavaScript. The for await...of syntax didn't land until ES2018, two years after the Streams Standard was initially finalized. This timing meant the API couldn't initially leverage what would eventually become the idiomatic way to consume asynchronous sequences in JavaScript. Instead, the spec introduced its own reader/writer acquisition model, and that decision rippled through every aspect of the API.

Excessive ceremony for common operations

The most common task with streams is reading them to completion. Here's what that looks like with Web streams:

// First, we acquire a reader that gives an exclusive lock
// on the stream...
const reader = stream.getReader();
const chunks = [];
try {
  // Second, we repeatedly call read and await on the returned
  // promise to either yield a chunk of data or indicate we're
  // done.
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    chunks.push(value);
  }
} finally {
  // Finally, we release the lock on the stream
  reader.releaseLock();
}

You might assume this pattern is inherent to streaming. It isn't. The reader acquisition, the lock management, and the { value, done } protocol are all just design choices, not requirements. They are artifacts of how and when the Web streams spec was written. Async iteration exists precisely to handle sequences that arrive over time, but async iteration did not yet exist when the streams specification was written. The complexity here is pure API overhead, not fundamental necessity.

Consider the alternative approach now that Web streams do support for await...of:

const chunks = [];
for await (const chunk of stream) {
  chunks.push(chunk);
}

This is better in that there is far less boilerplate, but it doesn't solve everything. Async iteration was retrofitted onto an API that wasn't designed for it, and it shows. Features like BYOB (bring your own buffer) reads aren't accessible through iteration. The underlying complexity of readers, locks, and controllers are still there, just hidden. When something does go wrong, or when additional features of the API are needed, developers find themselves back in the weeds of the original API, trying to understand why their stream is "locked" or why releaseLock() didn't do what they expected or hunting down bottlenecks in code they don't control.

The locking problem

Web streams use a locking model to prevent multiple consumers from interleaving reads. When you call getReader(), the stream becomes locked. While locked, nothing else can read from the stream directly, pipe it, or even cancel it – only the code that is actually holding the reader can.

This sounds reasonable until you see how easily it goes wrong:

async function peekFirstChunk(stream) {
  const reader = stream.getReader();
  const { value } = await reader.read();
  // Oops — forgot to call reader.releaseLock()
  // And the reader is no longer available when we return
  return value;
}

const first = await peekFirstChunk(stream);
// TypeError: Cannot obtain lock — stream is permanently locked
for await (const chunk of stream) { /* never runs */ }

Forgetting releaseLock() permanently breaks the stream. The locked property tells you that a stream is locked, but not why, by whom, or whether the lock is even still usable. Piping internally acquires locks, making streams unusable during pipe operations in ways that aren't obvious.

The semantics around releasing locks with pending reads were also unclear for years. If you called read() but didn't await it, then called releaseLock(), what happened? The spec was recently clarified to cancel pending reads on lock release – but implementations varied, and code that relied on the previous unspecified behavior can break.

That said, it's important to recognize that locking in itself is not bad. It does, in fact, serve an important purpose to ensure that applications properly and orderly consume or produce data. The key challenge is with the original manual implementation of it using APIs like getReader() and releaseLock(). With the arrival of automatic lock and reader management with async iterables, dealing with locks from the users point of view became a lot easier.

For implementers, the locking model adds a fair amount of non-trivial internal bookkeeping. Every operation must check lock state, readers must be tracked, and the interplay between locks, cancellation, and error states creates a matrix of edge cases that must all be handled correctly.

BYOB: complexity without payoff

BYOB (bring your own buffer) reads were designed to let developers reuse memory buffers when reading from streams, an important optimization intended for high-throughput scenarios. The idea is sound: instead of allocating new buffers for each chunk, you provide your own buffer and the stream fills it.

In practice, (and yes, there are always exceptions to be found) BYOB is rarely used to any measurable benefit. The API is substantially more complex than default reads, requiring a separate reader type (ReadableStreamBYOBReader) and other specialized classes (e.g. ReadableStreamBYOBRequest), careful buffer lifecycle management, and understanding of ArrayBuffer detachment semantics. When you pass a buffer to a BYOB read, the buffer becomes detached – transferred to the stream – and you get back a different view over potentially different memory. This transfer-based model is error-prone and confusing:

const reader = stream.getReader({ mode: 'byob' });
const buffer = new ArrayBuffer(1024);
let view = new Uint8Array(buffer);

const result = await reader.read(view);
// 'view' should now be detached and unusable
// (it isn't always in every impl)
// result.value is a NEW view, possibly over different memory
view = result.value; // Must reassign

BYOB also can't be used with async iteration or TransformStreams, so developers who want zero-copy reads are forced back into the manual reader loop.

For implementers, BYOB adds significant complexity. The stream must track pending BYOB requests, handle partial fills, manage buffer detachment correctly, and coordinate between the BYOB reader and the underlying source. The Web Platform Tests for readable byte streams include dedicated test files just for BYOB edge cases: detached buffers, bad views, response-after-enqueue ordering, and more.

BYOB ends up being complex for both users and implementers, yet sees little adoption in practice. Most developers stick with default reads and accept the allocation overhead.

Most userland implementations of custom ReadableStream instances do not typically bother with all the ceremony required to correctly implement both default and BYOB read support in a single stream – and for good reason. It's difficult to get right and most of the time consuming code is typically going to fallback on the default read path. The example below shows what a "correct" implementation would need to do. It's big, complex, and error prone, and not a level of complexity that the typical developer really wants to have to deal with:

new ReadableStream({
    type: 'bytes',
    
    async pull(controller: ReadableByteStreamController) {      
      if (offset >= totalBytes) {
        controller.close();
        return;
      }
      
      // Check for BYOB request FIRST
      const byobRequest = controller.byobRequest;
      
      if (byobRequest) {
        // === BYOB PATH ===
        // Consumer provided a buffer - we MUST fill it (or part of it)
        const view = byobRequest.view!;
        const bytesAvailable = totalBytes - offset;
        const bytesToWrite = Math.min(view.byteLength, bytesAvailable);
        
        // Create a view into the consumer's buffer and fill it
        // not critical but safer when bytesToWrite != view.byteLength
        const dest = new Uint8Array(
          view.buffer,
          view.byteOffset,
          bytesToWrite
        );
        
        // Fill with sequential bytes (our "data source")
        // Can be any thing here that writes into the view
        for (let i = 0; i < bytesToWrite; i++) {
          dest[i] = (offset + i) & 0xFF;
        }
        
        offset += bytesToWrite;
        
        // Signal how many bytes we wrote
        byobRequest.respond(bytesToWrite);
        
      } else {
        // === DEFAULT READER PATH ===
        // No BYOB request - allocate and enqueue a chunk
        const bytesAvailable = totalBytes - offset;
        const chunkSize = Math.min(1024, bytesAvailable);
        
        const chunk = new Uint8Array(chunkSize);
        for (let i = 0; i < chunkSize; i++) {
          chunk[i] = (offset + i) & 0xFF;
        }
        
        offset += chunkSize;
        controller.enqueue(chunk);
      }
    },
    
    cancel(reason) {
      console.log('Stream canceled:', reason);
    }
  });

When a host runtime provides a byte-oriented ReadableStream from the runtime itself, for instance, as the body of a fetch Response, it is often far easier for the runtime itself to provide an optimized implementation of BYOB reads, but those still need to be capable of handling both default and BYOB reading patterns and that requirement brings with it a fair amount of complexity.

Backpressure: good in theory, broken in practice

Backpressure – the ability for a slow consumer to signal a fast producer to slow down – is a first-class concept in Web streams. In theory. In practice, the model has some serious flaws.

The primary signal is desiredSize on the controller. It can be positive (wants data), zero (at capacity), negative (over capacity), or null (closed). Producers are supposed to check this value and stop enqueueing when it's not positive. But there's nothing enforcing this: controller.enqueue() always succeeds, even when desiredSize is deeply negative.

new ReadableStream({
  start(controller) {
    // Nothing stops you from doing this
    while (true) {
      controller.enqueue(generateData()); // desiredSize: -999999
    }
  }
});

Stream implementations can and do ignore backpressure; and some spec-defined features explicitly break backpressure. tee(), for instance, creates two branches from a single stream. If one branch reads faster than the other, data accumulates in an internal buffer with no limit. A fast consumer can cause unbounded memory growth while the slow consumer catches up, and there's no way to configure this or opt out beyond canceling the slower branch.

Web streams do provide clear mechanisms for tuning backpressure behavior in the form of the highWaterMark option and customizable size calculations, but these are just as easy to ignore as desiredSize, and many applications simply fail to pay attention to them.

The same issues exist on the WritableStream side. A WritableStream has a highWaterMark and desiredSize. There is a writer.ready promise that producers of data are supposed to pay attention but often don't.

const writable = getWritableStreamSomehow();
const writer = writable.getWriter();

// Producers are supposed to wait for the writer.ready
// It is a promise that, when resolves, indicates that
// the writables internal backpressure is cleared and
// it is ok to write more data
await writer.ready;
await writer.write(...);

For implementers, backpressure adds complexity without providing guarantees. The machinery to track queue sizes, compute desiredSize, and invoke pull() at the right times must all be implemented correctly. However, since these signals are advisory, all that work doesn't actually prevent the problems backpressure is supposed to solve.

The hidden cost of promises

The Web streams spec requires promise creation at numerous points, often in hot paths and often invisible to users. Each read() call doesn't just return a promise; internally, the implementation creates additional promises for queue management, pull() coordination, and backpressure signaling.

This overhead is mandated by the spec's reliance on promises for buffer management, completion, and backpressure signals. While some of it is implementation-specific, much of it is unavoidable if you're following the spec as written. For high-frequency streaming – video frames, network packets, real-time data – this overhead is significant.

The problem compounds in pipelines. Each TransformStream adds another layer of promise machinery between source and sink. The spec doesn't define synchronous fast paths, so even when data is available immediately, the promise machinery still runs.

For implementers, this promise-heavy design constrains optimization opportunities. The spec mandates specific promise resolution ordering, making it difficult to batch operations or skip unnecessary async boundaries without risking subtle compliance failures. There are many hidden internal optimizations that implementers do make but these can be complicated and difficult to get right.

While I was writing this blog post, Vercel's Malte Ubl published their own blog post describing some research work Vercel has been doing around improving the performance of Node.js' Web streams implementation. In that post they discuss the same fundamental performance optimization problem that every implementation of Web streams face:

"Or consider pipeTo(). Each chunk passes through a full Promise chain: read, write, check backpressure, repeat. An {value, done} result object is allocated per read. Error propagation creates additional Promise branches.
None of this is wrong. These guarantees matter in the browser where streams cross security boundaries, where cancellation semantics need to be airtight, where you do not control both ends of a pipe. But on the server, when you are piping React Server Components through three transforms at 1KB chunks, the cost adds up.
We benchmarked native WebStream pipeThrough at 630 MB/s for 1KB chunks. Node.js pipeline() with the same passthrough transform: ~7,900 MB/s. That is a 12x gap, and the difference is almost entirely Promise and object allocation overhead." - Malte Ubl, https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster

As part of their research, they have put together a set of proposed improvements for Node.js' Web streams implementation that will eliminate promises in certain code paths which can yield a significant performance boost up to 10x faster, which only goes to prove the point: promises, while useful, add significant overhead. As one of the core maintainers of Node.js, I am looking forward to helping Malte and the folks at Vercel get their proposed improvements landed!

In a recent update made to Cloudflare Workers, I made similar kinds of modifications to an internal data pipeline that reduced the number of JavaScript promises created in certain application scenarios by up to 200x. The result is several orders of magnitude improvement in performance in those applications.

Real-world failures

Exhausting resources with unconsumed bodies

When fetch() returns a response, the body is a ReadableStream. If you only check the status and don't consume or cancel the body, what happens? The answer varies by implementation, but a common outcome is resource leakage.

async function checkEndpoint(url) {
  const response = await fetch(url);
  return response.ok; // Body is never consumed or cancelled
}

// In a loop, this can exhaust connection pools
for (const url of urls) {
  await checkEndpoint(url);
}

This pattern has caused connection pool exhaustion in Node.js applications using undici (the fetch() implementation built into Node.js), and similar issues have appeared in other runtimes. The stream holds a reference to the underlying connection, and without explicit consumption or cancellation, the connection may linger until garbage collection – which may not happen soon enough under load.

The problem is compounded by APIs that implicitly create stream branches. Request.clone() and Response.clone() perform implicit tee() operations on the body stream – a detail that's easy to miss. Code that clones a request for logging or retry logic may unknowingly create branched streams that need independent consumption, multiplying the resource management burden.

Now, to be certain, these types of issues are implementation bugs. The connection leak was definitely something that undici needed to fix in its own implementation, but the complexity of the specification does not make dealing with these types of issues easy.

"Cloning streams in Node.js's fetch() implementation is harder than it looks. When you clone a request or response body, you're calling tee() - which splits a single stream into two branches that both need to be consumed. If one consumer reads faster than the other, data buffers unbounded in memory waiting for the slow branch. If you don't properly consume both branches, the underlying connection leaks. The coordination required between two readers sharing one source makes it easy to accidentally break the original request or exhaust connection pools. It's a simple API call with complex underlying mechanics that are difficult to get right." - Matteo Collina, Ph.D. - Platformatic Co-Founder & CTO, Node.js Technical Steering Committee Chair

Falling headlong off the tee() memory cliff

tee() splits a stream into two branches. It seems straightforward, but the implementation requires buffering: if one branch is read faster than the other, the data must be held somewhere until the slower branch catches up.

const [forHash, forStorage] = response.body.tee();

// Hash computation is fast
const hash = await computeHash(forHash);

// Storage write is slow — meanwhile, the entire stream
// may be buffered in memory waiting for this branch
await writeToStorage(forStorage);

The spec does not mandate buffer limits for tee(). And to be fair, the spec allows implementations to implement the actual internal mechanisms for tee()and other APIs in any way they see fit so long as the observable normative requirements of the specification are met. But if an implementation chooses to implement tee() in the specific way described by the streams specification, then tee() will come with a built-in memory management issue that is difficult to work around.

Implementations have had to develop their own strategies for dealing with this. Firefox initially used a linked-list approach that led to O(n) memory growth proportional to the consumption rate difference. In Cloudflare Workers, we opted to implement a shared buffer model where backpressure is signaled by the slowest consumer rather than the fastest.

Transform backpressure gaps

TransformStream creates a readable/writable pair with processing logic in between. The transform() function executes on write, not on read. Processing of the transform happens eagerly as data arrives, regardless of whether any consumer is ready. This causes unnecessary work when consumers are slow, and the backpressure signaling between the two sides has gaps that can cause unbounded buffering under load. The expectation in the spec is that the producer of the data being transformed is paying attention to the writer.ready signal on the writable side of the transform but quite often producers just simply ignore it.

If the transform's transform() operation is synchronous and always enqueues output immediately, it never signals backpressure back to the writable side even when the downstream consumer is slow. This is a consequence of the spec design that many developers completely overlook. In browsers, where there's only a single user and typically only a small number of stream pipelines active at any given time, this type of foot gun is often of no consequence, but it has a major impact on server-side or edge performance in runtimes that serve thousands of concurrent requests.

const fastTransform = new TransformStream({
  transform(chunk, controller) {
    // Synchronously enqueue — this never applies backpressure
    // Even if the readable side's buffer is full, this succeeds
    controller.enqueue(processChunk(chunk));
  }
});

// Pipe a fast source through the transform to a slow sink
fastSource
  .pipeThrough(fastTransform)
  .pipeTo(slowSink);  // Buffer grows without bound

What TransformStreams are supposed to do is check for backpressure on the controller and use promises to communicate that back to the writer:

const fastTransform = new TransformStream({
  async transform(chunk, controller) {
    if (controller.desiredSize <= 0) {
      // Wait on the backpressure to clear somehow
    }

    controller.enqueue(processChunk(chunk));
  }
});

A difficulty here, however, is that the TransformStreamDefaultController does not have a ready promise mechanism like Writers do; so the TransformStream implementation would need to implement a polling mechanism to periodically check when controller.desiredSize becomes positive again.

The problem gets worse in pipelines. When you chain multiple transforms – say, parse, transform, then serialize – each TransformStream has its own internal readable and writable buffers. If implementers follow the spec strictly, data cascades through these buffers in a push-oriented fashion: the source pushes to transform A, which pushes to transform B, which pushes to transform C, each accumulating data in intermediate buffers before the final consumer has even started pulling. With three transforms, you can have six internal buffers filling up simultaneously.

Developers using the streams API are expected to remember to use options like highWaterMark when creating their sources, transforms, and writable destinations but often they either forget or simply choose to ignore it.

source
  .pipeThrough(parse)      // buffers filling...
  .pipeThrough(transform)  // more buffers filling...
  .pipeThrough(serialize)  // even more buffers...
  .pipeTo(destination);    // consumer hasn't started yet

Implementations have found ways to optimize transform pipelines by collapsing identity transforms, short-circuiting non-observable paths, deferring buffer allocation, or falling back to native code that does not run JavaScript at all. Deno, Bun, and Cloudflare Workers have all successfully implemented "native path" optimizations that can help eliminate much of the overhead, and Vercel's recent fast-webstreams research is working on similar optimizations for Node.js. But the optimizations themselves add significant complexity and still can't fully escape the inherently push-oriented model that TransformStream uses.

GC thrashing in server-side rendering

Streaming server-side rendering (SSR) is a particularly painful case. A typical SSR stream might render thousands of small HTML fragments, each passing through the streams machinery:

// Each component enqueues a small chunk
function renderComponent(controller) {
  controller.enqueue(encoder.encode(`${content}`));
}

// Hundreds of components = hundreds of enqueue calls
// Each one triggers promise machinery internally
for (const component of components) {
  renderComponent(controller);  // Promises created, objects allocated
}

Every fragment means promises created for read() calls, promises for backpressure coordination, intermediate buffer allocations, and { value, done } result objects – most of which become garbage almost immediately.

Under load, this creates GC pressure that can devastate throughput. The JavaScript engine spends significant time collecting short-lived objects instead of doing useful work. Latency becomes unpredictable as GC pauses interrupt request handling. I've seen SSR workloads where garbage collection accounts for a substantial portion (up to and beyond 50%) of total CPU time per request. That's time that could be spent actually rendering content.

The irony is that streaming SSR is supposed to improve performance by sending content incrementally. But the overhead of the streams machinery can negate those gains, especially for pages with many small components. Developers sometimes find that buffering the entire response is actually faster than streaming through Web streams, defeating the purpose entirely.

The optimization treadmill

To achieve usable performance, every major runtime has resorted to non-standard internal optimizations for Web streams. Node.js, Deno, Bun, and Cloudflare Workers have all developed their own workarounds. This is particularly true for streams wired up to system-level I/O, where much of the machinery is non-observable and can be short-circuited.

Finding these optimization opportunities can itself be a significant undertaking. It requires end-to-end understanding of the spec to identify which behaviors are observable and which can safely be elided. Even then, whether a given optimization is actually spec-compliant is often unclear. Implementers must make judgment calls about which semantics they can relax without breaking compatibility. This puts enormous pressure on runtime teams to become spec experts just to achieve acceptable performance.

These optimizations are difficult to implement, frequently error-prone, and lead to inconsistent behavior across runtimes. Bun's "Direct Streams" optimization takes a deliberately and observably non-standard approach, bypassing much of the spec's machinery entirely. Cloudflare Workers' IdentityTransformStream provides a fast-path for pass-through transforms but is Workers-specific and implements behaviors that are not standard for a TransformStream. Each runtime has its own set of tricks and the natural tendency is toward non-standard solutions, because that's often the only way to make things fast.

This fragmentation hurts portability. Code that performs well on one runtime may behave differently (or poorly) on another, even though it's using "standard" APIs. The complexity burden on runtime implementers is substantial, and the subtle behavioral differences create friction for developers trying to write cross-runtime code, particularly those maintaining frameworks that must be able to run efficiently across many runtime environments.

It is also necessary to emphasize that many optimizations are only possible in parts of the spec that are unobservable to user code. The alternative, like Bun "Direct Streams", is to intentionally diverge from the spec-defined observable behaviors. This means optimizations often feel "incomplete". They work in some scenarios but not in others, in some runtimes but not others, etc. Every such case adds to the overall unsustainable complexity of the Web streams approach which is why most runtime implementers rarely put significant effort into further improvements to their streams implementations once the conformance tests are passing.

Implementers shouldn't need to jump through these hoops. When you find yourself needing to relax or bypass spec semantics just to achieve reasonable performance, that's a sign something is wrong with the spec itself. A well-designed streaming API should be efficient by default, not require each runtime to invent its own escape hatches.

The compliance burden

A complex spec creates complex edge cases. The Web Platform Tests for streams span over 70 test files, and while comprehensive testing is a good thing, what's telling is what needs to be tested.

Consider some of the more obscure tests that implementations must pass:

Prototype pollution defense: One test patches Object.prototype.then to intercept promise resolutions, then verifies that pipeTo() and tee() operations don't leak internal values through the prototype chain. This tests a security property that only exists because the spec's promise-heavy internals create an attack surface.
WebAssembly memory rejection: BYOB reads must explicitly reject ArrayBuffers backed by WebAssembly memory, which look like regular buffers but can't be transferred. This edge case exists because of the spec's buffer detachment model – a simpler API wouldn't need to handle it.
Crash regression for state machine conflicts: A test specifically checks that calling byobRequest.respond() after enqueue() doesn't crash the runtime. This sequence creates a conflict in the internal state machine — the enqueue() fulfills the pending read and should invalidate the byobRequest, but implementations must gracefully handle the subsequent respond() rather than corrupting memory in order to cover the very likely possibility that developers are not using the complex API correctly.

These aren't contrived scenarios invented by test authors in total vacuum. They're consequences of the spec's design and reflect real world bugs.

For runtime implementers, passing the WPT suite means handling intricate corner cases that most application code will never encounter. The tests encode not just the happy path but the full matrix of interactions between readers, writers, controllers, queues, strategies, and the promise machinery that connects them all.

A simpler API would mean fewer concepts, fewer interactions between concepts, and fewer edge cases to get right resulting in more confidence that implementations actually behave consistently.

The takeaway

Web streams are complex for users and implementers alike. The problems with the spec aren't bugs. They emerge from using the API exactly as designed. They aren't issues that can be fixed solely through incremental improvements. They're consequences of fundamental design choices. To improve things we need different foundations.

A better streams API is possible

After implementing the Web streams spec multiple times across different runtimes and seeing the pain points firsthand, I decided it was time to explore what a better, alternative streaming API could look like if designed from first principles today.

What follows is a proof of concept: it's not a finished standard, not a production-ready library, not even necessarily a concrete proposal for something new, but a starting point for discussion that demonstrates the problems with Web streams aren't inherent to streaming itself; they're consequences of specific design choices that could be made differently. Whether this exact API is the right answer is less important than whether it sparks a productive conversation about what we actually need from a streaming primitive.

What is a stream?

Before diving into API design, it's worth asking: what is a stream?

At its core, a stream is just a sequence of data that arrives over time. You don't have all of it at once. You process it incrementally as it becomes available.

Unix pipes are perhaps the purest expression of this idea:

cat access.log | grep "error" | sort | uniq -c

Data flows left to right. Each stage reads input, does its work, writes output. There's no pipe reader to acquire, no controller lock to manage. If a downstream stage is slow, upstream stages naturally slow down as well. Backpressure is implicit in the model, not a separate mechanism to learn (or ignore).

In JavaScript, the natural primitive for "a sequence of things that arrive over time" is already in the language: the async iterable. You consume it with for await...of. You stop consuming by stopping iteration.

This is the intuition the new API tries to preserve: streams should feel like iteration, because that's what they are. The complexity of Web streams – readers, writers, controllers, locks, queuing strategies – obscures this fundamental simplicity. A better API should make the simple case simple and only add complexity where it's genuinely needed.

Design principles

I built the proof-of-concept alternative around a different set of principles.

Streams are iterables.

No custom ReadableStream class with hidden internal state. A readable stream is just an AsyncIterable. You consume it with for await...of. No readers to acquire, no locks to manage.

Pull-through transforms

Transforms don't execute until the consumer pulls. There's no eager evaluation, no hidden buffering. Data flows on-demand from source, through transforms, to the consumer. If you stop iterating, processing stops.

Explicit backpressure

Backpressure is strict by default. When a buffer is full, writes reject rather than silently accumulating. You can configure alternative policies – block until space is available, drop oldest, drop newest – but you have to choose explicitly. No more silent memory growth.

Batched chunks

Instead of yielding one chunk per iteration, streams yield Uint8Array[]: arrays of chunks. This amortizes the async overhead across multiple chunks, reducing promise creation and microtask latency in hot paths.

Bytes only

The API deals exclusively with bytes (Uint8Array). Strings are UTF-8 encoded automatically. There's no "value stream" vs "byte stream" dichotomy. If you want to stream arbitrary JavaScript values, use async iterables directly. While the API uses Uint8Array, it treats chunks as opaque. There is no partial consumption, no BYOB patterns, no byte-level operations within the streaming machinery itself. Chunks go in, chunks come out, unchanged unless a transform explicitly modifies them.

Synchronous fast paths matter

The API recognizes that synchronous data sources are both necessary and common. The application should not be forced to always accept the performance cost of asynchronous scheduling simply because that's the only option provided. At the same time, mixing sync and async processing can be dangerous. Synchronous paths should always be an option and should always be explicit.

The new API in action

Creating and consuming streams

In Web streams, creating a simple producer/consumer pair requires TransformStream, manual encoding, and careful lock management:

const { readable, writable } = new TransformStream();
const enc = new TextEncoder();
const writer = writable.getWriter();
await writer.write(enc.encode("Hello, World!"));
await writer.close();
writer.releaseLock();

const dec = new TextDecoder();
let text = '';
for await (const chunk of readable) {
  text += dec.decode(chunk, { stream: true });
}
text += dec.decode();

Even this relatively clean version requires: a TransformStream, manual TextEncoder and TextDecoder, and explicit lock release.

Here's the equivalent with the new API:

import { Stream } from 'new-streams';

// Create a push stream
const { writer, readable } = Stream.push();

// Write data — backpressure is enforced
await writer.write("Hello, World!");
await writer.end();

// Consume as text
const text = await Stream.text(readable);

The readable is just an async iterable. You can pass it to any function that expects one, including Stream.text() which collects and decodes the entire stream.

The writer has a simple interface: write(), writev() for batched writes, end() to signal completion, and abort() for errors. That's essentially it.

The Writer is not a concrete class. Any object that implements write(), end(), and abort() can be a writer making it easy to adapt existing APIs or create specialized implementations without subclassing. There's no complex UnderlyingSink protocol with start(), write(), close(), and abort() callbacks that must coordinate through a controller whose lifecycle and state are independent of the WritableStream it is bound to.

Here's a simple in-memory writer that collects all written data:

// A minimal writer implementation — just an object with methods
function createBufferWriter() {
  const chunks = [];
  let totalBytes = 0;
  let closed = false;

  const addChunk = (chunk) => {
    chunks.push(chunk);
    totalBytes += chunk.byteLength;
  };

  return {
    get desiredSize() { return closed ? null : 1; },

    // Async variants
    write(chunk) { addChunk(chunk); },
    writev(batch) { for (const c of batch) addChunk(c); },
    end() { closed = true; return totalBytes; },
    abort(reason) { closed = true; chunks.length = 0; },

    // Sync variants return boolean (true = accepted)
    writeSync(chunk) { addChunk(chunk); return true; },
    writevSync(batch) { for (const c of batch) addChunk(c); return true; },
    endSync() { closed = true; return totalBytes; },
    abortSync(reason) { closed = true; chunks.length = 0; return true; },

    getChunks() { return chunks; }
  };
}

// Use it
const writer = createBufferWriter();
await Stream.pipeTo(source, writer);
const allData = writer.getChunks();

No base class to extend, no abstract methods to implement, no controller to coordinate with. Just an object with the right shape.

Pull-through transforms

Under the new API design, transforms should not perform any work until the data is being consumed. This is a fundamental principle.

// Nothing executes until iteration begins
const output = Stream.pull(source, compress, encrypt);

// Transforms execute as we iterate
for await (const chunks of output) {
  for (const chunk of chunks) {
    process(chunk);
  }
}

Stream.pull() creates a lazy pipeline. The compress and encrypt transforms don't run until you start iterating output. Each iteration pulls data through the pipeline on demand.

This is fundamentally different from Web streams' pipeThrough(), which starts actively pumping data from the source to the transform as soon as you set up the pipe. Pull semantics mean you control when processing happens, and stopping iteration stops processing.

Transforms can be stateless or stateful. A stateless transform is just a function that takes chunks and returns transformed chunks:

// Stateless transform — a pure function
// Receives chunks or null (flush signal)
const toUpperCase = (chunks) => {
  if (chunks === null) return null; // End of stream
  return chunks.map(chunk => {
    const str = new TextDecoder().decode(chunk);
    return new TextEncoder().encode(str.toUpperCase());
  });
};

// Use it directly
const output = Stream.pull(source, toUpperCase);

Stateful transforms are simple objects with member functions that maintain state across calls:

// Stateful transform — a generator that wraps the source
function createLineParser() {
  // Helper to concatenate Uint8Arrays
  const concat = (...arrays) => {
    const result = new Uint8Array(arrays.reduce((n, a) => n + a.length, 0));
    let offset = 0;
    for (const arr of arrays) { result.set(arr, offset); offset += arr.length; }
    return result;
  };

  return {
    async *transform(source) {
      let pending = new Uint8Array(0);
      
      for await (const chunks of source) {
        if (chunks === null) {
          // Flush: yield any remaining data
          if (pending.length > 0) yield [pending];
          continue;
        }
        
        // Concatenate pending data with new chunks
        const combined = concat(pending, ...chunks);
        const lines = [];
        let start = 0;

        for (let i = 0; i < combined.length; i++) {
          if (combined[i] === 0x0a) { // newline
            lines.push(combined.slice(start, i));
            start = i + 1;
          }
        }

        pending = combined.slice(start);
        if (lines.length > 0) yield lines;
      }
    }
  };
}

const output = Stream.pull(source, createLineParser());

For transforms that need cleanup on abort, add an abort handler:

// Stateful transform with resource cleanup
function createGzipCompressor() {
  // Hypothetical compression API...
  const deflate = new Deflater({ gzip: true });

  return {
    async *transform(source) {
      for await (const chunks of source) {
        if (chunks === null) {
          // Flush: finalize compression
          deflate.push(new Uint8Array(0), true);
          if (deflate.result) yield [deflate.result];
        } else {
          for (const chunk of chunks) {
            deflate.push(chunk, false);
            if (deflate.result) yield [deflate.result];
          }
        }
      }
    },
    abort(reason) {
      // Clean up compressor resources on error/cancellation
    }
  };
}

For implementers, there's no Transformer protocol with start(), transform(), flush() methods and controller coordination passed into a TransformStream class that has its own hidden state machine and buffering mechanisms. Transforms are just functions or simple objects: far simpler to implement and test.

Explicit backpressure policies

When a bounded buffer fills up and a producer wants to write more, there are only a few things you can do:

Reject the write: refuse to accept more data
Wait: block until space becomes available
Discard old data: evict what's already buffered to make room
Discard new data: drop what's incoming

That's it. Any other response is either a variation of these (like "resize the buffer," which is really just deferring the choice) or domain-specific logic that doesn't belong in a general streaming primitive. Web streams currently always choose Wait by default.

The new API makes you choose one of these four explicitly:

strict (default): Rejects writes when the buffer is full and too many writes are pending. Catches "fire-and-forget" patterns where producers ignore backpressure.
block: Writes wait until buffer space is available. Use when you trust the producer to await writes properly.
drop-oldest: Drops the oldest buffered data to make room. Useful for live feeds where stale data loses value.
drop-newest: Discards incoming data when full. Useful when you want to process what you have without being overwhelmed.

const { writer, readable } = Stream.push({
  highWaterMark: 10,
  backpressure: 'strict' // or 'block', 'drop-oldest', 'drop-newest'
});

No more hoping producers cooperate. The policy you choose determines what happens when the buffer fills.

Here's how each policy behaves when a producer writes faster than the consumer reads:

// strict: Catches fire-and-forget writes that ignore backpressure
const strict = Stream.push({ highWaterMark: 2, backpressure: 'strict' });
strict.writer.write(chunk1);  // ok (not awaited)
strict.writer.write(chunk2);  // ok (fills slots buffer)
strict.writer.write(chunk3);  // ok (queued in pending)
strict.writer.write(chunk4);  // ok (pending buffer fills)
strict.writer.write(chunk5);  // throws! too many pending writes

// block: Wait for space (unbounded pending queue)
const blocking = Stream.push({ highWaterMark: 2, backpressure: 'block' });
await blocking.writer.write(chunk1);  // ok
await blocking.writer.write(chunk2);  // ok
await blocking.writer.write(chunk3);  // waits until consumer reads
await blocking.writer.write(chunk4);  // waits until consumer reads
await blocking.writer.write(chunk5);  // waits until consumer reads

// drop-oldest: Discard old data to make room
const dropOld = Stream.push({ highWaterMark: 2, backpressure: 'drop-oldest' });
await dropOld.writer.write(chunk1);  // ok
await dropOld.writer.write(chunk2);  // ok
await dropOld.writer.write(chunk3);  // ok, chunk1 discarded

// drop-newest: Discard incoming data when full
const dropNew = Stream.push({ highWaterMark: 2, backpressure: 'drop-newest' });
await dropNew.writer.write(chunk1);  // ok
await dropNew.writer.write(chunk2);  // ok
await dropNew.writer.write(chunk3);  // silently dropped

Explicit Multi-consumer patterns

// Share with explicit buffer management
const shared = Stream.share(source, {
  highWaterMark: 100,
  backpressure: 'strict'
});

const consumer1 = shared.pull();
const consumer2 = shared.pull(decompress);

Instead of tee() with its hidden unbounded buffer, you get explicit multi-consumer primitives. Stream.share() is pull-based: consumers pull from a shared source, and you configure the buffer limits and backpressure policy upfront.

There's also Stream.broadcast() for push-based multi-consumer scenarios. Both require you to think about what happens when consumers run at different speeds, because that's a real concern that shouldn't be hidden.

Sync/async separation

Not all streaming workloads involve I/O. When your source is in-memory and your transforms are pure functions, async machinery adds overhead without benefit. You're paying for coordination of "waiting" that adds no benefit.

The new API has complete parallel sync versions: Stream.pullSync(), Stream.bytesSync(), Stream.textSync(), and so on. If your source and transforms are all synchronous, you can process the entire pipeline without a single promise.

// Async — when source or transforms may be asynchronous
const textAsync = await Stream.text(source);

// Sync — when all components are synchronous
const textSync = Stream.textSync(source);

Here's a complete synchronous pipeline – compression, transformation, and consumption with zero async overhead:

// Synchronous source from in-memory data
const source = Stream.fromSync([inputBuffer]);

// Synchronous transforms
const compressed = Stream.pullSync(source, zlibCompressSync);
const encrypted = Stream.pullSync(compressed, aesEncryptSync);

// Synchronous consumption — no promises, no event loop trips
const result = Stream.bytesSync(encrypted);

The entire pipeline executes in a single call stack. No promises are created, no microtask queue scheduling occurs, and no GC pressure from short-lived async machinery. For CPU-bound workloads like parsing, compression, or transformation of in-memory data, this can be significantly faster than the equivalent Web streams code – which would force async boundaries even when every component is synchronous.

Web streams has no synchronous path. Even if your source has data ready and your transform is a pure function, you still pay for promise creation and microtask scheduling on every operation. Promises are fantastic for cases in which waiting is actually necessary, but they aren't always necessary. The new API lets you stay in sync-land when that's what you need.

Bridging the gap between this and web streams

The async iterator based approach provides a natural bridge between this alternative approach and Web streams. When coming from a ReadableStream to this new approach, simply passing the readable in as input works as expected when the ReadableStream is set up to yield bytes:

const readable = getWebReadableStreamSomehow();
const input = Stream.pull(readable, transform1, transform2);
for await (const chunks of input) {
  // process chunks
}

When adapting to a ReadableStream, a bit more work is required since the alternative approach yields batches of chunks, but the adaptation layer is as easily straightforward:

async function* adapt(input) {
  for await (const chunks of input) {
    for (const chunk of chunks) {
      yield chunk;
    }
  }
}

const input = Stream.pull(source, transform1, transform2);
const readable = ReadableStream.from(adapt(input));

How this addresses the real-world failures from earlier

Unconsumed bodies: Pull semantics mean nothing happens until you iterate. No hidden resource retention. If you don't consume a stream, there's no background machinery holding connections open.
The tee() memory cliff: Stream.share() requires explicit buffer configuration. You choose the highWaterMark and backpressure policy upfront: no more silent unbounded growth when consumers run at different speeds.
Transform backpressure gaps: Pull-through transforms execute on-demand. Data doesn't cascade through intermediate buffers; it flows only when the consumer pulls. Stop iterating, stop processing.
GC thrashing in SSR: Batched chunks (Uint8Array[]) amortize async overhead. Sync pipelines via Stream.pullSync() eliminate promise allocation entirely for CPU-bound workloads.

Performance

The design choices have performance implications. Here are benchmarks from the reference implementation of this possible alternative compared to Web streams (Node.js v24.x, Apple M1 Pro, averaged over 10 runs):

Scenario	Alternative	Web streams	Difference
Small chunks (1KB × 5000)	~13 GB/s	~4 GB/s	~3× faster
Tiny chunks (100B × 10000)	~4 GB/s	~450 MB/s	~8× faster
Async iteration (8KB × 1000)	~530 GB/s	~35 GB/s	~15× faster
Chained 3× transforms (8KB × 500)	~275 GB/s	~3 GB/s	~80–90× faster
High-frequency (64B × 20000)	~7.5 GB/s	~280 MB/s	~25× faster

The chained transform result is particularly striking: pull-through semantics eliminate the intermediate buffering that plagues Web streams pipelines. Instead of each TransformStream eagerly filling its internal buffers, data flows on-demand from consumer to source.

Now, to be fair, Node.js really has not yet put significant effort into fully optimizing the performance of its Web streams implementation. There's likely significant room for improvement in Node.js' performance results through a bit of applied effort to optimize the hot paths there. That said, running these benchmarks in Deno and Bun also show a significant performance improvement with this alternative iterator based approach than in either of their Web streams implementations as well.

Browser benchmarks (Chrome/Blink, averaged over 3 runs) show consistent gains as well:

Scenario	Alternative	Web streams	Difference
Push 3KB chunks	~135k ops/s	~24k ops/s	~5–6× faster
Push 100KB chunks	~24k ops/s	~3k ops/s	~7–8× faster
3 transform chain	~4.6k ops/s	~880 ops/s	~5× faster
5 transform chain	~2.4k ops/s	~550 ops/s	~4× faster
bytes() consumption	~73k ops/s	~11k ops/s	~6–7× faster
Async iteration	~1.1M ops/s	~10k ops/s	~40–100× faster

These benchmarks measure throughput in controlled scenarios; real-world performance depends on your specific use case. The difference between Node.js and browser gains reflects the distinct optimization paths each environment takes for Web streams.

It's worth noting that these benchmarks compare a pure TypeScript/JavaScript implementation of the new API against the native (JavaScript/C++/Rust) implementations of Web streams in each runtime. The new API's reference implementation has had no performance optimization work; the gains come entirely from the design. A native implementation would likely show further improvement.

The gains illustrate how fundamental design choices compound: batching amortizes async overhead, pull semantics eliminate intermediate buffering, and the freedom for implementations to use synchronous fast paths when data is available immediately all contribute.

"We’ve done a lot to improve performance and consistency in Node streams, but there’s something uniquely powerful about starting from scratch. New streams’ approach embraces modern runtime realities without legacy baggage, and that opens the door to a simpler, performant and more coherent streams model." - Robert Nagy, Node.js TSC member and Node.js streams contributor

What's next

I'm publishing this to start a conversation. What did I get right? What did I miss? Are there use cases that don't fit this model? What would a migration path for this approach look like? The goal is to gather feedback from developers who've felt the pain of Web streams and have opinions about what a better API should look like.

Try it yourself

A reference implementation for this alternative approach is available now and can be found at https://github.com/jasnell/new-streams.

API Reference: See the API.md for complete documentation
Examples: The samples directory has working code for common patterns

I welcome issues, discussions, and pull requests. If you've run into Web streams problems I haven't covered, or if you see gaps in this approach, let me know. But again, the idea here is not to say "Let's all use this shiny new object!"; it is to kick off a discussion that looks beyond the current status quo of Web Streams and returns back to first principles.

Web streams was an ambitious project that brought streaming to the web platform when nothing else existed. The people who designed it made reasonable choices given the constraints of 2014 – before async iteration, before years of production experience revealed the edge cases.

But we've learned a lot since then. JavaScript has evolved. A streaming API designed today can be simpler, more aligned with the language, and more explicit about the things that matter, like backpressure and multi-consumer behavior.

We deserve a better stream API. So let's talk about what that could look like.

What came first: the CNAME or the A record?

Sebastiaan Neuteboom — Wed, 14 Jan 2026 00:00:00 GMT

On January 8, 2026, a routine update to 1.1.1.1 aimed at reducing memory usage accidentally triggered a wave of DNS resolution failures for users across the Internet. The root cause wasn't an attack or an outage, but a subtle shift in the order of records within our DNS responses.

While most modern software treats the order of records in DNS responses as irrelevant, we discovered that some implementations expect CNAME records to appear before everything else. When that order changed, resolution started failing. This post explores the code change that caused the shift, why it broke specific DNS clients, and the 40-year-old protocol ambiguity that makes the "correct" order of a DNS response difficult to define.

Timeline

All timestamps referenced are in Coordinated Universal Time (UTC).

Time	Description
2025-12-02	The record reordering is introduced to the 1.1.1.1 codebase
2025-12-10	The change is released to our testing environment
2026-01-07 23:48	A global release containing the change starts
2026-01-08 17:40	The release reaches 90% of servers
2026-01-08 18:19	Incident is declared
2026-01-08 18:27	The release is reverted
2026-01-08 19:55	Revert is completed. Impact ends

What happened?

While making some improvements to lower the memory usage of our cache implementation, we introduced a subtle change to CNAME record ordering. The change was introduced on December 2, 2025, released to our testing environment on December 10, and began deployment on January 7, 2026.

How DNS CNAME chains work

When you query for a domain like www.example.com, you might get a CNAME (Canonical Name) record that indicates one name is an alias for another name. It’s the job of public resolvers, such as 1.1.1.1, to follow this chain of aliases until it reaches a final response:

www.example.com → cdn.example.com → server.cdn-provider.com → 198.51.100.1

As 1.1.1.1 traverses this chain, it caches every intermediate record. Each record in the chain has its own TTL (Time-To-Live), indicating how long we can cache it. Not all the TTLs in a CNAME chain need to be the same:

www.example.com → cdn.example.com (TTL: 3600 seconds) # Still cached cdn.example.com → 198.51.100.1 (TTL: 300 seconds) # Expired

When one or more records in a CNAME chain expire, it’s considered partially expired. Fortunately, since parts of the chain are still in our cache, we don’t have to resolve the entire CNAME chain again — only the part that has expired. In our example above, we would take the still valid www.example.com → cdn.example.com chain, and only resolve the expired cdn.example.com A record. Once that’s done, we combine the existing CNAME chain and the newly resolved records into a single response.

The logic change

The code that merges these two chains is where the change occurred. Previously, the code would create a new list, insert the existing CNAME chain, and then append the new records:

impl PartialChain {
    /// Merges records to the cache entry to make the cached records complete.
    pub fn fill_cache(&self, entry: &mut CacheEntry) {
        let mut answer_rrs = Vec::with_capacity(entry.answer.len() + self.records.len());
        answer_rrs.extend_from_slice(&self.records); // CNAMEs first
        answer_rrs.extend_from_slice(&entry.answer); // Then A/AAAA records
        entry.answer = answer_rrs;
    }
}

However, to save some memory allocations and copies, the code was changed to instead append the CNAMEs to the existing answer list:

impl PartialChain {
    /// Merges records to the cache entry to make the cached records complete.
    pub fn fill_cache(&self, entry: &mut CacheEntry) {
        entry.answer.extend(self.records); // CNAMEs last
    }
}

As a result, the responses that 1.1.1.1 returned now sometimes had the CNAME records appearing at the bottom, after the final resolved answer.

Why this caused impact

When DNS clients receive a response with a CNAME chain in the answer section, they also need to follow this chain to find out that www.example.com points to 198.51.100.1. Some DNS client implementations handle this by keeping track of the expected name for the records as they’re iterated sequentially. When a CNAME is encountered, the expected name is updated:

;; QUESTION SECTION:
;; www.example.com.        IN    A

;; ANSWER SECTION:
www.example.com.    3600   IN    CNAME  cdn.example.com.
cdn.example.com.    300    IN    A      198.51.100.1

Find records for www.example.com
Encounter www.example.com. CNAME cdn.example.com
Find records for cdn.example.com
Encounter cdn.example.com. A 198.51.100.1

When the CNAME suddenly appears at the bottom, this no longer works:

;; QUESTION SECTION:
;; www.example.com.	       IN    A

;; ANSWER SECTION:
cdn.example.com.    300    IN    A      198.51.100.1
www.example.com.    3600   IN    CNAME  cdn.example.com.

Find records for www.example.com
Ignore cdn.example.com. A 198.51.100.1 as it doesn’t match the expected name
Encounter www.example.com. CNAME cdn.example.com
Find records for cdn.example.com
No more records are present, so the response is considered empty

One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution. When looking at its getanswer_r implementation, we can indeed see it expects to find the CNAME records before any answers:

for (; ancount > 0; --ancount)
  {
    // ... parsing DNS records ...
    
    if (rr.rtype == T_CNAME)
      {
        /* Record the CNAME target as the new expected name. */
        int n = __ns_name_unpack (c.begin, c.end, rr.rdata,
                                  name_buffer, sizeof (name_buffer));
        expected_name = name_buffer;  // Update what we're looking for
      }
    else if (rr.rtype == qtype
             && __ns_samebinaryname (rr.rname, expected_name)  // Must match!
             && rr.rdlength == rrtype_to_rdata_length (type:qtype))
      {
        /* Address record matches - store it */
        ptrlist_add (list:addresses, item:(char *) alloc_buffer_next (abuf, uint32_t));
        alloc_buffer_copy_bytes (buf:abuf, src:rr.rdata, size:rr.rdlength);
      }
  }

Another notable affected implementation was the DNSC process in three models of Cisco ethernet switches. In the case where switches had been configured to use 1.1.1.1 these switches experienced spontaneous reboot loops when they received a response containing the reordered CNAMEs. Cisco has published a service document describing the issue.

Not all implementations break

Most DNS clients don’t have this issue. For example, systemd-resolved first parses the records into an ordered set:

typedef struct DnsAnswerItem {
        DnsResourceRecord *rr; // The actual record
        DnsAnswerFlags flags;  // Which section it came from
        // ... other metadata
} DnsAnswerItem;


typedef struct DnsAnswer {
        unsigned n_ref;
        OrderedSet *items;
} DnsAnswer;

When following a CNAME chain it can then search the entire answer set, even if the CNAME records don’t appear at the top.

What the RFC says

RFC 1034, published in 1987, defines much of the behavior of the DNS protocol, and should give us an answer on whether the order of CNAME records matters. Section 4.3.1 contains the following text:

If recursive service is requested and available, the recursive response to a query will be one of the following:
- The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.

While "possibly preface" can be interpreted as a requirement for CNAME records to appear before everything else, it does not use normative key words, such as MUST and SHOULD that modern RFCs use to express requirements. This isn’t a flaw in RFC 1034, but simply a result of its age. RFC 2119, which standardized these key words, was published in 1997, 10 years after RFC 1034.

In our case, we did originally implement the specification so that CNAMEs appear first. However, we did not have any tests asserting the behavior remains consistent due to the ambiguous language in the RFC.

The subtle distinction: RRsets vs RRs in message sections

To understand why this ambiguity exists, we need to understand a subtle but important distinction in DNS terminology.

RFC 1034 section 3.6 defines Resource Record Sets (RRsets) as collections of records with the same name, type, and class. For RRsets, the specification is clear about ordering:

The order of RRs in a set is not significant, and need not be preserved by name servers, resolvers, or other parts of the DNS.

However, RFC 1034 doesn’t clearly specify how message sections relate to RRsets. While modern DNS specifications have shown that message sections can indeed contain multiple RRsets (consider DNSSEC responses with signatures), RFC 1034 doesn’t describe message sections in those terms. Instead, it treats message sections as containing individual Resource Records (RRs).

The problem is that the RFC primarily discusses ordering in the context of RRsets but doesn't specify the ordering of different RRsets relative to each other within a message section. This is where the ambiguity lives.

RFC 1034 section 6.2.1 includes an example that demonstrates this ambiguity further. It mentions that the order of Resource Records (RRs) is not significant either:

The difference in ordering of the RRs in the answer section is not significant.

However, this example only shows two A records for the same name within the same RRset. It doesn't address whether this applies to different record types like CNAMEs and A records.

CNAME chain ordering

It turns out that this issue extends beyond putting CNAME records before other record types. Even when CNAMEs appear before other records, sequential parsing can still break if the CNAME chain itself is out of order. Consider the following response:

;; QUESTION SECTION:
;; www.example.com.              IN    A

;; ANSWER SECTION:
cdn.example.com.           3600  IN    CNAME  server.cdn-provider.com.
www.example.com.           3600  IN    CNAME  cdn.example.com.
server.cdn-provider.com.   300   IN    A      198.51.100.1

Each CNAME belongs to a different RRset, as they have different owners, so the statement about RRset order being insignificant doesn’t apply here.

However, RFC 1034 doesn't specify that CNAME chains must appear in any particular order. There's no requirement that www.example.com. CNAME cdn.example.com. must appear before cdn.example.com. CNAME server.cdn-provider.com.. With sequential parsing, the same issue occurs:

Find records for www.example.com
Ignore cdn.example.com. CNAME server.cdn-provider.com. as it doesn’t match the expected name
Encounter www.example.com. CNAME cdn.example.com
Find records for cdn.example.com
Ignore server.cdn-provider.com. A 198.51.100.1 as it doesn’t match the expected name

What should resolvers do?

RFC 1034 section 5 describes resolver behavior. Section 5.2.2 specifically addresses how resolvers should handle aliases (CNAMEs):

In most cases a resolver simply restarts the query at the new name when it encounters a CNAME.

This suggests that resolvers should restart the query upon finding a CNAME, regardless of where it appears in the response. However, it's important to distinguish between different types of resolvers:

Recursive resolvers, like 1.1.1.1, are full DNS resolvers that perform recursive resolution by querying authoritative nameservers
Stub resolvers, like glibc’s getaddrinfo, are simplified local interfaces that forward queries to recursive resolvers and process the responses

The RFC sections on resolver behavior were primarily written with full resolvers in mind, not the simplified stub resolvers that most applications actually use. Some stub resolvers evidently don’t implement certain parts of the spec, such as the CNAME-restart logic described in the RFC.

The DNSSEC specifications provide contrast

Later DNS specifications demonstrate a different approach to defining record ordering. RFC 4035, which defines protocol modifications for DNSSEC, uses more explicit language:

When placing a signed RRset in the Answer section, the name server MUST also place its RRSIG RRs in the Answer section. The RRSIG RRs have a higher priority for inclusion than any other RRsets that may have to be included.

The specification uses "MUST" and explicitly defines "higher priority" for RRSIG records. However, "higher priority for inclusion" refers to whether RRSIGs should be included in the response, not where they should appear. This provides unambiguous guidance to implementers about record inclusion in DNSSEC contexts, while not mandating any particular behavior around record ordering.

For unsigned zones, however, the ambiguity from RFC 1034 remains. The word "preface" has guided implementation behavior for nearly four decades, but it has never been formally specified as a requirement.

Do CNAME records come first?

While in our interpretation the RFCs do not require CNAMEs to appear in any particular order, it’s clear that at least some widely-deployed DNS clients rely on it. As some systems using these clients might be updated infrequently, or never updated at all, we believe it’s best to require CNAME records to appear in-order before any other records.

Based on what we have learned during this incident, we have reverted the CNAME re-ordering and do not intend to change the order in the future.

To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF. If consensus is reached on the clarified behavior, this would become an RFC that explicitly defines how to correctly handle CNAMEs in DNS responses, helping us and the wider DNS community navigate the protocol. The proposal can be found at https://datatracker.ietf.org/doc/draft-jabley-dnsop-ordered-answer-section. If you have suggestions or feedback we would love to hear your opinions, most usefully via the DNSOP working group at the IETF.

MoQ: Refactoring the Internet's real-time media stack

Mike English — Fri, 22 Aug 2025 14:00:00 GMT

For over two decades, we've built real-time communication on the Internet using a patchwork of specialized tools. RTMP gave us ingest. HLS and DASH gave us scale. WebRTC gave us interactivity. Each solved a specific problem for its time, and together they power the global streaming ecosystem we rely on today.

But using them together in 2025 feels like building a modern application with tools from different eras. The seams are starting to show—in complexity, in latency, and in the flexibility needed for the next generation of applications, from sub-second live auctions to massive interactive events. We're often forced to make painful trade-offs between latency, scale, and operational complexity.

Today Cloudflare is launching the first Media over QUIC (MoQ) relay network, running on every Cloudflare server in datacenters in 330+ cities. MoQ is an open protocol being developed at the IETF by engineers from across the industry—not a proprietary Cloudflare technology. MoQ combines the low-latency interactivity of WebRTC, the scalability of HLS/DASH, and the simplicity of a single architecture, all built on a modern transport layer. We're joining Meta, Google, Cisco, and others in building implementations that work seamlessly together, creating a shared foundation for the next generation of real-time applications on the Internet.

An evolutionary ladder of compromise

To understand the promise of MoQ, we first have to appreciate the history that led us here—a journey defined by a series of architectural compromises where solving one problem inevitably created another.

The RTMP era: Conquering latency, compromising on scale

In the early 2000s, RTMP (Real-Time Messaging Protocol) was a breakthrough. It solved the frustrating "download and wait" experience of early video playback on the web by creating a persistent, stateful TCP connection between a Flash client and a server. This enabled low-latency streaming (2-5 seconds), powering the first wave of live platforms like Justin.tv (which later became Twitch).

But its strength was its weakness. That stateful connection, which had to be maintained for every viewer, was architecturally hostile to scale. It required expensive, specialized media servers and couldn't use the commodity HTTP-based Content Delivery Networks (CDNs) that were beginning to power the rest of the web. Its reliance on TCP also meant that a single lost packet could freeze the entire stream—a phenomenon known as head-of-line blocking—creating jarring latency spikes. The industry retained RTMP for the "first mile" from the camera to servers (ingest), but a new solution was needed for the "last mile" from servers to your screen (delivery).

The HLS & DASH era: Solving for scale, compromising on latency

The catalyst for the next era was the iPhone's rejection of Flash. In response, Apple created HLS (HTTP Live Streaming). HLS, and its open-standard counterpart MPEG-DASH abandoned stateful connections and treated video as a sequence of small, static files delivered over standard HTTP.

This enabled much greater scalability. By moving to the interoperable open standard of HTTP for the underlying transport, video could now be distributed by any web server and cached by global CDNs, allowing platforms to reach millions of viewers reliably and relatively inexpensively. The compromise? A significant trade-off in latency. To ensure smooth playback, players needed to buffer at least three video segments before starting. With segment durations of 6-10 seconds, this baked 15-30 seconds of latency directly into the architecture.

While extensions like Low-Latency HLS (LL-HLS) have more recently emerged to achieve latencies in the 3-second range, they remain complex patches fighting against the protocol's fundamental design. These extensions introduce a layer of stateful, real-time communication—using clever workarounds like holding playlist requests open—that ultimately strain the stateless request-response model central to HTTP's scalability and composability.

The WebRTC Era: Conquering conversational latency, compromising on architecture

In parallel, WebRTC (Web Real-Time Communication) emerged to solve a different problem: plugin-free, two-way conversational video with sub-500ms latency within a browser. It worked by creating direct peer-to-peer (P2P) media paths, removing central servers from the equation.

But this P2P model is fundamentally at odds with broadcast scale. In a mesh network, the number of connections grows quadratically with each new participant (the "N-squared problem"). For more than a handful of users, the model collapses under the weight of its own complexity. To work around this, the industry developed server-based topologies like the Selective Forwarding Unit (SFU) and Multipoint Control Unit (MCU). These are effective but require building what is essentially a private, stateful, real-time CDN—a complex and expensive undertaking that is not standardized across infrastructure providers.

This journey has left us with a fragmented landscape of specialized, non-interoperable silos, forcing developers to stitch together multiple protocols and accept a painful three-way tension between latency, scale, and complexity.

Introducing MoQ

This is the context into which Media over QUIC (MoQ) emerges. It's not just another protocol; it's a new design philosophy built from the ground up to resolve this historical trilemma. Born out of an open, community-driven effort at the IETF, MoQ aims to be a foundational Internet technology, not a proprietary product.

Its promise is to unify the disparate worlds of streaming by delivering:

Sub-second latency at broadcast scale: Combining the latency of WebRTC with the scale of HLS/DASH and the simplicity of RTMP.
Architectural simplicity: Creating a single, flexible protocol for ingest, distribution, and interactive use cases, eliminating the need to transcode between different technologies.
Transport efficiency: Building on QUIC, a UDP based protocol to eliminate bottlenecks like TCP head-of-line blocking.

The initial focus was "Media" over QUIC, but the core concepts—named tracks of timed, ordered, but independent data—are so flexible that the working group is now simply calling the protocol "MoQ." The name reflects the power of the abstraction: it's a generic transport for any real-time data that needs to be delivered efficiently and at scale.

MoQ is now generic enough that it’s a data fanout or pub/sub system, for everything from audio/video (high bandwidth data) to sports score updates (low bandwidth data).

A deep dive into the MoQ protocol stack

MoQ's elegance comes from solving the right problem at the right layer. Let's build up from the foundation to see how it achieves sub-second latency at scale.

The choice of QUIC as MoQ's foundation isn't arbitrary—it addresses issues that have plagued streaming protocols for decades.

By building on QUIC (the transport protocol that also powers HTTP/3), MoQ solves some key streaming problems:

No head-of-line blocking: Unlike TCP where one lost packet blocks everything behind it, QUIC streams are independent. A lost packet on one stream (e.g., an audio track) doesn't block another (e.g., the main video track). This alone eliminates the stuttering that plagued RTMP.
Connection migration: When your device switches from Wi-Fi to cellular mid-stream, the connection seamlessly migrates without interruption—no rebuffering, no reconnection.
Fast connection establishment: QUIC's 0-RTT resumption means returning viewers can start playing instantly.
Baked-in, mandatory encryption: All QUIC connections are encrypted by default with TLS 1.3.

The core innovation: Publish/subscribe for media

With QUIC solving transport issues, MoQ introduces its key innovation: treating media as subscribable tracks in a publish/subscribe system. But unlike traditional pub/sub, this is designed specifically for real-time media at CDN scale.

Instead of complex session management (WebRTC) or file-based chunking (HLS), MoQ lets publishers announce named tracks of media that subscribers can request. A relay network handles the distribution without needing to understand the media itself.

How MoQ organizes media: The data model

Before we see how media flows through the network, let's understand how MoQ structures it. MoQ organizes data in a hierarchy:

Tracks: Named streams of media, like "video-1080p" or "audio-english". Subscribers request specific tracks by name.
Groups: Independently decodable chunks of a track. For video, this typically means a GOP (Group of Pictures) starting with a keyframe. New subscribers can join at any Group boundary.
Objects: The actual packets sent on the wire. Each Object belongs to a Track and has a position within a Group.

This simple hierarchy enables two capabilities:

Subscribers can start playback at Group boundaries without waiting for the next keyframe
Relays can forward Objects without parsing or understanding the media format

The network architecture: From publisher to subscriber

MoQ’s network components are also simple:

Publishers: Announce track namespaces and send Objects
Subscribers: Request specific tracks by name
Relays: Connect publishers to subscribers by forwarding immutable Objects without parsing or transcoding the media

A Relay acts as a subscriber to receive tracks from upstream (like the original publisher) and simultaneously acts as a publisher to forward those same tracks downstream. This model is the key to MoQ's scalability: one upstream subscription can fan out to serve thousands of downstream viewers.

The MoQ Stack

MoQ's architecture can be understood as three distinct layers, each with a clear job:

The Transport Foundation (QUIC or WebTransport): This is the modern foundation upon which everything is built. MoQT can run directly over raw QUIC, which is ideal for native applications, or over WebTransport, which is required for use in a web browser. Crucially, the WebTransport protocol and its corresponding W3C browser API make QUIC's multiplexed reliable streams and unreliable datagrams directly accessible to browser applications. This is a game-changer. Protocols like SRT may be efficient, but their lack of native browser support relegates them to ingest-only roles. WebTransport gives MoQ first-class citizenship on the web, making it suitable for both ingest and massive-scale distribution directly to clients.
The MoQT Layer: Sitting on top of QUIC (or WebTransport), the MoQT layer provides the signaling and structure for a publish-subscribe system. This is the primary focus of the IETF working group. It defines the core control messages—like ANNOUNCE, and SUBSCRIBE—and the basic data model we just covered. MoQT itself is intentionally spartan; it doesn't know or care if the data it's moving is H.264 video, Opus audio, or game state updates.
The Streaming Format Layer: This is where media-specific logic lives. A streaming format defines things like manifests, codec metadata, and packaging rules. WARP is one such format being developed alongside MoQT at the IETF, but it isn't the only one. Another standards body, like DASH-IF, could define a CMAF-based streaming format over MoQT. A company that controls both original publisher and end subscriber can develop its own proprietary streaming format to experiment with new codecs or delivery mechanisms without being constrained by the transport protocol.

This separation of layers is why different organizations can build interoperable implementations while still innovating at the streaming format layer.

End-to-End Data Flow

Now that we understand the architecture and the data model, let's walk through how these pieces come together to deliver a stream. The protocol is flexible, but a typical broadcast flow relies on the ANNOUNCE and SUBSCRIBE messages to establish a data path from a publisher to a subscriber through the relay network.

Here is a step-by-step breakdown of what happens in this flow:

Initiating Connections: The process begins when the endpoints, acting as clients, connect to the relay network. The Original Publisher initiates a connection with its nearest relay (we'll call it Relay A). Separately, an End Subscriber initiates a connection with its own local relay (Relay B). These endpoints perform a SETUP handshake with their respective relays to establish a MoQ session and declare supported parameters.
Announcing a Namespace: To make its content discoverable, the Publisher sends an ANNOUNCE message to Relay A. This message declares that the publisher is the authoritative source for a given track namespace. Relay A receives this and registers in a shared control plane (a conceptual database) that it is now a source for this namespace within the network.
Subscribing to a Track: When the End Subscriber wants to receive media, it sends a SUBSCRIBE message to its relay, Relay B. This message is a request for a specific track name within a specific track namespace.
Connecting the Relays: Relay B receives the SUBSCRIBE request and queries the control plane. It looks up the requested namespace and discovers that Relay A is the source. Relay B then initiates a session with Relay A (if it doesn't already have one) and forwards the SUBSCRIBE request upstream.
Completing the Path and Forwarding Objects: Relay A, having received the subscription request from Relay B, forwards it to the Original Publisher. With the full path now established, the Publisher begins sending the Objects for the requested track. The Objects flow from the Publisher to Relay A, which forwards them to Relay B, which in turn forwards them to the End Subscriber. If another subscriber connects to Relay B and requests the same track, Relay B can immediately start sending them the Objects without needing to create a new upstream subscription.

An Alternative Flow: The `PUBLISH` Model

More recent drafts of the MoQ specification have introduced an alternative, push-based model using a PUBLISH message. In this flow, a publisher can effectively ask for permission to send a track's objects to a relay without waiting for a SUBSCRIBE request. The publisher sends a PUBLISH message, and the relay's PUBLISH_OK response indicates whether it will accept the objects. This is particularly useful for ingest scenarios, where a publisher wants to send its stream to an entry point in the network immediately, ensuring the media is available the instant the first subscriber connects.

Advanced capabilities: Prioritization and congestion control

MoQ’s benefits really shine when networks get congested. MoQ includes mechanisms for handling the reality of network traffic. One such mechanism is Subgroups.

Subgroups are subdivisions within a Group that effectively map directly to the underlying QUIC streams. All Objects within the same Subgroup are generally sent on the same QUIC stream, guaranteeing their delivery order. Subgroup numbering also presents an opportunity to encode prioritization: within a Group, lower-numbered Subgroups are considered higher priority.

This enables intelligent quality degradation, especially with layered codecs (e.g. SVC):

Subgroup 0: Base video layer (360p) - must deliver
Subgroup 1: Enhancement to 720p - deliver if bandwidth allows
Subgroup 2: Enhancement to 1080p - first to drop under congestion

When a relay detects congestion, it can drop Objects from higher-numbered Subgroups, preserving the base layer. Viewers see reduced quality instead of buffering.

The MoQ specification defines a scheduling algorithm that determines the order for all objects that are "ready to send." When a relay has multiple objects ready, it prioritizes them first by group order (ascending or descending) and then, within a group, by subgroup id. Our implementation supports the group order preference, which can be useful for low-latency broadcasts. If a viewer falls behind and its subscription uses descending group order, the relay prioritizes sending Objects from the newest "live" Group, potentially canceling unsent Objects from older Groups. This can help viewers catch up to the live edge quickly, a highly desirable feature for many interactive streaming use cases. The optimal strategies for using these features to improve QoE for specific use cases are still an open research question. We invite developers and researchers to use our network to experiment and help find the answers.

Implementation: building the Cloudflare MoQ relay

Theory is one thing; implementation is another. To validate the protocol and understand its real-world challenges, we've been building one of the first global MoQ relay networks. Cloudflare's network, which places compute and logic at the edge, is very well suited for this.

Our architecture connects the abstract concepts of MoQ to the Cloudflare stack. In our deep dive, we mentioned that when a publisher ANNOUNCEs a namespace, relays need to register this availability in a "shared control plane" so that SUBSCRIBE requests can be routed correctly. For this critical piece of state management, we use Durable Objects.

When a publisher announces a new namespace to a relay in, say, London, that relay uses a Durable Object—our strongly consistent, single-threaded storage solution—to record that this namespace is now available at that specific location. When a subscriber in Paris wants a track from that namespace, the network can query this distributed state to find the nearest source and route the SUBSCRIBE request accordingly. This architecture builds upon the technology we developed for Cloudflare's real-time services and provides a solution to the challenge of state management at a global scale.

An Evolving Specification

Building on a new protocol in the open means implementing against a moving target. To get MoQ into the hands of the community, we made a deliberate trade-off: our current relay implementation is based on a subset of the features defined in draft-ietf-moq-transport-07. This version became a de facto target for interoperability among several open-source projects and pausing there allowed us to put effort towards other aspects of deploying our relay network.

This draft of the protocol makes a distinction between accessing "past" and "future" content. SUBSCRIBE is used to receive future objects for a track as they arrive—like tuning into a live broadcast to get everything from that moment forward. In contrast, FETCH provides a mechanism for accessing past content that a relay may already have in its cache—like asking for a recording of a song that just played.

Both are part of the same specification, but for the most pressing low-latency use cases, a performant implementation of SUBSCRIBE is what matters most. For that reason, we have focused our initial efforts there and have not yet implemented FETCH.

This is where our roadmap is flexible and where the community can have a direct impact. Do you need FETCH to build on-demand or catch-up functionality? Or is more complete support for the prioritization features within SUBSCRIBE more critical for your use case? The feedback we receive from early developers will help us decide what to build next.

As always, we will announce our updates and changes to our implementation as we continue with development on our developer docs pages.

Kick the tires on the future

We believe in building in the open and interoperability in the community. MoQ is not a Cloudflare technology but a foundational Internet technology. To that end, the first demo client we’re presenting is an open source, community example.

You can access the demo here: https://moq.dev/publish/

Even though this is a preview release, we are running MoQ relays at Cloudflare’s full scale, like we do every production service. This means every server that is part of the Cloudflare network in more than 330 cities is now a MoQ relay.

We invite you to experience the "wow" moment of near-instant, sub-second streaming latency that MoQ enables. How would you use a protocol that offers the speed of a video call with the scale of a global broadcast?

Interoperability

We’ve been working with others in the IETF WG community and beyond on interoperability of publishers, players and other parts of the MoQ ecosystem. So far, we’ve tested with:

Luke Curley’s moq.dev
Lorenzo Miniero’s imquic
Meta’s Moxygen
moq-rs
moq-js
Norsk
Vindral

The Road Ahead

The Internet's media stack is being refactored. For two decades, we've been forced to choose between latency, scale, and complexity. The compromises we made solved some problems, but also led to a fragmented ecosystem.

MoQ represents a promising new foundation—a chance to unify the silos and build the next generation of real-time applications on a scalable protocol. We're committed to helping build this foundation in the open, and we're just getting started.

MoQ is a realistic way forward, built on QUIC for future proofing, easier to understand than WebRTC, compatible with browsers unlike RTMP.

The protocol is evolving, the implementations are maturing, and the community is growing. Whether you're building the next generation of live streaming, exploring real-time collaboration, or pushing the boundaries of interactive media, consider whether MoQ may provide the foundation you need.

Availability and pricing

We want developers to start building with MoQ today. To make that possible MoQ at Cloudflare is in tech preview - this means it's available free of charge for testing (at any scale). Visit our developer homepage for updates and potential breaking changes.

Indie developers and large enterprises alike ask about pricing early in their adoption of new technologies. We will be transparent and clear about MoQ pricing. In general availability, self-serve customers should expect to pay 5 cents/GB outbound with no cost for traffic sent towards Cloudflare.

Enterprise customers can expect usual pricing in line with regular media delivery pricing, competitive with incumbent protocols. This means if you’re already using Cloudflare for media delivery, you should not be wary of adopting new technologies because of cost. We will support you.

If you’re interested in partnering with Cloudflare in adopting the protocol early or contributing to its development, please reach out to us at moq@cloudflare.com! Engineers excited about the future of the Internet are standing by.

Get involved:

Try the demo: https://moq.dev/publish/
Read the Internet draft: https://datatracker.ietf.org/doc/draft-ietf-moq-transport/
Contribute to the protocol’s development: https://datatracker.ietf.org/group/moq/documents/
Visit our developer homepage: https://developers.cloudflare.com/moq/

New URLPattern API brings improved pattern matching to Node.js and Cloudflare Workers

Yagiz Nizipli — Mon, 24 Mar 2025 13:00:00 GMT

Today, we are excited to announce that we have contributed an implementation of the URLPattern API to Node.js, and it is available starting with the v23.8.0 update. We've done this by adding our URLPattern implementation to Ada URL, the high-performance URL parser that now powers URL handling in both Node.js and Cloudflare Workers. This marks an important step toward bringing this API to the broader JavaScript ecosystem.

Cloudflare Workers has, from the beginning, embraced a standards-based JavaScript programming model, and Cloudflare was one of the founding companies for what has evolved into ECMA's 55th Technical Committee, focusing on interoperability between Web-interoperable runtimes like Workers, Node.js, Deno, and others. This contribution highlights and marks our commitment to this ongoing philosophy. Ensuring that all the JavaScript runtimes work consistently and offer at least a minimally consistent set of features is critical to ensuring the ongoing health of the ecosystem as a whole.

URLPattern API contribution is just one example of Cloudflare’s ongoing commitment to the open-source ecosystem. We actively contribute to numerous open-source projects including Node.js, V8, and Ada URL, while also maintaining our own open-source initiatives like workerd and wrangler. By upstreaming improvements to foundational technologies that power the web, we strengthen the entire developer ecosystem while ensuring consistent features across JavaScript runtimes. This collaborative approach reflects our belief that open standards and shared implementations benefit everyone - reducing fragmentation, improving developer experience and creating a better Internet.

What is URLPattern?

URLPattern is a standard published by the WHATWG (Web Hypertext Application Technology Working Group) which provides a pattern-matching system for URLs. This specification is available at urlpattern.spec.whatwg.org. The API provides developers with an easy-to-use, regular expression (regex)-based approach to handling route matching, with built-in support for named parameters, wildcards, and more complex pattern matching that works uniformly across all URL components.

URLPattern is part of the WinterTC Minimum Common API, a soon-to-be standardized subset of web platform APIs designed to ensure interoperability across JavaScript runtimes, particularly for server-side and non-browser environments, and includes other APIs such as URL and URLSearchParams.

Cloudflare Workers has supported URLPattern for a number of years now, reflecting our commitment to enabling developers to use standard APIs across both browsers and server-side JavaScript runtimes. Contributing to Node.js and unifying the URLPattern implementation simplifies the ecosystem by reducing fragmentation, while at the same time improving our own implementation in Cloudflare Workers by making it faster and more specification compliant.

The following example demonstrates how URLPattern is used by creating a pattern that matches URLs with a “/blog/:year/:month/:slug” path structure, then tests if one specific URL string matches this pattern, and extracts the named parameters from a second URL using the exec method.

const pattern = new URLPattern({
  pathname: '/blog/:year/:month/:slug'
});

if (pattern.test('https://example.com/blog/2025/03/urlpattern-launch')) {
  console.log('Match found!');
}

const result = pattern.exec('https://example.com/blog/2025/03/urlpattern-launch');
console.log(result.pathname.groups.year); // "2025"
console.log(result.pathname.groups.month); // "03"
console.log(result.pathname.groups.slug); // "urlpattern-launch"

The URLPattern constructor accepts pattern strings or objects defining patterns for individual URL components. The test() method returns a boolean indicating if a URL simply matches the pattern. The exec() method provides detailed match results including captured groups. Behind this simple API, there’s sophisticated machinery working behind the scenes:

When a URLPattern is used, it internally breaks down a URL, matching it against eight distinct components: protocol, username, password, hostname, port, pathname, search, and hash. This component-based approach gives the developer control over which parts of a URL to match.
Upon creation of the instance, URLPattern parses your input patterns for each component and compiles them internally into eight specialized regular expressions (one for each component type). This compilation step happens just once when you create an URLPattern object, optimizing subsequent matching operations.
During a match operation (whether using test() or exec()), these regular expressions are used to determine if the input matches the given properties. The test() method tells you if there’s a match, while exec() provides detailed information about what was matched, including any named capture groups from your pattern.

Fixing things along the way

While implementing URLPattern, we discovered some inconsistencies between the specification and the web-platform tests, a cross-browser test suite maintained by all major browsers to test conformance to web standard specifications. For instance, we found that URLs with non-special protocols (opaque-paths) and URLs with invalid characters in hostnames were not correctly defined and processed within the URLPattern specification. We worked actively with the Chromium and the Safari teams to address these issues.

URLPatterns constructed from hostname components that contain newline or tab characters were expected to fail in the corresponding web-platform tests. This was due to an inconsistency with the original URLPattern implementation and the URLPattern specification.

const pattern = new URL({ "hostname": "bad\nhostname" });
const matched = pattern.test({ "hostname": "badhostname" });
// This now returns true.

We opened several issues to document these inconsistencies and followed up with a pull-request to fix the specification, ensuring that all implementations will eventually converge on the same corrected behavior. This also resulted in fixing several inconsistencies in web-platform tests, particularly around handling certain types of white space (such as newline or tab characters) in hostnames.

Getting started with URLPattern

If you’re interested in using URLPattern today, you can:

Use it natively in modern browsers by accessing the global URLPattern class
Try it in Cloudflare Workers (which has had URLPattern support for some time, now with improved spec compliance and performance)
Try it in Node.js, starting from v23.8.0
Try it in NativeScript on iOS and Android, starting from v8.9.0
Try it in Deno

Here is a more complex example showing how URLPattern can be used for routing in a Cloudflare Worker — a common use case when building API endpoints or web applications that need to handle different URL paths efficiently and differently. The following example shows a pattern for REST APIs that matches both “/users” and “/users/:userId”

const routes = [
  new URLPattern({ pathname: '/users{/:userId}?' }),
];

export default {
  async fetch(request, env, ctx): Promise {
    const url = new URL(request.url);
    for (const route of routes) {
      const match = route.exec(url);
      if (match) {
        const { userId } = match.pathname.groups;
        if (userId) {
          return new Response(`User ID: ${userId}`);
        }
        return new Response('List of users');
      }
    }
    // No matching route found
    return new Response('Not Found', { status: 404 });
  },
} satisfies ExportedHandler;

What does the future hold?

The contribution of URLPattern to Ada URL and Node.js is just the beginning. We’re excited about the possibilities this opens up for developers across different JavaScript environments.

In the future, we expect to contribute additional improvements to URLPattern’s performance, enabling more use cases for web application routing. Additionally, efforts to standardize the URLPatternList proposal will help deliver faster matching capabilities for server-side runtimes. We’re excited about these developments and encourage you to try URLPattern in your projects today.

Try it and let us know what you think by creating an issue on the workerd repository. Your feedback is invaluable as we work to further enhance URLPattern.

We hope to do our part to build a unified Javascript ecosystem, and encourage others to do the same. This may mean looking for opportunities, such as we have with URLPattern, to share API implementations across backend runtimes. It could mean using or contributing to web-platform-tests if you are working on a server-side runtime or web-standard APIs, or it might mean joining WinterTC to help define web-interoperable standards for server-side JavaScript.

Enhance your website's security with Cloudflare’s free security.txt generator

Alexandra Moraru — Sun, 06 Oct 2024 23:00:00 GMT

A story of security and simplicity

Meet Georgia, a diligent website administrator at a growing e-commerce company. Every day, Georgia juggles multiple tasks, from managing server uptime to ensuring customer data security. One morning, Georgia receives an email from a security researcher who discovered a potential vulnerability on the website. The researcher struggled to find the right contact information, leading to delays in reporting the issue. Georgia realizes the need for a standardized way to communicate with security researchers, ensuring that vulnerabilities are reported swiftly and efficiently. This is where security.txt comes in.

Why security.txt matters

Security.txt is becoming a widely adopted standard among security-conscious organizations. By providing a common location and format for vulnerability disclosure information, it helps bridge the gap between security researchers and organizations. This initiative is supported by major companies and aligns with global security best practices. By offering an automated security.txt generator for free, we aim to empower all of our users to enhance their security measures without additional costs.

In 2020, Cloudflare published the Cloudflare Worker for the security.txt generator as an open-source project on GitHub, demonstrating our commitment to enhancing web security. This tool is actively used by Cloudflare to streamline vulnerability disclosure processes. However, over the past few years, we've observed a growing demand from our customers for an easier way to implement this standard. In response to this demand and to further support the adoption of security.txt across the Internet, we integrated it directly into our dashboard, making it simple for all our users to enhance their security practices. You can learn more about the initial release and its impact in our previous blog post here.

Who can use the free Cloudflare security.txt generator

This feature is designed for any Cloudflare user who manages a website, from small business owners to large enterprises, from developers to security professionals. Whether you're a seasoned security expert or new to website management, this tool provides an easy way to create and manage your security.txt file in your Cloudflare account, ensuring that you're prepared to handle vulnerability reports effectively.

Technical insights: leveraging Cloudflare’s tools

Our security.txt generator is seamlessly integrated into our dashboard. Here's how it works:

When the user enters their data in the Cloudflare Dashboard, the information is immediately stored in a highly available and geo-redundant PostgreSQL database. This ensures that all user data is securely kept and can be accessed quickly from any location within our global network.

Instead of creating a static file at the point of data entry, we use a dynamic approach. When a request for the security.txt file is made via the standard .well-known path specified by RFC 9116, our system dynamically constructs the file using the latest data from our database. This method ensures that any updates made by users are reflected in real-time without requiring manual intervention or file regeneration. The data entered by users is synchronized across Cloudflare’s global network using our Quicksilver technology. This allows for rapid propagation of changes, ensuring that any updates to the security.txt file are available almost instantaneously across all servers.

Each security.txt file includes an expiration timestamp, which is set during the initial configuration. This timestamp helps alert users when their information may be outdated, encouraging them to review and update their details regularly. For example, if a user sets an expiration date 365 days into the future, they will receive notifications as this date approaches, prompting them to refresh their information.

To ensure compliance with best practices, we also support optional fields such as encryption keys and signatures within the security.txt file. Users can link to their PGP keys for secure communications or include signatures to verify authenticity, enhancing trust with security researchers.

Users who prefer automation can manage their security.txt files through our API, allowing seamless integration with existing workflows and tools. This feature enables developers to programmatically update their security.txt configurations without manual dashboard interactions.

Users can also find a view of any missing security.txt files via Security Insights under Security Center.

Available now, and free for all Cloudflare users

By making this feature available to all our users at no cost, we aim to support the security efforts of our entire community, helping you protect your digital assets and foster trust with your audience.

With the introduction of our free security.txt generator, we're taking a significant step towards simplifying security management for everyone. Whether you're a small business owner or a large enterprise, this tool empowers you to adopt industry best practices and ensure that you're ready to handle vulnerability reports effectively. Set up security.txt on your websites today!

You can now use WebGPU in Cloudflare Workers

André Cruz — Wed, 27 Sep 2023 13:00:56 GMT

The browser as an app platform is real and stronger every day; long gone are the Browser Wars. Vendors and standard bodies have done amazingly well over the last years, working together and advancing web standards with new APIs that allow developers to build fast and powerful applications, finally comparable to those we got used to seeing in the native OS' environment.

Today, browsers can render web pages and run code that interfaces with an extensive catalog of modern Web APIs. Things like networking, rendering accelerated graphics, or even accessing low-level hardware features like USB devices are all now possible within the browser sandbox.

One of the most exciting new browser APIs that browser vendors have been rolling out over the last months is WebGPU, a modern, low-level GPU programming interface designed for high-performance 2D and 3D graphics and general purpose GPU compute.

Today, we are introducing WebGPU support to Cloudflare Workers. This blog will explain why it's important, why we did it, how you can use it, and what comes next.

The history of the GPU in the browser

To understand why WebGPU is a big deal, we must revisit history and see how browsers went from relying only on the CPU for everything in the early days to taking advantage of GPUs over the years.

In 2011, WebGL 1, a limited port of OpenGL ES 2.0, was introduced, providing an API for fast, accelerated 3D graphics in the browser for the first time. By then, this was somewhat of a revolution in enabling gaming and 3D visualizations in the browser. Some of the most popular 3D animation frameworks, like Three.js, launched in the same period. Who doesn't remember going to the (now defunct) Google Chrome Experiments page and spending hours in awe exploring the demos? Another option then was using the Flash Player, which was still dominant in the desktop environment, and their Stage 3D API.

Later, in 2017, based on the learnings and shortcomings of its predecessor, WebGL 2 was a significant upgrade and brought more advanced GPU capabilities like shaders and more flexible textures and rendering.

WebGL, however, has proved to be a steep and complex learning curve for developers who want to take control of things, do low-level 3D graphics using the GPU, and not use 3rd party abstraction libraries.

Furthermore and more importantly, with the advent of machine learning and cryptography, we discovered that GPUs are great not only at drawing graphics but can be used for other applications that can take advantage of things like high-speed data or blazing-fast matrix multiplications, and one can use them to perform general computation. This became known as GPGPU, short for general-purpose computing on graphics processing units.

With this in mind, in the native desktop and mobile operating system worlds, developers started using more advanced frameworks like CUDA, Metal, DirectX 12, or Vulkan. WebGL stayed behind. To fill this void and bring the browser up to date, in 2017, companies like Google, Apple, Intel, Microsoft, Kronos, and Mozilla created the GPU for Web Community Working Group to collaboratively design the successor of WebGL and create the next modern 3D graphics and computation capabilities APIs for the Web.

What is WebGPU

WebGPU was developed with the following advantages in mind:

Lower Level Access - WebGPU provides lower-level, direct access to the GPU vs. the high-level abstractions in WebGL. This enables more control over GPU resources.
Multi-Threading - WebGPU can leverage multi-threaded rendering and compute, allowing improved CPU/GPU parallelism compared to WebGL, which relies on a single thread.
Compute Shaders - First-class support for general-purpose compute shaders for GPGPU tasks, not just graphics. WebGL compute is limited.
Safety - WebGPU ensures memory and GPU access safety, avoiding common WebGL pitfalls.
Portability - WGSL shader language targets cross-API portability across GPU vendors vs. GLSL in WebGL.
Reduced Driver Overhead - The lower level Vulkan/Metal/D3D12 basis improves overhead vs. OpenGL drivers in WebGL.
Pipeline State Objects - Predefined pipeline configs avoid per-draw driver overhead in WebGL.
Memory Management - Finer-grained buffer and resource management vs. WebGL.

The “too long didn't read” version is that WebGPU provides lower-level control over the GPU hardware with reduced overhead. It's safer, has multi-threading, is focused on compute, not just graphics, and has portability advantages compared to WebGL.

If these aren't reasons enough to get excited, developers are also looking at WebGPU as an option for native platforms, not just the Web. For instance, you can use this C API that mimics the JavaScript specification. If you think about this and the power of WebAssembly, you can effectively have a truly platform-agnostic GPU hardware layer that you can use to develop platforms for any operating system or browser.

More than just graphics

As explained above, besides being a graphics API, WebGPU makes it possible to perform tasks such as:

Machine Learning - Implement ML applications like neural networks and computer vision algorithms using WebGPU compute shaders and matrices.
Scientific Computing - Perform complex scientific computation like physics simulations and mathematical modeling using the GPU.
High Performance Computing - Unlock breakthrough performance for parallel workloads by connecting WebGPU to languages like Rust, C/C++ via WebAssembly.

WGSL, the shader language for WebGPU, is what enables the general-purpose compute feature. Shaders, or more precisely, compute shaders, have no user-defined inputs or outputs and are used for computing arbitrary information. Here are some examples of simple WebGPU compute shaders if you want to learn more.

WebGPU in Workers

We've been watching WebGPU since the API was published. Its general-purpose compute features perfectly fit our Workers' ecosystem and capabilities and align well with our vision of providing our customers multiple compute and hardware options and bringing GPU workloads to our global network, close to clients.

Cloudflare also has a track record of pioneering support for emerging web standards on our network and services, accelerating their adoption for our customers. Examples of these are Web Crypto API, HTTP/2, HTTP/3, TLS 1.3, or Early hints, but there are more.

Bringing WebGPU to Workers was both natural and timely. Today, we are announcing that we have released a version of workerd, the open-sourced JavaScript / Wasm runtime that powers Cloudflare Workers, with WebGPU support, that you can start playing and developing applications with, locally.

Starting today anyone can run this on their personal computer and experiment with WebGPU-enabled workers. Implementing local development first allows us to put this API in the hands of our customers and developers earlier and get feedback that will guide the development of this feature for production use.

But before we dig into code examples, let's explain how we built it.

How we built WebGPU on top of Workers

To implement the WebGPU API, we took advantage of Dawn, an open-source library backed by Google, the same used in Chromium and Chrome, that provides applications with an implementation of the WebGPU standard. It also provides the webgpu.h headers file, the de facto reference for all the other implementations of the standard.

Dawn can interoperate with Linux, MacOS, and Windows GPUs by interfacing with each platform's native GPU frameworks. For example, when an application makes a WebGPU draw call, Dawn will convert that draw command into the equivalent Vulkan, Metal, or Direct3D 12 API call, depending on the platform.

From an application standpoint, Dawn handles the interactions with the underlying native graphics APIs that communicate directly with the GPU drivers. Dawn essentially acts as a middle layer that translates the WebGPU API calls into calls for the platform's native graphics API.

Cloudflare workerd is the underlying open-source runtime engine that executes Workers code. It shares most of its code with the same runtime that powers Cloudflare Workers' production environment but with some changes designed to make it more portable to other environments. We then have release cycles that aim to synchronize both codebases; more on that later. Workerd is also used with wrangler, our command-line tool for building and interacting with Cloudflare Workers, to support local development.

The WebGPU code that interfaces with the Dawn library can be found here, and can easily be enabled with a flag, checked here.

jsg::Ref Navigator::getGPU(CompatibilityFlags::Reader flags) {
  // is this a durable object?
  KJ_IF_MAYBE (actor, IoContext::current().getActor()) {
    JSG_REQUIRE(actor->getPersistent() != nullptr, TypeError,
                "webgpu api is only available in Durable Objects (no storage)");
  } else {
    JSG_FAIL_REQUIRE(TypeError, "webgpu api is only available in Durable Objects");
  };

  JSG_REQUIRE(flags.getWebgpu(), TypeError, "webgpu needs the webgpu compatibility flag set");

  return jsg::alloc();
}

The WebGPU API can only be accessed using Durable Objects, which are essentially global singleton instances of Cloudflare Workers. There are two important reasons for this:

WebGPU code typically wants to store the state between requests, for example, loading an AI model into the GPU memory once and using it multiple times for inference.
Not all Cloudflare servers have GPUs yet, so although the worker that receives the request is typically the closest one available, the Durable Object that uses WebGPU will be instantiated where there are GPU resources available, which may not be on the same machine.

Using Durable Objects instead of regular Workers allow us to address both of these issues.

The WebGPU Hello World in Workers

Wrangler uses Miniflare 3, a fully-local simulator for Workers, which in turn is powered by workerd. This means you can start experimenting and doing WebGPU code locally on your machine right now before we prepare things in our production environment.

Let’s get coding then.

Since Workers doesn't render graphics yet, we started with implementing the general-purpose GPU (GPGPU) APIs in the WebGPU specification. In other words, we fully support the part of the API that the compute shaders and the compute pipeline require, but we are not yet focused on fragment or vertex shaders used in rendering pipelines.

Here’s a typical “hello world” in WebGPU. This Durable Object script will output the name of the GPU device that workerd found in your machine to your console.

const adapter = await navigator.gpu.requestAdapter();
const adapterInfo = await adapter.requestAdapterInfo(["device"]);
console.log(adapterInfo.device);

A more interesting example, though, is a simple compute shader. In this case, we will fill a results buffer with an incrementing value taken from the iteration number via global_invocation_id.

For this, we need two buffers, one to store the results of the computations as they happen (storageBuffer) and another to copy the results at the end (mappedBuffer).

We then dispatch four workgroups, meaning that the increments can happen in parallel. This parallelism and programmability are two key reasons why compute shaders and GPUs provide an advantage for things like machine learning inference workloads. Other advantages are:

Bandwidth - GPUs have a very high memory bandwidth, up to 10-20x more than CPUs. This allows fast reading and writing of all the model parameters and data needed for inference.
Floating-point performance - GPUs are optimized for high floating point operation throughput, which are used extensively in neural networks. They can deliver much higher TFLOPs than CPUs.

Let’s look at the code:

// Create device and command encoder
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
const encoder = device.createCommandEncoder();

// Storage buffer
const storageBuffer = device.createBuffer({
  size: 4 * Float32Array.BYTES_PER_ELEMENT, // 4 float32 values
  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC,
});

// Mapped buffer
const mappedBuffer = device.createBuffer({
  size: 4 * Float32Array.BYTES_PER_ELEMENT,
  usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
});

// Create shader that writes incrementing numbers to storage buffer
const computeShaderCode = `
    @group(0) @binding(0)
    var result : array;

    @compute @workgroup_size(1)
    fn main(@builtin(global_invocation_id) gid : vec3) {
      result[gid.x] = f32(gid.x);
    }
`;

// Create compute pipeline
const computePipeline = device.createComputePipeline({
  layout: "auto",
  compute: {
    module: device.createShaderModule({ code: computeShaderCode }),
    entryPoint: "main",
  },
});

// Bind group
const bindGroup = device.createBindGroup({
  layout: computePipeline.getBindGroupLayout(0),
  entries: [{ binding: 0, resource: { buffer: storageBuffer } }],
});

// Dispatch compute work
const computePass = encoder.beginComputePass();
computePass.setPipeline(computePipeline);
computePass.setBindGroup(0, bindGroup);
computePass.dispatchWorkgroups(4);
computePass.end();

// Copy from storage to mapped buffer
encoder.copyBufferToBuffer(
  storageBuffer,
  0,
  mappedBuffer,
  0,
  4 * Float32Array.BYTES_PER_ELEMENT //mappedBuffer.size
);

// Submit and read back result
const gpuBuffer = encoder.finish();
device.queue.submit([gpuBuffer]);

await mappedBuffer.mapAsync(GPUMapMode.READ);
console.log(new Float32Array(mappedBuffer.getMappedRange()));
// [0, 1, 2, 3]

Now that we covered the basics of WebGPU and compute shaders, let's move to something more demanding. What if we could perform machine learning inference using Workers and GPUs?

ONNX WebGPU demo

The ONNX runtime is a popular open-source cross-platform, high performance machine learning inferencing accelerator. Wonnx is a GPU-accelerated version of the same engine, written in Rust, that can be compiled to WebAssembly and take advantage of WebGPU in the browser. We are going to run it in Workers using a combination of workers-rs, our Rust bindings for Cloudflare Workers, and the workerd WebGPU APIs.

For this demo, we are using SqueezeNet. This small image classification model can run under lower resources but still achieves similar levels of accuracy on the ImageNet image classification validation dataset as larger models like AlexNet.

In essence, our worker will receive any uploaded image and attempt to classify it according to the 1000 ImageNet classes. Once ONNX runs the machine learning model using the GPU, it will return the list of classes with the highest probability scores. Let’s go step by step.

First we load the model from R2 into the GPU memory the first time the Durable Object is called:

#[durable_object]
pub struct Classifier {
    env: Env,
    session: Option,
}

impl Classifier {
    async fn ensure_session(&mut self) -> Result<()> {
        match self.session {
            Some(_) => worker::console_log!("DO already has a session"),
            None => {
                // No session, so this should be the first request. In this case
                // we will fetch the model from R2, build a wonnx session, and
                // store it for subsequent requests.
                let model_bytes = fetch_model(&self.env).await?;
                let session = wonnx::Session::from_bytes(&model_bytes)
                    .await
                    .map_err(|err| err.to_string())?;
                worker::console_log!("session created in DO");
                self.session = Some(session);
            }
        };
        Ok(())
    }
}

This is only required once, when the Durable Object is instantiated. For subsequent requests, we retrieve the model input tensor, call the existing session for the inference, and return to the calling worker the result tensor converted to JSON:

        let request_data: ArrayBase, Dim<[usize; 4]>> =
            serde_json::from_str(&req.text().await?)?;
        let mut input_data = HashMap::new();
        input_data.insert("data".to_string(), request_data.as_slice().unwrap().into());

        let result = self
            .session
            .as_ref()
            .unwrap() // we know the session exists
            .run(&input_data)
            .await
            .map_err(|err| err.to_string())?;
...
        let probabilities: Vec = result
            .into_iter()
            .next()
            .ok_or("did not obtain a result tensor from session")?
            .1
            .try_into()
            .map_err(|err: TensorConversionError| err.to_string())?;

        let do_response = serde_json::to_string(&probabilities)?;
        Response::ok(do_response)

On the Worker script itself, we load the uploaded image and pre-process it into a model input tensor:

    let image_file: worker::File = match req.form_data().await?.get("file") {
        Some(FormEntry::File(buf)) => buf,
        Some(_) => return Response::error("`file` part of POST form must be a file", 400),
        None => return Response::error("missing `file`", 400),
    };
    let image_content = image_file.bytes().await?;
    let image = load_image(&image_content)?;

Finally, we call the GPU Durable Object, which runs the model and returns the most likely classes of our image:

    let probabilities = execute_gpu_do(image, stub).await?;
    let mut probabilities = probabilities.iter().enumerate().collect::>();
    probabilities.sort_unstable_by(|a, b| b.1.partial_cmp(a.1).unwrap());
    Response::ok(LABELS[probabilities[0].0])

We packaged this demo in a public repository, so you can also run it. Make sure that you have a Rust compiler, Node.js, Git and curl installed, then clone the repository:

git clone https://github.com/cloudflare/workers-wonnx.git
cd workers-wonnx

Upload the model to the local R2 simulator:

npx wrangler@latest r2 object put model-bucket-dev/opt-squeeze.onnx --local --file models/opt-squeeze.onnx

And then run the Worker locally:

npx wrangler@latest dev

With the Worker running and waiting for requests you can then open another terminal window and upload one of the image examples in the same repository using curl:

> curl -F "file=@images/pelican.jpeg" http://localhost:8787
n02051845 pelican

If everything goes according to plan the result of the curl command will be the most likely class of the image.

Next steps and final words

Over the upcoming weeks, we will merge the workerd WebGPU code in the Cloudflare Workers production environment and make it available globally, on top of our growing GPU nodes fleet. We didn't do it earlier because that environment is subject to strict security and isolation requirements. For example, we can't break the security model of our process sandbox and have V8 talking to the GPU hardware directly, that would be a problem; we must create a configuration where another process is closer to the GPU and use IPC (inter-process communication) to talk to it. Other things like managing resource allocation and billing are being sorted out.

For now, we wanted to get the good news out that we will support WebGPU in Cloudflare Workers and ensure that you can start playing and coding with it today and learn from it. WebGPU and general-purpose computing on GPUs is still in its early days. We presented a machine-learning demo, but we can imagine other applications taking advantage of this new feature, and we hope you can show us some of them.

As usual, you can talk to us on our Developers Discord or the Community forum; the team will be listening. We are eager to hear from you and learn about what you're building.

Privacy Gateway: a privacy preserving proxy built on Internet standards

Mari Galicer — Thu, 27 Oct 2022 13:01:00 GMT

If you’re running a privacy-oriented application or service on the Internet, your options to provably protect users’ privacy are limited. You can minimize logs and data collection but even then, at a network level, every HTTP request needs to come from somewhere. Information generated by HTTP requests, like users’ IP addresses and TLS fingerprints, can be sensitive especially when combined with application data.

Meaningful improvements to your users’ privacy require a change in how HTTP requests are sent from client devices to the server that runs your application logic. This was the motivation for Privacy Gateway: a service that relays encrypted HTTP requests and responses between a client and application server. With Privacy Gateway, Cloudflare knows where the request is coming from, but not what it contains, and applications can see what the request contains, but not where it comes from. Neither Cloudflare nor the application server has the full picture, improving end-user privacy.

We recently deployed Privacy Gateway for Flo Health Inc., a leading female health app, for the launch of their Anonymous Mode. With Privacy Gateway in place, all request data for Anonymous Mode users is encrypted between the app user and Flo, which prevents Flo from seeing the IP addresses of those users and Cloudflare from seeing the contents of that request data.

With Privacy Gateway in place, several other privacy-critical applications are possible:

Browser developers can collect user telemetry in a privacy-respecting manner– what extensions are installed, what defaults a user might have changed — while removing what is still a potentially personal identifier (the IP address) from that data.
Users can visit a healthcare site to report a Covid-19 exposure without worrying that the site is tracking their IP address and/or location.
DNS resolvers can serve DNS queries without linking who made the request with what website they’re visiting – a pattern we’ve implemented with Oblivious DNS.Privacy Gateway is based on Oblivious HTTP (OHTTP), an emerging IETF standard and is built upon standard hybrid public-key cryptography.

How does it work?

The main innovation in the Oblivious HTTP standard – beyond a basic proxy service – is that these messages are encrypted to the application’s server, such that Privacy Gateway learns nothing of the application data beyond the source and destination of each message.

Privacy Gateway enables application developers and platforms, especially those with strong privacy requirements, to build something that closely resembles a “Mixnet”: an approach to obfuscating the source and destination of a message across a network. To that end, Privacy Gateway consists of three main components:

Client: the user’s device, or any client that’s configured to forward requests to Privacy Gateway.
Privacy Gateway: a service operated by Cloudflare and designed to relay requests between the Client and the Gateway, without being able to observe the contents within.
Application server: the origin or application web server responsible for decrypting requests from clients, and encrypting responses back.

If you were to imagine request data as the contents of the envelope (a letter) and the IP address and request metadata as the address on the outside, with Privacy Gateway, Cloudflare is able to see the envelope’s address and safely forward it to its destination without being able to see what’s inside.

An Oblivious HTTP transaction using Privacy Gateway

In slightly more detail, the data flow is as follows:

Client encapsulates an HTTP request using the public key of the application server, and sends it to Privacy Gateway over an HTTPS connection.
Privacy Gateway forwards the request to the server over its own, separate HTTPS connection with the application server.
The application server decapsulates the request, forwarding it to the target server which can produce the response.
The application server returns an encapsulated response to Privacy Gateway, which then forwards the result to the client.As specified in the protocol, requests from the client to the server are encrypted using HPKE, a state-of-the-art standard for public key encryption – which you can read more about here. We’ve taken additional measures to ensure that OHTTP’s use of HPKE is secure by conducting a formal analysis of the protocol, and we expect to publish a deeper analysis in the coming weeks.

How Privacy Gateway improves end-user privacy

This interaction offers two types of privacy, which we informally refer to as request privacy and client privacy.

Request privacy means that the application server does not learn information that would otherwise be revealed by an HTTP request, such as IP address, geolocation, TLS and HTTPS fingerprints, and so on. Because Privacy Gateway uses a separate HTTPS connection between itself and the application server, all of this per-request information revealed to the application server represents that of Privacy Gateway, not of the client. However, developers need to take care to not send personally identifying information in the contents of requests. If the request, once decapsulated, includes information like users’ email, phone number, or credit card info, for example, Privacy Gateway will not meaningfully improve privacy.

Client privacy is a stronger notion. Because Cloudflare and the application server are not colluding to share individual user’s data, from the server’s perspective, each individual transaction came from some unknown client behind Privacy Gateway. In other words, a properly configured Privacy Gateway deployment means that applications cannot link any two requests to the same client. In particular, with Privacy Gateway, privacy loves company. If there is only one end-user making use of Privacy Gateway, then it only provides request privacy (since the client IP address remains hidden from the Gateway). It would not provide client privacy, since the server would know that each request corresponds to the same, single client. Client privacy requires that there be many users of the system, so the application server cannot make this determination.

To better understand request and client privacy, consider the following HTTP request between a client and server:

Normal HTTP configuration with a client anonymity set of size 1

If a client connects directly to the server (or “Gateway” in OHTTP terms), the server is likely to see information about the client, including the IP address, TLS cipher used, and a degree of location data based on that IP address:

- ipAddress: 192.0.2.33 # the client’s real IP address
- ASN: 7922
- AS Organization: Comcast Cable
- tlsCipher: AEAD-CHACHA20-POLY1305-SHA256 # potentially unique
- tlsVersion: TLSv1.3
- Country: US
- Region: California
- City: Campbell

There’s plenty of sensitive information here that might be unique to the end-user. In other words, the connection offers neither request nor client privacy.

With Privacy Gateway, clients do not connect directly to the application server itself. Instead, they connect to Privacy Gateway, which in turn connects to the server. This means that the server only observes connections from Privacy Gateway, not individual connections from clients, yielding a different view:

- ipAddress: 104.16.5.5 # a Cloudflare IP
- ASN: 13335
- AS Organization: Cloudflare
- tlsCipher: ECDHE-ECDSA-AES128-GCM-SHA256 # shared across several clients
- tlsVersion: TLSv1.3
- Country: US
- Region: California
- City: Los Angeles

Privacy Gateway configuration with a client anonymity set of size k

This is request privacy. All information about the client’s location and identity are hidden from the application server. And all details about the application data are hidden from Privacy Gateway. For sensitive applications and protocols like DNS, keeping this metadata separate from the application data is an important step towards improving end-user privacy.

Moreover, applications should take care to not reveal sensitive, per-client information in their individual requests. Privacy Gateway cannot guarantee that applications do not send identifying info – such as email addresses, full names, etc – in request bodies, since it cannot observe plaintext application data. Applications which reveal user identifying information in requests may violate client privacy, but not request privacy. This is why – unlike our full application-level Privacy Proxy product – Privacy Gateway is not meant to be used as a generic proxy-based protocol for arbitrary applications and traffic. It is meant to be a special purpose protocol for sensitive applications, including DNS (as is evidenced by Oblivious DNS-over-HTTPS), telemetry data, or generic API requests as discussed above.

Integrating Privacy Gateway into your application

Integrating with Privacy Gateway requires applications to implement the client and server side of the OHTTP protocol. Let’s walk through what this entails.

Server Integration

The server-side part of the protocol is responsible for two basic tasks:

Publishing a public key for request encapsulation; and
Decrypting encapsulated client requests, processing the resulting request, and encrypting the corresponding response.

A public encapsulation key, called a key configuration, consists of a key identifier (so the server can support multiple keys at once for rotation purposes), cryptographic algorithm identifiers for encryption and decryption, and a public key:

HPKE Symmetric Algorithms {
  HPKE KDF ID (16),
  HPKE AEAD ID (16),
}

OHTTP Key Config {
  Key Identifier (8),
  HPKE KEM ID (16),
  HPKE Public Key (Npk * 8),
  HPKE Symmetric Algorithms Length (16),
  HPKE Symmetric Algorithms (32..262140),
}

Clients need this public key to create their request, and there are lots of ways to do this. Servers could fix a public key and then bake it into their application, but this would require a software update to rotate the key. Alternatively, clients could discover the public key some other way. Many discovery mechanisms exist and vary based on your threat model – see this document for more details. To start, a simple approach is to have clients fetch the public key directly from the server over some API. Below is a snippet of the API that our open source OHTTP server provides:

func (s *GatewayResource) configHandler(w http.ResponseWriter, r *http.Request) {
	config, err := s.Gateway.Config(s.keyID)
	if err != nil {
		http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
		return
	}
	w.Write(config.Marshal())
}

Once public key generation and distribution is solved, the server then needs to handle encapsulated requests from clients. For each request, the server needs to decrypt the request, translate the plaintext to a corresponding HTTP request that can be resolved, and then encrypt the resulting response back to the client.

Open source OHTTP libraries typically offer functions for request decryption and response encryption, whereas plaintext translation from binary HTTP to an HTTP request is handled separately. For example, our open source server delegates this translation to a different library that is specific to how Go HTTP requests are represented in memory. In particular, the function to translate from a plaintext request to a Go HTTP request is done with a function that has the following signature:

func UnmarshalBinaryRequest(data []byte) (*http.Request, error) {
	...
}

Conversely, translating a Go HTTP response to a plaintext binary HTTP response message is done with a function that has the following signature:

type BinaryResponse http.Response

func (r *BinaryResponse) Marshal() ([]byte, error) {
	...
}

While there exist several open source libraries that one can use to implement OHTTP server support, we’ve packaged all of it up in our open source server implementation available here. It includes instructions for building, testing, and deploying to make it easy to get started.

Client integration

Naturally, the client-side behavior of OHTTP mirrors that of the server. In particular, the client must:

Discover or obtain the server public key; and
Encode and encrypt HTTP requests, send them to Privacy Gateway, and decrypt and decode the HTTP responses.

Discovery of the server public key depends on the server’s chosen deployment model. For example, if the public key is available over an API, clients can simply fetch it directly:

$ curl https://server.example/ohttp-configs > config.bin

Encoding, encrypting, decrypting, and decoding are again best handled by OHTTP libraries when available. With these functions available, building client support is rather straightforward. A trivial example Go client using the library functions linked above is as follows:

configEnc := ... // encoded public key
config, err := ohttp.UnmarshalPublicConfig(configEnc)
if err != nil {
	return err
}

request, err := http.NewRequest(http.MethodGet, "https://test.example/index.html", nil)
if err != nil {
	return err
}

binaryRequest := ohttp.BinaryRequest(*request)
encodedRequest, err := binaryRequest.Marshal()
if err != nil {
	return err
}

ohttpClient := ohttp.NewDefaultClient(config)
encapsulatedReq, reqContext, err := ohttpClient.EncapsulateRequest(encodedRequest)

relayRequest, err := http.NewRequest(http.MethodPost, "https://relay.example", bytes.NewReader(encapsulatedReq.Marshal()))
if err != nil {
	return err
}
relayRequest.Header.Set("Content-Type", "message/ohttp-req")

client := http.Client{}
relayResponse, err := client.Do(relayRequest)
if err != nil {
	return err
}
bodyBytes, err := ioutil.ReadAll(relayResponse.Body)
if err != nil {
	return err
}
encapsulatedResp, err := ohttp.UnmarshalEncapsulatedResponse(bodyBytes)
if err != nil {
	return err
}

receivedResp, err := reqContext.DecapsulateResponse(encapsulatedResp)
if err != nil {
	return err
}

response, err := ohttp.UnmarshalBinaryResponse(receivedResp)
if err != nil {
	return err
}

fmt.Println(response)

A standalone client like this isn’t likely very useful to you if you have an existing application. To help integration into your existing application, we created a sample OHTTP client library that’s compatible with iOS and macOS applications. Additionally, if there’s language or platform support you would like to see to help ease integration on either or the client or server side, please let us know!

Interested?

Privacy Gateway is currently in early access – available to select privacy-oriented companies and partners. If you’re interested, please get in touch.

HPKE: Standardizing public-key encryption (finally!)

Christopher Wood — Thu, 24 Feb 2022 23:12:36 GMT

For the last three years, the Crypto Forum Research Group of the Internet Research Task Force (IRTF) has been working on specifying the next generation of (hybrid) public-key encryption (PKE) for Internet protocols and applications. The result is Hybrid Public Key Encryption (HPKE), published today as RFC 9180.

HPKE was made to be simple, reusable, and future-proof by building upon knowledge from prior PKE schemes and software implementations. It is already in use in a large assortment of emerging Internet standards, including TLS Encrypted Client Hello and Oblivious DNS-over-HTTPS, and has a large assortment of interoperable implementations, including one in CIRCL. This article provides an overview of this new standard, going back to discuss its motivation, design goals, and development process.

A primer on public-key encryption

Public-key cryptography is decades old, with its roots going back to the seminal work of Diffie and Hellman in 1976, entitled “New Directions in Cryptography.” Their proposal – today called Diffie-Hellman key exchange – was a breakthrough. It allowed one to transform small secrets into big secrets for cryptographic applications and protocols. For example, one can bootstrap a secure channel for exchanging messages with confidentiality and integrity using a key exchange protocol.

Unauthenticated Diffie-Hellman key exchange

In this example, Sender and Receiver exchange freshly generated public keys with each other, and then combine their own secret key with their peer’s public key. Algebraically, this yields the same value \(g^{xy} = (g^x)^y = (g^y)^x\). Both parties can then use this as a shared secret for performing other tasks, such as encrypting messages to and from one another.

The Transport Layer Security (TLS) protocol is one such application of this concept. Shortly after Diffie-Hellman was unveiled to the world, RSA came into the fold. The RSA cryptosystem is another public key algorithm that has been used to build digital signature schemes, PKE algorithms, and key transport protocols. A key transport protocol is similar to a key exchange algorithm in that the sender, Alice, generates a random symmetric key and then encrypts it under the receiver’s public key. Upon successful decryption, both parties then share this secret key. (This fundamental technique, known as static RSA, was used pervasively in the context of TLS. See this post for details about this old technique in TLS 1.2 and prior versions.)

At a high level, PKE between a sender and receiver is a protocol for encrypting messages under the receiver’s public key. One way to do this is via a so-called non-interactive key exchange protocol.

To illustrate how this might work, let \(g^y\) be the receiver’s public key, and let \(m\) be a message that one wants to send to this receiver. The flow looks like this:

The sender generates a fresh private and public key pair, \((x, g^x)\).
The sender computes \(g^{xy} = (g^y)^x\), which can be done without involvement from the receiver, that is, non-interactively.
The sender then uses this shared secret to derive an encryption key, and uses this key to encrypt m.
The sender packages up \(g^x\) and the encryption of \(m\), and sends both to the receiver.

The general paradigm here is called "hybrid public-key encryption" because it combines a non-interactive key exchange based on public-key cryptography for establishing a shared secret, and a symmetric encryption scheme for the actual encryption. To decrypt \(m\), the receiver computes the same shared secret \(g^{xy} = (g^x)^y\), derives the same encryption key, and then decrypts the ciphertext.

Conceptually, PKE of this form is quite simple. General designs of this form date back for many years and include the Diffie-Hellman Integrated Encryption System (DHIES) and ElGamal encryption. However, despite this apparent simplicity, there are numerous subtle design decisions one has to make in designing this type of protocol, including:

What type of key exchange protocol should be used for computing the shared secret? Should this protocol be based on modern elliptic curve groups like Curve25519? Should it support future post-quantum algorithms?
How should encryption keys be derived? Are there other keys that should be derived? How should additional application information be included in the encryption key derivation, if at all?
What type of encryption algorithm should be used? What types of messages should be encrypted?
How should sender and receiver encode and exchange public keys?

These and other questions are important for a protocol, since they are required for interoperability. That is, senders and receivers should be able to communicate without having to use the same source code.

There have been a number of efforts in the past to standardize PKE, most of which focus on elliptic curve cryptography. Some examples of past standards include: ANSI X9.63 (ECIES), IEEE 1363a, ISO/IEC 18033-2, and SECG SEC 1.

Timeline of related standards and software

A paper by Martinez et al. provides a thorough and technical comparison of these different standards. The key points are that all these existing schemes have shortcomings. They either rely on outdated or not-commonly-used primitives such as RIPEMD and CMAC-AES, lack accommodations for moving to modern primitives (e.g., AEAD algorithms), lack proofs of IND-CCA2 security, or, importantly, fail to provide test vectors and interoperable implementations.

The lack of a single standard for public-key encryption has led to inconsistent and often non-interoperable support across libraries. In particular, hybrid PKE implementation support is fractured across the community, ranging from the hugely popular and simple-to-use NaCl box and libsodium box seal implementations based on modern algorithm variants like X-SalsaPoly1305 for authenticated encryption, to BouncyCastle implementations based on “classical” algorithms like AES and elliptic curves.

Despite the lack of a single standard, this hasn’t stopped the adoption of ECIES instantiations for widespread and critical applications. For example, the Apple and Google Exposure Notification Privacy-preserving Analytics (ENPA) platform uses ECIES for public-key encryption.

When designing protocols and applications that need a simple, reusable, and agile abstraction for public-key encryption, existing standards are not fit for purpose. That’s where HPKE comes into play.

Construction and design goals

HPKE is a public-key encryption construction that is designed from the outset to be simple, reusable, and future-proof. It lets a sender encrypt arbitrary-length messages under a receiver’s public key, as shown below. You can try this out in the browser at Franziskus Kiefer’s blog post on HPKE!

HPKE overview

HPKE is built in stages. It starts with a Key Encapsulation Mechanism (KEM), which is similar to the key transport protocol described earlier and, in fact, can be constructed from the Diffie-Hellman key agreement protocol. A KEM has two algorithms: Encapsulation and Decapsulation, or Encap and Decap for short. The Encap algorithm creates a symmetric secret and wraps it for a public key such that only the holder of the corresponding private key can unwrap it. An attacker knowing this encapsulated key cannot recover even a single bit of the shared secret. Decap takes the encapsulated key and the private key associated with the public key, and computes the original shared secret. From this shared secret, HPKE computes a series of derived keys that are then used to encrypt and authenticate plaintext messages between sender and receiver.

This simple construction was driven by several high-level design goals and principles. We will discuss these goals and how they were met below.

Algorithm agility

Different applications, protocols, and deployments have different constraints, and locking any single use case into a specific (set of) algorithm(s) would be overly restrictive. For example, some applications may wish to use post-quantum algorithms when available, whereas others may wish to use different authenticated encryption algorithms for symmetric-key encryption. To accomplish this goal, HPKE is designed as a composition of a Key Encapsulation Mechanism (KEM), Key Derivation Function (KDF), and Authenticated Encryption Algorithm (AEAD). Any combination of the three algorithms yields a valid instantiation of HPKE, subject to certain security constraints about the choice of algorithm.

One important point worth noting here is that HPKE is not a protocol, and therefore does nothing to ensure that sender and receiver agree on the HPKE ciphersuite or shared context information. Applications and protocols that use HPKE are responsible for choosing or negotiating a specific HPKE ciphersuite that fits their purpose. This allows applications to be opinionated about their choice of algorithms to simplify implementation and analysis, as is common with protocols like WireGuard, or be flexible enough to support choice and agility, as is the approach taken with TLS.

Authentication modes

At a high level, public-key encryption ensures that only the holder of the private key can decrypt messages encrypted for the corresponding public key (being able to decrypt the message is an implicit authentication of the receiver.) However, there are other ways in which applications may wish to authenticate messages from sender to receiver. For example, if both parties have a pre-shared key, they may wish to ensure that both can demonstrate possession of this pre-shared key as well. It may also be desirable for senders to demonstrate knowledge of their own private key in order for recipients to decrypt the message (this functionally is similar to signing an encryption, but has some subtle and important differences).

To support these various use cases, HPKE admits different modes of authentication, allowing various combinations of pre-shared key and sender private key authentication. The additional private key contributes to the shared secret between the sender and receiver, and the pre-shared key contributes to the derivation of the application data encryption secrets. This process is referred to as the “key schedule”, and a simplified version of it is shown below.

Simplified HPKE key schedule

These modes come at a price, however: not all KEM algorithms will work with all authentication modes. For example, for most post-quantum KEM algorithms there isn’t a private key authentication variant known.

Reusability

The core of HPKE’s construction is its key schedule. It allows secrets produced and shared with KEMs and pre-shared keys to be mixed together to produce additional shared secrets between sender and receiver for performing authenticated encryption and decryption. HPKE allows applications to build on this key schedule without using the corresponding AEAD functionality, for example, by exporting a shared application-specific secret. Using HPKE in an “export-only” fashion allows applications to use other, non-standard AEAD algorithms for encryption, should that be desired. It also allows applications to use a KEM different from those specified in the standard, as is done in the proposed TLS AuthKEM draft.

Interface simplicity

HPKE hides the complexity of message encryption from callers. Encrypting a message with additional authenticated data from sender to receiver for their public key is as simple as the following two calls:

// Create an HPKE context to send messages to the receiver
encapsulatedKey, senderContext = SetupBaseS(receiverPublicKey, ”shared application info”)

// AEAD encrypt the message using the context
ciphertext = senderContext.Seal(aad, message)

In fact, many implementations are likely to offer a simplified “single-shot” interface that does context creation and message encryption with one function call.

Notice that this interface does not expose anything like nonce ("number used once") or sequence numbers to the callers. The HPKE context manages nonce and sequence numbers internally, which means the application is responsible for message ordering and delivery. This was an important design decision done to hedge against key and nonce reuse, which can be catastrophic for security.

Consider what would be necessary if HPKE delegated nonce management to the application. The sending application using HPKE would need to communicate the nonce along with each ciphertext value for the receiver to successfully decrypt the message. If this nonce was ever reused, then security of the AEAD may fall apart. Thus, a sending application would necessarily need some way to ensure that nonces were never reused. Moreover, by sending the nonce to the receiver, the application is effectively implementing a message sequencer. The application could just as easily implement and use this sequencer to ensure in-order message delivery and processing. Thus, at the end of the day, exposing the nonce seemed both harmful and, ultimately, redundant.

Wire format

Another hallmark of HPKE is that all messages that do not contain application data are fixed length. This means that serializing and deserializing HPKE messages is trivial and there is no room for application choice. In contrast, some implementations of hybrid PKE deferred choice of wire format details, such as whether to use elliptic curve point compression, to applications. HPKE handles this under the KEM abstraction.

Development process

HPKE is the result of a three-year development cycle between industry practitioners, protocol designers, and academic cryptographers. In particular, HPKE built upon prior art relating to public-key encryption, iterated on a design and specification in a tight specification, implementation, experimentation, and analysis loop, with an ultimate goal towards real world use.

HPKE development process

This process isn’t new. TLS 1.3 and QUIC famously demonstrated this as an effective way of producing high quality technical specifications that are maximally useful for their consumers.

One particular point worth highlighting in this process is the value of interoperability and analysis. From the very first draft, interop between multiple, independent implementations was a goal. And since then, every revision was carefully checked by multiple library maintainers for soundness and correctness. This helped catch a number of mistakes and improved overall clarity of the technical specification.

From a formal analysis perspective, HPKE brought novel work to the community. Unlike protocol design efforts like those around TLS and QUIC, HPKE was simpler, but still came with plenty of sharp edges. As a new cryptographic construction, analysis was needed to ensure that it was sound and, importantly, to understand its limits. This analysis led to a number of important contributions to the community, including a formal analysis of HPKE, new understanding of the limits of ChaChaPoly1305 in a multi-user security setting, as well as a new CFRG specification documenting limits for AEAD algorithms. For more information about the analysis effort that went into HPKE, check out this companion blog by Benjamin Lipp, an HPKE co-author.

HPKE’s future

While HPKE may be a new standard, it has already seen a tremendous amount of adoption in the industry. As mentioned earlier, it’s an essential part of the TLS Encrypted Client Hello and Oblivious DoH standards, both of which are deployed protocols on the Internet today. Looking ahead, it’s also been integrated as part of the emerging Oblivious HTTP, Message Layer Security, and Privacy Preserving Measurement standards. HPKE’s hallmark is its generic construction that lets it adapt to a wide variety of application requirements. If an application needs public-key encryption with a key-committing AEAD, one can simply instantiate HPKE using a key-committing AEAD.

Moreover, there exists a huge assortment of interoperable implementations built on popular cryptographic libraries, including OpenSSL, BoringSSL, NSS, and CIRCL. There are also formally verified implementations in hacspec and F*; check out this blog post for more details. The complete set of known implementations is tracked here. More implementations will undoubtedly follow in their footsteps.

HPKE is ready for prime time. I look forward to seeing how it simplifies protocol design and development in the future. Welcome, RFC 9180.

Cloudflare and the IETF

Jonathan Hoyland — Wed, 13 Oct 2021 12:59:37 GMT

The Internet, far from being just a series of tubes, is a huge, incredibly complex, decentralized system. Every action and interaction in the system is enabled by a complicated mass of protocols woven together to accomplish their task, each handing off to the next like trapeze artists high above a virtual circus ring. Stop to think about details, and it is a marvel.

Consider one of the simplest tasks enabled by the Internet: Sending a message from sender to receiver.

The location (address) of a receiver is discovered using DNS, a connection between sender and receiver is established using a transport protocol like TCP, and (hopefully!) secured with a protocol like TLS. The sender's message is encoded in a format that the receiver can recognize and parse, like HTTP, because the two disparate parties need a common language to communicate. Then, ultimately, the message is sent and carried in an IP datagram that is forwarded from sender to receiver based on routes established with BGP.

Even an explanation this dense is laughably oversimplified. For example, the four protocols listed are just the start, and ignore many others with acronyms of their own. The truth is that things are complicated. And because things are complicated, how these protocols and systems interact and influence the user experience on the Internet is complicated. Extra round trips to establish a secure connection increase the amount of time before useful work is done, harming user performance. The use of unauthenticated or unencrypted protocols reveals potentially sensitive information to the network or, worse, to malicious entities, which harms user security and privacy. And finally, consolidation and centralization — seemingly a prerequisite for reducing costs and protecting against attacks — makes it challenging to provide high availability even for essential services. (What happens when that one system goes down or is otherwise unavailable, or to extend our earlier metaphor, when a trapeze isn’t there to catch?)

These four properties — performance, security, privacy, and availability — are crucial to the Internet. At Cloudflare, and especially in the Cloudflare Research team, where we use all these various protocols, we're committed to improving them at every layer in the stack. We work on problems as diverse as helping network security and privacy with TLS 1.3 and QUIC, improving DNS privacy via Oblivious DNS-over-HTTPS, reducing end-user CAPTCHA annoyances with Privacy Pass and Cryptographic Attestation of Personhood (CAP), performing Internet-wide measurements to understand how things work in the real world, and much, much more.

Above all else, these projects are meant to do one thing: focus beyond the horizon to help build a better Internet. We do that by developing, advocating, and advancing open standards for the many protocols in use on the Internet, all backed by implementation, experimentation, and analysis.

Standards

The Internet is a network of interconnected autonomous networks. Computers attached to these networks have to be able to route messages to each other. However, even if we can send messages back and forth across the Internet, much like the storied Tower of Babel, to achieve anything those computers have to use a common language, a lingua franca, so to speak. And for the Internet, standards are that common language.

Many of the parts of the Internet that Cloudflare is interested in are standardized by the IETF, which is a standards development organization responsible for producing technical specifications for the Internet's most important protocols, including IP, BGP, DNS, TCP, TLS, QUIC, HTTP, and so on. The IETF's mission is:

> to make the Internet work better by producing high-quality, relevant technical documents that influence the way people design, use, and manage the Internet.

Our individual contributions to the IETF help further this mission, especially given our role on the Internet. We can only do so much on our own to improve the end-user experience. So, through standards, we engage with those who use, manage, and operate the Internet to achieve three simple goals that lead to a better Internet:

1. Incrementally improve existing and deployed protocols with innovative solutions;

2. Provide holistic solutions to long-standing architectural problems and enable new use cases; and

3. Identify key problems and help specify reusable, extensible, easy-to-implement abstractions for solving them.

Below, we’ll give an example of how we helped achieve each goal, touching on a number of important technical specifications produced in recent years, including DNS-over-HTTPS, QUIC, and (the still work-in-progress) TLS Encrypted Client Hello.

Incremental innovation: metadata privacy with DoH and ECH

The Internet is not only complicated — it is leaky. Metadata seeps like toxic waste from nearly every protocol in use, from DNS to TLS, and even to HTTP at the application layer.

One critically important piece of metadata that still leaks today is the name of the server that clients connect to. When a client opens a connection to a server, it reveals the name and identity of that server in many places, including DNS, TLS, and even sometimes at the IP layer (if the destination IP address is unique to that server). Linking client identity (IP address) to target server names enables third parties to build a profile of per-user behavior without end-user consent. The result is a set of protocols that does not respect end-user privacy.

Fortunately, it’s possible to incrementally address this problem without regressing security. For years, Cloudflare has been working with the standards community to plug all of these individual leaks through separate specialized protocols:

DNS-over-HTTPS encrypts DNS queries between clients and recursive resolvers, ensuring only clients and trusted recursive resolvers see plaintext DNS traffic.
TLS Encrypted Client Hello encrypts metadata in the TLS handshake, ensuring only the client and authoritative TLS server see sensitive TLS information.

These protocols impose a barrier between the client and server and everyone else. However, neither of them prevent the server from building per-user profiles. Servers can track users via one critically important piece of information: the client IP address. Fortunately, for the overwhelming majority of cases, the IP address is not essential for providing a service. For example, DNS recursive resolvers do not need the full client IP address to provide accurate answers, as is evidenced by the EDNS(0) Client Subnet extension. To further reduce information exposure on the web, we helped push further with two more incremental improvements:

Oblivious DNS-over-HTTPS (ODoH) uses cryptography and network proxies to break linkability between client identity (IP address) and DNS traffic, ensuring that recursive resolvers have only the minimal amount of information to provide DNS answers -- the queries themselves, without any per-client information.
MASQUE is standardizing techniques for proxying UDP and IP protocols over QUIC connections, similar to the existing HTTP CONNECT method for TCP-based protocols. Generally, the CONNECT method allows clients to use services without revealing any client identity (IP address).

While each of these protocols may seem only an incremental improvement over what we have today, together, they raise many possibilities for the future of the Internet. Are DoH and ECH sufficient for end-user privacy, or are technologies like ODoH and MASQUE necessary? How do proxy technologies like MASQUE complement or even subsume protocols like ODoH and ECH? These are questions the Cloudflare Research team strives to answer through experimentation, analysis, and deployment together with other stakeholders on the Internet through the IETF. And we could not ask the questions without first laying the groundwork.

Architectural advancement: QUIC and HTTP/3

QUIC and HTTP/3 are transformative technologies. Whilst the TLS handshake forms the heart of QUIC’s security model, QUIC is an improvement beyond TLS over TCP, in many respects, including more encryption (privacy), better protection against active attacks and ossification at the network layer, fewer round trips to establish a secure connection, and generally better security properties. QUIC and HTTP/3 give us a clean slate for future innovation.

Perhaps one of QUIC’s most important contributions is that it challenges and even breaks many established conventions and norms used on the Internet. For example, the antiquated socket API for networking, which treats the network connection as an in-order bit pipe is no longer appropriate for modern applications and developers. Modern networking APIs such as Apple’s Network.framework provide high-level interfaces that take advantage of the new transport features provided by QUIC. Applications using this or even higher-level HTTP abstractions can take advantage of the many security, privacy, and performance improvements of QUIC and HTTP/3 today with minimal code changes, and without being constrained by sockets and their inherent limitations.

Another salient feature of QUIC is its wire format. Nearly every bit of every QUIC packet is encrypted and authenticated between sender and receiver. And within a QUIC packet, individual frames can be rearranged, repackaged, and otherwise transformed by the sender.

Together, these are powerful tools to help mitigate future network ossification and enable continued extensibility. (TLS’s wire format ultimately led to the middlebox compatibility mode for TLS 1.3 due to the many middlebox ossification problems that were encountered during early deployment tests.)

Exercising these features of QUIC is important for the long-term health of the protocol and applications built on top of it. Indeed, this sort of extensibility is what enables innovation.

In fact, we've already seen a flurry of new work based on QUIC: extensions to enable multipath QUIC, different congestion control approaches, and ways to carry data unreliably in the DATAGRAM frame.

Beyond functional extensions, we’ve also seen a number of new use cases emerge as a result of QUIC. DNS-over-QUIC is an upcoming proposal that complements DNS-over-TLS for recursive to authoritative DNS query protection. As mentioned above, MASQUE is a working group focused on standardizing methods for proxying arbitrary UDP and IP protocols over QUIC connections, enabling a number of fascinating solutions and unlocking the future of proxy and VPN technologies. In the context of the web, the WebTransport working group is standardizing methods to use QUIC as a “supercharged WebSocket” for transporting data efficiently between client and server while also depending on the WebPKI for security.

By definition, these extensions are nowhere near complete. The future of the Internet with QUIC is sure to be a fascinating adventure.

Specifying abstractions: Cryptographic algorithms and protocol design

Standards allow us to build abstractions. An ideal standard is one that is usable in many contexts and contains all the information a sufficiently skilled engineer needs to build a compliant implementation that successfully interoperates with other independent implementations. Writing a new standard is sort of like creating a new Lego brick. Creating a new Lego brick allows us to build things that we couldn’t have built before. For example, one new “brick” that’s nearly finished (as of this writing) is Hybrid Public Key Encryption (HPKE). HPKE allows us to efficiently encrypt arbitrary plaintexts under the recipient’s public key.

Mixing asymmetric and symmetric cryptography for efficiency is a common technique that has been used for many years in all sorts of protocols, from TLS to PGP. However, each of these applications has come up with their own design, each with its own security properties. HPKE is intended to be a single, standard, interoperable version of this technique that turns this complex and technical corner of protocol design into an easy-to-use black box. The standard has undergone extensive analysis by cryptographers throughout its development and has numerous implementations available. The end result is a simple abstraction that protocol designers can include without having to consider how it works under-the-hood. In fact, HPKE is already a dependency for a number of other draft protocols in the IETF, such as TLS Encrypted Client Hello, Oblivious DNS-over-HTTPS, and Message Layer Security.

Modes of Interaction

We engage with the IETF in the specification, implementation, experimentation, and analysis phases of a standard to help achieve our three goals of incremental innovation, architectural advancement, and production of simple abstractions.

Our participation in the standards process hits all four phases. Individuals in Cloudflare bring a diversity of knowledge and domain expertise to each phase, especially in the production of technical specifications. Today, we also published a blog post about an upcoming standard that we’ve been working on for a number of years and will be sharing details about how we used formal analysis to make sure that we ruled out as many security issues in the design as possible. We work in close collaboration with people from all around the world as an investment in the future of the Internet. Open standards mean that everyone can take advantage of the latest and greatest in protocol design, whether they use Cloudflare or not.

Cloudflare’s scale and perspective on the Internet are essential to the standards process. We have experience rapidly implementing, deploying, and experimenting with emerging technologies to gain confidence in their maturity. We also have a proven track record of publishing the results of these experiments to help inform the standards process. Moreover, we open source as much of the code we use for these experiments as possible to enable reproducibility and transparency. Our unique collection of engineering expertise and wide perspective allows us to help build standards that work in a wide variety of use cases. By investing time in developing standards that everyone can benefit from, we can make a clear contribution to building a better Internet.

One final contribution we make to the IETF is more procedural and based around building consensus in the community. A challenge to any open process is gathering consensus to make forward progress and avoiding deadlock. We help build consensus through the production of running code, leadership on technical documents such as QUIC and ECH, and even logistically by chairing working groups. (Working groups at the IETF are chaired by volunteers, and Cloudflare numbers a few working group chairs amongst its employees, covering a broad spectrum of the IETF (and its related research-oriented group, the IRTF) from security and privacy to transport and applications.) Collaboration is a cornerstone of the standards process and a hallmark of Cloudflare Research, and we apply it most prominently in the standards process.

If you too want to help build a better Internet, check out some IETF Working Groups and mailing lists. All you need to start contributing is an Internet connection and an email address, so why not give it a go? And if you want to join us on our mission to help build a better Internet through open and interoperable standards, check out our open positions, visiting researcher program, and many internship opportunities!

The Cloudflare Blog

Moving past bots vs. humans

The Web we had

The client-server model

Bot management today

A digression: the rate limit trilemma

The important distinctions are what, not who

Platforms and services that want to be identifiable

Distributed traffic that needs anonymity

Anonymous credentials for the Web

The trajectory if we do nothing

Anonymous authentication brings some risk, too

Why we should build it anyway

To decide, use this guardrail

A new balance

We deserve a better streams API for JavaScript

Where we're coming from

Excessive ceremony for common operations

The locking problem

BYOB: complexity without payoff

Backpressure: good in theory, broken in practice

The hidden cost of promises

Real-world failures

Exhausting resources with unconsumed bodies

Falling headlong off the tee() memory cliff

Transform backpressure gaps

GC thrashing in server-side rendering

The optimization treadmill

The compliance burden

The takeaway

A better streams API is possible

What is a stream?

Design principles

Streams are iterables.

Pull-through transforms

Explicit backpressure

Batched chunks

Bytes only

Synchronous fast paths matter

The new API in action

Creating and consuming streams

Pull-through transforms

Explicit backpressure policies

Explicit Multi-consumer patterns

Sync/async separation

Bridging the gap between this and web streams

How this addresses the real-world failures from earlier

Performance

What's next

Try it yourself

What came first: the CNAME or the A record?

Timeline

What happened?

How DNS CNAME chains work

The logic change

Why this caused impact

Not all implementations break

What the RFC says

The subtle distinction: RRsets vs RRs in message sections

CNAME chain ordering

What should resolvers do?

The DNSSEC specifications provide contrast

Do CNAME records come first?

MoQ: Refactoring the Internet's real-time media stack

An evolutionary ladder of compromise

Introducing MoQ

A deep dive into the MoQ protocol stack

The core innovation: Publish/subscribe for media

How MoQ organizes media: The data model

The network architecture: From publisher to subscriber

The MoQ Stack

End-to-End Data Flow

An Alternative Flow: The PUBLISH Model

Advanced capabilities: Prioritization and congestion control

Implementation: building the Cloudflare MoQ relay

An Evolving Specification

Kick the tires on the future

Interoperability

The Road Ahead

Availability and pricing

An Alternative Flow: The `PUBLISH` Model