
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Tue, 14 Apr 2026 19:07:43 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Introducing Foundations - our open source Rust service foundation library]]></title>
            <link>https://blog.cloudflare.com/introducing-foundations-our-open-source-rust-service-foundation-library/</link>
            <pubDate>Wed, 24 Jan 2024 14:00:17 GMT</pubDate>
            <description><![CDATA[ Foundations is a foundational Rust library, designed to help scale programs for distributed, production-grade systems ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2yQdeNHftkPvZAq7tcEYN9/eaf572b5329ea631b4df66b36e14e538/image1-4.png" />
            
            </figure><p>In this blog post, we're excited to present Foundations, our foundational library for Rust services, now released as <a href="https://github.com/cloudflare/foundations">open source on GitHub</a>. Foundations is a foundational Rust library, designed to help scale programs for distributed, production-grade systems. It enables engineers to concentrate on the core business logic of their services, rather than the intricacies of production operation setups.</p><p>Originally developed as part of our <a href="/introducing-oxy/">Oxy proxy framework</a>, Foundations has evolved to serve a wider range of applications. For those interested in exploring its technical capabilities, we recommend consulting the library’s <a href="https://docs.rs/foundations/latest/foundations/">API documentation</a>. Additionally, this post will cover the motivations behind Foundations' creation and provide a concise summary of its key features. Stay with us to learn more about how Foundations can support your Rust projects.</p>
    <div>
      <h2>What is Foundations?</h2>
      <a href="#what-is-foundations">
        
      </a>
    </div>
    <p>In software development, seemingly minor tasks can become complex when scaled up. This complexity is particularly evident when comparing the deployment of services on server hardware globally to running a program on a personal laptop.</p><p>The key question is: what fundamentally changes when transitioning from a simple laptop-based prototype to a full-fledged service in a production environment? Through our experience in developing numerous services, we've identified several critical differences:</p><ul><li><p><b>Observability</b>: locally, developers have access to various tools for monitoring and debugging. However, these tools are not as accessible or practical when dealing with thousands of software instances running on remote servers.</p></li><li><p><b>Configuration</b>: local prototypes often use basic, sometimes hardcoded, configurations. This approach is impractical in production, where changes require a more flexible and dynamic configuration system. Hardcoded settings are cumbersome, and command-line options, while common, don't always suit complex hierarchical configurations or align with the "Configuration as Code" paradigm.</p></li><li><p><b>Security</b>: services in production face a myriad of security challenges, exposed to diverse threats from external sources. Basic security hardening becomes a necessity.</p></li></ul><p>Addressing these distinctions, Foundations emerges as a comprehensive library, offering solutions to these challenges. Derived from our Oxy proxy framework, Foundations brings the tried-and-tested functionality of Oxy to a broader range of Rust-based applications at Cloudflare.</p><p>Foundations was developed with these guiding principles:</p><ul><li><p><b>High modularity</b>: recognizing that many services predate Foundations, we designed it to be modular. Teams can adopt individual components at their own pace, facilitating a smooth transition.</p></li><li><p><b>API ergonomics</b>: a top priority for us is user-friendly library interaction. Foundations leverages Rust's procedural macros to offer an intuitive, well-documented API, aiming for minimal friction in usage.</p></li><li><p><b>Simplified setup and configuration</b>: our goal is for engineers to spend minimal time on setup. Foundations is designed to be 'plug and play', with essential functions working immediately and adjustable settings for fine-tuning. We understand that this focus on ease of setup over extreme flexibility might be debatable, as it implies a trade-off. Unlike other libraries that cater to a wide range of environments with potentially verbose setup requirements, Foundations is tailored for specific, production-tested environments and workflows. This doesn't restrict Foundations’ adaptability to other settings, but we approach this with compile-time features to manage setup workflows, rather than a complex setup API.</p></li></ul><p>Next, let's delve into the components Foundations offers. To better illustrate the functionality that Foundations provides we will refer to the <a href="https://github.com/cloudflare/foundations/tree/main/examples/http_server">example web server</a> from Foundations’ source code repository.</p>
    <div>
      <h3>Telemetry</h3>
      <a href="#telemetry">
        
      </a>
    </div>
    <p>In any production system, <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability</a>, which we refer to as telemetry, plays an essential role. Generally, three primary types of telemetry are adequate for most service needs:</p><ul><li><p><b>Logging</b>: this involves recording arbitrary textual information, which can be enhanced with tags or structured fields. It's particularly useful for documenting operational errors that aren't critical to the service.</p></li><li><p><b>Tracing</b>: this method offers a detailed timing breakdown of various service components. It's invaluable for identifying performance bottlenecks and investigating issues related to timing.</p></li><li><p><b>Metrics</b>: these are quantitative data points about the service, crucial for monitoring the overall health and performance of the system.</p></li></ul><p>Foundations integrates an API that encompasses all these telemetry aspects, consolidating them into a unified package for ease of use.</p>
    <div>
      <h3>Tracing</h3>
      <a href="#tracing">
        
      </a>
    </div>
    <p>Foundations’ tracing API shares similarities with <a href="https://github.com/tokio-rs/tracing">tokio/tracing</a>, employing a comparable approach with implicit context propagation, instrumentation macros, and futures wrapping:</p>
            <pre><code>#[tracing::span_fn("respond to request")]
async fn respond(
    endpoint_name: Arc&lt;String&gt;,
    req: Request&lt;Body&gt;,
    routes: Arc&lt;Map&lt;String, ResponseSettings&gt;&gt;,
) -&gt; Result&lt;Response&lt;Body&gt;, Infallible&gt; {
    …
}</code></pre>
            <p>Refer to the <a href="https://github.com/cloudflare/foundations/blob/347548000cab0ac549f8f23e2a0ce9e1147b7640/examples/http_server/main.rs#L154">example web server</a> and <a href="https://docs.rs/foundations/latest/foundations/telemetry/tracing/index.html">documentation</a> for more comprehensive examples.</p><p>However, Foundations distinguishes itself in a few key ways:</p><ul><li><p><b>Simplified API</b>: we've streamlined the setup process for tracing, aiming for a more minimalistic approach compared to tokio/tracing.</p></li><li><p><b>Enhanced trace sampling flexibility</b>: Foundations allows for selective override of the sampling ratio in specific code branches. This feature is particularly useful for detailed performance bug investigations, enabling a balance between global trace sampling for overall <a href="https://www.cloudflare.com/application-services/solutions/app-performance-monitoring/">performance monitoring</a> and targeted sampling for specific accounts, connections, or requests.</p></li><li><p><b>Distributed trace stitching</b>: our API supports the integration of trace data from multiple services, contributing to a comprehensive view of the entire pipeline. This functionality includes fine-tuned control over sampling ratios, allowing upstream services to dictate the sampling of specific traffic flows in downstream services.</p></li><li><p><b>Trace forking capability</b>: addressing the challenge of long-lasting connections with numerous multiplexed requests, Foundations introduces trace forking. This feature enables each request within a connection to have its own trace, linked to the parent connection trace. This method significantly simplifies the analysis and improves performance, particularly for connections handling thousands of requests.</p></li></ul><p>We regard telemetry as a vital component of our software, not merely an optional add-on. As such, we believe in rigorous testing of this feature, considering it our primary tool for monitoring software operations. Consequently, Foundations includes an API and user-friendly macros to facilitate the collection and analysis of tracing data within tests, presenting it in a format conducive to assertions.</p>
    <div>
      <h3>Logging</h3>
      <a href="#logging">
        
      </a>
    </div>
    <p>Foundations’ logging API shares its foundation with tokio/tracing and <a href="https://github.com/slog-rs/slog">slog</a>, but introduces several notable enhancements.</p><p>During our work on various services, we recognized the hierarchical nature of logging contextual information. For instance, in a scenario involving a connection, we might want to tag each log record with the connection ID and HTTP protocol version. Additionally, for requests served over this connection, it would be useful to attach the request URL to each log record, while still including connection-specific information.</p><p>Typically, achieving this would involve creating a new logger for each request, copying tags from the connection’s logger, and then manually passing this new logger throughout the relevant code. This method, however, is cumbersome, requiring explicit handling and storage of the logger object.</p><p>To streamline this process and prevent telemetry from obstructing business logic, we adopted a technique similar to tokio/tracing's approach for tracing, applying it to logging. This method relies on future instrumentation machinery (<a href="https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code">tracing-rs documentation</a> has a good explanation of the concept), allowing for implicit passing of the current logger. This enables us to "fork" logs for each request and use this forked log seamlessly within the current code scope, automatically propagating it down the call stack, including through asynchronous function calls:</p>
            <pre><code> let conn_tele_ctx = TelemetryContext::current();

 let on_request = service_fn({
        let endpoint_name = Arc::clone(&amp;endpoint_name);

        move |req| {
            let routes = Arc::clone(&amp;routes);
            let endpoint_name = Arc::clone(&amp;endpoint_name);

            // Each request gets independent log inherited from the connection log and separate
            // trace linked to the connection trace.
            conn_tele_ctx
                .with_forked_log()
                .with_forked_trace("request")
                .apply(async move { respond(endpoint_name, req, routes).await })
        }
});</code></pre>
            <p>Refer to <a href="https://github.com/cloudflare/foundations/blob/347548000cab0ac549f8f23e2a0ce9e1147b7640/examples/http_server/main.rs#L155-L198">example web server</a> and <a href="https://docs.rs/foundations/latest/foundations/telemetry/log/index.html">documentation</a> for more comprehensive examples.</p><p>In an effort to simplify the user experience, we merged all APIs related to context management into a single, implicitly available in each code scope, TelemetryContext object. This integration not only simplifies the process but also lays the groundwork for future advanced features. These features could blend tracing and logging information into a cohesive narrative by cross-referencing each other.</p><p>Like tracing, Foundations also offers a user-friendly API for testing service’s logging.</p>
    <div>
      <h3>Metrics</h3>
      <a href="#metrics">
        
      </a>
    </div>
    <p>Foundations incorporates the official <a href="https://github.com/prometheus/client_rust">Prometheus Rust client library</a> for its metrics functionality, with a few enhancements for ease of use. One key addition is a procedural macro provided by Foundations, which simplifies the definition of new metrics with typed labels, reducing boilerplate code:</p>
            <pre><code>use foundations::telemetry::metrics::{metrics, Counter, Gauge};
use std::sync::Arc;

#[metrics]
pub(crate) mod http_server {
    /// Number of active client connections.
    pub fn active_connections(endpoint_name: &amp;Arc&lt;String&gt;) -&gt; Gauge;

    /// Number of failed client connections.
    pub fn failed_connections_total(endpoint_name: &amp;Arc&lt;String&gt;) -&gt; Counter;

    /// Number of HTTP requests.
    pub fn requests_total(endpoint_name: &amp;Arc&lt;String&gt;) -&gt; Counter;

    /// Number of failed requests.
    pub fn requests_failed_total(endpoint_name: &amp;Arc&lt;String&gt;, status_code: u16) -&gt; Counter;
}</code></pre>
            <p>Refer to the <a href="https://github.com/cloudflare/foundations/blob/347548000cab0ac549f8f23e2a0ce9e1147b7640/examples/http_server/metrics.rs">example web server</a> and <a href="https://docs.rs/foundations/latest/foundations/telemetry/metrics/index.html">documentation</a> for more information of how metrics can be defined and used.</p><p>In addition to this, we have refined the approach to metrics collection and structuring. Foundations offers a streamlined, user-friendly API for both these tasks, focusing on simplicity and minimalism.</p>
    <div>
      <h3>Memory profiling</h3>
      <a href="#memory-profiling">
        
      </a>
    </div>
    <p>Recognizing the <a href="https://mjeanroy.dev/2021/04/19/Java-in-K8s-how-weve-reduced-memory-usage-without-changing-any-code.html">efficiency</a> of <a href="https://jemalloc.net/">jemalloc</a> for long-lived services, Foundations includes a feature for enabling jemalloc memory allocation. A notable aspect of jemalloc is its memory profiling capability. Foundations packages this functionality into a straightforward and safe Rust API, making it accessible and easy to integrate.</p>
    <div>
      <h3>Telemetry server</h3>
      <a href="#telemetry-server">
        
      </a>
    </div>
    <p>Foundations comes equipped with a built-in, customizable telemetry server endpoint. This server automatically handles a range of functions including health checks, metric collection, and memory profiling requests.</p>
    <div>
      <h2>Security</h2>
      <a href="#security">
        
      </a>
    </div>
    <p>A vital component of Foundations is its robust and ergonomic API for <a href="https://en.wikipedia.org/wiki/Seccomp">seccomp</a>, a Linux kernel feature for syscall sandboxing. This feature enables the setting up of hooks for syscalls used by an application, allowing actions like blocking or logging. Seccomp acts as a formidable line of defense, offering an additional layer of security against threats like arbitrary code execution.</p><p>Foundations provides a simple way to define lists of all allowed syscalls, also allowing a composition of multiple lists (in addition, Foundations ships predefined lists for common use cases):</p>
            <pre><code>  use foundations::security::common_syscall_allow_lists::{ASYNC, NET_SOCKET_API, SERVICE_BASICS};
    use foundations::security::{allow_list, enable_syscall_sandboxing, ViolationAction};

    allow_list! {
        static ALLOWED = [
            ..SERVICE_BASICS,
            ..ASYNC,
            ..NET_SOCKET_API
        ]
    }

    enable_syscall_sandboxing(ViolationAction::KillProcess, &amp;ALLOWED)
 </code></pre>
            <p>Refer to the <a href="https://github.com/cloudflare/foundations/blob/347548000cab0ac549f8f23e2a0ce9e1147b7640/examples/http_server/main.rs#L239-L254">web server example</a> and <a href="https://docs.rs/foundations/latest/foundations/security/index.html">documentation</a> for more comprehensive examples of this functionality.</p>
    <div>
      <h2>Settings and CLI</h2>
      <a href="#settings-and-cli">
        
      </a>
    </div>
    <p>Foundations simplifies the management of service settings and command-line argument parsing. Services built on Foundations typically use YAML files for configuration. We advocate for a design where every service comes with a default configuration that's functional right off the bat. This philosophy is embedded in Foundations’ settings functionality.</p><p>In practice, applications define their settings and defaults using Rust structures and enums. Foundations then transforms Rust documentation comments into configuration annotations. This integration allows the CLI interface to generate a default, fully annotated YAML configuration files. As a result, service users can quickly and easily understand the service settings:</p>
            <pre><code>use foundations::settings::collections::Map;
use foundations::settings::net::SocketAddr;
use foundations::settings::settings;
use foundations::telemetry::settings::TelemetrySettings;

#[settings]
pub(crate) struct HttpServerSettings {
    /// Telemetry settings.
    pub(crate) telemetry: TelemetrySettings,
    /// HTTP endpoints configuration.
    #[serde(default = "HttpServerSettings::default_endpoints")]
    pub(crate) endpoints: Map&lt;String, EndpointSettings&gt;,
}

impl HttpServerSettings {
    fn default_endpoints() -&gt; Map&lt;String, EndpointSettings&gt; {
        let mut endpoint = EndpointSettings::default();

        endpoint.routes.insert(
            "/hello".into(),
            ResponseSettings {
                status_code: 200,
                response: "World".into(),
            },
        );

        endpoint.routes.insert(
            "/foo".into(),
            ResponseSettings {
                status_code: 403,
                response: "bar".into(),
            },
        );

        [("Example endpoint".into(), endpoint)]
            .into_iter()
            .collect()
    }
}

#[settings]
pub(crate) struct EndpointSettings {
    /// Address of the endpoint.
    pub(crate) addr: SocketAddr,
    /// Endoint's URL path routes.
    pub(crate) routes: Map&lt;String, ResponseSettings&gt;,
}

#[settings]
pub(crate) struct ResponseSettings {
    /// Status code of the route's response.
    pub(crate) status_code: u16,
    /// Content of the route's response.
    pub(crate) response: String,
}</code></pre>
            <p>The settings definition above automatically generates the following default configuration YAML file:</p>
            <pre><code>---
# Telemetry settings.
telemetry:
  # Distributed tracing settings
  tracing:
    # Enables tracing.
    enabled: true
    # The address of the Jaeger Thrift (UDP) agent.
    jaeger_tracing_server_addr: "127.0.0.1:6831"
    # Overrides the bind address for the reporter API.
    # By default, the reporter API is only exposed on the loopback
    # interface. This won't work in environments where the
    # Jaeger agent is on another host (for example, Docker).
    # Must have the same address family as `jaeger_tracing_server_addr`.
    jaeger_reporter_bind_addr: ~
    # Sampling ratio.
    #
    # This can be any fractional value between `0.0` and `1.0`.
    # Where `1.0` means "sample everything", and `0.0` means "don't sample anything".
    sampling_ratio: 1.0
  # Logging settings.
  logging:
    # Specifies log output.
    output: terminal
    # The format to use for log messages.
    format: text
    # Set the logging verbosity level.
    verbosity: INFO
    # A list of field keys to redact when emitting logs.
    #
    # This might be useful to hide certain fields in production logs as they may
    # contain sensitive information, but allow them in testing environment.
    redact_keys: []
  # Metrics settings.
  metrics:
    # How the metrics service identifier defined in `ServiceInfo` is used
    # for this service.
    service_name_format: metric_prefix
    # Whether to report optional metrics in the telemetry server.
    report_optional: false
  # Server settings.
  server:
    # Enables telemetry server
    enabled: true
    # Telemetry server address.
    addr: "127.0.0.1:0"
# HTTP endpoints configuration.
endpoints:
  Example endpoint:
    # Address of the endpoint.
    addr: "127.0.0.1:0"
    # Endoint's URL path routes.
    routes:
      /hello:
        # Status code of the route's response.
        status_code: 200
        # Content of the route's response.
        response: World
      /foo:
        # Status code of the route's response.
        status_code: 403
        # Content of the route's response.
        response: bar</code></pre>
            <p>Refer to the <a href="https://github.com/cloudflare/foundations/blob/347548000cab0ac549f8f23e2a0ce9e1147b7640/examples/http_server/settings.rs">example web server</a> and documentation for <a href="https://docs.rs/foundations/latest/foundations/settings/index.html">settings</a> and <a href="https://docs.rs/foundations/latest/foundations/cli/index.html">CLI API</a> for more comprehensive examples of how settings can be defined and used with Foundations-provided CLI API.</p>
    <div>
      <h2>Wrapping Up</h2>
      <a href="#wrapping-up">
        
      </a>
    </div>
    <p>At Cloudflare, we greatly value the contributions of the open source community and are eager to reciprocate by sharing our work. Foundations has been instrumental in reducing our development friction, and we hope it can do the same for others. We welcome external contributions to Foundations, aiming to integrate diverse experiences into the project for the benefit of all.</p><p>If you're interested in working on projects like Foundations, consider joining our team — <a href="https://www.cloudflare.com/en-gb/careers/">we're hiring</a>!</p> ]]></content:encoded>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[Observability]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Oxy]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">5R4Wv17SBVwevN7lzcL2GW</guid>
            <dc:creator>Ivan Nikulin</dc:creator>
        </item>
        <item>
            <title><![CDATA[Oxy is Cloudflare's Rust-based next generation proxy framework]]></title>
            <link>https://blog.cloudflare.com/introducing-oxy/</link>
            <pubDate>Thu, 02 Mar 2023 15:05:00 GMT</pubDate>
            <description><![CDATA[ In this blog post, we are proud to introduce Oxy - our modern proxy framework, developed using the Rust programming language ]]></description>
            <content:encoded><![CDATA[ <p></p><p>In this blog post, we are proud to introduce Oxy - our modern proxy framework, developed using the Rust programming language. Oxy is a foundation of several Cloudflare projects, including the <a href="https://www.cloudflare.com/products/zero-trust/gateway/">Zero Trust Gateway</a>, the iCloud Private Relay <a href="/icloud-private-relay/">second hop proxy</a>, and the internal <a href="/cloudflare-servers-dont-own-ips-anymore/">egress routing service</a>.</p><p>Oxy leverages our years of experience building high-load proxies to implement the latest communication protocols, enabling us to effortlessly build sophisticated services that can accommodate massive amounts of daily traffic.</p><p>We will be exploring Oxy in greater detail in upcoming technical blog posts, providing a comprehensive and in-depth look at its capabilities and potential applications. For now, let us embark on this journey and discover what Oxy is and how we built it.</p>
    <div>
      <h2>What Oxy does</h2>
      <a href="#what-oxy-does">
        
      </a>
    </div>
    <p>We refer to Oxy as our "next-generation proxy framework". But what do we really mean by “proxy framework”? Picture a server (like NGINX, that reader might be familiar with) that can proxy traffic with an array of protocols, including various predefined common traffic flow scenarios that enable you to route traffic to specific destinations or even egress with a different protocol than the one used for ingress. This server can be configured in many ways for specific flows and boasts tight integration with the surrounding infrastructure, whether telemetry consumers or networking services.</p><p>Now, take all of that and add in the ability to programmatically control every aspect of the proxying: protocol decapsulation, traffic analysis, routing, tunneling logic, DNS resolution, and so much more. And this is what Oxy proxy framework is: a feature-rich proxy server tightly integrated with our internal infrastructure that's customizable to meet application requirements, allowing engineers to tweak every component.</p><p>This design is in line with our belief in an iterative approach to development, where a basic solution is built first and then gradually improved over time. With Oxy, you can start with a basic solution that can be deployed to our servers and then add additional features as needed, taking advantage of the many extensibility points offered by Oxy. In fact, you can avoid writing any code, besides a few lines of bootstrap boilerplate and get a production-ready server with a wide variety of startup configuration options and traffic flow scenarios.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5nk7Ri6viC85BdWoRSiB9v/f40a9971fdad71cb07ee0b3aebf99fd9/image3-2.png" />
            
            </figure><p><i>High-level Oxy architecture</i></p><p>For example, suppose you'd like to implement an HTTP firewall. With Oxy, you can proxy HTTP(S) requests right out of the box, eliminating the need to write any code related to production services, such as request metrics and logs. You simply need to implement an Oxy hook handler for HTTP requests and responses. If you've used <a href="https://developers.cloudflare.com/workers/examples/respond-with-another-site/">Cloudflare Workers</a> before, then you should be familiar with this extensibility model.</p><p>Similarly, you can implement a <a href="https://en.wikipedia.org/wiki/OSI_model">layer 4</a> firewall by providing application hooks that handle ingress and egress connections. This goes beyond a simple block/accept scenario, as you can build authentication functionality or a traffic router that sends traffic to different destinations based on the geographical information of the ingress connection. The capabilities are incredibly rich, and we've made the extensibility model as ergonomic and flexible as possible. As an example, if information obtained from layer 4 is insufficient to make an informed firewall decision, the app can simply ask Oxy to decapsulate the traffic and process it with HTTP firewall.</p><p>The aforementioned scenarios are prevalent in many products we build at Cloudflare, so having a foundation that incorporates ready solutions is incredibly useful. This foundation has absorbed lots of experience we've gained over the years, taking care of many sharp and dark corners of high-load service programming. As a result, application implementers can stay focused on the business logic of their application with Oxy taking care of the rest. In fact, we've been able to create a few privacy proxy applications using Oxy that now serve massive amounts of traffic in production with less than a couple of hundred lines of code. This is something that would have taken multiple orders of magnitude more time and lines of code before.</p><p>As previously mentioned, we'll dive deeper into the technical aspects in future blog posts. However, for now, we'd like to provide a brief overview of Oxy's capabilities. This will give you a glimpse of the many ways in which Oxy can be customized and used.</p>
    <div>
      <h3>On-ramps</h3>
      <a href="#on-ramps">
        
      </a>
    </div>
    <p>On-ramp defines a combination of transport layer socket type and protocols that server listeners can use for ingress traffic.</p><p>Oxy supports a wide variety of traffic on-ramps:</p><ul><li><p>HTTP 1/2/3 (including various CONNECT protocols for layer 3 and 4 traffic)</p></li><li><p>TCP and UDP traffic over Proxy Protocol</p></li><li><p>general purpose IP traffic, including ICMP</p></li></ul><p>With Oxy, you have the ability to analyze and manipulate traffic at multiple layers of the OSI model - from layer 3 to layer 7. This allows for a wide range of possibilities in terms of how you handle incoming traffic.</p><p>One of the most notable and powerful features of Oxy is the ability for applications to force decapsulation. This means that an application can analyze traffic at a higher level, even if it originally arrived at a lower level. For example, if an application receives IP traffic, it can choose to analyze the UDP traffic encapsulated within the IP packets. With just a few lines of code, the application can tell Oxy to upgrade the IP flow to a UDP tunnel, effectively allowing the same code to be used for different on-ramps.</p><p>The application can even go further and ask Oxy to sniff UDP packets and check if they contain <a href="https://www.cloudflare.com/learning/performance/what-is-http3/">HTTP/3 traffic</a>. In this case, Oxy can upgrade the UDP traffic to HTTP and handle HTTP/3 requests that were originally received as raw IP packets. This allows for the simultaneous processing of traffic at all three layers (L3, L4, L7), enabling applications to analyze, filter, and manipulate the traffic flow from multiple perspectives. This provides a robust toolset for developing advanced traffic processing applications.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4tVlLbQeNVeN2lYN9ovJNH/d87cc5adb53ff0fc441530520540f781/image1-1.png" />
            
            </figure><p><i>Multi-layer traffic processing in Oxy applications</i></p>
    <div>
      <h3>Off-ramps</h3>
      <a href="#off-ramps">
        
      </a>
    </div>
    <p>Off-ramp defines a combination of transport layer socket type and protocols that proxy server connectors can use for egress traffic.</p><p>Oxy offers versatility in its egress methods, supporting a range of protocols including HTTP 1 and 2, UDP, TCP, and IP. It is equipped with internal DNS resolution and caching, as well as customizable resolvers, with automatic fallback options for maximum system reliability. Oxy implements <a href="https://www.rfc-editor.org/rfc/rfc8305">happy eyeballs</a> for TCP, advanced tunnel timeout logic and has the ability to route traffic to internal services with accompanying metadata.</p><p>Additionally, through collaboration with one of our internal services (which is an Oxy application itself!) <a href="/geoexit-improving-warp-user-experience-larger-network/">Oxy is able to offer geographical egress</a> — allowing applications to route traffic to the public Internet from various locations in our extensive network covering numerous cities worldwide. This complex and powerful feature can be easily utilized by Oxy application developers at no extra cost, simply by adjusting configuration settings.</p>
    <div>
      <h3>Tunneling and request handling</h3>
      <a href="#tunneling-and-request-handling">
        
      </a>
    </div>
    <p>We've discussed Oxy's communication capabilities with the outside world through on-ramps and off-ramps. In the middle, Oxy handles efficient stateful tunneling of various traffic types including TCP, UDP, QUIC, and IP, while giving applications full control over traffic blocking and redirection.</p><p>Additionally, Oxy effectively handles HTTP traffic, providing full control over requests and responses, and allowing it to serve as a direct HTTP or API service. With built-in tools for streaming analysis of HTTP bodies, Oxy makes it easy to extract and process data, such as form data from uploads and downloads.</p><p>In addition to its multi-layer traffic processing capabilities, Oxy also supports advanced HTTP tunneling methods, such as <a href="https://datatracker.ietf.org/doc/html/rfc9298">CONNECT-UDP</a> and <a href="https://datatracker.ietf.org/doc/draft-ietf-masque-connect-ip/">CONNECT-IP</a>, using the latest extensions to HTTP 3 and 2 protocols. It can even process HTTP CONNECT request payloads on layer 4 and recursively process the payload as HTTP if the encapsulated traffic is HTTP.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4a80AwmzUmUyxx7q8j2hcK/c2bcd1903e037852e57186510f6bac58/image2-2.png" />
            
            </figure><p><i>Recursive processing of HTTP CONNECT body payload in HTTP pipeline</i></p>
    <div>
      <h3>TLS</h3>
      <a href="#tls">
        
      </a>
    </div>
    <p>The modern Internet is unimaginable without traffic encryption, and Oxy, of course, provides this essential aspect. Oxy's cryptography and TLS are based on BoringSSL, providing both a FIPS-compliant version with a limited set of certified features and the latest version that supports all the currently available TLS features. Oxy also allows applications to switch between the two versions in real-time, on a per-request or per-connection basis.</p><p>Oxy's TLS client is designed to make HTTPS requests to <a href="https://en.wikipedia.org/wiki/Upstream_server">upstream servers</a>, with the functionality and security of a browser-grade client. This includes the reconstruction of certificate chains, certificate revocation checks, and more. In addition, Oxy applications can be secured with TLS v1.3, and optionally mTLS, allowing for the extraction of client authentication information from x509 certificates.</p><p>Oxy has the ability to inspect and filter HTTPS traffic, including HTTP/3, and provides the means for dynamically generating certificates, serving as a foundation for implementing data loss prevention (DLP) products. Additionally, Oxy's internal fork of BoringSSL, which is not FIPS-compliant, supports the use of <a href="https://datatracker.ietf.org/doc/html/rfc7250">raw public keys</a> as an alternative to WebPKI, making it ideal for internal service communication. This allows for all the benefits of TLS without the hassle of managing root certificates.</p>
    <div>
      <h3>Gluing everything together</h3>
      <a href="#gluing-everything-together">
        
      </a>
    </div>
    <p>Oxy is more than just a set of building blocks for network applications. It acts as a cohesive glue, handling the bootstrapping of the entire proxy application with ease, including parsing and applying configurations, setting up an asynchronous runtime, applying seccomp hardening and providing automated graceful restarts functionality.</p><p>With built-in support for panic reporting to Sentry, Prometheus metrics with a Rust-macro based API, Kibana logging, distributed tracing, memory and runtime profiling, Oxy offers comprehensive <a href="https://www.cloudflare.com/application-services/solutions/app-performance-monitoring/">monitoring</a> and analysis capabilities. It can also generate detailed audit logs for layer 4 traffic, useful for billing and network analysis.</p><p>To top it off, Oxy includes an integration testing framework, allowing for easy testing of application interactions using TypeScript-based tests.</p>
    <div>
      <h3>Extensibility model</h3>
      <a href="#extensibility-model">
        
      </a>
    </div>
    <p>To take full advantage of Oxy's capabilities, one must understand how to extend and configure its features. Oxy applications are configured using YAML configuration files, offering numerous options for each feature. Additionally, application developers can extend these options by leveraging the convenient macros provided by the framework, making customization a breeze.</p><p>Suppose the Oxy application uses a key-value database to retrieve user information. In that case, it would be beneficial to expose a YAML configuration settings section for this purpose. With Oxy, defining a structure and annotating it with the <code>#[oxy_app_settings]</code> attribute is all it takes to accomplish this:</p>
            <pre><code>///Application’s key-value (KV) database settings
#[oxy_app_settings]
pub struct MyAppKVSettings {
    /// Key prefix.
    pub prefix: Option&lt;String&gt;,
    /// Path to the UNIX domain socket for the appropriate KV 
    /// server instance.
    pub socket: Option&lt;String&gt;,
}</code></pre>
            <p>Oxy can then generate a default YAML configuration file listing available options and their default values, including those extended by the application. The configuration options are automatically documented in the generated file from the Rust doc comments, following best Rust practices.</p><p>Moreover, Oxy supports multi-tenancy, allowing a single application instance to expose multiple on-ramp endpoints, each with a unique configuration. But, sometimes even a YAML configuration file is not enough to build a desired application, this is where Oxy's comprehensive set of hooks comes in handy. These hooks can be used to extend the application with Rust code and cover almost all aspects of the traffic processing.</p><p>To give you an idea of how easy it is to write an Oxy application, here is an example of basic Oxy code:</p>
            <pre><code>struct MyApp;

// Defines types for various application extensions to Oxy's
// data types. Contexts provide information and control knobs for
// the different parts of the traffic flow and applications can extend // all of them with their custom data. As was mentioned before,
// applications could also define their custom configuration.
// It’s just a matter of defining a configuration object with
// `#[oxy_app_settings]` attribute and providing the object type here.
impl OxyExt for MyApp {
    type AppSettings = MyAppKVSettings;
    type EndpointAppSettings = ();
    type EndpointContext = ();
    type IngressConnectionContext = MyAppIngressConnectionContext;
    type RequestContext = ();
    type IpTunnelContext = ();
    type DnsCacheItem = ();

}
   
#[async_trait]
impl OxyApp for MyApp {
    fn name() -&gt; &amp;'static str {
        "My app"
    }

    fn version() -&gt; &amp;'static str {
        env!("CARGO_PKG_VERSION")
    }

    fn description() -&gt; &amp;'static str {
        "This is an example of Oxy application"
    }

    async fn start(
        settings: ServerSettings&lt;MyAppSettings, ()&gt;
    ) -&gt; anyhow::Result&lt;Hooks&lt;Self&gt;&gt; {
        // Here the application initializes various hooks, with each
        // hook being a trait implementation containing multiple
        // optional callbacks invoked during the lifecycle of the
        // traffic processing.
        let ingress_hook = create_ingress_hook(&amp;settings);
        let egress_hook = create_egress_hook(&amp;settings);
        let tunnel_hook = create_tunnel_hook(&amp;settings);
        let http_request_hook = create_http_request_hook(&amp;settings);
        let ip_flow_hook = create_ip_flow_hook(&amp;settings);

        Ok(Hooks {
            ingress: Some(ingress_hook),
            egress: Some(egress_hook),
            tunnel: Some(tunnel_hook),
            http_request: Some(http_request_hook),
            ip_flow: Some(ip_flow_hook),
            ..Default::default()
        })
    }
}

// The entry point of the application
fn main() -&gt; OxyResult&lt;()&gt; {
    oxy::bootstrap::&lt;MyApp&gt;()
}</code></pre>
            
    <div>
      <h2>Technology choice</h2>
      <a href="#technology-choice">
        
      </a>
    </div>
    <p>Oxy leverages the safety and performance benefits of Rust as its implementation language. At Cloudflare, Rust has emerged as a popular choice for new product development, and there are ongoing efforts to migrate some of the existing products to the language as well.</p><p>Rust offers memory and concurrency safety through its ownership and borrowing system, preventing issues like null pointers and data races. This safety is achieved without sacrificing performance, as Rust provides low-level control and the ability to write code with minimal runtime overhead. Rust's balance of safety and performance has made it popular for building safe performance-critical applications, like proxies.</p><p>We intentionally tried to stand on the shoulders of the giants with this project and avoid reinventing the wheel. Oxy heavily relies on open-source dependencies, with <a href="https://github.com/hyperium/hyper">hyper</a> and <a href="https://github.com/tokio-rs/tokio">tokio</a> being the backbone of the framework. Our philosophy is that we should pull from existing solutions as much as we can, allowing for faster iteration, but also use widely battle-tested code. If something doesn't work for us, we try to collaborate with maintainers and contribute back our fixes and improvements. In fact, we now have two team members who are core team members of tokio and hyper projects.</p><p>Even though Oxy is a proprietary project, we try to give back some love to the open-source community without which the project wouldn’t be possible by open-sourcing some of the building blocks such as <a href="https://github.com/cloudflare/boring">https://github.com/cloudflare/boring</a> and <a href="https://github.com/cloudflare/quiche">https://github.com/cloudflare/quiche</a>.</p>
    <div>
      <h2>The road to implementation</h2>
      <a href="#the-road-to-implementation">
        
      </a>
    </div>
    <p>At the beginning of our journey, we set out to implement a proof-of-concept  for an HTTP firewall using Rust for what would eventually become Zero Trust Gateway product. This project was originally part of the <a href="/1111-warp-better-vpn/">WARP</a> service repository. However, as the PoC rapidly advanced, it became clear that it needed to be separated into its own Gateway proxy for both technical and operational reasons.</p><p>Later on, when tasked with implementing a relay proxy for iCloud Private Relay, we saw the opportunity to reuse much of the code from the Gateway proxy. The Gateway project could also benefit from the HTTP/3 support that was being added for the Private Relay project. In fact, early iterations of the relay service were forks of the Gateway server.</p><p>It was then that we realized we could extract common elements from both projects to create a new framework, Oxy. The history of Oxy can be traced back to its origins in the commit history of the Gateway and Private Relay projects, up until its separation as a standalone framework.</p><p>Since our inception, we have leveraged the power of Oxy to efficiently roll out multiple projects that would have required a significant amount of time and effort without it. Our iterative development approach has been a strength of the project, as we have been able to identify common, reusable components through hands-on testing and implementation.</p><p>Our small core team is supplemented by internal contributors from across the company, ensuring that the best subject-matter experts are working on the relevant parts of the project. This contribution model also allows us to shape the framework's API to meet the functional and ergonomic needs of its users, while the core team ensures that the project stays on track.</p>
    <div>
      <h2>Relation to <a href="/how-we-built-pingora-the-proxy-that-connects-cloudflare-to-the-internet/">Pingora</a></h2>
      <a href="#relation-to">
        
      </a>
    </div>
    <p>Although Pingora, another proxy server developed by us in Rust, shares some similarities with Oxy, it was intentionally designed as a separate proxy server with a different objective. Pingora was created to serve traffic from millions of our client’s upstream servers, including those with ancient and unusual configurations. Non-UTF 8 URLs or TLS settings that are not supported by most TLS libraries being just a few such quirks among many others. This focus on handling technically challenging unusual configurations sets Pingora apart from other proxy servers.</p><p>The concept of Pingora came about during the same period when we were beginning to develop Oxy, and we initially considered merging the two projects. However, we quickly realized that their objectives were too different to do that. Pingora is specifically designed to establish Cloudflare’s HTTP connectivity with the Internet, even in its most technically obscure corners. On the other hand, Oxy is a multipurpose platform that supports a wide variety of communication protocols and aims to provide a simple way to develop high-performance proxy applications with business logic.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Oxy is a proxy framework that we have developed to meet the demanding needs of modern services. It has been designed  to provide a flexible and scalable solution that can be adapted to meet the unique requirements of each project and by leveraging the power of Rust, we made it both safe and fast.</p><p>Looking forward, Oxy is poised to play one of the critical roles in our company's larger effort to modernize and improve our architecture. It provides a solid block in foundation on which we can keep building the better Internet.</p><p>As the framework continues to evolve and grow, we remain committed to our iterative approach to development, constantly seeking out new opportunities to reuse existing solutions and improve our codebase. This collaborative, community-driven approach has already yielded impressive results, and we are confident that it will continue to drive the future success of Oxy.</p><p>Stay tuned for more tech savvy blog posts on the subject!</p> ]]></content:encoded>
            <category><![CDATA[Proxying]]></category>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[Performance]]></category>
            <category><![CDATA[Edge]]></category>
            <category><![CDATA[iCloud Private Relay]]></category>
            <category><![CDATA[Cloudflare Gateway]]></category>
            <category><![CDATA[Oxy]]></category>
            <guid isPermaLink="false">1HAnoThlPiFQ4Bgpn04CM0</guid>
            <dc:creator>Ivan Nikulin</dc:creator>
        </item>
        <item>
            <title><![CDATA[A History of HTML Parsing at Cloudflare: Part 2]]></title>
            <link>https://blog.cloudflare.com/html-parsing-2/</link>
            <pubDate>Fri, 29 Nov 2019 08:00:00 GMT</pubDate>
            <description><![CDATA[ The second blog post in the series on HTML rewriters picks up the story in 2017 after the launch of the Cloudflare edge compute platform Cloudflare Workers. It became clear that the developers using workers wanted the same HTML rewriting capabilities that we used internally,  ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2sfVCRtS6lSIai6MEJKe6d/62b69f0f5a6d768aa220daa866346337/HTML-rewrriter-GA_2x-1.png" />
            
            </figure><p>The second blog post in the series on HTML rewriters picks up the story in 2017 after the launch of the Cloudflare edge compute platform <a href="https://workers.cloudflare.com/">Cloudflare Workers</a>. It became clear that the developers using workers wanted the same HTML rewriting capabilities that we used internally, but accessible via a JavaScript API.</p><p>This blog post describes the building of a streaming HTML rewriter/parser with a CSS-selector based API in Rust. It is used as the back-end for the Cloudflare Workers <a href="https://developers.cloudflare.com/workers/reference/apis/html-rewriter/">HTMLRewriter</a>. We have open-sourced the library (<a href="https://github.com/cloudflare/lol-html"><i>LOL HTML</i></a>) as it can also be used as a stand-alone HTML rewriting/parsing library.</p><p>The major change compared to <a href="https://github.com/cloudflare/lazyhtml">LazyHTML</a>, the previous rewriter, is the dual-parser architecture required to overcome the additional performance overhead of wrapping/unwrapping each token when propagating tokens to the workers runtime. The remainder of the post describes a CSS selector matching engine inspired by a Virtual Machine approach to regular expression matching.</p>
    <div>
      <h2>v2 : Give it to everyone and make it faster</h2>
      <a href="#v2-give-it-to-everyone-and-make-it-faster">
        
      </a>
    </div>
    <p>In 2017, Cloudflare introduced an edge compute platform - <a href="https://workers.cloudflare.com/">Cloudflare Workers</a>. It was no surprise that customers quickly required the same HTML rewriting capabilities that we were using internally. Our team was impressed with the platform and decided to migrate some of our features to Workers. The goal was to improve our developer experience working with modern JavaScript rather than statically linked NGINX modules implemented in C with a Lua API.</p><p>It is possible to rewrite HTML in Workers, though for that you needed a third party JavaScript package (such as <a href="http://cheerio.js.org/">Cheerio</a>). These packages are not designed for HTML rewriting on the edge due to the latency, speed and memory considerations described in the previous post.</p><p>JavaScript is really fast but it still can’t always produce performance comparable to native code for some tasks - parsing being one of those. Customers typically needed to buffer the whole content of the page to do the rewriting resulting in considerable output latency and memory consumption that often exceeded the memory limits enforced by the Workers runtime.</p><p>We started to think about how we could reuse the technology in Workers. LazyHTML was a perfect fit in terms of parsing performance, but it had two issues:</p><ol><li><p><b>API ergonomics</b>: LazyHTML produces a stream of HTML tokens. This is sufficient for our internal needs. However, for an average user, it is not as convenient as the jQuery-like API of Cheerio.</p></li><li><p><b>Performance</b>: Even though LazyHTML is tremendously fast, integration with the Workers runtime adds even more limitations. LazyHTML operates as a simple parse-modify-serialize pipeline, which means that it produces tokens for the whole content of the page. All of these tokens then have to be propagated to the Workers runtime and wrapped inside a JavaScript object and then unwrapped and fed back to LazyHTML for serialization. This is an extremely expensive operation which would nullify the performance benefit of LazyHTML.</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4GmYB4iPMNqvDypCQbb2t/3f2010375f3c32cdf8b26730121981e0/image8-1.png" />
            
            </figure><p>LazyHTML with V8</p>
    <div>
      <h3>LOL HTML</h3>
      <a href="#lol-html">
        
      </a>
    </div>
    <p>We needed something new, designed with Workers requirements in mind, using a language with the native speed and safety guarantees (it’s incredibly easy to shoot yourself in the foot doing parsing). Rust was the obvious choice as it provides the native speed and the best guarantee of memory safety which minimises the attack surface of untrusted input. Wherever possible the Low Output Latency HTML rewriter (LOL HTML) uses all the previous optimizations developed for LazyHTML such as tag name hashing.</p>
    <div>
      <h4>Dual-parser architecture</h4>
      <a href="#dual-parser-architecture">
        
      </a>
    </div>
    <p>Most developers are familiar and prefer to use CSS selector-based APIs (as in Cheerio, jQuery or DOM itself) for HTML mutation tasks. We decided to base our API on CSS selectors as well. Although this meant additional implementation complexity, the decision created even more opportunities for parsing optimizations.</p><p>As selectors define the scope of the content that should be rewritten, we realised we can skip the content that is not in this scope and not produce tokens for it. This not only significantly speeds up the parsing itself, but also avoids the performance burden of the back and forth interactions with the JavaScript VM. As ever the best optimization is not to do something.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1bnIfLwJ1sYEYM34uXAleV/f03f74021ce0a7c89fba1e53e2073cb0/image7-2.png" />
            
            </figure><p>Considering the tasks required, LOL HTML’s parser consists of two internal parsers:</p><ul><li><p><b>Lexer</b> - a regular full parser, that produces output for all types of content that it encounters;</p></li><li><p><b>Tag scanner</b> - looks for start and end tags and skips parsing the rest of the content. The tag scanner parses only the tag name and feeds it to the selector matcher. The matcher will switch parser to the lexer if there was a match or additional information about the tag (such as attributes) are required for matching.</p></li></ul><p>The parser switches back to the tag scanner as soon as input leaves the scope of all selector matches. The tag scanner may also sometimes switch the parser to the Lexer - if it requires additional tag information for the parsing feedback simulation.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1U4yaNQ6DOcGtl5JeUZfVS/b61ecda06aa67d2704345854f61c4e45/image1-3.png" />
            
            </figure><p>LOL HTML architecture</p><p>Having two different parser implementations for the same grammar will increase development costs and is error-prone due to implementation inconsistencies. We minimize these risks by implementing a small Rust macro-based DSL which is similar in spirit to Ragel. The DSL program describes <a href="https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton">Nondeterministic finite automaton</a> states and actions associated with each state transition and matched input byte.</p><p>An example of a DSL state definition:</p>
            <pre><code>tag_name_state {
   whitespace =&gt; ( finish_tag_name?; --&gt; before_attribute_name_state )
   b'/'       =&gt; ( finish_tag_name?; --&gt; self_closing_start_tag_state )
   b'&gt;'       =&gt; ( finish_tag_name?; emit_tag?; --&gt; data_state )
   eof        =&gt; ( emit_raw_without_token_and_eof?; )
   _          =&gt; ( update_tag_name_hash; )
}</code></pre>
            <p>The DSL program gets expanded by the Rust compiler into not quite as beautiful, but extremely efficient Rust code.</p><p>We no longer need to reimplement the code that drives the parsing process for each of our parsers. All we need to do is to define different action implementations for each. In the case of the tag scanner, the majority of these actions are a no-op, so the Rust compiler does the NFA optimization job for us: it optimizes away state branches with no-op actions and even whole states if all of the branches have no-op actions. Now that’s cool.</p>
    <div>
      <h4>Byte slice processing optimisations</h4>
      <a href="#byte-slice-processing-optimisations">
        
      </a>
    </div>
    <p>Moving to a memory-safe language provided new challenges. Rust has great memory safety mechanisms, however sometimes they have a runtime performance cost.</p><p>The task of the parser is to scan through the input and find the boundaries of lexical units of the language - tokens and their internal parts. For example, an HTML start tag token consists of multiple parts: a byte slice of input that represents the tag name and multiple pairs of input slices that represent attributes and values:</p>
            <pre><code>struct StartTagToken&lt;'i&gt; {
   name: &amp;'i [u8],
   attributes: Vec&lt;(&amp;'i [u8], &amp;'i [u8])&gt;,
   self_closing: bool
}</code></pre>
            <p>As Rust uses bound checks on memory access, construction of a token might be a relatively expensive operation. We need to be capable of constructing thousands of them in a fraction of second, so every CPU instruction counts.</p><p>Following the principle of doing as little as possible to improve performance we use a “token outline” representation of tokens: instead of having memory slices for token parts we use numeric ranges which are lazily transformed into a byte slice when required.</p>
            <pre><code>struct StartTagTokenOutline {
   name: Range&lt;usize&gt;,
   attributes: Vec&lt;(Range&lt;usize&gt;, Range&lt;usize&gt;)&gt;,
   self_closing: bool
}</code></pre>
            <p>As you might have noticed, with this approach we are no longer bound to the lifetime of the input chunk which turns out to be very useful. If a start tag is spread across multiple input chunks we can easily update the token that is currently in construction, as new chunks of input arrive by just adjusting integer indices. This allows us to avoid constructing a new token with slices from the new input memory region (it could be the input chunk itself or the internal parser’s buffer).</p><p>This time we can’t get away with avoiding the conversion of input character encoding; we expose a user-facing API that operates on JavaScript strings and input HTML can be of any encoding. Luckily, as we can still parse without decoding and only encode and decode within token bounds by a request (though we still can’t do that for UTF-16 encoding).</p><p>So, when a user requests an element’s tag name in the API, internally it is still represented as a byte slice in the character encoding of the input, but when provided to the user it gets dynamically decoded. The opposite process happens when a user sets a new tag name.</p><p>For selector matching we can still operate on the original encoding representation - because we know the input encoding ahead of time we preemptively convert values in a selector to the page’s character encoding, so comparisons can be done without decoding fields of each token.</p><p>As you can see, the new parser architecture along with all these optimizations produced great performance results:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3e73RS0xEBDmg1jxWQGKsJ/ce5739700a08dea4e6e48c3a5cc96d7b/image4-2.png" />
            
            </figure><p>Average parsing time depending on the input size - lower is better</p><p>LOL HTML’s tag scanner is typically twice as fast as LazyHTML and the lexer has comparable performance, outperforming LazyHTML on bigger inputs. Both are a few times faster than the tokenizer from <a href="https://github.com/servo/html5ever">html5ever</a> - another parser implemented in Rust used in the Mozilla’s Servo browser engine.</p>
    <div>
      <h4>CSS selector matching VM</h4>
      <a href="#css-selector-matching-vm">
        
      </a>
    </div>
    <p>With an impressively fast parser on our hands we had only one thing missing - the CSS selector matcher. Initially we thought we could just use Servo’s <a href="https://crates.io/crates/selectors">CSS selector matching engine</a> for this purpose. After a couple of days of experimentation it turned out that it is not quite suitable for our task.</p><p>It did not work well with our dual parser architecture. We first need to to match just a tag name from the tag scanner, and then, if we fail, query the lexer for the attributes. The selectors library wasn’t designed with this architecture in mind so we needed ugly hacks to bail out from matching in case of insufficient information. It was inefficient as we needed to start matching again after the bailout doing twice the work. There were other problems, such as the integration of lazy character decoding and integration of tag name comparison using tag name hashes.</p>
    <div>
      <h5>Matching direction</h5>
      <a href="#matching-direction">
        
      </a>
    </div>
    <p>The main problem encountered was the need to backtrack all the open elements for matching. Browsers match selectors from right to left and traverse all ancestors of an element. This <a href="https://stackoverflow.com/a/5813672">StackOverflow</a> has a good explanation of why they do it this way. We would need to store information about all open elements and their attributes - something that we can’t do while operating with tight memory constraints. This matching approach would be inefficient for our case - unlike browsers, we expect to have just a few selectors and a lot of elements. In this case it is much more efficient to match selectors from left to right.</p><p>And this is when we had a revelation. Consider the following CSS selector:</p>
            <pre><code>body &gt; div.foo  img[alt] &gt; div.foo ul</code></pre>
            <p>It can be split into individual components attributed to a particular element with hierarchical combinators in between:</p>
            <pre><code>body &gt; div.foo img[alt] &gt; div.foo  ul
---    ------- --------   -------  --</code></pre>
            <p>Each component is easy to match having a start tag token - it’s just a matter of comparison of token fields with values in the component. Let’s dive into abstract thinking and imagine that each such component is a character in the infinite alphabet of all possible components:</p>  <table>
        <tr>
            <th>Selector  component</th>
            <th>Character</th>
        </tr>
        <tr>
            <td>body</td>
            <td>a</td>
        </tr>
        <tr>
            <td>div.foo</td>
            <td>b</td>
        </tr>
        <tr>
            <td>img[alt]</td>
            <td>c</td>
        </tr>
        <tr>
            <td>ul</td>
            <td>d</td>
        </tr>
    </table><p>Let’s rewrite our selector with selector components replaced by our imaginary characters:</p>
            <pre><code>a &gt; b c &gt; b d</code></pre>
            <p>Does this remind you of something?</p><p>A   `&gt;` combinator can be considered a child element, or “immediately followed by”.</p><p>The ` ` (space) is a descendant element can be thought of as there might be zero or more elements in between.</p><p>There is a very well known abstraction to express these relations - regular expressions. The selector replacing combinators can be replaced with a regular expression syntax:</p>
            <pre><code>ab.*cb.*d</code></pre>
            <p>We transformed our CSS selector into a regular expression that can be executed on the sequence of start tag tokens. Note that not all CSS selectors can be converted to such a regular grammar and the input on which we match has some specifics, which we’ll discuss later. However, it was a good starting point: it allowed us to express a significant subset of selectors.</p>
    <div>
      <h5>Implementing a Virtual Machine</h5>
      <a href="#implementing-a-virtual-machine">
        
      </a>
    </div>
    <p>Next, we started looking at non-backtracking algorithms for regular expressions. The virtual machine approach seemed suitable for our task as it was possible to have a non-backtracking implementation that was flexible enough to work around differences between real regular expression matching on strings and our abstraction.</p><p>VM-based regular expression matching is implemented as one of the engines in many regular expression libraries such as regexp2 and Rust’s regex. The basic idea is that instead of building an NFA or DFA for a regular expression it is instead converted into DSL assembly language with instructions later executed by the virtual machine - regular expressions are treated as programs that accept strings for matching.</p><p>Since the VM program is just a representation of <a href="https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton#NFA_with_%CE%B5-moves">NFA with ε-transitions</a> it can exist in multiple states simultaneously during the execution, or, in other words, spawns multiple threads. The regular expression matches if one or more states succeed.</p><p>For example, consider the following VM instructions:</p><ul><li><p><i>expect c</i> - waits for next input character, aborts the thread if doesn’t equal to the instruction’s operand;</p></li><li><p><i>jmp L</i> - jump to label ‘L’;</p></li><li><p><i>thread L1, L2</i> - spawns threads for labels L1 and L2, effectively splitting the execution;</p></li><li><p><i>match</i> - succeed the thread with a match;</p></li></ul><p>For example, using this instructions set regular expression “<i>ab*c”</i> can be translated into_:_</p>
            <pre><code>    expect a
L1: thread L2, L3
L2: expect b
    jmp L1
L3: expect c
    match</code></pre>
            <p>Let’s try to translate the regular expression ab.*cb.*d from the selector we saw earlier:</p>
            <pre><code>    expect a
    expect b
L1: thread L2, L3
L2: expect [any]
    jmp L1
L3: expect c
    expect b
L4: thread L5, L6
L5: expect [any]
    jmp L4
L6: expect d
    match</code></pre>
            <p>That looks complex! Though this assembly language is designed for regular expressions in general, and regular expressions can be much more complex than our case. For us the only kind of repetition that matters is “<i>.*”</i>. So, instead of expressing it with multiple instructions we can use just one called <i>hereditary_jmp</i>:</p>
            <pre><code>    expect a
    expect b
    hereditary_jmp L1
L1: expect c
    expect b
    hereditary_jmp L2
L2: expect d
    match</code></pre>
            <p>The instruction tells VM to memoize instruction’s label operand and unconditionally spawn a thread with a jump to this label on each input character.</p><p>There is one significant distinction between the string input of regular expressions and the input provided to our VM. The input can shrink!</p><p>A regular string is just a contiguous sequence of characters, whereas we operate on a sequence of open elements. As new tokens arrive this sequence can grow as well as shrink. Assume we represent  as ‘a’ character in our imaginary language, so having <code>&lt;div&gt;&lt;div&gt;&lt;div&gt;</code> input we can represent it as <code>aaa</code>, if the next token in the input is <code>&lt;/div&gt;</code> then our “string” shrinks to <code>aa</code>.</p><p>You might think at this point that our abstraction doesn’t work and we should try something else. What we have as an input for our machine is a stack of open elements and we needed a stack-like structure to store our hereditrary_jmp instruction labels that VM had seen so far. So, why not store it on the open element stack? If we store the next instruction pointer on each of stack items on which the <code>expect</code> instruction was successfully executed, we’ll have a full snapshot of the VM state, so we can easily roll back to it if our stack shrinks.</p><p>With this implementation we don’t need to store anything except a tag name on the stack, and, considering that we can use the tag name hashing algorithm, it is just a 64-bit integer per open element. As an additional small optimization, to avoid traversing of the whole stack in search of active hereditary jumps on each new input we store an index of the first ancestor with a hereditary jump on each stack item.</p><p>For example, having the following selector “<i>body</i> &gt; <i>div span”</i> we’ll have the following VM program (let’s get rid of labels and just use instruction indices instead):</p>
            <pre><code>0| expect &lt;body&gt;
1| expect &lt;div&gt;
2| hereditary_jmp 3
3| expect &lt;span&gt;
4| match</code></pre>
            <p>Having an input “” we’ll have the following stack:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/F9vqNHbnLYJDeg4qSjvbx/32b541fac6a9f1c5478c25d8a0be6da8/image2-3.png" />
            
            </figure><p>Now, if the next token is a start tag  the VM will first try to execute the selectors program from the beginning and will fail on the first instruction. However, it will also look for any active hereditary jumps on the stack. We have one which jumps to the instructions at index 3. After jumping to this instruction the VM successfully produces a match. If we get yet another  start tag later it will much as well following the same steps which is exactly what we expect for the descendant selector.</p><p>If we then receive a sequence of “” end tags our stack will contain only one item:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Y5I0Az6dXlOkEvQd239bI/356ff04c062c308d94703a74cd2ff773/image5-2.png" />
            
            </figure><p>which instructs VM to jump to instruction at index 1, effectively rolling back to matching the <i>div</i> component of the selector.</p><p>We mentioned earlier that we can bail out from the matching process if we only have a tag name from the tag scanner and we need to obtain more information by running the lexer? With a VM approach it is as easy as stopping the execution of the current instruction and resuming it later when we get the required information.</p>
    <div>
      <h5>Duplicate selectors</h5>
      <a href="#duplicate-selectors">
        
      </a>
    </div>
    <p>As we need a separate program for each selector we need to match, how can we stop the same simple components doing the same job? The AST for our selector matching program is a <a href="https://en.wikipedia.org/wiki/Radix_tree">radix tree</a>-like structure whose edge labels are simple selector components and nodes are hierarchical combinators.For example for the following selectors:</p>
            <pre><code>body &gt; div &gt; link[rel]
body &gt; span
body &gt; span a</code></pre>
            <p>we’ll get the following AST:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3LczcYVznE4d44KCIO8JJw/56a4b10a75a124a2dcff5830381885a8/image3-1.png" />
            
            </figure><p>If selectors have common prefixes we can match them just once for all these selectors. In the compilation process, we flatten this structure into a vector of instructions.</p>
    <div>
      <h5>[not] JIT-compilation</h5>
      <a href="#not-jit-compilation">
        
      </a>
    </div>
    <p>For performance reasons compiled instructions are macro-instructions - they incorporate multiple basic VM instruction calls. This way the VM can execute only one macro instruction per input token. Each of the macro instructions compiled using the so-called “<a href="/building-fast-interpreters-in-rust/#-not-jit-compilation">[not] JIT-compilation</a>” (the same approach to the compilation is used in our other Rust project - wirefilter).</p><p>Internally the macro instruction contains <code>expect</code> and following <code>jmp</code>, <code>hereditary_jmp</code> and <code>match</code> basic instructions. In that sense macro-instructions resemble <a href="https://en.wikipedia.org/wiki/Microcode">microcode</a> making it easy to suspend execution of a macro instruction if we need to request attributes information from the lexer.</p>
    <div>
      <h2>What’s next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>It is obviously not the end of the road, but hopefully, we’ve got a bit closer to it. There are still multiple bits of functionality that need to be implemented and certainly, there is a space for more optimizations.</p><p>If you are interested in the topic don’t hesitate to join us in development of <a href="https://github.com/cloudflare/lazyhtml">LazyHTML</a> and <a href="https://github.com/cloudflare/lol-html">LOL HTML</a> at GitHub and, of course, we are always happy to see people passionate about technology here at Cloudflare, so don’t hesitate to <a href="https://www.cloudflare.com/careers">contact us</a> if you are too :).</p> ]]></content:encoded>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Serverless]]></category>
            <category><![CDATA[Workers Sites]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Deep Dive]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">3QbtvZ1bIRvLpb1lE9PPcD</guid>
            <dc:creator>Andrew Galloni</dc:creator>
            <dc:creator>Ivan Nikulin</dc:creator>
        </item>
        <item>
            <title><![CDATA[Too Old To Rocket Load, Too Young To Die]]></title>
            <link>https://blog.cloudflare.com/too-old-to-rocket-load-too-young-to-die/</link>
            <pubDate>Wed, 04 Jul 2018 14:58:21 GMT</pubDate>
            <description><![CDATA[ Rocket Loader is in the news again. One of Cloudflare's earliest web performance products has been re-engineered for contemporary browsers and Web standards.

It controls the load and execution of your JavaScript, ensuring useful, meaningful page content is unblocked and displayed sooner. ]]></description>
            <content:encoded><![CDATA[ <p>Rocket Loader is in the news again. One of Cloudflare's earliest web performance products has been re-engineered for contemporary browsers and Web standards.</p><p>No longer a beta product, Rocket Loader controls the load and execution of your page JavaScript, ensuring useful and meaningful page content is unblocked and displayed sooner.</p><p>For a high-level discussion of Rocket Loader aims, please refer to our sister post, <a href="/we-have-lift-off-rocket-loader-ga-is-mobile/">We have lift off - Rocket Loader GA is mobile!</a></p><p>Below, we offer a lower-level outline of how Rocket Loader actually achieves its goals.</p>
    <div>
      <h3>Prehistory</h3>
      <a href="#prehistory">
        
      </a>
    </div>
    <p>Early humans looked upon Netscape 2.0, with its new ability to script HTML using LiveScript, and <code>&lt;BLINK&gt;</code>ed to ensure themselves they weren’t dreaming. They decided to use this technology, soon to be re-christened JavaScript (a story told often and elsewhere), for everything they didn’t know they needed: form input validation, image substitution, frameset manipulation, popup windows, and more. The sole requirement was a few interpreted commands enclosed in a <code>&lt;script&gt;</code> tag. The possibilities were endless.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3koyiGY3ofb0n0pf4qzuIn/abff6cdaf842ca710c41d560f4396328/prehistory-_4x.png" />
            
            </figure><p>Soon, the introduction of the <code>src</code> attribute allowed them to import a file full of JS into their pages. Little need to fiddle with the markup, when all the requisite JS for the page could be included in a single, or a few, external files, specified in the page’s <code>&lt;HEAD&gt;</code>. It didn’t take our ancestors long before they decided that the same JS file(s) should be in <i>all</i> pages, throughout their website, containing JS for the complete site. No worries about bloat; after all, the browser would cache it.</p><p>A clear, sunny, road to dynamic, interactive sites lay ahead. What could go wrong?</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3kYBa0v72xkrGHmbZlVApH/de4fe237c97739dc8b8359b4ec122ff1/blockage_4x.png" />
            
            </figure>
    <div>
      <h3>Blockage</h3>
      <a href="#blockage">
        
      </a>
    </div>
    <p>Those early JS adopters deduced that when the HTML parser encountered an external script, it suspended visual rendering of the page while it went off to retrieve and execute it. Simple. The more numerous and larger the scripts, the longer the wait for the page to paint. JavaScript, therefore, was very soon, most often unnecessarily, blocking page rendering.</p><p>The solutions poured in, both from the developer community and browser vendors:</p><ul><li><p><i>Community</i>: Move script location to end of HTML pageA classic <i>duh!</i> moment. Amazingly, this simple suggestion helped, unless the script was required to help build the page, eg. using <code>document.write</code> for markup.</p></li><li><p><i>Vendor</i>: Use <code>&lt;script defer&gt;</code>.It’s 1997, and IE4 introduces the <code>defer</code> attribute. Scripts that do not contribute to the initial rendering of the page should be marked with <code>defer</code>, and they will load in parallel, without blocking, and be executed in their markup order before <code>window.load</code> is fired (later, before <code>document.DOMContentLoaded</code>). Script tags could remain in the <code>&lt;head&gt;</code>, and execute as if they were at the end of page.The main benefit to page rendering was the saving in script retrieval time.</p></li><li><p><i>Community</i>: Reduce latency by reducing actual script size.What began as script <i>obfuscation</i> for intellectual property and vanity reasons, quickly became script <a href="https://en.wikipedia.org/wiki/Minification_(programming)"><i>minification</i></a>, still used widely.</p></li><li><p><i>Community</i>: Reduce latency and http handshake instances through concatenation of all scripts, delivered as one.</p></li><li><p><i>Vendor</i>: Use <code>&lt;script async&gt;</code>.In 2010, 13 years (yes, 13, thirteen) after <code>defer</code> was born, HTML5 provided <code>defer</code> with a sibling, <code>async</code>. Scripts can be loaded asynchronously, be non-blocking, and be executed when they load. Markup order is irrelevant to execution order. A clear benefit over <code>defer</code> was that <code>load/DOMContentLoaded</code> events were not delayed.</p></li><li><p><i>Community</i>: Lazy Loading.Use JS to load JS by dynamically creating non-blocking script tags.</p></li><li><p><i>Cloudflare</i>: Rocket LoaderIt's 2011, and Cloudflare enters the fray, leveraging our network to reduce http requests for 1st party scripts, “bag”ging 3rd party scripts into a single file, and delaying and controlling JS execution.See <a href="/we-have-lift-off-rocket-loader-ga-is-mobile/">Combining Javascript &amp; CSS, a Better Way</a></p></li><li><p><i>Vendor</i>: Use <code>&lt;link rel="preload"&gt;</code> in the <code>&lt;head&gt;</code>.Important resources like scripts, in our case, can be specified for <i>preload</i>. The browser will load scripts in parallel and not block render-parsing.</p></li></ul>
    <div>
      <h3>Rocket Loader, The Early Years</h3>
      <a href="#rocket-loader-the-early-years">
        
      </a>
    </div>
    <p>Much has been written in this blog space about <a href="/tag/rocketloader/">Rocket Loader</a>, from its <a href="/how-cloudflare-rocket-loader-redefines-the-modern-cdn/">initial launch</a>, to the <a href="/we-have-lift-off-rocket-loader-ga-is-mobile/">current one</a>.</p><p>If reading outdated blog posts is not your thing, perhaps watching an extremely short video of a high-profile early Rocket Loader success (June 9, 2011) is:<a href="https://vimeo.com/24900882">CloudFlare Rocket Loader makes the Financial Times website (FT.com) faster</a></p><p>Rocket Loader improved page load times by:</p><ol><li><p>Minimising network requests through the bundling of JS files, including third-party, speeding up page rendering</p></li><li><p>Asynchronously loading the bundles, avoiding HTML parsing blockage</p></li><li><p>Caching scripts locally (using LocalStorage), reducing refetch requests.</p></li></ol><p>As browsers matured, Rocket Loader fell behind, leading to several severe shortcomings:</p><ul><li><p>It did not honour Content-Security-Policy.Rocket Loader was unaware of CSP headers, and loaded scripts indiscriminately.</p></li><li><p>It did not honour Subresource IntegrityRocket Loader loaded scripts through XHR, so browsers could not validate the fetched script.</p></li><li><p>It allowed for <a href="https://www.cloudflare.com/learning/security/threats/cross-site-scripting/">XSS</a> PersistenceSince Rocket Loader stored scripts in LocalStorage, a site’s compromised script could exist as a trojan in a customer’s storage, loading whenever the customer visited the site.</p></li><li><p>It was just out-of-date</p><ul><li><p>Script bundling fell out of favour with the introduction of http2.</p></li><li><p>The use of <code>eval()</code> was finally recognised as <i>evil</i>.</p></li><li><p>Mobile use skyrocketed; mobile browsers became sophisticated; eventually Rocket Loader was unable to support mobile.</p></li></ul></li></ul>
    <div>
      <h3>New and Improved Rocket Loader</h3>
      <a href="#new-and-improved-rocket-loader">
        
      </a>
    </div>
    <p>We recently rebuilt Rocket Loader from the ground up.</p><p>Although our aim remains the same, to improve customer page performance, we incorporated lessons learned. Most importantly, we learned not to aim too high. In order to satisfy all permutations of page layout, the old Rocket Loader created a virtual DOM, a decision that ultimately led to fragility. We've gone the simple, elegant route, knowing full well that there will be a minority of websites that will not benefit.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/515Gnp9rXBxLS4eicMWX5A/49004b2f6d659dff5b9bf10fba657629/new-and-improved-_4x.png" />
            
            </figure><p>The main concept behind Rocket Loader is quite straightforward: execute blocking scripts after all other page assets have loaded.</p><p>The scripts need to be loaded and executed in the originally intended order. Only external blocking scripts curtail page resources, but any script may rely on another one. We must simulate the <i>loading process of scripts</i>, mimicing how the browser would handle them during page load, but do it <i>after the page is actually fully loaded</i>.</p>
    <div>
      <h3>On the Server</h3>
      <a href="#on-the-server">
        
      </a>
    </div>
    <p>Rocket Loader has both a server-side and a client-side component. The goal of the former is to</p><ol><li><p>rewrite <code>&lt;script&gt;</code> tags in the page markup to make them non-executable, and</p></li><li><p>insert the client-side component of Rocket Loader into the page.</p></li></ol><p>The server-side component is built on top of our CF-HTML pipeline. CF-HTML is an nginx module that provides streaming HTML parsing and rewriting functionality with a SAX-style (<a href="https://en.wikipedia.org/wiki/Simple_API_for_XML">Simple API for XML</a>) API on top of it.</p><p>To make the scripts non-executable, we simply prepend their <code>type</code> attribute value with a randomly generated value (<a href="https://en.wikipedia.org/wiki/Cryptographic_nonce">nonce</a>), unique for each page request. Having a unique prefix for each page prevents Rocket Loader from being used as an XSS gadget to bypass various XSS filters.</p><p>Markup that looked like this:</p>
            <pre><code>&lt;!DOCTYPE html&gt;
&lt;html&gt;
  &lt;head&gt;
    &lt;script src="example.org/1.js"&gt;&lt;/script&gt;
    &lt;script src="example.org/2.js" type="text/javascript"&gt;&lt;/script&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...body markup... 
    &lt;script src="example.org/3.js" type="text/javascript"&gt;&lt;/script&gt;
    ...more body markup... 
  &lt;/body&gt;
&lt;/html&gt;</code></pre>
            <p>becomes:</p>
            <pre><code>&lt;!DOCTYPE html&gt;
&lt;html&gt;
  &lt;head&gt;
    &lt;script src="example.org/1.js" type="42deadbeef-"&gt;&lt;/script&gt;
    &lt;script src="example.org/2.js" type="42deadbeef-text/javascript"&gt;&lt;/script&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...body markup... 
    &lt;script src="example.org/3.js" type="42deadbeef-text/javascript"&gt;&lt;/script&gt;
    ...more body markup... 
    &lt;script src="https://ajax.cloudflare.com/rocket-loader.js"
            data-cf-nonce="42deadbeef" defer&gt;
  &lt;/body&gt;
&lt;/html&gt;</code></pre>
            <p>So far, no rocket science, but by making most, or all, scripts non-executable, Rocket Loader has unblocked page-parsing. Browsers display content sooner, improving perceived page load metrics, and engaging the user.</p>
    <div>
      <h4>On The Client</h4>
      <a href="#on-the-client">
        
      </a>
    </div>
    <p>Generally, scripts can be divided into four categories, each having distinct load and execution behaviours when inserted into the DOM:</p><ol><li><p><i>Inline scripts</i> - executed immediately upon insertion.</p></li><li><p><i>External blocking scripts</i> - start loading upon insertion, preventing other scripts from loading and executing.</p></li><li><p><i>External</i> <code>defer</code> <i>scripts</i> - start loading upon insertion, without preventing other scripts from loading and executing. Execution should happen right before <code>DOMContentLoaded</code> event.</p></li><li><p><i>External </i><code><i>async</i></code><i> scripts</i> - start loading upon insertion, without preventing other scripts from loading and executing. Executed when loaded.</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4jwj7cYA0lu9sgSsf9boRx/86993ee16e0cd1eda9a3178017961b53/loadExecute1.png" />
            
            </figure><p>Modified diagram from <a href="https://html.spec.whatwg.org/#attr-script-defer">HTML Standard</a></p><p>To handle load and execution of all script types, Rocket Loader needs two passes.</p>
    <div>
      <h4>Pass One</h4>
      <a href="#pass-one">
        
      </a>
    </div>
    <p>On the first pass, we collect all scripts with our nonce onto a stack, then re-insert them into the DOM, with nonce removed, and wrapped in a comment node. These serve as our placeholders.</p>
            <pre><code>&lt;!DOCTYPE html&gt;
&lt;html&gt;
  &lt;head&gt;
    &lt;!--&lt;script src="example.org/1.js"&gt;&lt;/script&gt;--&gt; 
    &lt;!--&lt;script src="example.org/2.js" type="text/javascript"&gt;&lt;/script&gt;--&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...body markup...
    &lt;!--&lt;script src="example.org/3.js" type="text/javascript"&gt;&lt;/script&gt;--&gt;
    ...more body markup... 
    &lt;script src="https://ajax.cloudflare.com/rocket-loader.js"
            data-cf-nonce="42deadbeef" defer&gt;
  &lt;/body&gt;
&lt;/html&gt;</code></pre>
            <p>Rocket Loader now iterates through the scripts in our stack and re-inserts them, maintaining their intended position in relevant DOM collections (<code>document.scripts</code>, <code>document.querySelectorAll("script")</code>, <code>document.getElementsByTagName("script")</code>, etc.).</p><p>This process of script insertion and execution differs for each script category:</p><p><i>Inline scripts</i> - Placeholder is replaced with the original script element, without nonce, making the script executable. Browsers execute such scripts immediately upon insertion, <i>in the same execution tick</i>.</p><p><i>External blocking scripts</i> - As above, but Rocket Loader waits for the script’s <code>load</code> event before unwinding the script stack further. This delay simulates the script's blocking behaviour <b><i>manually</i></b>. Only parser-inserted external scripts (i.e. scripts present in the original HTML markup) are naturally blocking. External scripts inserted or created via a DOM API are considered async. This behaviour can’t be overridden, so we need our simulation.</p><p><i>External </i><code><i>async</i></code><i> scripts</i> - The same insertion procedure as inline scripts. Browsers treat all inserted external scripts as async, so the default behaviour suits us.</p><p><i>External </i><code><i>defer</i></code><i> scripts</i> - These are not executed during the first pass, since in the simulated environment we haven’t reached the <code>DOMContentLoaded</code> event yet. If we encounter a <code>defer</code> script on the stack we re-insert it, as is, without removing the nonce prefix. It remains non-executable, but in the correct DOM position.</p>
    <div>
      <h4>Pass Two</h4>
      <a href="#pass-two">
        
      </a>
    </div>
    <p>The second pass loads the <code>defer</code> scripts. Again, Rocket Loader collects all scripts with the nonce prefix (these are now just <code>defer</code> scripts) onto the execution stack, but does not replace them with placeholders. They remain in the DOM, since at this point in our simulated environment the complete document has loaded. We then activate them by replacing the <code>&lt;script&gt;</code> elements with themselves, nonce removed, and let the browser do the rest.</p>
    <div>
      <h4>Quirks I: Taming the Waterfall</h4>
      <a href="#quirks-i-taming-the-waterfall">
        
      </a>
    </div>
    <p>Ostensibly, we have now simulated browser script loading and execution behaviours. However, there are some one-off issues we must deal with, quirks if you will.</p><p>There is one not-so-obvious difference between our algorithm and the real behaviour of browsers. Modern browsers try to be clever with the way they manage page resources, engaging various heuristics to improve performance during page load. These are, generally, implementation-specific and not set-in-stone by any specification.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1UoWRV4FFktlhVFLGetJeC/093faf054eb7ba695d3f1ef05a184d3a/noRocketCroppedSm.png" />
            
            </figure><p>One such optimisation that affects us, is <i>speculative parsing</i>. Despite the official specification requiring a browser to block parsing on script execution, browsers continue parsing received HTML markup speculatively, and prefetch found external resources. For example, even with blocking scripts on a page, Chrome loads them simultaneously, in parallel.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1uUQ3PUV8f5FCh9SCY2Y6V/370f69793a57e917b9765e28a7a99406/prevRocketCroppedSm-1.png" />
            
            </figure><p>With Rocket Loader, browsers don’t prefetch scripts, as our nonce makes them non-executable during page load. Later, when we sequentially re-insert activated scripts, we witness a sequential “waterfall” graph.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1tjsG9hdMmd8QDZduzsrXF/bfbd69cda8f237c8eed1826a1196b58f/withRocketCroppedSm.png" />
            
            </figure><p>In our attempt to improve page load performance, we significantly slowed down some script loading. Ironic. Fortunately, we have a workaround: we can insert <i>preload hints</i> (see <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Preloading_content">Preloading content with rel="preload"</a>) before we begin unwinding our script stack, giving the browser notice that we’ll soon be requiring these scripts. It begins fetching them as it would do during speculative parsing.</p><p>Our waterfall is replaced with improved parallel loading and better load metrics.</p>
    <div>
      <h4>Quirks II: <code>document.write()</code> is not dead yet</h4>
      <a href="#quirks-ii-document-write-is-not-dead-yet">
        
      </a>
    </div>
    <p>We've simulated script execution and insertion. We still need to deal with dynamic markup insertion. We can’t use <code>document.write()</code> directly since the document is already parsed and <code>document.close()</code> has been implicitely executed. Calling <code>write()</code> will create a new document, erasing the entire current document. We must manually parse content created by the <code>document.write</code> function and insert it in the intended location.</p><p>Not so simple, if one considers that <code>document.write</code> can insert partial markup. In the following example, if we parse and insert content on the first <code>document.write</code> call, we’ll completely ignore the completion of the <code>id</code> attribute that should be inserted with the second call:</p>
            <pre><code>document.write('&lt;div id="elm');
document.write(Date.now());
document.write('"&gt;some content&lt;/div&gt;');</code></pre>
            <p>So, we have a hard choice:</p><ul><li><p>We can buffer <i>all</i> content inserted via <code>document.write</code> during script execution and flush it afterwards, in which case already executed code expecting elements to be in the DOM will fail, or</p></li><li><p>We can flush inserted markup immediately, but not handle partial markup writes.</p></li></ul><p>Choosing the lesser of two evils, we decided to go with the first option: our observations showed cases like these are more common.</p><p>(Actually, there is a third option that allows for handling of both cases, but it requires proxying of a significant number of DOM APIs, a rabbit hole that we don’t want to dive into, KISS FTW, you know…).</p>
    <div>
      <h4>Quirks III: I ain't got no<code>&lt;body&gt;</code></h4>
      <a href="#quirks-iii-i-aint-got-no-body">
        
      </a>
    </div>
    <p>As mentioned, it’s not enough to just insert parsed markup. There are various modifications of the DOM performed by the parser during full document parsing that contend with <i>malformed markup</i>. We felt we should simulate at least some of them, because, well… scripts may rely on malformed markup.</p><p>Our initial implementation even included simulation of relatively exotic mechanisms such as <a href="https://html.spec.whatwg.org/multipage/parsing.html#unexpected-markup-in-tables"><i>foster parenting</i></a>, but eventually we decided to keep things simple and the only thing that Rocket Loader simulates is the squeezing out of unallowed content from the <code>&lt;head&gt;</code> element.</p><p>To perform this simulation we wrap our <code>document.write</code> buffer in a <code>&lt;head&gt;</code> element and feed this markup to the <a href="https://developer.mozilla.org/en-US/docs/Web/API/DOMParser">DOM Parser</a>.</p><p>Using the resulting document from the parser, we identify all nodes in its <code>&lt;head&gt;</code> and move them into the page, immediately following the script that performed the <code>document.write</code>. If we encounter any nodes in the parsed document's <code>&lt;body&gt;</code> element, we copy all nodes that follow the current script to the <code>&lt;body&gt;</code> element, prepended with the nodes in the parsed document.</p><p>To illustrate this simulation, consider the following page markup:</p>
            <pre><code>&lt;!DOCTYPE&gt;
&lt;head&gt;
  &lt;script&gt;
    document.write(‘&lt;link rel=”stylesheet” href=”1.css”&gt;’);
    document.write(‘&lt;div&gt;&lt;/div&gt;’);
    document.write(‘&lt;link rel=”stylesheet” href=”2.css”&gt;’);
  &lt;/script&gt;
  &lt;link rel=”stylesheet” href=”3.css”&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;div&gt;Hey!&lt;/div&gt;
&lt;/body&gt;</code></pre>
            <p>The buffered, dynamically inserted, markup after script execution will be</p>
            <pre><code>&lt;link rel=”stylesheet” href=”1.css”&gt;
&lt;div&gt;&lt;/div&gt;
&lt;link rel=”stylesheet” href=”2.css”&gt;</code></pre>
            <p>and the string that we’ll feed to the DOMParser will be</p>
            <pre><code>&lt;!DOCTYPE&gt;
&lt;head&gt;
  &lt;link rel=”stylesheet” href=”1.css”&gt;
  &lt;div&gt;&lt;/div&gt;
  &lt;link rel=”stylesheet” href=”2.css”&gt;
&lt;/head&gt;</code></pre>
            <p>The parser will produce the following document structure from the provided markup (note that <code>&lt;div&gt;</code> is not allowed in <code>&lt;head&gt;</code> and was squeezed out to the <code>&lt;body&gt;</code>):</p>
            <pre><code>&lt;!DOCTYPE&gt;
&lt;html&gt;
&lt;head&gt;
  &lt;link rel=”stylesheet” href=”1.css”&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;div&gt;&lt;/div&gt;
  &lt;link rel=”stylesheet” href=”2.css”&gt;
&lt;/body&gt;
&lt;/html&gt;</code></pre>
            <p>Now we move all nodes that we found in parsed document's <code>&lt;head&gt;</code> to the original document:</p>
            <pre><code>&lt;!DOCTYPE&gt;
&lt;head&gt;
  &lt;script&gt;
    document.write(‘&lt;link rel=”stylesheet” href=”1.css”&gt;’);
    document.write(‘&lt;div&gt;&lt;/div&gt;’);
    document.write(‘&lt;link rel=”stylesheet” href=”2.css”&gt;’);
  &lt;/script&gt;
  &lt;link rel=”stylesheet” href=”1.css”&gt;
  &lt;link rel=”stylesheet” href=”3.css”&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;div&gt;Hey!&lt;/div&gt;
&lt;/body&gt;</code></pre>
            <p>We see that parsed document's <code>&lt;body&gt;</code> contains some nodes, so we prepend them to the original document’s <code>&lt;body&gt;</code>:</p>
            <pre><code>&lt;!DOCTYPE&gt;
&lt;head&gt;
  &lt;script&gt;
    document.write(‘&lt;link rel=”stylesheet” href=”1.css”&gt;’);
    document.write(‘&lt;div&gt;&lt;/div&gt;’);
    document.write(‘&lt;link rel=”stylesheet” href=”2.css”&gt;’);
  &lt;/script&gt;
  &lt;link rel=”stylesheet” href=”1.css”&gt;
  &lt;link rel=”stylesheet” href=”3.css”&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;div&gt;&lt;/div&gt;
  &lt;link rel=”stylesheet” href=”2.css”&gt;
  &lt;div&gt;Hey!&lt;/div&gt;
&lt;/body&gt;</code></pre>
            <p>And as a final step, we move all nodes in the <code>&lt;head&gt;</code>, that initially followed the current script, to after the nodes that we’ve just inserted in the <code>&lt;body&gt;</code>:</p>
            <pre><code>&lt;!DOCTYPE&gt;
&lt;head&gt;
  &lt;script&gt;
    document.write(‘&lt;link rel=”stylesheet” href=”1.css”&gt;’);
    document.write(‘&lt;div&gt;&lt;/div&gt;’);
    document.write(‘&lt;link rel=”stylesheet” href=”2.css”&gt;’);
  &lt;/script&gt;
  &lt;link rel=”stylesheet” href=”1.css”&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;div&gt;&lt;/div&gt;
  &lt;link rel=”stylesheet” href=”2.css”&gt;
  &lt;link rel=”stylesheet” href=”3.css”&gt;
  &lt;div&gt;Hey!&lt;/div&gt;
&lt;/body&gt;</code></pre>
            
    <div>
      <h4>Quirks IV: Handling handlers</h4>
      <a href="#quirks-iv-handling-handlers">
        
      </a>
    </div>
    <p>There is one edge case which drastically changes the behaviour of our script-loading simulation. If we encounter elements with inline event handlers in the HTML markup, we need to execute <i>all scripts that precede such elements</i> since the handlers may rely on them.</p><p>We insert the Rocket Loader client side script in special "bailout" mode immediately before such elements. In bailout mode, we activate scripts the same way as in regular mode, except we do it in a blocking manner (remember, we need to prevent element from being parsed while we activate all preceding scripts).</p><p>As noted, it’s impossible to dynamically create blocking external scripts using DOM APIs such as <code>document.appendChild</code>. However, we have a solution to overcome this limitation.</p><p>Since the page is still loading, we can <code>document.write</code> the <code>outerHTML</code> of activatable script into the document, forcing the browser to mark it as parser-inserted and, thus, blocking. However, the script will be inserted in a DOM position different from its original, intended, position, which may break traversing of surrounding nodes from within the script (e.g. using <code>document.currentScript</code> as a starting point).</p><p>There is a trick. We leverage browser behaviour which parses generated content in the same execution tick as the <code>document.write</code> that produced it. We have immediate access to the written element. The <i>execution</i> of the external script is always scheduled for one of the <i>next</i> execution ticks. So, we can just move script to its original position right after we write it and have it in the correct DOM position, awaiting its execution.</p>
    <div>
      <h4>"I can resist everything except temptation"<a href="#fn1">[1]</a></h4>
      <a href="#i-can-resist-everything-except-temptation">
        
      </a>
    </div>
    <p>The need to account for every quirk, every variation in browser parsing, is strong, but implementation would eventually only weaken our product. We've dealt with the best part of browser parser behaviours, enough to benefit the majority of our customers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5vV1rYeaoGyrndNCqb0qyq/733478aacd708279692d6dc75d4d085b/rock-house-_4x.png" />
            
            </figure>
    <div>
      <h3>What's Next?</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>As Rocket Loader matures, and inevitably is affected by changes in Web technologies, it may be expanded and improved. For now, we're monitoring its use, identifying issues, and ensuring that it's worthy of its predecessor, which lasted through so many advances and changes in Web technology.</p><hr /><ol><li><p>Oscar Wilde, Lady Windermere's Fan (1892), and apologies to <a href="http://jethrotull.com/too-old/">Jethro Tull</a> for the blog post title. <a href="#fnref1">↩︎</a></p></li></ol> ]]></content:encoded>
            <category><![CDATA[Rocket Loader]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[Optimization]]></category>
            <guid isPermaLink="false">6cHm33ksyTK14oTt1047Nr</guid>
            <dc:creator>Peter Belesis</dc:creator>
            <dc:creator>Ivan Nikulin</dc:creator>
        </item>
    </channel>
</rss>