
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Mon, 13 Apr 2026 20:31:02 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Zaraz use Workers to make third-party tools secure and fast]]></title>
            <link>https://blog.cloudflare.com/zaraz-use-workers-to-make-third-party-tools-secure-and-fast/</link>
            <pubDate>Wed, 08 Dec 2021 14:00:00 GMT</pubDate>
            <description><![CDATA[ Zaraz fundamentally changes how third-parties are loaded on the web. Learn how we built it from the ground up, and why we chose Cloudflare Worker to power it. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4WT0OzykEX9fSzalaoD8GP/138b4dab26180f9995f46a9172708078/image4-14.png" />
            
            </figure><p>We decided to create Zaraz around the end of March 2020. We were working on another product when we noticed everyone was asking us about the performance impact of having many third-parties on their website. Third-party content is an important part of the majority of websites today, powering analytics, chatbots, conversion pixels, widgets — you name it. The definition of third-party is an asset, often JavaScript, hosted outside the primary site-user relationship, that is not under the direct control of the site owner but is present with ‘approval’. <a href="/cloudflare-acquires-zaraz-to-enable-cloud-loading-of-third-party-tools">Yair wrote in detail about the process of measuring the impact of these third-party tools, and how we pivoted our startup</a>, but I wanted to write about how we built Zaraz and what it actually does behind the scenes.</p><p>Third parties are great in that they let you integrate already-made solutions with your website, and you barely need to do any coding. Analytics? Just drop this code snippet. Chat widget? Just add this one. Third-party vendors will usually instruct you on how to add their tool, and from that point on things should just be working. Right? But when you add third-party code, it usually fetches even more code from remote sources, meaning you have less and less control over whatever is happening in your visitors’ browsers. How can you guarantee that none of the multitude of third parties you have on your website wasn’t hacked, and started <a href="https://www.theregister.com/2018/12/12/ticketmaster_denies_fault_website_magecart_infection/">stealing information</a>, <a href="https://www.wired.co.uk/article/browsealoud-ico-texthelp-cryptomining-how-cryptomining-work">mining cryptocurrencies</a> or logging key presses on your visitors' computers?</p><p>It doesn’t even have to be a deliberate hack. As we investigated more and more third-party tools, we noticed a pattern — sometimes it’s easier for a third-party vendor to collect everything, rather than being selective or careful about it. More often than not, user emails would find their way into a third-party tool, which could very easily put the website owner in trouble due to GDPR, CCPA, or similar.</p>
    <div>
      <h2>How third-party tools work today</h2>
      <a href="#how-third-party-tools-work-today">
        
      </a>
    </div>
    <p>Usually, when you add a third party to your page, you’re asked to add a piece of JavaScript code to the <code>&lt;head&gt;</code> of your HTML. Google Analytics is by far the most popular third-party, so let’s see how it’s done there:</p>
            <pre><code>&lt;!-- Google Analytics --&gt;
&lt;script&gt;
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-XXXXX-Y', 'auto');
ga('send', 'pageview');
&lt;/script&gt;
&lt;!-- End Google Analytics --&gt;</code></pre>
            <p>In this case, and in most other cases, the snippet that you’re pasting actually calls more JavaScript code to be executed. The snippet above creates a new <code>&lt;script&gt;</code> element, gives it the <code>https://www.google-analytics.com/analytics.js</code> <code>src</code> attribute, and appends it to the DOM. The browser then loads the <code>analytics.js</code> script, which includes more JavaScript code than the snippet itself, and sometimes asks the browser to download even more scripts, some of them bigger than <code>analytics.js</code> itself. So far, however, no analytics data has been captured at all, although this is why you’ve added Google Analytics in the first place.</p><p>The last line in the snippet, <code>ga('send', 'pageview');</code>, uses a function defined in the <code>analytics.js</code> file to finally <code>send</code> the <code>pageview</code>. The function is needed because it is what is capturing the analytics data — it fetches the kind of browser, the screen resolution, the language, etc…  Then, it constructs a URL that includes all the data, and  sends a request to this URL. It’s only after this step that the analytics information gets captured. Every user behavior event you record using Google Analytics will result in another request.</p><p>The reality is that the vast majority of tools consist of more than one resource file, and that it’s practically impossible to know in advance what a tool is going to load without testing it on your website. You can use <a href="https://requestmap.webperf.tools/">Request Map Generator</a> to get a visual representation of all the resources loaded on your website, including how they call each other. Below is a Request Map of a demo e-commerce website we created:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6d0x1iH8o1cnjj1qJkJXt2/9cf3432bf125d8e6cd4bee4136a6d633/image6-6.png" />
            
            </figure><p>That big blue circle is our website’s resources, and all other circles are third-party tools. You can see how the big green circle is actually a sub-request of the main Facebook pixel (fbevents.js), and how many tools, like LinkedIn on top right, are creating a redirect chain in order to sync some data, on the expense of forcing the browser to make more and more network requests.</p>
    <div>
      <h2>A new place to run a tag manager — the edge</h2>
      <a href="#a-new-place-to-run-a-tag-manager-the-edge">
        
      </a>
    </div>
    <p>Since we want to make third-parties faster, more secure, and private, we had to develop a fundamental new way of thinking about them and a new system for how they run. We came up with a plan: build a platform where third-parties can run code outside the browser, while still getting access to the information they need and being able to talk with the DOM when necessary. We don’t believe third parties are evil: they never intended to slow down the Internet for everyone, they just didn’t have another option. Being able to run code on the edge and run it fast opened up new possibilities and changed all that, but the transition is hard.</p><p>By moving third-party code to run outside the browser, we get multiple wins.</p><ul><li><p>The website will load faster and be more interactive. The browser rendering your website can now focus on the most important thing — your website. The downloading, parsing and execution of all the third-party scripts will no longer compete or even block the rendering and interactivity of your website.</p></li><li><p>Control over the data sent to third-parties. Third-party tools often automatically collect information from the page and from the browser to, for example, measure site behaviour/usage. In many cases, <a href="https://www.backblaze.com/blog/privacy-update-third-party-tracking/">this information should stay private</a>. For example, most tools collect the <code>document.location</code>, but we often see a “reset password” page including the user email in the URL, meaning emails are unknowingly being sent and saved by third-party providers, usually without consent. Moving the execution of the third parties to the edge means we have full visibility into what is being sent. This means we can provide alerts and filters in case tools are trying to collect Personally Identifiable Information or mask the private parts of the data before they reach third-party servers. This feature is currently not available on the public beta, but contact us if you want to start using it today.</p></li><li><p>By reducing the amount of code being executed in the browser and by scanning all code that is executed in it, we can continuously verify that the code hasn’t been tampered with and that it only does what it is intended to do. We are working to connect Zaraz with <a href="https://www.cloudflare.com/page-shield/">Cloudflare Page Shield</a> to do this automatically.</p></li></ul><p>When you configure a third-party tool through a normal tag manager, a lot happens in the browsers of your visitors which is out of your control. The tag manager will load and then evaluate all trigger rules to decide which tools to load. It would then usually append the script tags of those tools to the DOM of the page, making the browser fetch the scripts and execute them. These scripts come from untrusted or unknown origins, increasing the risk of malicious code execution in the browser. They can also block the browser from becoming interactive until they are completely executed. They are generally free to do whatever they want in the browser, but most commonly they would then collect some information and send it to some endpoint on the third-party server. With Zaraz, the browser essentially does none of that.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4FfeOuSS5KXe1m3JtPX8LB/9c7de59bbd06987f1f2a9b9fdac17df0/BLOG-713---Pageload.png" />
            
            </figure>
    <div>
      <h2>Choosing Cloudflare Workers</h2>
      <a href="#choosing-cloudflare-workers">
        
      </a>
    </div>
    <p>When we set about coding Zaraz, we quickly understood that our infrastructure decisions would have a massive impact on our service. In fact, choosing the wrong one could mean we have no service at all. The most common alternative to Zaraz is traditional Tag Management software. They generally have no server-side component: whenever a user “publishes” a configuration, a JavaScript file is rendered and hosted as a static asset on a CDN. With Zaraz the idea is to move most of the evaluation of code out of the browser, and respond with a dynamically generated JavaScript code each time. We needed to find a solution that would allow us to have a server-side component, but would be as fast as a <a href="https://www.cloudflare.com/learning/cdn/what-is-a-cdn/">CDN</a>. Otherwise, there was a risk we might end up slowing down websites instead of making them faster.</p><p>We needed Zaraz to be served from a place close to the visiting user. Since setting up servers all around the world seemed like too big of a task for a very young startup, we looked at a few distributed serverless platforms. We approached this search with a small list of requirements:</p><ul><li><p><b>Run JavaScript:</b> Third-party tools all use JavaScript. If we were to port them to run in a cloud environment, the easiest way to do so would be to be able to use JavaScript as well.</p></li><li><p><b>Secure:</b> We are processing sensitive data. We can’t afford the risk of someone hacking into our EC2 instance. We wanted to make sure that data doesn’t stay on some server after we sent our HTTP response.</p></li><li><p><b>Fully programmable:</b> Some CDNs allow setting complicated rules for handling a request, but altering HTTP headers, setting redirects or HTTP response codes isn’t enough. We need to generate JavaScript code on the fly, meaning we need full control over the responses. We also need to use some external JavaScript libraries.</p></li><li><p><b>Extremely fast and globally distributed:</b> In the very early stages of the company, we already had customers in the USA, Europe, India, and Israel. As we were preparing to show them a Proof of Concept, we needed to be sure it would be fast wherever they are. We were competing with tag managers and Customer Data Platforms that have a pretty fast response time, so we need to be able to respond as fast as if our content was statically hosted on a CDN, or faster.</p></li></ul><p>Initially we thought we would need to create Docker containers that would run around the globe and would use their own HTTP server, but then a friend from our Y Combinator batch said we should check out Cloudflare Workers.</p><p>At first, we thought it wouldn’t work — Workers doesn’t work like a Node.js application, and we felt that limitation would prevent us from building what we wanted. We planned to let Workers handle the requests coming from users’ browsers, and then use an AWS Lambda for the heavy lifting of actually processing data and sending it to third-party vendors.</p><p>Our first attempt with Workers was very simple: just confirming we could use it to actually return dynamic browser-side JavaScript that is generated on-the-fly:</p>
            <pre><code>addEventListener('fetch', (event) =&gt; {
 event.respondWith(handleRequest(event.request))
})
 
async function handleRequest(request) {
   let code = '(function() {'
  
   if (request.headers.get('user-agent').includes('Firefox')) {
     code += `console.log('Hello Firefox!');`
   } else {
     code += `console.log('Hey other browsers...');`
   }
  
   code += '})();'
  
   return new Response(code, {
     headers: { 'content-type': 'text/javascript' }
   });
}</code></pre>
            <p>It was a tiny example, but I remember calling Yair afterwards and saying “this could actually work!”. It proved the flexibility of Workers. We just created an endpoint that served a JavaScript file, this JavaScript file was dynamically generated, and the response time was less than 10ms. We could now put <code>&lt;script src="path/to/worker.js"&gt;</code> in our HTML and treat this Worker like a normal JavaScript file.</p><p>As we took a deeper look, we found Workers answering demand after demand from our list, and learned we could even do the most complicated things inside Workers. The Lambda function started doing less and less, and was eventually removed. Our little Node.js proof-of-concept was easily converted to Workers.</p>
    <div>
      <h2>Using the Cloudflare Workers platform: “standing on the shoulders of giants”</h2>
      <a href="#using-the-cloudflare-workers-platform-standing-on-the-shoulders-of-giants">
        
      </a>
    </div>
    <p>When we raised our seed round we heard many questions like “if this can work, how come it wasn’t built before?” We often said that while the problem has been a long standing one, accessible edge computing is a new possibility. Later, on our first investors update after creating the prototype, we told them about the unbelievably fast response time we managed to achieve and got much praise for it — talk about “standing on the shoulders of giants”. Workers simply checked all our boxes. Running JavaScript and using the same V8 engine as the browser meant that we could keep the same environment when porting tools to run on the cloud (it also helped with hiring). It also opened the possibility of later on using WebAssembly for certain tasks. The fact that Workers are serverless and stateless by default was a selling point for our own trustworthiness: we told customers we couldn’t save their personal data even by mistake, which was true. The integration between webpack and Wrangler meant that we could write a full-blown application — with modules and external dependencies — to shift 100% of our logic into our Worker. And the performance helped us ace all our demos.</p><p>As we were building Zaraz, the Workers platform got more advanced. We ended up using Workers KV for storing user configuration, and Durable Objects for communicating between Workers. Our main Worker holds server-side implementations of more than 50 popular third-party tools, replacing hundreds of thousands of JavaScript lines of code that traditionally run inside browsers. It’s an ever growing list, and we recently also published an SDK that allows third-party vendors to build support for their tools by themselves. For the first time, they can do it in a secure, private, and fast environment.</p>
    <div>
      <h2>A new way to build third-parties</h2>
      <a href="#a-new-way-to-build-third-parties">
        
      </a>
    </div>
    <p>Most third-party tools do two fundamental things: First, they collect some information from the browser such as screen resolution, current URL, page title or cookie content. Second, they send it to their server. It is often simple, but when a website has tens of these tools, and each of them query for the information it needs and then sends its requests, it can cause a real slowdown. On Zaraz, this looks very different: Every tool provides a <code>run</code> function, and when Zaraz evaluates the user request and decides to load a tool, it executes this <code>run</code> function. This is how we built integrations for over 50 different tools, all from different categories, and this is how we’re inviting third-party vendors to write their own integrations into Zaraz.</p>
            <pre><code>run({system, utils}) { 
  // The `system` object includes information about the current page, browser, and more 
  const { device, page, cookies } = system
  // The `utils` are a set of functions we found useful across multiple tools
  const { getCookieString, waitUntil } = utils

  // Get the existing cookie content, or create a new UUID instead
  const cookieName = 'visitor-identifier'
  const sessionCookie = cookies[cookieName] || crypto.randomUUID()

  // Build the payload
  const payload = {
    session: sessionCookie,
    ip: device.ip,
    resolution: device.resolution,
    ua: device.userAgent,
    url: page.url.href,
    title: page.title,
  }

  // Construct the URL
  const baseURL = 'https://example.com/collect?'
  const params = new URLSearchParams(payload)
  const finalURL = baseURL + params

  // Send a request to the third-party server from the edge
  waitUntil(fetch(finalURL))
  
  // Save or update the cookie in the browser
  return getCookieString(cookieName, sessionCookie)
}</code></pre>
            <p>The above code runs in our Cloudflare Worker, instead of the browser. Previously, having 10x more tools meant 10x more requests browsers rendering your website needed to make, and 10x more JavaScript code they needed to evaluate. This code would often be repetitive, for example, almost every tool implements their own “get cookie” function. It’s also 10x more origins you have to trust no one is tampering with. When running tools on the edge, this doesn’t affect the browser at all: you can add as many tools as you want, but they wouldn’t be loading in the browser, so they will have no effect.</p><p>In this example, we first check for the existence of a cookie that identifies the session, called “visitor-identifier”. If it exists, we read its value; if not, we generate a new UUID for it. Note that the power of Workers is all accessible here: we use <code>crypto.randomUUID()</code> just like we can use any other Workers functionality. We then collect all the information our example tool needs — user agent, current URL, page title, screen resolution, client IP address — and the content of the “visitor-identifier” cookie. We construct the final URL that the Worker needs to send a request to, and we then use <code>waitUntil</code> to make sure the request gets there. Zaraz’s version of fetch gives our tools automatic logging, data loss prevention and retries capabilities.</p><p>Lastly, we return the value of the <code>getCookieString</code> function. Whatever string is returned by the <code>run</code> function is passed to the visitor as browser-side JavaScript. In this case, <code>getCookieString</code> returns something like <code>document.cookie = 'visitor-identifier=5006e6fa-7ce6-45ef-8724-c846f1953369; Path=/; Max-age=31536000';</code>, causing the browser to create a first-party cookie. The next time a user loads a page, the <code>visitor-identifier</code> cookie should exist, causing Zaraz to reuse the UUID instead of creating a new one.</p><p>This system of <code>run</code> functions allows us to separate and isolate each tool to run independently of the rest of the system, while still providing it with all the required context and data coming from the browser, and the capabilities of Workers. We are inviting third-party vendors to work with us to build the future of secure, private and fast third-party tools.</p>
    <div>
      <h2>A new events system</h2>
      <a href="#a-new-events-system">
        
      </a>
    </div>
    <p>Many third-party tools need to collect behavioral information during a user visit. For example, you might want to place a conversation pixel right after a user clicked “submit” on the credit card form. Since we moved tools to the cloud, you can’t access their libraries from the browser context anymore. For that we created <code>zaraz.track()</code> — a method that allows you to call tools programmatically, and optionally provide them with more information:</p>
            <pre><code>document.getElementById("credit-card-form").addEventListener("submit", () =&gt; {
  zaraz.track("card-submission", {
    value: document.getElementById("total").innerHTML,
    transaction: "X-98765",
  });
});</code></pre>
            <p>In this example, we’re letting Zaraz know about a trigger called “card-submission”, and we associate some data with it — the <code>value</code> of the transaction that we’re taking from an element with the ID <code>total</code>, and a transaction code that is hardcoded and gets printed directly from our backend.</p><p>In the Zaraz interface, configured tools can be subscribed to different and multiple triggers. When the code above gets triggered, Zaraz checks, on the edge, what tools are subscribed to the card-submission trigger, and it then calls them with the right additional data supplied, populating their requests with the transaction code and its value.</p><p>This is different from how traditional tag managers work: GTM’s <code>dataLayer.push</code> serves a similar purpose, but is evaluated client-side. The result is that GTM itself, when used intensively, will grow its script so much that it can become the heaviest tool a website loads. Each event sent using <code>dataLayer.push</code> will cause repeated evaluation of code in the browser, and each tool that will match the evaluation will execute code in the browser, and might call more external assets again. As these events are usually coupled with user interactions, this often makes interacting with a website feel slow, because running the tools is occupying the main thread. With Zaraz, these tools exist and are evaluated only at the edge, improving the website’s speed and security.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/zfeXBueyJMyduWMF5VzXv/37af1f8270a57bab5edaad333ed6ff85/unnamed-7.png" />
            
            </figure><p>You don’t have to be coder to use triggers. The Zaraz dashboard allows you to choose from a predefined set of templates like click listeners, scroll events and more, that you can attach to any element on your website without touching your code. When you combine <code>zaraz.track()</code> with the ability to program your own tools, what you get is essentially a one-liner integration of Workers into your website. You can write any backend code you want and Zaraz will take care of calling it exactly at the right time with the right parameters.</p>
    <div>
      <h2>Joining Cloudflare</h2>
      <a href="#joining-cloudflare">
        
      </a>
    </div>
    <p>When new customers started using Zaraz, we noticed a pattern: the best teams we worked with chose Cloudflare, and some were also moving parts of their backend infrastructure to Workers. We figured we could further improve performance and integration for companies using Cloudflare as well. We could inline parts of the code inside the page and then further reduce the amount of network requests. Integration also allowed us to remove the time it takes to DNS resolve our script, because we could use Workers to proxy Zaraz into our customers' domains. Integrating with Cloudflare made our offering even more compelling.</p><p>Back when we were doing Y Combinator in Winter 2020 and realized how much third parties could affect a websites’ performance, we saw a grand mission ahead of us: creating a faster, private, and secure web by reducing the amount of third-party bloat. This mission remained the same to this day. As our conversations with Cloudflare got deeper, we were excited to realize that we’re talking with people who share the same vision. We are thrilled for the opportunity to scale our solutions to millions of websites on the Internet, making them faster and safer and even reducing carbon emissions.</p><p>If you would like to explore the free beta version, <a href="https://dash.cloudflare.com/?to=/:account/:zone/zaraz">please click here</a>. If you are an enterprise and have additional/custom requirements, please <a href="https://www.cloudflare.com/cloudflare-zaraz-third-party-tool-manager-waitlist/">click here</a> to join the waitlist. To join our Discord channel, <a href="https://discord.gg/2TRr6nSxdd">click here</a>.</p> ]]></content:encoded>
            <category><![CDATA[CIO Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">WKGRmdzjfhPVQe2iAYCmP</guid>
            <dc:creator>Yo'av Moshe</dc:creator>
            <dc:creator>Andrew Galloni</dc:creator>
        </item>
        <item>
            <title><![CDATA[Prototyping optimizations with Cloudflare Workers and WebPageTest]]></title>
            <link>https://blog.cloudflare.com/workers-and-webpagetest/</link>
            <pubDate>Wed, 08 Jan 2020 15:00:00 GMT</pubDate>
            <description><![CDATA[ Have you ever wanted to quickly test a new performance idea, or see if the latest performance wisdom is beneficial to your site? As web performance appears to be a stochastic process, it is really important to be able to iterate quickly and review the effects of different experiments. ]]></description>
            <content:encoded><![CDATA[ <p><i>This article was originally published as part of  </i><a href="https://calendar.perfplanet.com/2019/"><i>Perf Planet's 2019 Web Performance Calendar</i></a><i>.</i></p><p>Have you ever wanted to quickly test a new performance idea, or see if the latest performance wisdom is beneficial to your site? As web performance appears to be a stochastic process, it is really important to be able to iterate quickly and review the effects of different experiments. The challenge is to be able to arbitrarily change requests and responses without the overhead of setting up another internet facing server. This can be straightforward to implement by combining two of my favourite technologies : <a href="https://webpagetest.org/">WebPageTest</a> and <a href="https://workers.cloudflare.com/">Cloudflare Workers</a>. Pat Meenan sums this up with the following slide from a recent <a href="https://www.slideshare.net/patrickmeenan/getting-the-most-out-of-webpagetest">getting the most of WebPageTest</a> presentation:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/63FuSzxUDPepqCC577lP24/0e40f5e48ed07432017fe9bdf4abef6b/magic.jpg" />
            
            </figure><p>So what is Cloudflare Workers and why is it ideally suited to easy prototyping of optimizations?</p>
    <div>
      <h2>Cloudflare Workers</h2>
      <a href="#cloudflare-workers">
        
      </a>
    </div>
    <p>From the documentation :</p><blockquote><p>Cloudflare Workers provides a lightweight JavaScript execution environment that allows developers to augment existing applications or create entirely new ones without configuring or maintaining infrastructure.A Cloudflare Worker is a programmable proxy which brings the simplicity and flexibility of the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API">Service Workers event-based fetch API</a> from the browser to the edge. This allows a worker to intercept and modify requests and responses.</p></blockquote>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3y9pmmtBYue73BYEUOHtpN/74e7e29b940d04cebe403e8cdacb26b5/worker-4.svg" />
            
            </figure><p>With the Service Worker API you can add an <code>EventListener</code> to any fetch event that is routed through the worker script and modify the request to come from a different origin.</p><p>Cloudflare Workers also provides a <a href="https://developers.cloudflare.com/workers/reference/apis/html-rewriter/">streaming HTMLRewriter</a> to enable on the fly modification of HTML as it passes through the worker. The streaming nature of this parser ensures latency is minimised as the entire HTML document does not have to be buffered before rewriting can happen.</p>
    <div>
      <h3>Setting up a worker</h3>
      <a href="#setting-up-a-worker">
        
      </a>
    </div>
    <p>It is really quick and easy to <a href="https://workers.cloudflare.com/">sign up</a> for a free subdomain at <code>workers.dev</code> which provides you with 100,000 free requests per day. There is a quick-start guide available <a href="https://developers.cloudflare.com/workers/quickstart/">here</a>.To be able to run the examples in this post you will need to <a href="https://github.com/cloudflare/wrangler#installation">install Wrangler</a>, the CLI tool for deploying workers. Once Wrangler is installed run the following command to download the example worker project:    </p>
            <pre><code>wrangler generate wpt-proxy https://github.com/xtuc/WebPageTest-proxy</code></pre>
            <p><a href="https://github.com/xtuc/WebPageTest-proxyYou">You</a> will then need to update the <code>wrangler.toml</code> with your account_id, which can be found in the <a href="https://dash.cloudflare.com">dashboard</a> in the right sidebar. Then configure an API key with the command:</p><p><code>wrangler config</code></p><p>Finally, you can publish the worker with:  </p><p><code>wrangler publish</code></p><p>At this the point, the worker will be active at</p><p><code>https://wpt-proxy.&lt;your-subdomain&gt;.workers.dev</code>.</p>
    <div>
      <h2>WebPageTest OverrideHost  </h2>
      <a href="#webpagetest-overridehost">
        
      </a>
    </div>
    <p>Now that your worker is configured, the next step is to configure WebPageTest to redirect requests through the worker. WebPageTest has a feature where it can re-point arbitrary origins to a different domain. To access the feature in WebPageTest, you need to use the WebPageTest scripting language "overrideHost" command, as shown:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3obNudq5ry3Gg8kYI0basR/5ca5e9269e797c9d05cc46baf66bc534/samplescript.png" />
            
            </figure><p>This example will redirect all network requests to <a href="http://www.bbc.co.uk">www.bbc.co.uk</a> to wpt-proxy.prf.workers.dev instead. WebPageTest also adds an <code>x-host</code> header to each redirected request so that the destination can determine for which host the request was originally intended:    </p><p><code>x-host: www.bbc.co.uk</code></p><p>The script can process multiple overrideHost commands to override multiple different origins. If HTTPS is used, WebPageTest can use HTTP/2 and benefit from <a href="https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/">connection coalescing</a>:  </p>
            <pre><code>overrideHost www.bbc.co.uk wpt-proxy.prf.workers.dev    
overrideHost nav.files.bbci.co.uk wpt-proxy.prf.workers.dev
navigate https://www.bbc.co.uk</code></pre>
            <p> It also supports wildcards:  </p>
            <pre><code>overrideHost *bbc.co.uk wpt-proxy.prf.workers.dev    
navigate https://www.bbc.co.uk</code></pre>
            <p>There are a few special strings that can be used in a script when bulk testing, so a single script can be re-used across multiple URLs:</p><ul><li><p><code>%URL%</code> - Replaces with the URL of the current test</p></li><li><p><code>%HOST%</code> - Replaces with the hostname of the URL of the current test</p></li><li><p><code>%HOSTR%</code> - Replaces with the hostname of the final URL in case the test URL does a redirect.</p></li></ul><p>A more generic script would look like this:    </p>
            <pre><code>overrideHost %HOSTR% wpt-proxy.prf.workers.dev    
navigate %URL% </code></pre>
            
    <div>
      <h2>Basic worker</h2>
      <a href="#basic-worker">
        
      </a>
    </div>
    <p>In the base example below, the worker listens for the fetch event, looks for the <code>x-host</code> header that WebPageTest has set and responds by fetching the content from the orginal url:</p>
            <pre><code>/* 
* Handle all requests. 
* Proxy requests with an x-host header and return 403
* for everything else
*/

addEventListener("fetch", event =&gt; {    
   const host = event.request.headers.get('x-host');        
   if (host) {          
      const url = new URL(event.request.url);          
      const originUrl = url.protocol + '//' + host + url.pathname + url.search;             
      let init = {             
         method: event.request.method,             
         redirect: "manual",             
         headers: [...event.request.headers]          
      };          
      event.respondWith(fetch(originUrl, init));        
   } 
   else {           
     const response = new Response('x-Host headers missing', {status: 403});                
     event.respondWith(response);        
   }    
});</code></pre>
            <p>The source code can be found <a href="https://github.com/xtuc/WebPageTest-proxy">here</a> and instructions to download and deploy this worker are described in the earlier section.</p><p>So what happens if we point all the domains on the BBC website through this worker, using the following config:  </p>
            <pre><code>overrideHost    *bbci.co.uk wpt.prf.workers.dev    
overrideHost    *bbc.co.uk  wpt.prf.workers.dev    
navigate    https://www.bbc.co.uk</code></pre>
            <p>configured to a 3G Fast setting from a UK test location.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6y7lIsq1654FwvbSlcpBvh/68324705a4c88d6ae450da4e1e405e17/simpleworkercompare.png" />
            
            </figure><p>Comparison of BBC website if when using a single connection. </p><p>Before</p><p>After</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2pwdrZ0TRNcU2x2AosI09b/7ee354130eab1e1a05d0f74e93bf85c7/beforeBBC2-1.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7DE8XJvQoNTKguQR4Y9a9J/3a588557c9048328de75816bd666c07a/afterBBC2.png" />
            
            </figure><p>The potential performance improvement of loading a page over a single connection, eliminating the additional DNS lookup, TCP connection and TLS handshakes, can be seen  by comparing the filmstrips and waterfalls. There are several reasons why you may not want or be able to move everything to a single domain, but at least it is now easy to see what the performance difference would be.  </p>
    <div>
      <h2>HTMLRewriter</h2>
      <a href="#htmlrewriter">
        
      </a>
    </div>
    <p>With the HTMLRewriter, it is possible to change the HTML response as it passes through the worker. A jQuery-like syntax provides CSS-selector matching and a standard set of DOM mutation methods. For instance you could rewrite your page to measure the effects of different preload/prefetch strategies, review the performance savings of removing or using different third-party scripts, or you could stock-take the HEAD of your document. One piece of performance advice is to self-host some third-party scripts. This <a href="https://github.com/xtuc/rewrite-3d-party.git">example</a> script invokes the HTMLRewriter to listen for a script tag with a <code>src</code> attribute. If the script is from a proxiable domain the src is rewritten to be first-party, with a specific path prefix.</p>
            <pre><code>async function rewritePage(request) {  
  const response = await fetch(request);    
    return new HTMLRewriter()      
      .on("script[src]", {        
        element: el =&gt; {          
          let src = el.getAttribute("src");          
          if (PROXIED_URL_PREFIXES_RE.test(src)) {
            el.setAttribute("src", createProxiedScriptUrl(src));
          }           
        }    
    })    
    .transform(response);
}</code></pre>
            <p>Subsequently, when the browser makes a request with the specific prefix, the worker fetches the asset from the original URL. This example can be downloaded with this command:    </p><p><code>wrangler generate test [https://github.com/xtuc/rewrite-3d-party.git](https://github.com/xtuc/rewrite-3d-party.git)</code></p>
    <div>
      <h2>Request Mangling</h2>
      <a href="#request-mangling">
        
      </a>
    </div>
    <p>As well as rewriting content, it is also possible to change or delay a request. Below is an example of how to randomly add a delay of a second to a request:</p>
            <pre><code>addEventListener("fetch", event =&gt; {    
    const host = event.request.headers.get('x-host');    
    if (host) { 
//....     
    // Add the delay if necessary     
    if (Math.random() * 100 &lt; DELAY_PERCENT) {       
      await new Promise(resolve =&gt; setTimeout(resolve, DELAY_MS));     
    }    
    event.respondWith(fetch(originUrl, init));
//...
}</code></pre>
            
    <div>
      <h2>HTTP/2 prioritization</h2>
      <a href="#http-2-prioritization">
        
      </a>
    </div>
    <p>What if you want to see what the effect of changing the HTTP/2 prioritization of assets would make to your website? Cloudflare Workers provide <a href="/better-http-2-prioritization-for-a-faster-web/">custom http2 prioritization schemes</a> that can be applied by setting a custom header on the response. The <code>cf-priority</code> header is defined as <code>&lt;priority&gt;/&lt;concurrency&gt;</code> so adding:    </p><p><code>response.headers.set('cf-priority', “30/0”);</code>    </p><p>would set the priority of that response to 30 with a concurrency of 0 for the given response. Similarly, “30/1” would set concurrency to 1 and “30/n” would set concurrency to n. With this flexibility, you can prioritize the bytes that are important for your website or run a bulk test to prove that your new  prioritization scheme is better than any of the existing browser implementations.</p>
    <div>
      <h2>Summary</h2>
      <a href="#summary">
        
      </a>
    </div>
    <p>A major barrier to understanding and innovation, is the amount of time is takes to get feedback. Having a quick and easy framework, to try out a new idea and comprehend the impact, is key. I hope this post has convinced you that combining WebPageTest and Cloudflare Workers is an easy solution to this problem and is indeed magic</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Speed]]></category>
            <category><![CDATA[Serverless]]></category>
            <category><![CDATA[Workers Sites]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">4GVvwYx4x8CBGH3mW0gmTr</guid>
            <dc:creator>Andrew Galloni</dc:creator>
        </item>
        <item>
            <title><![CDATA[A History of HTML Parsing at Cloudflare: Part 2]]></title>
            <link>https://blog.cloudflare.com/html-parsing-2/</link>
            <pubDate>Fri, 29 Nov 2019 08:00:00 GMT</pubDate>
            <description><![CDATA[ The second blog post in the series on HTML rewriters picks up the story in 2017 after the launch of the Cloudflare edge compute platform Cloudflare Workers. It became clear that the developers using workers wanted the same HTML rewriting capabilities that we used internally,  ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2sfVCRtS6lSIai6MEJKe6d/62b69f0f5a6d768aa220daa866346337/HTML-rewrriter-GA_2x-1.png" />
            
            </figure><p>The second blog post in the series on HTML rewriters picks up the story in 2017 after the launch of the Cloudflare edge compute platform <a href="https://workers.cloudflare.com/">Cloudflare Workers</a>. It became clear that the developers using workers wanted the same HTML rewriting capabilities that we used internally, but accessible via a JavaScript API.</p><p>This blog post describes the building of a streaming HTML rewriter/parser with a CSS-selector based API in Rust. It is used as the back-end for the Cloudflare Workers <a href="https://developers.cloudflare.com/workers/reference/apis/html-rewriter/">HTMLRewriter</a>. We have open-sourced the library (<a href="https://github.com/cloudflare/lol-html"><i>LOL HTML</i></a>) as it can also be used as a stand-alone HTML rewriting/parsing library.</p><p>The major change compared to <a href="https://github.com/cloudflare/lazyhtml">LazyHTML</a>, the previous rewriter, is the dual-parser architecture required to overcome the additional performance overhead of wrapping/unwrapping each token when propagating tokens to the workers runtime. The remainder of the post describes a CSS selector matching engine inspired by a Virtual Machine approach to regular expression matching.</p>
    <div>
      <h2>v2 : Give it to everyone and make it faster</h2>
      <a href="#v2-give-it-to-everyone-and-make-it-faster">
        
      </a>
    </div>
    <p>In 2017, Cloudflare introduced an edge compute platform - <a href="https://workers.cloudflare.com/">Cloudflare Workers</a>. It was no surprise that customers quickly required the same HTML rewriting capabilities that we were using internally. Our team was impressed with the platform and decided to migrate some of our features to Workers. The goal was to improve our developer experience working with modern JavaScript rather than statically linked NGINX modules implemented in C with a Lua API.</p><p>It is possible to rewrite HTML in Workers, though for that you needed a third party JavaScript package (such as <a href="http://cheerio.js.org/">Cheerio</a>). These packages are not designed for HTML rewriting on the edge due to the latency, speed and memory considerations described in the previous post.</p><p>JavaScript is really fast but it still can’t always produce performance comparable to native code for some tasks - parsing being one of those. Customers typically needed to buffer the whole content of the page to do the rewriting resulting in considerable output latency and memory consumption that often exceeded the memory limits enforced by the Workers runtime.</p><p>We started to think about how we could reuse the technology in Workers. LazyHTML was a perfect fit in terms of parsing performance, but it had two issues:</p><ol><li><p><b>API ergonomics</b>: LazyHTML produces a stream of HTML tokens. This is sufficient for our internal needs. However, for an average user, it is not as convenient as the jQuery-like API of Cheerio.</p></li><li><p><b>Performance</b>: Even though LazyHTML is tremendously fast, integration with the Workers runtime adds even more limitations. LazyHTML operates as a simple parse-modify-serialize pipeline, which means that it produces tokens for the whole content of the page. All of these tokens then have to be propagated to the Workers runtime and wrapped inside a JavaScript object and then unwrapped and fed back to LazyHTML for serialization. This is an extremely expensive operation which would nullify the performance benefit of LazyHTML.</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4GmYB4iPMNqvDypCQbb2t/3f2010375f3c32cdf8b26730121981e0/image8-1.png" />
            
            </figure><p>LazyHTML with V8</p>
    <div>
      <h3>LOL HTML</h3>
      <a href="#lol-html">
        
      </a>
    </div>
    <p>We needed something new, designed with Workers requirements in mind, using a language with the native speed and safety guarantees (it’s incredibly easy to shoot yourself in the foot doing parsing). Rust was the obvious choice as it provides the native speed and the best guarantee of memory safety which minimises the attack surface of untrusted input. Wherever possible the Low Output Latency HTML rewriter (LOL HTML) uses all the previous optimizations developed for LazyHTML such as tag name hashing.</p>
    <div>
      <h4>Dual-parser architecture</h4>
      <a href="#dual-parser-architecture">
        
      </a>
    </div>
    <p>Most developers are familiar and prefer to use CSS selector-based APIs (as in Cheerio, jQuery or DOM itself) for HTML mutation tasks. We decided to base our API on CSS selectors as well. Although this meant additional implementation complexity, the decision created even more opportunities for parsing optimizations.</p><p>As selectors define the scope of the content that should be rewritten, we realised we can skip the content that is not in this scope and not produce tokens for it. This not only significantly speeds up the parsing itself, but also avoids the performance burden of the back and forth interactions with the JavaScript VM. As ever the best optimization is not to do something.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1bnIfLwJ1sYEYM34uXAleV/f03f74021ce0a7c89fba1e53e2073cb0/image7-2.png" />
            
            </figure><p>Considering the tasks required, LOL HTML’s parser consists of two internal parsers:</p><ul><li><p><b>Lexer</b> - a regular full parser, that produces output for all types of content that it encounters;</p></li><li><p><b>Tag scanner</b> - looks for start and end tags and skips parsing the rest of the content. The tag scanner parses only the tag name and feeds it to the selector matcher. The matcher will switch parser to the lexer if there was a match or additional information about the tag (such as attributes) are required for matching.</p></li></ul><p>The parser switches back to the tag scanner as soon as input leaves the scope of all selector matches. The tag scanner may also sometimes switch the parser to the Lexer - if it requires additional tag information for the parsing feedback simulation.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1U4yaNQ6DOcGtl5JeUZfVS/b61ecda06aa67d2704345854f61c4e45/image1-3.png" />
            
            </figure><p>LOL HTML architecture</p><p>Having two different parser implementations for the same grammar will increase development costs and is error-prone due to implementation inconsistencies. We minimize these risks by implementing a small Rust macro-based DSL which is similar in spirit to Ragel. The DSL program describes <a href="https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton">Nondeterministic finite automaton</a> states and actions associated with each state transition and matched input byte.</p><p>An example of a DSL state definition:</p>
            <pre><code>tag_name_state {
   whitespace =&gt; ( finish_tag_name?; --&gt; before_attribute_name_state )
   b'/'       =&gt; ( finish_tag_name?; --&gt; self_closing_start_tag_state )
   b'&gt;'       =&gt; ( finish_tag_name?; emit_tag?; --&gt; data_state )
   eof        =&gt; ( emit_raw_without_token_and_eof?; )
   _          =&gt; ( update_tag_name_hash; )
}</code></pre>
            <p>The DSL program gets expanded by the Rust compiler into not quite as beautiful, but extremely efficient Rust code.</p><p>We no longer need to reimplement the code that drives the parsing process for each of our parsers. All we need to do is to define different action implementations for each. In the case of the tag scanner, the majority of these actions are a no-op, so the Rust compiler does the NFA optimization job for us: it optimizes away state branches with no-op actions and even whole states if all of the branches have no-op actions. Now that’s cool.</p>
    <div>
      <h4>Byte slice processing optimisations</h4>
      <a href="#byte-slice-processing-optimisations">
        
      </a>
    </div>
    <p>Moving to a memory-safe language provided new challenges. Rust has great memory safety mechanisms, however sometimes they have a runtime performance cost.</p><p>The task of the parser is to scan through the input and find the boundaries of lexical units of the language - tokens and their internal parts. For example, an HTML start tag token consists of multiple parts: a byte slice of input that represents the tag name and multiple pairs of input slices that represent attributes and values:</p>
            <pre><code>struct StartTagToken&lt;'i&gt; {
   name: &amp;'i [u8],
   attributes: Vec&lt;(&amp;'i [u8], &amp;'i [u8])&gt;,
   self_closing: bool
}</code></pre>
            <p>As Rust uses bound checks on memory access, construction of a token might be a relatively expensive operation. We need to be capable of constructing thousands of them in a fraction of second, so every CPU instruction counts.</p><p>Following the principle of doing as little as possible to improve performance we use a “token outline” representation of tokens: instead of having memory slices for token parts we use numeric ranges which are lazily transformed into a byte slice when required.</p>
            <pre><code>struct StartTagTokenOutline {
   name: Range&lt;usize&gt;,
   attributes: Vec&lt;(Range&lt;usize&gt;, Range&lt;usize&gt;)&gt;,
   self_closing: bool
}</code></pre>
            <p>As you might have noticed, with this approach we are no longer bound to the lifetime of the input chunk which turns out to be very useful. If a start tag is spread across multiple input chunks we can easily update the token that is currently in construction, as new chunks of input arrive by just adjusting integer indices. This allows us to avoid constructing a new token with slices from the new input memory region (it could be the input chunk itself or the internal parser’s buffer).</p><p>This time we can’t get away with avoiding the conversion of input character encoding; we expose a user-facing API that operates on JavaScript strings and input HTML can be of any encoding. Luckily, as we can still parse without decoding and only encode and decode within token bounds by a request (though we still can’t do that for UTF-16 encoding).</p><p>So, when a user requests an element’s tag name in the API, internally it is still represented as a byte slice in the character encoding of the input, but when provided to the user it gets dynamically decoded. The opposite process happens when a user sets a new tag name.</p><p>For selector matching we can still operate on the original encoding representation - because we know the input encoding ahead of time we preemptively convert values in a selector to the page’s character encoding, so comparisons can be done without decoding fields of each token.</p><p>As you can see, the new parser architecture along with all these optimizations produced great performance results:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3e73RS0xEBDmg1jxWQGKsJ/ce5739700a08dea4e6e48c3a5cc96d7b/image4-2.png" />
            
            </figure><p>Average parsing time depending on the input size - lower is better</p><p>LOL HTML’s tag scanner is typically twice as fast as LazyHTML and the lexer has comparable performance, outperforming LazyHTML on bigger inputs. Both are a few times faster than the tokenizer from <a href="https://github.com/servo/html5ever">html5ever</a> - another parser implemented in Rust used in the Mozilla’s Servo browser engine.</p>
    <div>
      <h4>CSS selector matching VM</h4>
      <a href="#css-selector-matching-vm">
        
      </a>
    </div>
    <p>With an impressively fast parser on our hands we had only one thing missing - the CSS selector matcher. Initially we thought we could just use Servo’s <a href="https://crates.io/crates/selectors">CSS selector matching engine</a> for this purpose. After a couple of days of experimentation it turned out that it is not quite suitable for our task.</p><p>It did not work well with our dual parser architecture. We first need to to match just a tag name from the tag scanner, and then, if we fail, query the lexer for the attributes. The selectors library wasn’t designed with this architecture in mind so we needed ugly hacks to bail out from matching in case of insufficient information. It was inefficient as we needed to start matching again after the bailout doing twice the work. There were other problems, such as the integration of lazy character decoding and integration of tag name comparison using tag name hashes.</p>
    <div>
      <h5>Matching direction</h5>
      <a href="#matching-direction">
        
      </a>
    </div>
    <p>The main problem encountered was the need to backtrack all the open elements for matching. Browsers match selectors from right to left and traverse all ancestors of an element. This <a href="https://stackoverflow.com/a/5813672">StackOverflow</a> has a good explanation of why they do it this way. We would need to store information about all open elements and their attributes - something that we can’t do while operating with tight memory constraints. This matching approach would be inefficient for our case - unlike browsers, we expect to have just a few selectors and a lot of elements. In this case it is much more efficient to match selectors from left to right.</p><p>And this is when we had a revelation. Consider the following CSS selector:</p>
            <pre><code>body &gt; div.foo  img[alt] &gt; div.foo ul</code></pre>
            <p>It can be split into individual components attributed to a particular element with hierarchical combinators in between:</p>
            <pre><code>body &gt; div.foo img[alt] &gt; div.foo  ul
---    ------- --------   -------  --</code></pre>
            <p>Each component is easy to match having a start tag token - it’s just a matter of comparison of token fields with values in the component. Let’s dive into abstract thinking and imagine that each such component is a character in the infinite alphabet of all possible components:</p>  <table>
        <tr>
            <th>Selector  component</th>
            <th>Character</th>
        </tr>
        <tr>
            <td>body</td>
            <td>a</td>
        </tr>
        <tr>
            <td>div.foo</td>
            <td>b</td>
        </tr>
        <tr>
            <td>img[alt]</td>
            <td>c</td>
        </tr>
        <tr>
            <td>ul</td>
            <td>d</td>
        </tr>
    </table><p>Let’s rewrite our selector with selector components replaced by our imaginary characters:</p>
            <pre><code>a &gt; b c &gt; b d</code></pre>
            <p>Does this remind you of something?</p><p>A   `&gt;` combinator can be considered a child element, or “immediately followed by”.</p><p>The ` ` (space) is a descendant element can be thought of as there might be zero or more elements in between.</p><p>There is a very well known abstraction to express these relations - regular expressions. The selector replacing combinators can be replaced with a regular expression syntax:</p>
            <pre><code>ab.*cb.*d</code></pre>
            <p>We transformed our CSS selector into a regular expression that can be executed on the sequence of start tag tokens. Note that not all CSS selectors can be converted to such a regular grammar and the input on which we match has some specifics, which we’ll discuss later. However, it was a good starting point: it allowed us to express a significant subset of selectors.</p>
    <div>
      <h5>Implementing a Virtual Machine</h5>
      <a href="#implementing-a-virtual-machine">
        
      </a>
    </div>
    <p>Next, we started looking at non-backtracking algorithms for regular expressions. The virtual machine approach seemed suitable for our task as it was possible to have a non-backtracking implementation that was flexible enough to work around differences between real regular expression matching on strings and our abstraction.</p><p>VM-based regular expression matching is implemented as one of the engines in many regular expression libraries such as regexp2 and Rust’s regex. The basic idea is that instead of building an NFA or DFA for a regular expression it is instead converted into DSL assembly language with instructions later executed by the virtual machine - regular expressions are treated as programs that accept strings for matching.</p><p>Since the VM program is just a representation of <a href="https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton#NFA_with_%CE%B5-moves">NFA with ε-transitions</a> it can exist in multiple states simultaneously during the execution, or, in other words, spawns multiple threads. The regular expression matches if one or more states succeed.</p><p>For example, consider the following VM instructions:</p><ul><li><p><i>expect c</i> - waits for next input character, aborts the thread if doesn’t equal to the instruction’s operand;</p></li><li><p><i>jmp L</i> - jump to label ‘L’;</p></li><li><p><i>thread L1, L2</i> - spawns threads for labels L1 and L2, effectively splitting the execution;</p></li><li><p><i>match</i> - succeed the thread with a match;</p></li></ul><p>For example, using this instructions set regular expression “<i>ab*c”</i> can be translated into_:_</p>
            <pre><code>    expect a
L1: thread L2, L3
L2: expect b
    jmp L1
L3: expect c
    match</code></pre>
            <p>Let’s try to translate the regular expression ab.*cb.*d from the selector we saw earlier:</p>
            <pre><code>    expect a
    expect b
L1: thread L2, L3
L2: expect [any]
    jmp L1
L3: expect c
    expect b
L4: thread L5, L6
L5: expect [any]
    jmp L4
L6: expect d
    match</code></pre>
            <p>That looks complex! Though this assembly language is designed for regular expressions in general, and regular expressions can be much more complex than our case. For us the only kind of repetition that matters is “<i>.*”</i>. So, instead of expressing it with multiple instructions we can use just one called <i>hereditary_jmp</i>:</p>
            <pre><code>    expect a
    expect b
    hereditary_jmp L1
L1: expect c
    expect b
    hereditary_jmp L2
L2: expect d
    match</code></pre>
            <p>The instruction tells VM to memoize instruction’s label operand and unconditionally spawn a thread with a jump to this label on each input character.</p><p>There is one significant distinction between the string input of regular expressions and the input provided to our VM. The input can shrink!</p><p>A regular string is just a contiguous sequence of characters, whereas we operate on a sequence of open elements. As new tokens arrive this sequence can grow as well as shrink. Assume we represent  as ‘a’ character in our imaginary language, so having <code>&lt;div&gt;&lt;div&gt;&lt;div&gt;</code> input we can represent it as <code>aaa</code>, if the next token in the input is <code>&lt;/div&gt;</code> then our “string” shrinks to <code>aa</code>.</p><p>You might think at this point that our abstraction doesn’t work and we should try something else. What we have as an input for our machine is a stack of open elements and we needed a stack-like structure to store our hereditrary_jmp instruction labels that VM had seen so far. So, why not store it on the open element stack? If we store the next instruction pointer on each of stack items on which the <code>expect</code> instruction was successfully executed, we’ll have a full snapshot of the VM state, so we can easily roll back to it if our stack shrinks.</p><p>With this implementation we don’t need to store anything except a tag name on the stack, and, considering that we can use the tag name hashing algorithm, it is just a 64-bit integer per open element. As an additional small optimization, to avoid traversing of the whole stack in search of active hereditary jumps on each new input we store an index of the first ancestor with a hereditary jump on each stack item.</p><p>For example, having the following selector “<i>body</i> &gt; <i>div span”</i> we’ll have the following VM program (let’s get rid of labels and just use instruction indices instead):</p>
            <pre><code>0| expect &lt;body&gt;
1| expect &lt;div&gt;
2| hereditary_jmp 3
3| expect &lt;span&gt;
4| match</code></pre>
            <p>Having an input “” we’ll have the following stack:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/F9vqNHbnLYJDeg4qSjvbx/32b541fac6a9f1c5478c25d8a0be6da8/image2-3.png" />
            
            </figure><p>Now, if the next token is a start tag  the VM will first try to execute the selectors program from the beginning and will fail on the first instruction. However, it will also look for any active hereditary jumps on the stack. We have one which jumps to the instructions at index 3. After jumping to this instruction the VM successfully produces a match. If we get yet another  start tag later it will much as well following the same steps which is exactly what we expect for the descendant selector.</p><p>If we then receive a sequence of “” end tags our stack will contain only one item:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Y5I0Az6dXlOkEvQd239bI/356ff04c062c308d94703a74cd2ff773/image5-2.png" />
            
            </figure><p>which instructs VM to jump to instruction at index 1, effectively rolling back to matching the <i>div</i> component of the selector.</p><p>We mentioned earlier that we can bail out from the matching process if we only have a tag name from the tag scanner and we need to obtain more information by running the lexer? With a VM approach it is as easy as stopping the execution of the current instruction and resuming it later when we get the required information.</p>
    <div>
      <h5>Duplicate selectors</h5>
      <a href="#duplicate-selectors">
        
      </a>
    </div>
    <p>As we need a separate program for each selector we need to match, how can we stop the same simple components doing the same job? The AST for our selector matching program is a <a href="https://en.wikipedia.org/wiki/Radix_tree">radix tree</a>-like structure whose edge labels are simple selector components and nodes are hierarchical combinators.For example for the following selectors:</p>
            <pre><code>body &gt; div &gt; link[rel]
body &gt; span
body &gt; span a</code></pre>
            <p>we’ll get the following AST:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3LczcYVznE4d44KCIO8JJw/56a4b10a75a124a2dcff5830381885a8/image3-1.png" />
            
            </figure><p>If selectors have common prefixes we can match them just once for all these selectors. In the compilation process, we flatten this structure into a vector of instructions.</p>
    <div>
      <h5>[not] JIT-compilation</h5>
      <a href="#not-jit-compilation">
        
      </a>
    </div>
    <p>For performance reasons compiled instructions are macro-instructions - they incorporate multiple basic VM instruction calls. This way the VM can execute only one macro instruction per input token. Each of the macro instructions compiled using the so-called “<a href="/building-fast-interpreters-in-rust/#-not-jit-compilation">[not] JIT-compilation</a>” (the same approach to the compilation is used in our other Rust project - wirefilter).</p><p>Internally the macro instruction contains <code>expect</code> and following <code>jmp</code>, <code>hereditary_jmp</code> and <code>match</code> basic instructions. In that sense macro-instructions resemble <a href="https://en.wikipedia.org/wiki/Microcode">microcode</a> making it easy to suspend execution of a macro instruction if we need to request attributes information from the lexer.</p>
    <div>
      <h2>What’s next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>It is obviously not the end of the road, but hopefully, we’ve got a bit closer to it. There are still multiple bits of functionality that need to be implemented and certainly, there is a space for more optimizations.</p><p>If you are interested in the topic don’t hesitate to join us in development of <a href="https://github.com/cloudflare/lazyhtml">LazyHTML</a> and <a href="https://github.com/cloudflare/lol-html">LOL HTML</a> at GitHub and, of course, we are always happy to see people passionate about technology here at Cloudflare, so don’t hesitate to <a href="https://www.cloudflare.com/careers">contact us</a> if you are too :).</p> ]]></content:encoded>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Serverless]]></category>
            <category><![CDATA[Workers Sites]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Deep Dive]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">3QbtvZ1bIRvLpb1lE9PPcD</guid>
            <dc:creator>Andrew Galloni</dc:creator>
            <dc:creator>Ivan Nikulin</dc:creator>
        </item>
        <item>
            <title><![CDATA[A History of HTML Parsing at Cloudflare: Part 1]]></title>
            <link>https://blog.cloudflare.com/html-parsing-1/</link>
            <pubDate>Thu, 28 Nov 2019 08:44:00 GMT</pubDate>
            <description><![CDATA[ To coincide with the launch of streaming HTML rewriting functionality for Cloudflare Workers we are open sourcing the Rust HTML rewriter (LOL  HTML) used to back the Workers HTMLRewriter API. We also thought it was about time to review the history of HTML rewriting at Cloudflare. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/30jkHPpVT3WDwbmxy2U4P4/b7c107bdf8240fa50da653f3eed806bd/HTML-rewrriter_1_3x.png" />
            
            </figure><p>To coincide with the launch of streaming HTML rewriting functionality for <a href="https://workers.cloudflare.com/">Cloudflare Workers</a> we are open sourcing the Rust HTML rewriter (<a href="https://github.com/cloudflare/lol-html">LOL HTML</a>) used to back the Workers <a href="https://developers.cloudflare.com/workers/reference/apis/html-rewriter/">HTMLRewriter API</a>. We also thought it was about time to review the history of HTML rewriting at Cloudflare.</p><p>The first blog post will explain the basics of a streaming HTML rewriter and our particular requirements. We start around 8 years ago by describing the group of ‘ad-hoc’ parsers that were created with specific functionality such as to rewrite e-mail addresses or minify HTML. By 2016 the state machine defined in the HTML5 specification could be used to build a single spec-compliant HTML pluggable rewriter, to replace the existing collection of parsers. The source code for this rewriter is now public and available here: <a href="https://github.com/cloudflare/lazyhtml">https://github.com/cloudflare/lazyhtml</a>.</p><p>The second blog post will describe the next iteration of rewriter. With the launch of the edge compute platform <a href="https://workers.cloudflare.com/">Cloudflare Workers</a> we came to realise that developers wanted the same HTML rewriting capabilities with a JavaScript API. The post describes the thoughts behind a low latency streaming HTML rewriter with a CSS-selector based API. We open-sourced the Rust library as it can also be used as a stand-alone HTML rewriting/parsing library.</p>
    <div>
      <h3>What is a streaming HTML rewriter ?</h3>
      <a href="#what-is-a-streaming-html-rewriter">
        
      </a>
    </div>
    <p>A streaming HTML rewriter takes either a HTML string or byte stream input, parses it into tokens or any other structured <a href="https://en.wikipedia.org/wiki/Intermediate_representation">intermediate representation</a> (IR) - such as an <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree</a> (AST). It then performs transformations on the tokens before converting back to HTML. This provides the ability to modify, extract or add to an existing HTML document as the bytes are being processed. Compare this with a standard HTML tree parser which needs to retrieve the entire file to generate a full DOM tree. The tree-based rewriter will both take longer to deliver the first processed bytes and require significantly more memory.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/REe13uX61XBtlevdzEg50/b1398b68b2024b5a5c65f5c84d2c8589/image8.png" />
            
            </figure><p>HTML rewriter</p><p>For example; consider you own a large site with a lot of historical content that you want to now serve over HTTPS. You will quickly run into the problem of resources (images, scripts, videos) being served over HTTP. This ‘mixed content’ opens a security hole and browsers will warn or block these resources. It can be difficult or even impossible to update every link on every page of a website. With a streaming HTML rewriter you can select the URI attribute of any HTML tag and change any HTTP links to HTTPS. We built this very feature <a href="/fixing-the-mixed-content-problem-with-automatic-https-rewrites/">Automatic HTTPS rewrites</a> back in 2016 to solve mixed content issues for our customers.</p><p>The reader may already be wondering: “Isn’t this a solved problem, aren’t there many widely used open-source browsers out there with HTML parsers that can be used for this purpose?”. The reality is that writing code to run in 190+ PoPs around the world with a strict low latency requirement turns even seemingly trivial problems into complex engineering challenges.</p><p>The following blog posts will detail the journey of how starting with a simple idea of finding email addresses within an HTML page led to building an almost spec compliant HTML parser and then on to a CSS selector matching Virtual Machine. We learned a lot on this journey. I hope you find some of this as interesting as we did.</p>
    <div>
      <h2>Rewriting at the edge</h2>
      <a href="#rewriting-at-the-edge">
        
      </a>
    </div>
    <p>When rewriting content through Cloudflare we do not want to impact site performance. The balance in designing a streaming HTML rewriter is to minimise the pause in response byte flow by holding onto as little information as possible whilst retaining the ability to rewrite matching tokens.</p><p>The difference in requirements compared to an HTML parser used in a browser include:</p>
    <div>
      <h4>Output latency</h4>
      <a href="#output-latency">
        
      </a>
    </div>
    <p>For browsers, the Document Object Model (DOM) is the end product of the parsing process but in our case we have to parse, rewrite and serialize back to HTML. In the case of Cloudflare’s reverse proxy any content processing on the edge server results in latency between the server and an eyeball. It is desirable to minimize the latency impact of HTML handling, which involves parsing, rewriting and serializing back to HTML. In all of these stages we want to be as fast as possible to minimize latency.</p>
    <div>
      <h4>Parser throughput</h4>
      <a href="#parser-throughput">
        
      </a>
    </div>
    <p>Let’s assume that usually browsers rarely need to deal with HTML pages bigger than 1Mb in size and an average page load time is somewhere around 3s at best. HTML parsing is not the main bottleneck of the page loading process as the browser will be blocked on running scripts and loading other render-critical resources. We can roughly estimate that ~3Mbps is an acceptable throughput for browser’s HTML parser. At Cloudflare we have hundreds of megabytes of traffic per CPU, so we need a parser that is faster by an order of magnitude.</p>
    <div>
      <h4>Memory limitations</h4>
      <a href="#memory-limitations">
        
      </a>
    </div>
    <p>As most users must realise, browsers have the luxury of being able to consume memory. For example, this simple HTML markup when opened in a browser will consume a significant chunk of your system memory before eventually halting a browser tab (and all this memory will be consumed by the parser) :</p>
            <pre><code>&lt;script&gt;
   document.write('&lt;');
   while(true) {
      document.write('aaaaaaaaaaaaaaaaaaaaaaaa');
   }
&lt;/script&gt;</code></pre>
            <p>Unfortunately, buffering of some fraction of the input is inevitable even for streaming HTML rewriting. Consider these 2 HTML snippets:</p>
            <pre><code>&lt;div foo="bar" qux="qux"&gt;</code></pre>
            
            <pre><code>&lt;div foo="bar" qux="qux"</code></pre>
            <p>These seemingly similar fragments of HTML will be treated completely differently when encountered at the end of an HTML page. The first fragment will be parsed as a start tag and the second one will be ignored. By just seeing a `&lt;` character followed by a tag name, the parser can’t determine if it has found a start tag or not. It needs to traverse the input in the search of the closing `&gt;` to make a decision, buffering all content in between, so it can later be emitted to the consumer as a start tag token.</p><p>This requirement forces browsers to indefinitely buffer content before eventually giving up with the out-of-memory error.</p><p>In our case, we can’t afford to spend hundreds of megabytes of memory parsing a single HTML file (actual constraints are even tighter - even using a dozen kilobytes for each request would be unacceptable). We need to be much more sophisticated than other implementations in terms of memory usage and gracefully handle all the situations where provided memory capacity is insufficient to accomplish parsing.</p>
    <div>
      <h2>v0 : “Ad-hoc parsers”</h2>
      <a href="#v0-ad-hoc-parsers">
        
      </a>
    </div>
    <p>As usual with big projects, it all started pretty innocently.</p>
    <div>
      <h4>Find and obfuscate an email</h4>
      <a href="#find-and-obfuscate-an-email">
        
      </a>
    </div>
    <p>In 2010, Cloudflare decided to provide a feature that would stop popular email scrapers. The basic idea of this protection was to find and obfuscate emails on pages and later decode them back in the browser with injected JavaScript code. Sounds easy, right? You search for anything that looks like an email, encode it and then decode it with some JavaScript magic and present the result to the end-user.</p><p>However, even such a seemingly simple task already requires solving several issues. First of all, we need to define what an email is, and there is no simple answer. Even the infamous <a href="http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html">regex</a> supposedly covering the entire RFC is, in fact, outdated and incomplete as the new RFC added lots of valid email constructions, including Unicode support. Let’s not go down that rabbit hole for now and instead focus on a higher-level issue: transforming streaming content.</p><p>Content from the network comes in packets, which have to be buffered and parsed as HTTP by our servers. You can’t predict how the content will be split, which means you always need to buffer some of it because content that is going to be replaced can be present in multiple input chunks.</p><p>Let’s say we decided to go with a simple regex like `[\w.]+@[\w.]+`. If the content that comes through contains the email “<a href="#">test@example.org</a>”, it might be split in the following chunks:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4SbMCSfBNwSI7ctqCY8ko0/ffd823fb60c086753d7d0c34f0c16a1e/image3.png" />
            
            </figure><p>In order to keep good Time To First Byte (TTFB) and consistent speed, we want to ensure that the preceding chunk is emitted as soon as we determine that it’s not interesting for replacement purposes.</p><p>The easiest way to do that is to transform our regex into a state machine, or a finite automata. While you could do that by hand, you will end up with hard-to-maintain and error-prone code. Instead, <a href="http://www.colm.net/open-source/ragel/">Ragel</a> was chosen to transform regular expressions into efficient native state machine code. Ragel doesn’t try to take care of buffering or anything other than traversing the state machine. It provides a syntax that not only describes patterns, but can also associate custom actions (code in a host language) with any given state.</p><p>In our case we can pass through buffers until we match the beginning of an email. If we subsequently find out the pattern is not an email we can bail out from buffering as soon as the pattern stops matching. Otherwise, we can retrieve the matched email and replace it with new content.</p><p>To turn our pattern into a streaming parser we can remember the position of the potential start of an email and, unless it was already discarded or replaced by the end of the current input, store the unhandled part in a permanent buffer. Then, when a new chunk comes, we can process it separately, resuming from a state Ragel remembers itself, but then use both the buffered chunk and a new one to either emit or obfuscate.</p><p>Now that we have solved the problem of matching email patterns in text, we need to deal with the fact that they need to be obfuscated on pages. This is when the first hints of HTML “parsing” were introduced.</p><p>I’ve put “parsing” in quotes because, rather than implementing the whole parser, the email filter (as the module was called) didn’t attempt to replicate the whole HTML grammar, but rather added custom Ragel patterns just for skipping over comments and tags where emails should not be obfuscated.</p><p>This was a reasonable approach, especially back in 2010 - four years before the HTML5 specification, when all browsers had their own quirks handling of HTML. However, as you can imagine, this approach did not scale well. If you’re trying to work around quirks in other parsers, you start gaining more and more quirks in your own, and then work around these too. Simultaneously, new features started to be added, which also required modifying HTML on the fly (like automatic insertion of Google Analytics script), and an existing module seemed to be the best place for that. It grew to handle more and more tags, operations and syntactic edge cases.</p>
    <div>
      <h4>Now let’s minify..</h4>
      <a href="#now-lets-minify">
        
      </a>
    </div>
    <p>In 2011, Cloudflare decided to also add minification to allow customers to speed up their websites even if they had not employed minification themselves. For that, we decided to use an existing streaming minifier - <a href="https://github.com/brianpane/jitify-core">jitify</a>. It already had NGINX bindings, which made it a great candidate for integration into the existing pipeline.</p><p>Unfortunately, just like most other parsers from that time as well as ours described above, it had its own processing rules for HTML, JavaScript and CSS, which weren’t precise but rather tried to parse content on a best-effort basis. This led to us having two independent streaming parsers that were incompatible and could produce bugs either individually or only in combination.</p>
    <div>
      <h2>v1 : "(Almost) HTML5 Spec compliant parser"</h2>
      <a href="#v1-almost-html5-spec-compliant-parser">
        
      </a>
    </div>
    <p>Over the years engineers kept adding new features to the ever-growing state machines, while fixing new bugs arising from imprecise syntax implementations, conflicts between various parsers, and problems in features themselves.</p><p>By 2016, it was time to get out of the multiple ad hoc parsers business and do things ‘the right way’.</p><p>The next section(s) will describe how we built our HTML5 compliant parser starting from the specification state machine. Using only this state machine it should have been straight-forward to build a parser. You may be aware that historically the parsing of HTML had not been entirely strict which meant to not break existing implementations the building of an actual DOM was required for parsing. This is not possible for a streaming rewriter so a simulator of the parser feedback was developed. In terms of performance, it is always better not to do something. We then describe why the rewriter can be ‘lazy’ and not perform the expensive encoding and decoding of text when rewriting HTML. The surprisingly difficult problem of deciding if a response is HTML is then detailed.</p>
    <div>
      <h4>HTML5</h4>
      <a href="#html5">
        
      </a>
    </div>
    <p>By 2016, HTML5 had defined precise syntax rules for parsing and compatibility with legacy content and custom browser implementations. It was already implemented by all browsers and many 3rd-party implementations.</p><p>The <a href="https://html.spec.whatwg.org/multipage/parsing.html">HTML5 parsing specification</a> defines basic HTML syntax in the form of a state machine. We already had experience with <a href="http://www.colm.net/open-source/ragel/">Ragel</a> for similar use cases, so there was no question about what to use for the new streaming parser. Despite the complexity of the grammar, the translation of the specification to Ragel syntax was straightforward. The code looks simpler than the formal description of the state machine, thanks to the ability to mix regex syntax with explicit transitions.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3lfOyfSPrShxKNNEw5OoF1/7c70383207caeb3741f265725bc5bbf1/image6-1.png" />
            
            </figure><p>A visualisation of a small fraction of the HTML state machine. Source: <a href="https://twitter.com/RReverser/status/715937136520916992">https://twitter.com/RReverser/status/715937136520916992</a></p>
    <div>
      <h3>HTML5 parsing requires a ‘DOM’</h3>
      <a href="#html5-parsing-requires-a-dom">
        
      </a>
    </div>
    <p>However, HTML has a history. To not break existing implementations HTML5 is specified with recovery procedures for incorrect tag nesting, ordering, unclosed tags, missing attributes and all the other possible quirks that used to work in older browsers. In order to resolve these issues, the specification expects a tree builder to drive the lexer, essentially meaning you can’t correctly tokenize HTML (split into separate tags) without a DOM.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6WiV28d2K2jxXL3TbpNJJk/2468b04ab896d5ab46721b3446ac8680/image2-2.png" />
            
            </figure><p>HTML parsing flow as defined by the specification</p><p>For this reason, most parsers don’t even try to perform streaming parsing and instead take the input as a whole and produce a document tree as an output. This is not something we could do for streaming transformation without adding significant delays to page loading.</p><p>An existing HTML5 JavaScript parser - <a href="https://github.com/inikulin/parse5">parse5</a> - had already implemented spec-compliant tree parsing using a streaming tokenizer and rewriter. To avoid having to create a full DOM the concept of a “parser feedback simulator” was introduced.</p>
    <div>
      <h4>Tree builder feedback</h4>
      <a href="#tree-builder-feedback">
        
      </a>
    </div>
    <p>As you can guess from the name, this is a module that aims to simulate a full parser’s feedback to the tokenizer, without actually building the whole DOM, but instead preserving only the required information and context necessary for correctly driving the state machine.</p><p>After rigorous testing and upstreaming a test runner to parse5, we found this technique to be suitable for the majority of even poorly written pages on the Internet, and employed it in LazyHTML.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7156cYXNTDlDNGfLV8YCrq/7edc57eb142f4daded43a653d66460fa/image7-1.png" />
            
            </figure><p>LazyHTML architecture</p>
    <div>
      <h3>Avoiding decoding - everything is ASCII</h3>
      <a href="#avoiding-decoding-everything-is-ascii">
        
      </a>
    </div>
    <p>Now that we had a streaming tokenizer working, we wanted to make sure that it was fast enough so that users didn’t notice any slowdowns to their pages as they go through the parser and transformations. Otherwise it would completely circumvent any optimisations we’d want to attempt on the fly.</p><p>It would not only cause a performance hit due to decoding and re-encoding any modified HTML content, but also significantly complicates our implementation due to multiple sources of potential encoding information required to <a href="https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding">determine the character encoding</a>, including sniffing of the first 1 KB of the content.</p><p>The “living” HTML Standard specification permits only encodings defined in the <a href="https://encoding.spec.whatwg.org/">Encoding Standard</a>. If we look carefully through those encodings, as well as a remark on Character encodings section of the HTML spec, we find that all of them are ASCII-compatible with the exception of UTF-16 and ISO-2022-JP.</p><p>This means that any ASCII text will be represented in such encodings exactly as it would be in ASCII, and any non-ASCII text will be represented by bytes outside of the ASCII range. This property allows us to safely tokenize, compare and even modify original HTML without decoding or even knowing which particular encoding it contains. It is possible as all the token boundaries in HTML grammar are represented by an ASCII character.</p><p>We need to detect UTF-16 by sniffing and either decode or skip such documents without modification. We chose the latter to avoid potential security-sensitive bugs which are common with UTF-16, and because the character encoding is seen in less than 0.1% of known character encodings luckily.</p><p>The only issue left with this approach is that in most places the <a href="https://html.spec.whatwg.org/multipage/parsing.html#tokenization">HTML tokenization</a> specification requires you to replace U+0000 (NUL) characters with U+FFFD (replacement character) during parsing. Presumably, this was added as a security precaution against bugs in C implementations of old engines which could treat NUL character, encoded in ASCII / UTF-8 / ... as a 0x00 byte, as the end of the string (yay, null-terminated strings…). It’s problematic for us because U+FFFD is outside of the ASCII range, and will be represented by different sequences of bytes in different encodings. We don’t know the encoding of the document, so this will lead to corruption of the output.</p><p>Luckily, we’re not in the same business as browser vendors, and don’t worry about NUL characters in strings as much - we use “fat pointer” string representation, in which the length of the string is determined not by the position of the NUL character, but stored along with the data pointer as an integer field:</p>
            <pre><code>typedef struct {
   const char *data;
   size_t length;
} lhtml_string_t;</code></pre>
            <p>Instead, we can quietly ignore these parts of the spec (sorry!), and keep U+0000 characters as-is and add them as such to tag, attribute names, and other strings, and later re-emit to the document. This is safe to do, because it doesn’t affect any state machine transitions, but merely preserves original 0x00 bytes and delegates their replacement to the parser in the end user’s browser.</p>
    <div>
      <h3>Content type madness</h3>
      <a href="#content-type-madness">
        
      </a>
    </div>
    <p>We want to be lazy and minimise false positives. We only want to spend time parsing, decoding and rewriting actual HTML rather than breaking images or JSON. So the question is how do you decide if something is a HTML document. Can you just use the Content-Type for example ? A comment left in the source code best describes the reality.</p>
            <pre><code>/*
Dear future generations. I didn't like this hack either and hoped
we could do the right thing instead. Unfortunately, the Internet
was a bad and scary place at the moment of writing. If this
ever changes and websites become more standards compliant,
please do remove it just like I tried.
Many websites use PHP which sets Content-Type: text/html by
default. There is no error or warning if you don't provide own
one, so most websites don't bother to change it and serve
JSON API responses, private keys and binary data like images
with this default Content-Type, which we would happily try to
parse and transforms. This not only hurts performance, but also
easily breaks response data itself whenever some sequence inside
it happens to look like a valid HTML tag that we are interested
in. It gets even worse when JSON contains valid HTML inside of it
and we treat it as such, and append random scripts to the end
breaking APIs critical for popular web apps.
This hack attempts to mitigate the risk by ensuring that the
first significant character (ignoring whitespaces and BOM)
is actually `&lt;` - which increases the chances that it's indeed HTML.
That way we can potentially skip some responses that otherwise
could be rendered by a browser as part of AJAX response, but this
is still better than the opposite situation.
*/</code></pre>
            <p>The reader might think that it’s a rare edge case, however, our observations show that almost 25% of the traffic served through Cloudflare with the “text/html” content type is unlikely to be HTML.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/a7BORxQP4K6bUBa06Mr4z/fc6f1596815a90154d35f4d2bcc6b958/image9-1.png" />
            
            </figure><p>The trouble doesn’t end there: it turns out that there is a considerable amount of XML content served with the “text/html” content type which can’t be always processed correctly when treated as HTML.</p><p>Over time bailouts for binary data, JSON, AMP and correctly identifying HTML fragments leads to the content sniffing logic which can be described by the following diagram:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2aEUXB2iTqRDZIlMxwBZKQ/12b43668197625afce880c56603a7516/image4-1.png" />
            
            </figure><p>This is a good example of divergence between formal specifications and reality.</p>
    <div>
      <h3>Tag name comparison optimisation</h3>
      <a href="#tag-name-comparison-optimisation">
        
      </a>
    </div>
    <p>But just having fast parsing is not enough - we have functionality that consumes the output of the parser, rewrites it and feeds it back for the serialization. And all the memory and time constraints that we have for the parser are applicable for this code as well, as it is a part of the same content processing pipeline.</p><p>It’s a common requirement to compare parsed HTML tag names, e.g. to determine if the current tag should be rewritten or not. A naive implementation will use regular per-byte comparison which can require traversing the whole tag name. We were able to narrow this operation to a single integer comparison instruction in the majority of cases by using specially designed hashing algorithm.</p><p>The tag names of all <a href="https://html.spec.whatwg.org/multipage/semantics.html#semantics">standard HTML elements</a> contain only alphabetical ASCII characters and digits from 1 to 6 (in numbered header tags, i.e.  - ). Comparison of tag names is case-insensitive, so we only need 26 characters to represent alphabetical characters. Using the same basic idea as <a href="https://en.wikipedia.org/wiki/Arithmetic_coding">arithmetic coding</a>, we can represent each of the possible 32 characters of a tag name using just 5 bits and, thus, fit up to <i>floor(64 / 5) = 12</i> characters in a 64-bit integer which is enough for all the standard tag names and any other tag names that satisfy the same requirements! The great part is that we don’t even need to additionally traverse a tag name to hash it - we can do that as we parse the tag name consuming the input byte by byte.</p><p>However, there is one problem with this hashing algorithm and the culprit is not so obvious: to fit all 32 characters in 5 bits we need to use all possible bit combinations including 00000. This means that if the leading character of the tag name is represented with 00000 then we will not be able to differentiate between a varying number of consequent repetitions of this character.</p><p>For example, considering that ‘a’ is encoded as 00000 and ‘b’ as 00001 :</p><table>
    <tr>
    <th>Tag name</th>
    <th>Bit representation</th>
    <th>Encoded value</th>
    </tr>
    <tr>
        <td>ab</td>
        <td>00000 00001</td>
        <td>1</td>
    </tr>
    <tr>
        <td>aab</td>
        <td>00000 00000 00001</td>
        <td>1</td>
    </tr>
</table><p>Luckily, we know that HTML grammar <a href="https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state">doesn’t allow</a> the first character of a tag name to be anything except an ASCII alphabetical character, so reserving numbers from 0 to 5 (00000b-00101b) for digits and numbers from 6 to 31 (00110b - 11111b) for ASCII alphabetical characters solves the problem.</p>
    <div>
      <h3>LazyHTML</h3>
      <a href="#lazyhtml">
        
      </a>
    </div>
    <p>After taking everything mentioned above into consideration the LazyHTML (<a href="https://github.com/cloudflare/lazyhtml">https://github.com/cloudflare/lazyhtml</a>) library was created. It is a fast streaming HTML parser and serializer with a token based C-API derived from the HTML5 lexer written in Ragel. It provides a pluggable transformation pipeline to allow multiple transformation handlers to be chained together.</p><p>An example of a function that transforms `href` property of links:</p>
            <pre><code>// define static string to be used for replacements
static const lhtml_string_t REPLACEMENT = {
   .data = "[REPLACED]",
   .length = sizeof("[REPLACED]") - 1
};

static void token_handler(lhtml_token_t *token, void *extra /* this can be your state */) {
  if (token-&gt;type == LHTML_TOKEN_START_TAG) { // we're interested only in start tags
    const lhtml_token_starttag_t *tag = &amp;token-&gt;start_tag;
    if (tag-&gt;type == LHTML_TAG_A) { // check whether tag is of type &lt;a&gt;
      const size_t n_attrs = tag-&gt;attributes.count;
      const lhtml_attribute_t *attrs = tag-&gt;attributes.items;
      for (size_t i = 0; i &lt; n_attrs; i++) { // iterate over attributes
        const lhtml_attribute_t *attr = &amp;attrs[i];
        if (lhtml_name_equals(attr-&gt;name, "href")) { // match the attribute name
          attr-&gt;value = REPLACEMENT; // set the attribute value
        }
      }
    }
  }
  lhtml_emit(token, extra); // pass transformed token(s) to next handler(s)
}</code></pre>
            
    <div>
      <h3>So, is it correct and how fast is it?</h3>
      <a href="#so-is-it-correct-and-how-fast-is-it">
        
      </a>
    </div>
    <p>It is HTML5 compliant as tested against the official test suites. As part of the work several contributions were sent to the specification itself for clarification / simplification of the spec language.</p><p>Unlike the previous parser(s), it didn't bail out on any of the 2,382,625 documents from HTTP Archive, although 0.2% of documents exceeded expected bufferization limits as they were in fact JavaScript or RSS or other types of content incorrectly served with Content-Type: text/html, and since anything is valid HTML5, the parser tried to parse e.g. a&lt;b; x=3; y=4 as incomplete tag with attributes. This is very rare (and goes to even lower amount of 0.03% when two error-prone advertisement networks are excluded from those results), but still needs to be accounted for and is a valid case for bailing out.</p><p>As for the benchmarks, In September 2016 using an example which transforms the HTML spec itself (7.9 MB HTML file) by replacing every  (only that property only in those tags) to a static value. It was compared against the few existing and popular HTML parsers (only tokenization mode was used for the fair comparison, so that they don't need to build AST and so on), and timings in milliseconds for 100 iterations are the following (lazy mode means that we're using raw strings whenever possible, the other one serializes each token just for comparison):</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7BUi5lG0xlZjW0jjZuSL6S/68b1cccd68c8fe5fec844de53e80dbee/image5-1.png" />
            
            </figure><p>The results show that LazyHTML parser speeds are around an order of magnitude faster.</p><p>That concludes the first post in our series on HTML rewriters at Cloudflare. The next post describes how we built a new streaming rewriter on top of the ideas of LazyHTML. The major update was to provide an easier to use CSS selector API. It provides the back-end for the Cloudflare workers HTMLRewriter JavaScript API.</p> ]]></content:encoded>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Serverless]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workers Sites]]></category>
            <category><![CDATA[JavaScript]]></category>
            <guid isPermaLink="false">6ME5A32CsufvgVgNpOie59</guid>
            <dc:creator>Andrew Galloni</dc:creator>
            <dc:creator>Ingvar Stepanyan</dc:creator>
        </item>
        <item>
            <title><![CDATA[One more thing... new Speed Page]]></title>
            <link>https://blog.cloudflare.com/new-speed-page/</link>
            <pubDate>Mon, 20 May 2019 13:00:00 GMT</pubDate>
            <description><![CDATA[ With the Speed Page redesign, we are emphasizing the performance benefits of using Cloudflare and the additional improvements possible from our features. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/XqOoxFvK0VJ3KVmpQ12uk/12abe99caca7f6677cde4f88b1c38748/new-speed-page_3x.png" />
            
            </figure><p>Congratulations on making it through Speed Week. In the last week, Cloudflare has: described how <a href="/argo-and-the-cloudflare-global-private-backbone">our global network</a> speeds up the Internet, launched a <a href="/better-http-2-prioritization-for-a-faster-web">HTTP/2 prioritisation model</a> that will improve web experiences on all browsers, launched an <a href="/announcing-cloudflare-image-resizing-simplifying-optimal-image-delivery">image resizing service</a> which will deliver the optimal image to every device, <a href="/introducing-concurrent-streaming-acceleration">optimized live video delivery</a>, detailed how to <a href="/parallel-streaming-of-progressive-images">stream progressive images</a> so that they render twice as fast - using the flexibility of our new HTTP/2 prioritisation model and finally, prototyped a new <a href="/binary-ast">over-the-wire format for JavaScript</a> that could improve application start-up performance especially on mobile devices. As a bonus, we’re also rolling out one more new feature: “TCP Turbo” automatically chooses the TCP settings to further accelerate your website.</p><p>As a company, we want to help every one of our customers improve web experiences. The growth of Cloudflare, along with the increase in features, has often made simple questions difficult to answer:</p><ul><li><p>How fast is my website?</p></li><li><p>How should I be thinking about performance features?</p></li><li><p>How much faster would the site be if I were to enable a particular feature?</p></li></ul><p>This post will describe the exciting changes we have made to the Speed Page on the Cloudflare dashboard to give our customers a much clearer understanding of how their websites are performing and how they can be made even faster. The new Speed Page consists of :</p><ul><li><p>A visual comparison of your website loading on Cloudflare, with caching enabled, compared to connecting directly to the origin.</p></li><li><p>The measured improvement expected if any performance feature is enabled.</p></li><li><p>A report describing how fast your website is on desktop and mobile.</p></li></ul><p>We want to simplify the complexity of making web experiences fast and give our customers control.  Take a look - We hope you like it.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7HG9zZiFbLdNOXolin2M33/a463e5a1537c803fe5467ad2808a859e/SpeedPage--1-.png" />
            
            </figure>
    <div>
      <h3>Why do fast web experiences matter?</h3>
      <a href="#why-do-fast-web-experiences-matter">
        
      </a>
    </div>
    <p><b>Customer experience :</b> No one likes slow service. Imagine if you go to a restaurant and the service is slow, especially when you arrive; you are not likely to go back or recommend it to your friends. It turns out the web works in the same way and Internet customers are even more demanding. As many as <a href="https://blog.hubspot.com/marketing/page-load-time-conversion-rates">79% of customers</a> who are “dissatisfied” with a website’s performance are less likely to buy from that site again.</p><p><b>Engagement and Revenue :</b> There are many studies explaining how speed affects customer engagement, bounce rates and revenue.</p><p><b>Reputation :</b> There is also brand reputation to consider as customers associate an online experience to the brand. A study found that for <a href="https://www.singlestoneconsulting.com/articles/patience-is-a-virtue">66%</a> of the sample website performance influences their impression of the company.</p><p><b>Diversity :</b> Mobile traffic has grown to be larger than its desktop counterpart over the last few years. Mobile customers' expectations have become increasingly demanding and expect seamless Internet access regardless of location.</p><p>Mobile provides a new set of challenges that includes the diversity of device specifications. When testing, be aware that the average mobile device is significantly less capable than the top-of-the-range models. For example, there can be orders-of-magnitude disparity in the time different mobile devices take to run JavaScript. Another challenge is the variance in mobile performance, as customers move from a strong, high quality office network to mobile networks of different speeds (3G/5G), and quality within the same browsing session.</p>
    <div>
      <h3>New Speed Page</h3>
      <a href="#new-speed-page">
        
      </a>
    </div>
    <p>There is compelling evidence that a faster web experience is important for anyone online. Most of the major studies involve the largest tech companies, who have whole teams dedicated to measuring and improving web experiences for their own services. At Cloudflare we are on a mission to help build a better and faster Internet for everyone - not just the selected few.</p><p>Delivering fast web experiences is not a simple matter. That much is clear.To know what to send and when requires a deep understanding of every layer of the stack, from TCP tuning, protocol level prioritisation, content delivery formats through to the intricate mechanics of browser rendering.  You will also need a global network that strives to be within 10 ms of every Internet user. The intrinsic value of such a network, should be clear to everyone. Cloudflare <i>has</i> this network, but it also offers many additional performance features.</p><p>With the Speed Page redesign, we are emphasizing the performance benefits of using Cloudflare and the additional improvements possible from our features.</p><p>The de facto standard for measuring website performance has been <a href="https://www.webpagetest.org/">WebPageTest</a>. Having its creator in-house at Cloudflare encouraged us to use it as the basis for website performance measurement. So, what is the easiest way to understand how a web page loads? A list of statistics do not paint a full picture of actual user experience. One of the cool features of WebPageTest is that it can generate a filmstrip of screen snapshots taken during a web page load, enabling us to quantify how a page loads, visually. This view makes it significantly easier to determine how long the page is blank for, and how long it takes for the most important content to render. Being able to look at the results in this way, provides the ability to empathise with the user.</p>
    <div>
      <h3>How fast on Cloudflare ?</h3>
      <a href="#how-fast-on-cloudflare">
        
      </a>
    </div>
    <p>After moving your website to Cloudflare, you may have asked: How fast did this decision make my website? Well, now we provide the answer:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ul30Mk0yn4elSNrAodrkg/2428b0205cf0c82f2cc812f04fad3638/OnlyFilmstrip-1.png" />
            
            </figure><p>Comparison of website performance using Cloudflare. </p><p>As well as the increase in speed, we provide filmstrips of before and after, so that it is easy to compare and understand how differently a user will experience the website. If our tests are unable to reach your origin and you are already setup on Cloudflare, we will test with development mode enabled, which disables caching and minification.</p><p><b>Site performance statistics</b></p><p>How can we measure the user experience of a website?</p><p>Traditionally, <i>page load</i> was the important metric. Page load is a technical measurement used by browser vendors that has no bearing on the presentation or usability of a page. The metric reports on how long it takes not only to load the important content but also all of the 3rd party content (social network widgets, advertising, tracking scripts etc.). A user may very well not see anything until after all the page content has loaded, or they may be able to interact with a page immediately, while content continues to load.</p><p>A user will not decide whether a page is fast by a single measure or moment. A user will perceive how fast a website is from a combination of factors:</p><ul><li><p>when they <i>see</i> any response</p></li><li><p>when they see the <i>content</i> they expect</p></li><li><p>when they can <i>interact</i> with the page</p></li><li><p>when they can <i>perform</i> the task they intended</p></li></ul><p>Experience has shown that if you focus on one measure, it will likely be to the detriment of the others.</p><p><b>Importance of Visual response</b></p><p>If an impatient user navigates to your site and sees no content for several seconds or no valuable content, they are likely to get frustrated and leave. The <a href="https://w3c.github.io/paint-timing/">paint timing spec</a> defines a set of <a href="https://docs.google.com/document/d/1BR94tJdZLsin5poeet0XoTW60M0SjvOJQttKT-JK8HI/view">paint metrics</a>, when content appears on a page, to measure the key moments in how a user perceives performance.</p><p><b>First Contentful Paint</b> (FCP) is the time when the browser first renders any DOM content.</p><p><b>First Meaningful Paint</b> (FMP) is the point in time when the page’s “primary” content appears on the screen. This metric should relate to what the user has come to the site to see and is designed as the point in time when the largest visible layout change happens.</p><p><b>Speed Index</b> attempts to quantify the value of the filmstrip rather than using a single paint timing. The <a href="https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-index">speed index</a> measures the rate at which content is displayed - essentially the area above the curve. In the chart below from our progressive image feature you can see reaching 80% happens much earlier for the parallelized (red) load rather than the regular (blue).</p><img src="http://staging.blog.mrk.cfdata.org/content/images/2019/05/speedindex-1.png" />

    <div>
      <h3>Importance of interactivity</h3>
      <a href="#importance-of-interactivity">
        
      </a>
    </div>
    <p>The same impatient user is now happy that the content they want to see has appeared. They will still become frustrated if they are unable to interact with the site.<b>Time to Interactive</b> is the time it takes for content to be rendered and the page is ready to receive input from the user. Technically this is defined as when the browser’s main processing thread has been idle for several seconds after first meaningful paint.</p><p>The Speed Tab displays these key metrics for mobile and desktop.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/50EYtQgFSjMevEDSsIk4CG/c327d48e0b7b24fe8d650de11fd88834/CriticalLoadingTime-5.png" />
            
            </figure>
    <div>
      <h3>How much faster on Cloudflare ?</h3>
      <a href="#how-much-faster-on-cloudflare">
        
      </a>
    </div>
    <p>The Cloudflare Dashboard provides a list of performance features which can, admittedly, be both confusing and daunting. What would be the benefit of turning on Rocket Loader and on which performance metrics will it have the most impact ? If you upgrade to Pro what will be the value of the enhanced HTTP/2 prioritisation ? The optimization section answers these questions.</p><p>Tests are run with each performance feature turned on and off. The values for the tests for the appropriate performance metrics are displayed, along with the improvement. You can enable or upgrade the feature from this view. Here are a few examples :</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6LkuEkCNY8bloX9JvJEAOS/1c067dbce64927affb3e4420044dd4af/RocketLoader.png" />
            
            </figure><p>If Rocket Loader were enabled for this website, the render-blocking JavaScript would be deferred causing first paint time to drop from 1.25s to 0.81s - an improvement of 32% on desktop.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/kyJrVQRTpGxQb3HuFBIJn/82475df75c20123ec9d81a7ed2631c3f/Mirage.png" />
            
            </figure><p>Image heavy sites do not perform well on slow mobile connections. If you enable Mirage, your customers on 3G connections would see meaningful content 1s sooner - an improvement of 29.4%.</p><p>So how about our new features?</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5erNk1iTWG0gx7vm8dwITX/288529be98b01aa378dc4489a8c690fd/H2.png" />
            
            </figure><p>We tested the enhanced HTTP/2 prioritisation feature on an Edge browser on desktop and saw meaningful content display 2s sooner - an improvement of 64%.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5DSkRUoK50wHoHsUtWHcc1/32099750c650930c1d9c89e7a40033bb/ProImgStreaming.png" />
            
            </figure><p>This is a more interesting result taken from the blog example used to illustrate the progressive image streaming. At first glance the improvement of 29% in speed index is good. The filmstrip comparison shows a more significant difference. In this case the page with no images shown is already 43% visually complete for both scenarios after 1.5s. At 2.5s the difference is 77% compared to 50%.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2tf37dOPAwHL1lKPQNkxVq/6fd8cc88184349c12f57bec4adeba5e0/filmstrip.png" />
            
            </figure><p>This is a great example of how metrics do not tell the full story. They cannot completely replace viewing the page loading flow and understanding what is important for your site.</p>
    <div>
      <h3>How to try</h3>
      <a href="#how-to-try">
        
      </a>
    </div>
    <p>This is our first iteration of the new Speed Page and we are eager to get your feedback. We will be rolling this out to beta customers who are interested in seeing how their sites perform. To be added to the queue for activation of the new Speed Page please click on the banner on the overview page,</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1eHf4V1lYUprNu6EuYGVLc/fa2b5e64ed05508e9139cdd37e180596/BannerOverview.png" />
            
            </figure><p>or click on the banner on the existing Speed Page.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5sJqPyA2OJDHOkxRDiIwvB/de77d2e82ab87960f06c3385e3accda5/BannerSpeed.png" />
            
            </figure> ]]></content:encoded>
            <category><![CDATA[Speed Week]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[HTTP2]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Dashboard]]></category>
            <guid isPermaLink="false">24ufNjE5kwbiozkOO2zlWM</guid>
            <dc:creator>Andrew Galloni</dc:creator>
        </item>
        <item>
            <title><![CDATA[Parallel streaming of progressive images]]></title>
            <link>https://blog.cloudflare.com/parallel-streaming-of-progressive-images/</link>
            <pubDate>Tue, 14 May 2019 16:00:00 GMT</pubDate>
            <description><![CDATA[ Image-optimized HTTP/2 multiplexing makes all progressive images across the page appear visually complete in half of the time. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/277BtPJSXREVafvPqyX3ZH/60d4273f08458c42a0190203a4bf9b7b/880BAE29-39B3-4733-96DA-735FE76443D5.png" />
            
            </figure><p>Progressive image rendering and HTTP/2 multiplexing technologies have existed for a while, but now we've combined them in a new way that makes them much more powerful. With Cloudflare progressive streaming <b>images appear to load in half of the time, and browsers can start rendering pages sooner</b>.</p>
<p>In HTTP/1.1 connections, servers didn't have any choice about the order in which resources were sent to the client; they had to send responses, as a whole, in the exact order they were requested by the web browser. HTTP/2 improved this by adding multiplexing and prioritization, which allows servers to decide exactly what data is sent and when. We’ve taken advantage of these new HTTP/2 capabilities to improve perceived speed of loading of progressive images by sending the most important fragments of image data sooner.</p>
    <div>
      <h3>What is progressive image rendering?</h3>
      <a href="#what-is-progressive-image-rendering">
        
      </a>
    </div>
    <p>Basic images load strictly from top to bottom. If a browser has received only half of an image file, it can show only the top half of the image. Progressive images have their content arranged not from top to bottom, but from a low level of detail to a high level of detail. Receiving a fraction of image data allows browsers to show the entire image, only with a lower fidelity. As more data arrives, the image becomes clearer and sharper.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/Mlb3C96uo4KLWgNe835l2/31ccbe118c20ce15f4c35c5ceb0cd377/image6.jpg" />
            
            </figure><p>This works great in the JPEG format, where only about 10-15% of the data is needed to display a preview of the image, and at 50% of the data the image looks almost as good as when the whole file is delivered. Progressive JPEG images contain exactly the same data as baseline images, merely reshuffled in a more useful order, so progressive rendering doesn’t add any cost to the file size. This is possible, because JPEG doesn't store the image as pixels. Instead, it represents the image as frequency coefficients, which are like a set of predefined patterns that can be blended together, in any order, to reconstruct the original image. The inner workings of JPEG are really fascinating, and you can learn more about them from my recent <a href="https://www.youtube.com/watch?v=jTXhYj2aCDU">performance.now() conference talk</a>.</p><p>The end result is that the images can look almost fully loaded in half of the time, for free! The page appears to be visually complete and can be used much sooner. The rest of the image data arrives shortly after, upgrading images to their full quality, before visitors have time to notice anything is missing.</p>
    <div>
      <h3>HTTP/2 progressive streaming</h3>
      <a href="#http-2-progressive-streaming">
        
      </a>
    </div>
    <p>But there's a catch. Websites have more than one image (sometimes even hundreds of images). When the server sends image files naïvely, one after another, the progressive rendering doesn’t help that much, because overall the images still load sequentially:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7KLYwVqXQSyp0QjnTTbCX4/b70c3a915dd7afaf7b0beec2aedf9f1d/image5.gif" />
            
            </figure><p>Having complete data for half of the images (and no data for the other half) doesn't look as good as having half of the data for all images.</p><p>And there's another problem: when the browser doesn't know image sizes yet, it lays the page out with placeholders instead, and relays out the page when each image loads. This can make pages jump during loading, which is inelegant, distracting and annoying for the user.</p><p>Our new progressive streaming feature greatly improves the situation: we can send all of the images at once, in parallel. This way the browser gets size information for all of the images as soon as possible, can paint a preview of all images without having to wait for a lot of data, and large images don’t delay loading of styles, scripts and other more important resources.</p><p>This idea of streaming of progressive images in parallel is as old as HTTP/2 itself, but it needs special handling in low-level parts of web servers, and so far this hasn't been implemented at a large scale.</p><p>When we were improving <a href="/better-http-2-prioritization-for-a-faster-web">our HTTP/2 prioritization</a>, we realized it can be also used to implement this feature. Image files as a whole are neither high nor low priority. The priority changes within each file, and dynamic re-prioritization gives us the behavior we want:</p><ul><li><p>The image header that contains the image size is very high priority, because the browser needs to know the size as soon as possible to do page layout. The image header is small, so it doesn't hurt to send it ahead of other data.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7qRxoj4KuoVnSD7nEpWbMe/6bd553099bbb167660c6a65fb782ea7c/image7.jpg" />
            
            </figure></li><li><p>The minimum amount of data in the image required to show a preview of the image has a medium priority (we'd like to plug "holes" left for unloaded images as soon as possible, but also leave some bandwidth available for scripts, fonts and other resources)</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Cm6w0yIFtMJctt42maKMF/f585afb934ef9e1f98ecd4d3750e7b09/secondhalf.png" />
            
            </figure></li><li><p>The remainder of the image data is low priority. Browsers can stream it last to refine image quality once there's no rush, since the page is already fully usable.</p></li></ul><p>Knowing the exact amount of data to send in each phase requires understanding the structure of image files, but it seemed weird to us to make our web server parse image responses and have a format-specific behavior hardcoded at a protocol level. By framing the problem as a dynamic change of priorities, were able to elegantly separate low-level networking code from knowledge of image formats. We can use Workers or offline image processing tools to analyze the images, and instruct our server to change HTTP/2 priorities accordingly.</p><p>The great thing about parallel streaming of images is that it doesn’t add any overhead. We’re still sending the same data, the same amount of data, we’re just sending it in a smarter order. This technique takes advantage of existing web standards, so it’s compatible with all browsers.</p>
    <div>
      <h3>The waterfall</h3>
      <a href="#the-waterfall">
        
      </a>
    </div>
    <p>Here are waterfall charts from <a href="https://webpagetest.org">WebPageTest</a> showing comparison of regular HTTP/2 responses and progressive streaming. In both cases the files were exactly the same, the amount of data transferred was the same, and the overall page loading time was the same (within measurement noise). In the charts, blue segments show when data was transferred, and green shows when each request was idle.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4gWR5w8eEc3VASAU81KwBS/986a63df2f51f9f88237bc1bc74d5273/image8.png" />
            
            </figure><p>The first chart shows a typical server behavior that makes images load mostly sequentially. The chart itself looks neat, but the actual experience of loading that page was not great — the last image didn't start loading until almost the end.</p><p>The second chart shows images loaded in parallel. The blue vertical streaks throughout the chart are image headers sent early followed by a couple of stages of progressive rendering. You can see that useful data arrived sooner for all of the images. You may notice that one of the images has been sent in one chunk, rather than split like all the others. That’s because at the very beginning of a TCP/IP connection we don't know the true speed of the connection yet, and we have to sacrifice some opportunity to do prioritization in order to maximize the connection speed.</p>
    <div>
      <h3>The metrics compared to other solutions</h3>
      <a href="#the-metrics-compared-to-other-solutions">
        
      </a>
    </div>
    <p>There are other techniques intended to provide image previews quickly, such as low-quality image placeholder (LQIP), but they have several drawbacks. They add unnecessary data for the placeholders, and usually interfere with browsers' preload scanner, and delay loading of full-quality images due to dependence on JavaScript needed to upgrade the previews to full images.</p><ul><li><p>Our solution doesn't cause any additional requests, and doesn't add any extra data. Overall page load time is not delayed.</p></li><li><p>Our solution doesn't require any JavaScript. It takes advantage of functionality supported natively in the browsers.</p></li><li><p>Our solution doesn't require any changes to page's markup, so it's very safe and easy to deploy site-wide.</p></li></ul><p>The improvement in user experience is reflected in performance metrics such as <b>SpeedIndex</b> metric and and time to visually complete. Notice that with regular image loading the visual progress is linear, but with the progressive streaming it quickly jumps to mostly complete:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1lKfDWRojqzIpdPg7MwFGj/70c9f7df2546b5ea50d69e4bbafde7cc/image1-5.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/n9HPq3RQ8Ew4CTxAf8NP4/7ed0c32f1592de89fd2f3a95b18d74e4/image4.png" />
            
            </figure>
    <div>
      <h3>Getting the most out of progressive rendering</h3>
      <a href="#getting-the-most-out-of-progressive-rendering">
        
      </a>
    </div>
    <p>Avoid ruining the effect with JavaScript. Scripts that hide images and wait until the <code>onload</code> event to reveal them (with a fade in, etc.) will defeat progressive rendering. Progressive rendering works best with the good old <code>&lt;img&gt;</code> element.</p>
    <div>
      <h3>Is it JPEG-only?</h3>
      <a href="#is-it-jpeg-only">
        
      </a>
    </div>
    <p>Our implementation is format-independent, but progressive streaming is useful only for certain file types. For example, it wouldn't make sense to apply it to scripts or stylesheets: these resources are rendered as all-or-nothing.</p><p>Prioritizing of image headers (containing image size) works for all file formats.</p><p>The benefits of progressive rendering are unique to JPEG (supported in all browsers) and JPEG 2000 (supported in Safari). GIF and PNG have interlaced modes, but these modes come at a cost of worse compression. WebP doesn't even support progressive rendering at all. This creates a dilemma: WebP is usually 20%-30% smaller than a JPEG of equivalent quality, but progressive JPEG <i>appears</i> to load 50% faster. There are next-generation image formats that support progressive rendering better than JPEG, and compress better than WebP, but they're not supported in web browsers yet. In the meantime you can choose between the bandwidth savings of WebP or the better perceived performance of progressive JPEG by changing Polish settings in your Cloudflare dashboard.</p>
    <div>
      <h3>Custom header for experimentation</h3>
      <a href="#custom-header-for-experimentation">
        
      </a>
    </div>
    <p>We also support a custom HTTP header that allows you to experiment with, and optimize streaming of other resources on your site. For example, you could make our servers send the first frame of animated GIFs with high priority and deprioritize the rest. Or you could prioritize loading of resources mentioned in <code>&lt;head&gt;</code> of HTML documents before <code>&lt;body&gt;</code> is loaded.</p><p>The custom header can be set only from a Worker. The syntax is a comma-separated list of file positions with priority and concurrency. The priority and concurrency is the same as in the whole-file cf-priority header described in the previous blog post.</p>
            <pre><code>cf-priority-change: &lt;offset in bytes&gt;:&lt;priority&gt;/&lt;concurrency&gt;, ...</code></pre>
            <p>For example, for a progressive JPEG we use something like (this is a fragment of JS to use in a Worker):</p>
            <pre><code>let headers = new Headers(response.headers);
headers.set("cf-priority", "30/0");
headers.set("cf-priority-change", "512:20/1, 15000:10/n");
return new Response(response.body, {headers});</code></pre>
            <p>Which instructs the server to use priority 30 initially, while it sends the first 512 bytes. Then switch to priority 20 with some concurrency (<code>/1</code>), and finally after sending 15000 bytes of the file, switch to low priority and high concurrency (<code>/n</code>) to deliver the rest of the file.</p><p>We’ll try to split HTTP/2 frames to match the offsets specified in the header to change the sending priority as soon as possible. However, priorities don’t guarantee that data of different streams will be multiplexed exactly as instructed, since the server can prioritize only when it has data of multiple streams waiting to be sent at the same time. If some of the responses arrive much sooner from the upstream server or the cache, the server may send them right away, without waiting for other responses.</p>
    <div>
      <h3>Try it!</h3>
      <a href="#try-it">
        
      </a>
    </div>
    <p>Enable our <a href="/better-http-2-prioritization-for-a-faster-web"><i>Enhanced HTTP/2 Prioritization</i></a> feature, and JPEG images optimized by <a href="/introducing-polish-automatic-image-optimizati/"><i>Polish</i></a> or <a href="/announcing-cloudflare-image-resizing-simplifying-optimal-image-delivery/"><i>Image Resizing</i></a> will be streamed automatically.</p> ]]></content:encoded>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[Speed Week]]></category>
            <category><![CDATA[Optimization]]></category>
            <guid isPermaLink="false">75xuDiShO45CbDQuf5hkLY</guid>
            <dc:creator>Andrew Galloni</dc:creator>
            <dc:creator>Kornel Lesiński</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building fast interpreters in Rust]]></title>
            <link>https://blog.cloudflare.com/building-fast-interpreters-in-rust/</link>
            <pubDate>Mon, 04 Mar 2019 16:00:00 GMT</pubDate>
            <description><![CDATA[ In the previous post we described the Firewall Rules architecture and how the different components are integrated together. We created a configurable Rust library for writing and executing Wireshark®-like filters in different parts of our stack written in Go, Lua, C, C++ and JavaScript Workers. ]]></description>
            <content:encoded><![CDATA[ <p>In the <a href="/how-we-made-firewall-rules/">previous post</a> we described the Firewall Rules architecture and how the different components are integrated together. We also mentioned that we created a configurable Rust library for writing and executing <a href="https://www.wireshark.org/">Wireshark</a>®-like filters in different parts of our stack written in Go, Lua, C, C++ and JavaScript Workers.</p><blockquote><p>With a mixed set of requirements of performance, memory safety, low memory use, and the capability to be part of other products that we’re working on like Spectrum, Rust stood out as the strongest option.</p></blockquote>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3emjeRzzAw9z6ipj1FIjoD/fbfb5538cf10d6a5f0c096676dabfa63/Langs.png" />
            
            </figure><p>We have now open-sourced this library under our Github account: <a href="https://github.com/cloudflare/wirefilter">https://github.com/cloudflare/wirefilter</a>. This post will dive into its design, explain why we didn’t use a parser generator and how our execution engine balances security, runtime performance and compilation cost for the generated filters.</p>
    <div>
      <h3>Parsing Wireshark syntax</h3>
      <a href="#parsing-wireshark-syntax">
        
      </a>
    </div>
    <p>When building a custom Domain Specific Language (DSL), the first thing we need to be able to do is parse it. This should result in an intermediate representation (usually called an Abstract Syntax Tree) that can be inspected, traversed, analysed and, potentially, serialised.</p><p>There are different ways to perform such conversion, such as:</p><ol><li><p>Manual char-by-char parsing using state machines, regular expression and/or native string APIs.</p></li><li><p>Parser combinators, which use higher-level functions to combine different parsers together (in Rust-land these are represented by <a href="https://github.com/Geal/nom">nom</a>, <a href="https://github.com/m4rw3r/chomp">chomp</a>, <a href="https://github.com/Marwes/combine">combine</a> and <a href="https://crates.io/keywords/parser-combinators">others</a>).</p></li><li><p>Fully automated generators which, provided with a grammar, can generate a fully working parser for you (examples are <a href="https://github.com/kevinmehall/rust-peg">peg</a>, <a href="https://github.com/pest-parser/pest">pest</a>, <a href="https://github.com/lalrpop/lalrpop">LALRPOP</a>, etc.).</p></li></ol>
    <div>
      <h4>Wireshark syntax</h4>
      <a href="#wireshark-syntax">
        
      </a>
    </div>
    <p>But before trying to figure out which approach would work best for us, let’s take a look at some of the simple <a href="https://wiki.wireshark.org/DisplayFilters">official Wireshark examples</a>, to understand what we’re dealing with:</p><ul><li><p><code>ip.len le 1500</code></p></li><li><p><code>udp contains 81:60:03</code></p></li><li><p><code>sip.To contains "a1762"</code></p></li><li><p><code>http.request.uri matches "gl=se$"</code></p></li><li><p><code>eth.dst == ff:ff:ff:ff:ff:ff</code></p></li><li><p><code>ip.addr == 192.168.0.1</code></p></li><li><p><code>ipv6.addr == ::1</code></p></li></ul><p>You can see that the right hand side of a comparison can be a number, an IPv4 / IPv6 address, a set of bytes or a string. They are used interchangeably, without any special notion of a type, which is fine given that they are easily distinguishable… or are they?</p><p>Let’s take a look at some <a href="https://en.wikipedia.org/wiki/IPv6#Address_representation">IPv6 forms</a> on Wikipedia:</p><ul><li><p><code>2001:0db8:0000:0000:0000:ff00:0042:8329</code></p></li><li><p><code>2001:db8:0:0:0:ff00:42:8329</code></p></li><li><p><code>2001:db8::ff00:42:8329</code></p></li></ul><p>So IPv6 can be written as a set of up to 8 colon-separated hexadecimal numbers, each containing up to 4 digits with leading zeros omitted for convenience. This appears suspiciously similar to the syntax for byte sequences. Indeed, if we try writing out a sequence like <code>2f:31:32:33:34:35:36:37</code>, it’s simultaneously a valid IPv6 and a byte sequence in terms of Wireshark syntax.</p><p>There is no way of telling what this sequence actually represents without looking at the type of the field it’s being compared with, and if you try using this sequence in Wireshark, you’ll notice that it does just that:</p><ul><li><p><code>ipv6.addr == 2f:31:32:33:34:35:36:37</code>: right hand side is parsed and used as an IPv6 address</p></li><li><p><code>http.request.uri == 2f:31:32:33:34:35:36:37</code>: right hand side is parsed and used as a byte sequence (will match a URL <code>"/1234567"</code>)</p></li></ul><p>Are there other examples of such ambiguities? Yup - for example, we can try using a single number with two decimal digits:</p><ul><li><p><code>tcp.port == 80</code>: matches any traffic on the port 80 (HTTP)</p></li><li><p><code>http.file_data == 80</code>: matches any HTTP request/response with body containing a single byte (0x80)</p></li></ul><p>We could also do the same with ethernet address, defined as a separate type in Wireshark, but, for simplicity, we represent it as a regular byte sequence in our implementation, so there is no ambiguity here.</p>
    <div>
      <h4>Choosing a parsing approach</h4>
      <a href="#choosing-a-parsing-approach">
        
      </a>
    </div>
    <p>This is an interesting syntax design decision. It means that we need to store a mapping between field names and types ahead of time - a Scheme, as we call it - and use it for contextual parsing. This restriction also immediately rules out many if not most parser generators.</p><p>We could still use one of the more sophisticated ones (like LALRPOP) that allow replacing the default regex-based lexer with your own custom code, but at that point we’re so close to having a full parser for our DSL that the complexity outweighs any benefits of using a black-box parser generator.</p><p>Instead, we went with a manual parsing approach. While (for a good reason) this might sound scary in unsafe languages like C / C++, in Rust all strings are bounds checked by default. Rust also provides a rich string manipulation API, which we can use to build more complex helpers, eventually ending up with a full parser.</p><p>This approach is, in fact, pretty similar to parser combinators in that the parser doesn’t have to keep state and only passes the unprocessed part of the input down to smaller, narrower scoped functions. Just as in parser combinators, the absence of mutable state also allows to easily test and maintain each of the parsers for different parts of the syntax independently of the others.</p><p>Compared with popular parser combinator libraries in Rust, one of the differences is that our parsers are not standalone functions but rather types that implement common traits:</p>
            <pre><code>pub trait Lex&lt;'i&gt;: Sized {
   fn lex(input: &amp;'i str) -&gt; LexResult&lt;'i, Self&gt;;
}
pub trait LexWith&lt;'i, E&gt;: Sized {
   fn lex_with(input: &amp;'i str, extra: E) -&gt; LexResult&lt;'i, Self&gt;;
}</code></pre>
            <p>The <code>lex</code> method or its contextual variant <code>lex_with</code> can either return a successful pair of <code>(instance of the type, rest of input)</code> or a pair of <code>(error kind, relevant input span)</code>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5L9MIL21iug4jVm8Eo1bGM/a4996c058b046ea785ff40d315772c53/parse.png" />
            
            </figure><p>The <code>Lex</code> trait is used for target types that can be parsed independently of the context (like field names or literals), while <code>LexWith</code> is used for types that need a <code>Scheme</code> or a part of it to be parsed unambiguously.</p><p>A bigger difference is that, instead of relying on higher-level functions for parser combinators, we use the usual imperative function call syntax. For example, when we want to perform sequential parsing, all we do is call several parsers in a row, using tuple destructuring for intermediate results:</p>
            <pre><code>let input = skip_space(input);
let (op, input) = CombinedExpr::lex_with(input, scheme)?;
let input = skip_space(input);
let input = expect(input, ")")?;</code></pre>
            <p>And, when we want to try different alternatives, we can use native pattern matching and ignore the errors:</p>
            <pre><code>if let Ok(input) = expect(input, "(") {
   ...
   (SimpleExpr::Parenthesized(Box::new(op)), input)
} else if let Ok((op, input)) = UnaryOp::lex(input) {
   ...
} else {
   ...
}</code></pre>
            <p>Finally, when we want to automate parsing of some more complicated common cases - say, enums - Rust provides a powerful macro syntax:</p>
            <pre><code>lex_enum!(#[repr(u8)] OrderingOp {
   "eq" | "==" =&gt; Equal = EQUAL,
   "ne" | "!=" =&gt; NotEqual = LESS | GREATER,
   "ge" | "&gt;=" =&gt; GreaterThanEqual = GREATER | EQUAL,
   "le" | "&lt;=" =&gt; LessThanEqual = LESS | EQUAL,
   "gt" | "&gt;" =&gt; GreaterThan = GREATER,
   "lt" | "&lt;" =&gt; LessThan = LESS,
});</code></pre>
            <p>This gives an experience similar to parser generators, while still using native language syntax and keeping us in control of all the implementation details.</p>
    <div>
      <h3>Execution engine</h3>
      <a href="#execution-engine">
        
      </a>
    </div>
    <p>Because our grammar and operations are fairly simple, initially we used direct AST interpretation by requiring all nodes to implement a trait that includes an <code>execute</code> method.</p>
            <pre><code>trait Expr&lt;'s&gt; {
    fn execute(&amp;self, ctx: &amp;ExecutionContext&lt;'s&gt;) -&gt; bool;
}</code></pre>
            <p>The <code>ExecutionContext</code> is pretty similar to a <code>Scheme</code>, but instead of mapping arbitrary field names to their types, it maps them to the runtime input values provided by the caller.</p><p>As with <code>Scheme</code>, initially <code>ExecutionContext</code> used an internal <code>HashMap</code> for registering these arbitrary <code>String</code> -&gt; <code>RhsValue</code> mappings. During the <code>execute</code> call, the AST implementation would evaluate itself recursively, and look up each field reference in this map, either returning a value or raising an error on missing slots and type mismatches.</p><p>This worked well enough for an initial implementation, but using a <code>HashMap</code> has a non-trivial cost which we would like to eliminate. We already used a more efficient hasher - <code>[Fnv](https://github.com/servo/rust-fnv)</code> - because we are in control of all keys and so are not worried about hash DoS attacks, but there was still more we could do.</p>
    <div>
      <h4>Speeding up field access</h4>
      <a href="#speeding-up-field-access">
        
      </a>
    </div>
    <p>If we look at the data structures involved, we can see that the scheme is always well-defined in advance, and all our runtime values in the execution engine are expected to eventually match it, even if the order or a precise set of fields is not guaranteed:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/mMOLvyXxOj9FxO3dIbYwr/6b308db1a7860c67f52209a689226b56/fieldaccess.png" />
            
            </figure><p>So what if we ditch the second map altogether and instead use a fixed-size array of values? Array indexing should be much cheaper than looking up in a map, so it might be well worth the effort.</p><p>How can we do it? We already know the number of items (thanks to the predefined scheme) so we can use that for the size of the backing storage, and, in order to simulate <code>HashMap</code> “holes” for unset values, we can wrap each item an <code>Option&lt;...&gt;</code>:</p>
            <pre><code>pub struct ExecutionContext&lt;'e&gt; {
    scheme: &amp;'e Scheme,
    values: Box&lt;[Option&lt;LhsValue&lt;'e&gt;&gt;]&gt;,
}</code></pre>
            <p>The only missing piece is an index that could map both structures to each other. As you might remember, <code>Scheme</code> still uses a <code>HashMap</code> for field registration, and a <code>HashMap</code> is normally expected to be randomised and indexed only by the predefined key.</p><p>While we could wrap a value and an auto-incrementing index together into a custom struct, there is already a better solution: <code>[IndexMap](https://github.com/bluss/indexmap)</code>. <code>IndexMap</code> is a drop-in replacement for a <code>HashMap</code> that preserves ordering and provides a way to get an index of any element and vice versa - exactly what we needed.</p><p>After replacing a <code>HashMap</code> in the Scheme with <code>IndexMap</code>, we can change parsing to resolve all the parsed field names to their indices in-place and store that in the AST:</p>
            <pre><code>impl&lt;'i, 's&gt; LexWith&lt;'i, &amp;'s Scheme&gt; for Field&lt;'s&gt; {
   fn lex_with(mut input: &amp;'i str, scheme: &amp;'s Scheme) -&gt; LexResult&lt;'i, Self&gt; {
       ...
       let field = scheme
           .get_field_index(name)
           .map_err(|err| (LexErrorKind::UnknownField(err), name))?;
       Ok((field, input))
   }
}</code></pre>
            <p>After that, in the <code>ExecutionContext</code> we allocate a fixed-size array and use these indices for resolving values during runtime:</p>
            <pre><code>impl&lt;'e&gt; ExecutionContext&lt;'e&gt; {
   /// Creates an execution context associated with a given scheme.
   ///
   /// This scheme will be used for resolving any field names and indices.
   pub fn new&lt;'s: 'e&gt;(scheme: &amp;'s Scheme) -&gt; Self {
       ExecutionContext {
           scheme,
           values: vec![None; scheme.get_field_count()].into(),
       }
   }
   ...
}</code></pre>
            <p>This gave significant (~2x) speed ups on our standard benchmarks:</p><p><i>Before:</i></p>
            <pre><code>test matching ... bench:       2,548 ns/iter (+/- 98)
test parsing  ... bench:     192,037 ns/iter (+/- 21,538)</code></pre>
            <p><i>After**:**</i></p>
            <pre><code>test matching ... bench:       1,227 ns/iter (+/- 29)
test parsing  ... bench:     197,574 ns/iter (+/- 16,568)</code></pre>
            <p>This change also improved the usability of our API, as any type errors are now detected and reported much earlier, when the values are just being set on the context, and not delayed until filter execution.</p>
    <div>
      <h4>[not] JIT compilation</h4>
      <a href="#not-jit-compilation">
        
      </a>
    </div>
    <p>Of course, as with any respectable DSL, one of the other ideas we had from the beginning was “...at some point we’ll add native compilation to make everything super-fast, it’s just a matter of time...”.</p><p>In practice, however, native compilation is a complicated matter, but not due to lack of tools.</p><p>First of all, there is question of storage for the native code. We could compile each filter statically into some sort of a library and publish to a key-value store, but that would not be easy to maintain:</p><ul><li><p>We would have to compile each filter to several platforms (x86-64, ARM, WASM, …).</p></li><li><p>The overhead of native library formats would significantly outweigh the useful executable size, as most filters tend to be small.</p></li><li><p>Each time we’d like to change our execution logic, whether to optimise it or to fix a bug, we would have to recompile and republish all the previously stored filters.</p></li><li><p>Finally, even if/though we’re sure of the reliability of the chosen store, executing dynamically retrieved native code on the edge as-is is not something that can be taken lightly.</p></li></ul><p>The usual flexible alternative that addresses most of these issues is Just-in-Time (JIT) compilation.</p><p>When you compile code directly on the target machine, you get to re-verify the input (still expressed as a restricted DSL), you can compile it just for the current platform in-place, and you never need to republish the actual rules.</p><p>Looks like a perfect fit? Not quite. As with any technology, there are tradeoffs, and you only get to choose those that make more sense for your use cases. JIT compilation is no exception.</p><p>First of all, even though you’re not loading untrusted code over the network, you still need to generate it into the memory, mark that memory as executable and trust that it will always contain valid code and not garbage or something worse. Depending on your choice of libraries and complexity of the DSL, you might be willing to trust it or put heavy sandboxing around, but, either way, it’s a risk that one must explicitly be willing to take.</p><p>Another issue is the cost of compilation itself. Usually, when measuring the speed of native code vs interpretation, the cost of compilation is not taken into the account because it happens out of the process.</p><p>With JIT compilers though, it’s different as you’re now compiling things the moment they’re used and cache the native code only for a limited time. Turns out, generating native code can be rather expensive, so you must be absolutely sure that the compilation cost doesn’t offset any benefits you might gain from the native execution speedup.</p><p>I’ve talked a bit more about this at <a href="https://www.meetup.com/rust-atx/">Rust Austin meetup</a> and, I believe, this topic deserves a separate blog post so won’t go into much more details here, but feel free to check out the slides: <a href="https://www.slideshare.net/RReverser/building-fast-interpreters-in-rust">https://www.slideshare.net/RReverser/building-fast-interpreters-in-rust</a>. Oh, and if you’re in Austin, you should pop into our office for the next meetup!</p><p>Let’s get back to our original question: is there anything else we can do to get the best balance between security, runtime performance and compilation cost? Turns out, there is.</p>
    <div>
      <h4>Dynamic dispatch and closures to the rescue</h4>
      <a href="#dynamic-dispatch-and-closures-to-the-rescue">
        
      </a>
    </div>
    <p>Introducing <code>Fn</code> trait!</p><p>In Rust, the <code>Fn</code> trait and friends (<code>FnMut</code>, <code>FnOnce</code>) are automatically implemented on eligible functions and closures. In case of a simple <code>Fn</code> case the restriction is that they must not modify their captured environment and can only borrow from it.</p><p>Normally, you would want to use it in generic contexts to support arbitrary callbacks with given argument and return types. This is important because in Rust, each function and closure implements a unique type and any generic usage would compile down to a specific call just to that function.</p>
            <pre><code>fn just_call(me: impl Fn(), maybe: bool) {
  if maybe {
    me()
  }
}</code></pre>
            <p>Such behaviour (called static dispatch) is the default in Rust and is preferable for performance reasons.</p><p>However, if we don’t know all the possible types at compile-time, Rust allows us to opt-in for a dynamic dispatch instead:</p>
            <pre><code>fn just_call(me: &amp;dyn Fn(), maybe: bool) {
  if maybe {
    me()
  }
}</code></pre>
            <p>Dynamically dispatched objects don't have a statically known size, because it depends on the implementation details of particular type being passed. They need to be passed as a reference or stored in a heap-allocated <code>Box</code>, and then used just like in a generic implementation.</p><p>In our case, this allows to create, return and store arbitrary closures, and later call them as regular functions:</p>
            <pre><code>trait Expr&lt;'s&gt; {
    fn compile(self) -&gt; CompiledExpr&lt;'s&gt;;
}

pub(crate) struct CompiledExpr&lt;'s&gt;(Box&lt;dyn 's + Fn(&amp;ExecutionContext&lt;'s&gt;) -&gt; bool&gt;);

impl&lt;'s&gt; CompiledExpr&lt;'s&gt; {
   /// Creates a compiled expression IR from a generic closure.
   pub(crate) fn new(closure: impl 's + Fn(&amp;ExecutionContext&lt;'s&gt;) -&gt; bool) -&gt; Self {
       CompiledExpr(Box::new(closure))
   }

   /// Executes a filter against a provided context with values.
   pub fn execute(&amp;self, ctx: &amp;ExecutionContext&lt;'s&gt;) -&gt; bool {
       self.0(ctx)
   }
}</code></pre>
            <p>The closure (an <code>Fn</code> box) will also automatically include the environment data it needs for the execution.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7x17xAapCcN3PjapVfoIyh/89ca29faa4b157fc2dcd7af0179eacb6/box.png" />
            
            </figure><p>This means that we can optimise the runtime data representation as part of the “compile” process without changing the AST or the parser. For example, when we wanted to optimise IP range checks by splitting them for different IP types, we could do that without having to modify any existing structures:</p>
            <pre><code>RhsValues::Ip(ranges) =&gt; {
   let mut v4 = Vec::new();
   let mut v6 = Vec::new();
   for range in ranges {
       match range.clone().into() {
           ExplicitIpRange::V4(range) =&gt; v4.push(range),
           ExplicitIpRange::V6(range) =&gt; v6.push(range),
       }
   }
   let v4 = RangeSet::from(v4);
   let v6 = RangeSet::from(v6);
   CompiledExpr::new(move |ctx| {
       match cast!(ctx.get_field_value_unchecked(field), Ip) {
           IpAddr::V4(addr) =&gt; v4.contains(addr),
           IpAddr::V6(addr) =&gt; v6.contains(addr),
       }
   })
}</code></pre>
            <p>Moreover, boxed closures can be part of that captured environment, too. This means that we can convert each simple comparison into a closure, and then combine it with other closures, and keep going until we end up with a single top-level closure that can be invoked as a regular function to evaluate the entire filter expression.</p><p>It’s turtles closures all the way down:</p>
            <pre><code>let items = items
   .into_iter()
   .map(|item| item.compile())
   .collect::&lt;Vec&lt;_&gt;&gt;()
   .into_boxed_slice();

match op {
   CombiningOp::And =&gt; {
       CompiledExpr::new(move |ctx| items.iter().all(|item| item.execute(ctx)))
   }
   CombiningOp::Or =&gt; {
       CompiledExpr::new(move |ctx| items.iter().any(|item| item.execute(ctx)))
   }
   CombiningOp::Xor =&gt; CompiledExpr::new(move |ctx| {
       items
           .iter()
           .fold(false, |acc, item| acc ^ item.execute(ctx))
   }),
}</code></pre>
            <p>What’s nice about this approach is:</p><ul><li><p>Our execution is no longer tied to the AST, and we can be as flexible with optimising the implementation and data representation as we want without affecting the parser-related parts of code or output format.</p></li><li><p>Even though we initially “compile” each node to a single closure, in future we can pretty easily specialise certain combinations of expressions into their own closures and so improve execution speed for common cases. All that would be required is a separate <code>match</code> branch returning a closure optimised for just that case.</p></li><li><p>Compilation is very cheap compared to real code generation. While it might seem that allocating many small objects (one <code>Box</code>ed closure per expression) is not very efficient and that it would be better to replace it with some sort of a memory pool, in practice we saw a negligible performance impact.</p></li><li><p>No native code is generated at runtime, which means that we execute only code that was statically verified by Rust at compile-time and compiled down to a static function. All that we do at the runtime is call existing functions with different values.</p></li><li><p>Execution turns out to be faster too. This initially came as a surprise, because dynamic dispatch is widely believed to be costly and we were worried that it would get slightly worse than AST interpretation. However, it showed an immediate ~10-15% runtime improvement in benchmarks and on real examples.</p></li></ul><p>The only obvious downside is that each level of AST requires a separate dynamically-dispatched call instead of a single inlined code for the entire expression, like you would have even with a basic template JIT.</p><p>Unfortunately, such output could be achieved only with real native code generation, and, for our case, the mentioned downsides and risks would outweigh runtime benefits, so we went with the safe &amp; flexible closure approach.</p>
    <div>
      <h3>Bonus: WebAssembly support</h3>
      <a href="#bonus-webassembly-support">
        
      </a>
    </div>
    <p>As was mentioned earlier, we chose Rust as a safe high-level language that allows easy integration with other parts of our stack written in Go, C and Lua via C FFI. But Rust has one more target it invests in and supports exceptionally well: WebAssembly.</p><p>Why would we be interested in that? Apart from the parts of the stack where our rules would run, and the API that publishes them, we also have users who like to write their own rules. To do that, they use a UI editor that allows either writing raw expressions in Wireshark syntax or as a WYSIWYG builder.</p><p>We thought it would be great to expose the parser - the same one as we use on the backend - to the frontend JavaScript for a consistent real-time editing experience. And, honestly, we were just looking for an excuse to play with WASM support in Rust.</p><p>WebAssembly could be targeted via regular C FFI, but in that case you would need to manually provide all the glue for the JavaScript side to hold and convert strings, arrays and objects forth and back.</p><p>In Rust, this is all handled by <a href="https://github.com/rustwasm/wasm-bindgen">wasm-bindgen</a>. While it provides various attributes and methods for direct conversions, the simplest way to get started is to activate the “serde” feature which will automatically convert types using <code>JSON.parse</code>, <code>JSON.stringify</code> and <code>[serde_json](https://docs.serde.rs/serde_json/)</code> under the hood.</p><p>In our case, creating a wrapper for the parser with only 20 lines of code was enough to get started and have all the WASM code + JavaScript glue required:</p>
            <pre><code>#[wasm_bindgen]
pub struct Scheme(wirefilter::Scheme);

fn into_js_error(err: impl std::error::Error) -&gt; JsValue {
   js_sys::Error::new(&amp;err.to_string()).into()
}

#[wasm_bindgen]
impl Scheme {
   #[wasm_bindgen(constructor)]
   pub fn try_from(fields: &amp;JsValue) -&gt; Result&lt;Scheme, JsValue&gt; {
       fields.into_serde().map(Scheme).map_err(into_js_error)
   }

   pub fn parse(&amp;self, s: &amp;str) -&gt; Result&lt;JsValue, JsValue&gt; {
       let filter = self.0.parse(s).map_err(into_js_error)?;
       JsValue::from_serde(&amp;filter).map_err(into_js_error)
   }
}</code></pre>
            <p>And by using a higher-level tool called <a href="https://github.com/rustwasm/wasm-pack">wasm-pack</a>, we also got automated npm package generation and publishing, for free.</p><p>This is not used in the production UI yet because we still need to figure out some details for unsupported browsers, but it’s great to have all the tooling and packages ready with minimal efforts. Extending and reusing the same package, it should be even possible to run filters in Cloudflare Workers too (which <a href="/webassembly-on-cloudflare-workers/">also support WebAssembly</a>).</p>
    <div>
      <h3>The future</h3>
      <a href="#the-future">
        
      </a>
    </div>
    <p>The code in the current state is already doing its job well in production and we’re happy to share it with the open-source Rust community.</p><p>This is definitely not the end of the road though - we have many more fields to add, features to implement and planned optimisations to explore. If you find this sort of work interesting and would like to help us by working on firewalls, parsers or just any Rust projects at scale, give us a shout!</p> ]]></content:encoded>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Serverless]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[IPv6]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">2IkqAbjbvhsOMUuOPkvsnL</guid>
            <dc:creator>Ingvar Stepanyan</dc:creator>
            <dc:creator>Andrew Galloni</dc:creator>
        </item>
    </channel>
</rss>