The Cloudflare Blog

Exploring WebAssembly AI Services on Cloudflare Workers

Guest Author — Fri, 09 Oct 2020 11:00:00 GMT

This is a guest post by Videet Parekh, Abelardo Lopez-Lagunas, Sek Chai at Latent AI.

Edge networks present a significant opportunity for Artificial Intelligence (AI) performance and applicability. AI technologies already make it possible to run compelling applications like object and voice recognition, navigation, and recommendations.

AI at the edge presents a host of benefits. One is scalability—it is simply impractical to send all data to a centralized cloud. In fact, one study has predicted a global scope of 90 zettabytes generated by billions of IoT devices by 2025. Another is privacy—many users are reluctant to move their personal data to the cloud, whereas data processed at the edge are more ephemeral.

When AI services are distributed away from centralized data centers and closer to the service edge, it becomes possible to enhance the overall application speed without moving data unnecessarily. However, there are still challenges to make AI from the deep-cloud run efficiently on edge hardware. Here, we use the term deep-cloud to refer to highly centralized, massively-sized data centers. Deploying edge AI services can be hard because AI is both computational and memory bandwidth intensive. We need to tune the AI models so the computational latency and bandwidth can be radically reduced for the edge.

The Case for Distributed AI Services

Edge network infrastructure for distributed AI is already widely available. Edge networks like Cloudflare serve a significant proportion of today’s Internet traffic, and can serve as the bridge between devices and the centralized cloud. Highly-performant AI services are possible because of the distributed processing that has excellent spatial proximity to the edge data.

We at Latent AI are exploring ways to deploy AI at the edge, with technology that transforms and compresses AI models for the edge. The size of our edge AI model is many orders of magnitudes smaller than the sensor data (e.g., kilobytes or megabytes for the edge AI model, compared to petabytes of edge data). We are exploring using WebAssembly (WASM) within the Cloudflare Workers environment. We want to identify possible operating points for the distributed AI services by exploring achievable performance on the available edge infrastructure.

Architectural Approach for Exploration

WebAssembly (WASM) is a new open-standard format for programs that run on the Web. It is a popular way to enable high-performance web-based applications. WASM is closer to machine code, and thus faster than JavaScript (JS) or JIT. Compiler optimizations, already done ahead of time, reduce the overhead in fetching and parsing application code. Today, WASM offers the flexibility and portability of JS at the near-optimum performance of compiled machine code.

AI models have notoriously large memory usage demands because configuring them requires high parameter counts. Cloudflare already extends support for WASM using their Wrangler CLI, and we chose to use it for our exploration. Wrangler is the open-source CLI tool used to manage Workers, and is designed to enable a smooth developer experience.

How Latent AI Accelerates Distributed AI Services

Latent AI’s mission is to enable ambient computing, regardless of any resource constraints. We develop developer tools that greatly reduce the computing resources needed to process AI on the edge while being completely hardware-agnostic.

Latent AI’s tools significantly compress AI models to reduce their memory size. We have shown up to 10x compression for state-of-the-art models. This capability addresses the load time latencies challenging many edge network deployments. We also offer an optimized runtime that executes a neural network natively. Results are a 2-3x speedup on runtime without any hardware-specific accelerators. This dramatic performance boost offers fast and efficient inferences for the edge.

Our compression uses quantization algorithms to convert parameters for the AI model from 32-bit floating-point toward 16-bit or 8-bit models, with minimal loss of accuracy. The key benefit of moving to lower bit-precision is the higher power efficiency with less storage needed. Now AI inference can be processed using more efficient parallel processor hardware on the continuum of platforms at the distributed edge.

Optimized AI services can process data closest to the source and perform inferences at the distributed edge.

Selecting Real-World WASM Neural Network Examples

For our exploration, we use state-of-the-art deep neural networks called MobileNet. MobileNets are designed specifically for embedded platforms such as smartphones, and can achieve high recognition accuracy in visual object detection. We compress MobileNets AI models to be small fast, in order to represent the variety of use cases that can be deployed as distributed AI services. Please see this blog for more details on the AI model architecture.

We use the MobileNetV2 model variant for our exploration. The models are trained with different visual objects that can be detected: (1) a larger sized model with 10 objects derived from ImageNet dataset, and (2) a smaller version with just two classes derived from the COCO dataset. The COCO dataset are public open-source databases of images that are used as benchmarks for AI models. Images are labeled with detected objects such as persons, vehicles, bicycles, traffic lights, etc. Using Latent AI’s compression tool, we were able to compress and compile the MobileNetV2 models into WASM programs. In the WASM form, we can achieve fast and efficient processing of the AI model with a small storage footprint.

We want WASM neural networks to be as fast and efficient as possible. We spun up a Workers app to accept an image from a client, convert and preprocess the image into a cleaned data array, run it through the model and then return a class for that image. For both the large and small MobileNetv2 models, we create three variants with different bit-precision (32-bit floating point, 16-bit integer, and 8-bit integer). The average memory and inference times for the large AI model are 110ms and 189ms, respectively; And for the smaller AI model, they are 159ms and 15ms, respectively.

Our analysis suggests that overall processing can be improved by reducing the overhead for memory operations. For the large model, lowering bit precision to 8-bits reduces memory operations from 48% to 26%. For the small model, the memory load times dominate over the inference computation with over 90% of the latency in memory operations.

It is important to note that our results are based on our initial exploration, which is focused more on functionality rather than optimization. We make sure the results are consistent by averaging our measurements over 50-100 iterations. We do acknowledge that there are still network and system related latencies that can be further optimized, but we believe that the early results described here show promise with respect to AI model inferences on the distributed edge.

Comparison of memory and inference processing times for large and small DNNs.

Learning from Real-World WASM Neural Network Example

What lessons can we draw from our example use case?

First of all, we recommend a minimal compute and memory footprint for AI models deployed to the network edge. A small footprint allows for better line up of data types for WASM AI models to reduce memory load overhead. WASM practitioners know that WASM speed-ups come from the tighter coupling of the API between JavaScript API and native machine code. Because WASM code does not need to speculate on data types, parallelizing compilation for WASM can achieve better optimization.

Furthermore, we encourage the use of running AI models at 8-bit precision to reduce the overall size. These 8-bit AI models are readily compressed and compiled for the target hardware to greatly reduce the overhead in hosting the models for inference. Furthermore, for video imagery, there is no overhead to convert digitized raw data (e.g. image files digitized and stored as integers) to floating-point values for use with floating point AI models.

Finally, we suggest the use of a smart cache for AI models so that Workers can essentially reduce memory load times and focus solely on neural network inferences at runtime. Again, 8-bit models allow more AI models to be hosted and ready for inference. Referring to our exploratory results, hosted small AI models can be served at approximately 15ms inference time, offering a very compelling user experience with low latency and local processing. The WASM API provides a significant performance increase over pure-JS toolchains like Tensorflow.js. For example, for inference time for the large AI model of 189ms on WASM, we have observed a range of 1500ms on Tensorflow.js workflow, which is approximately an 8X difference in compute latency.

Unlocking the Future of the Distributed Edge

With exceedingly optimized WASM neural networks, distributed edge networks can move the inference closer to users, offering new edge AI services closer to the source of the data. With Latent AI technology to compress and compile WASM neural networks, the distributed edge networks can (1) host more models, (2) offer lower latency responses, and (3) potentially lower power utilization with more efficient computing.

Example person detected using a small AI model, 10x compressed to 150KB.

Imagine for example that the small AI model described earlier can distinguish if a person is in a video feed. Digital systems, e.g. door bell and doorway entry cameras, can hook up to Cloudflare Workers to verify if a person is present in the camera field of view. Similarly, other AI services could conduct sound analyses to check for broken windows and water leaks. With these distributed AI services, applications can run without access to deep cloud services. Furthermore, the sensor platform can be made with ultra low cost, low power hardware, in very compact form factors.

Application developers can now offer AI services with neural networks trained, compressed, and compiled natively as a WASM neural network. Latent AI developer tools can compress WASM neural networks and provide WASM runtimes offering blazingly fast inferences for the device and infrastructure edge. With scale and speed baked in, developers can easily create high-performance experiences for their users, wherever they are, at any scale. More importantly, we can scale enterprise applications on the edge, while offering the desired return on investments using edge networks.

About Latent AI

Latent AI is an early-stage venture spinout of SRI International. Our mission is to enable developers and change the way we think about building AI for the edge. We develop software tools designed to help companies add AI to edge devices and to empower users with new smart IoT applications. For more information about the availability of LEIP SDK, please feel free to contact us at info@latentai.com or check out our website.

Rendering React on the Edge with Flareact and Cloudflare Workers

Guest Author — Thu, 03 Sep 2020 12:00:00 GMT

The following is a guest post from Josh Larson, Engineer at Vox Media.

Imagine you’re the maintainer of a high-traffic media website, and your DNS is already hosted on Cloudflare.

Page speed is critical. You need to get content to your audience as quickly as possible on every device. You also need to render ads in a speedy way to maintain a good user experience and make money to support your journalism.

One solution would be to render your site statically and cache it at the edge. This would help ensure you have top-notch delivery speed because you don’t need a server to return a response. However, your site has decades worth of content. If you wanted to make even a small change to the site design, you would need to regenerate every single page during your next deploy. This would take ages.

Another issue is that your site would be static — and future updates to content or new articles would not be available until you deploy again.

That’s not going to work.

Another solution would be to render each page dynamically on your server. This ensures you can return a dynamic response for new or updated articles.

However, you’re going to need to pay for some beefy servers to be able to handle spikes in traffic and respond to requests in a timely manner. You’ll also probably need to implement a system of internal caches to optimize the performance of your app, which could lead to a more complicated development experience. That also means you’ll be at risk of a thundering herd problem if, for any reason, your cache becomes invalidated.

Neither of these solutions are great, and you’re forced to make a tradeoff between one of these two approaches.

Thankfully, you’ve recently come across a project like Next.js which offers a hybrid approach: static-site generation along with incremental regeneration. You’re in love with the patterns and developer experience in Next.js, but you’d also love to take advantage of the Cloudflare Workers platform to host your site.

Cloudflare Workers allow you to run your code on the edge quickly, efficiently and at scale. Instead of paying for a server to host your code, you can host it directly inside the datacenter — reducing the number of network trips required to load your application. In a perfect world, we wouldn’t need to find hosting for a Next.js site, because Cloudflare offers the same JavaScript hosting functionality with the Workers platform. With their dynamic runtime and edge caching capabilities, we wouldn’t need to worry about making a tradeoff between static and dynamic for our site.

Unfortunately, frameworks like Next.js and Cloudflare Workers don’t mesh together particularly well due to technical constraints. Until now:

I’m excited to announce Flareact, a new open-source React framework built for Cloudflare Workers.

With Flareact, you don’t need to make the tradeoff between a static site and a dynamic application.

Flareact allows you to render your React apps at the edge rather than on the server. It is modeled after Next.js, which means it supports file-based page routing, dynamic page paths and edge-side data fetching APIs.

Not only are Flareact pages rendered at the edge — they’re also cached at the edge using the Cache API. This allows you to provide a dynamic content source for your app without worrying about traffic spikes or response times.

With no servers or origins to deal with, your site is instantly available to your audience. Cloudflare Workers gives you a 0ms cold start and responses from the edge within milliseconds.

You can check out the docs and get started now by clicking the button below:

To get started manually, install the latest wrangler, and use the handy wrangler generate command below to create your first project:

npm i @cloudflare/wrangler -g
wrangler generate my-project https://github.com/flareact/flareact-template

What’s the big deal?

Hosting React apps on Cloudflare Workers Sites is not a new concept. In fact, you’ve always been able to deploy a create-react-app project to Workers Sites in addition to static versions of other frameworks like Gatsby and Next.js.

However, Flareact renders your React application at the edge. This allows you to provide an initial server response with HTML markup — which can be helpful for search engine crawlers. You can also cache the response at the edge and optionally invalidate that cache on a timed basis — meaning your static markup will be regenerated if you need it to be fresh.

This isn’t a new pattern: Next.js has done the hard work in defining the shape of this API with SSG support and Incremental Static Regeneration. While there are nuanced differences in the implementation between Flareact and Next.js, they serve a similar purpose: to get your application to your end-user in the quickest and most-scalable way possible.

A focus on developer experience

A magical developer experience is a crucial ingredient to any successful product.

As a longtime fan and user of Next.js, I wanted to experiment with running the framework on Cloudflare Workers. However, Next.js and its APIs are framed around the Node.js HTTP Server API, while Cloudflare Workers use V8 isolates and are modeled after the FetchEvent type.

Since we don’t have typical access to a filesystem inside V8 isolates, it’s tough to mimic the environment required to run a dynamic Next.js server at the edge. Though projects like Fab have come up with workarounds, I decided to approach the project with a clean slate and use existing patterns established in Next.js in a brand-new framework.

As a developer, I absolutely love the simplicity of exporting an asynchronous function from my page to have it supply props to the component. Flareact implements this pattern by allowing you to export a getEdgeProps function. This is similar to getStaticProps in Next.js, and it matches the expected return shape of that function in Next.js — including a revalidate parameter. Learn more about data fetching in Flareact.

I was also inspired by the API Routes feature of Next.js when I implemented the API Routes feature of Flareact — enabling you to write standard Cloudflare Worker scripts directly within your React app.

I hope porting over an existing Next.js project to Flareact is a breeze!

How it works

When a FetchEvent request comes in, Flareact inspects the URL pathname to decide how to handle it:

If the request is for a page or for page props, it checks the cache for that request and returns it if there’s a hit. If there is a cache miss, it generates the page request or props function, stores the result in the cache, and returns the response.

If the request is for an API route, it sends the entire FetchEvent along to the user-defined API function, allowing the user to respond as they see fit.

If you want your cached page to be revalidated after a certain amount of time, you can return an additional revalidate property from getEdgeProps(). This instructs Flareact to cache the endpoint for that number of seconds before generating a new response.

Finally, if the request is for a static asset, it returns it directly from the Workers KV.

The Worker

The core responsibilities of the Worker — or in a traditional SSR framework, the server — are to:

Render the initial React page component into static HTML markup.
Provide the initial page props as a JSON object, embedded into the static markup in a script tag.
Load the client-side JavaScript bundles and stylesheets necessary to render the interactive page.

One challenge with building Flareact is that the Webpack targets the webworker output rather than the node output. This makes it difficult to inform the worker which pages exist in the filesystem, since there is no access to the filesystem.

To get around this, Flareact leverages require.context, a Webpack-specific API, to inspect the project and build a manifest of pages on the client and the worker. I’d love to replace this with a smarter bundling strategy on the client-side eventually.

The Client

In addition to handling incoming Worker requests, Flareact compiles a client bundle containing the code necessary for routing, data fetching and more from the browser.

The core responsibilities of the client are to:

Listen for routing events
Fetch the necessary page component and its props from the worker over AJAX

Building a client router from scratch has been a challenge. It listens for changes to the internal route state, updates the URL pathname with pushState, makes an AJAX request to the worker for the page props, and then updates the current component in the render tree with the requested page.

It was fun building a flareact/link component similar to next/link:

import Link from "flareact/link";

export default function Index() {
  return (
    
      
        Go to About
      
    
  );
}

I also set out to build a custom version of next/head for Flareact. As it turns out, this was non-trivial! With lots of interesting stuff going on behind the scenes to support SSR and client-side routing events, I decided to make flareact/head a simple wrapper around react-helmet instead:

import Head from "flareact/head";

export default function Index() {
  return (
    
      
        My page title
      
      Hello, world.
    
  );
}

Local Development

The local developer experience of Flareact leverages the new wrangler dev command, sending server requests through a local tunnel to the Cloudflare edge and back to your machine.

This is a huge win for productivity, since you don’t need to manually build and deploy your application to see how it will perform in a production environment.

It’s also a really exciting update to the serverless toolchain. Running a robust development environment in a serverless world has always been a challenge, since your code is executing in a non-traditional context. Tunneling local code to the edge and back is such a great addition to Cloudflare’s developer experience.

Use cases

Flareact is a great candidate for a lot of Jamstack-adjacent applications, like blogs or static marketing sites.

It could also be used for more dynamic applications, with robust API functions and authentication mechanisms — all implemented using Cloudflare Workers.

Imagine building a high-traffic e-commerce site with Flareact, where both site reliability and dynamic rendering for things like price changes and stock availability are crucial.

There are also untold possibilities for integrating the Workers KV into your edge props or API functions as a first-class database solution. No need to reach for an externally-hosted database!

While the project is still in its early days, here are a couple real-world examples:

The Flareact docs site, powered by Markdown files
A blog site, powered by a headless WordPress API

The road ahead

I have to be honest: creating a server-side rendered React framework with little prior knowledge was very difficult. There’s still a ton to learn, and Flareact has a long way to go to reach parity with Next.js in the areas of optimization and production-readiness.

Here’s what I’m hoping to add to Flareact in the near future:

Smarter client bundling and Webpack chunks to reduce individual page weight
A more feature-complete client-side router
The ability to extend and customize the root document of the app
Support for more style frameworks (CSS-in-JS, Sass, CSS modules, etc)
A more stable development environment
Documentation and support for environment variables, secrets and KV namespaces
A guide for deploying from GitHub Actions and other CI tools

If the project sounds interesting to you, be sure to check out the source code on GitHub. Contributors are welcome!

How Replicated Developers Develop Remotely

Guest Author — Tue, 10 Mar 2020 13:00:00 GMT

This is a guest post by Marc Campbell and Grant Miller, co-founders of Replicated.

Replicated is a 5-year old infrastructure software company with a focus on enabling a new model of enterprise software delivery that we call Kubernetes Off-The-Shelf (KOTS) Software. Our team of 22 is largely technical with a geographic focus on Los Angeles and a few remote team members throughout the US. Our goal is to make it easy to install and operate third party software, so that sending your data to multi-tenant SaaS providers isn’t the only way to use their services. We think that it’s possible and easy to bring the applications to your data, securely and without a lot of operational overhead. While building Replicated, we began using Cloudflare first for DNS and DDoS protection, and over time started to use other Cloudflare services to help keep our services available and secure.

At Replicated, our development environment needs to be run on Kubernetes. Our product, KOTS, runs in Kubernetes and manages the lifecycle of 3rd-party applications in the Kubernetes cluster. Building and validating the product requires a developer to have access to a cluster. As our engineering team has grown to include dedicated front end engineers and other specialists who shouldn’t have to worry about building and maintaining their own cluster, the complexity of managing a local environment became a burden, and we needed to simplify in order to maintain developer productivity.

We’ve designed a solution that uses cloud-based infrastructure components, accessed and secured with Cloudflare Access and Argo, to move our development environment to Infrastructure as a Service (IaaS) resources; meaning our entire development environment is in the cloud. As a result, we’ve reduced the amount of time that a developer spends troubleshooting their local environment and allow every engineer on the team to maintain a full stack development environment, even if they are without deep Kubernetes expertise.

Previous Dev Environments with Docker for Mac

We started with each developer building their own local environments, using whatever tools they were comfortable with. Our first attempt to build a standard development environment that works for our engineering team was to use Docker for Mac and its built-in Kubernetes distribution. We would buy the best MacBook Pros available (16 GB, then 32 GB, now 64 GB), and everyone would have the entire stack running on their laptop.

This worked pretty well, except that there was a set of problems that our engineers would continue to hit--battery life was terrible because of the constant CPU usage, Docker For Mac was different from “real Kubernetes” in some meaningful ways, and Docker for Mac’s built-in K8s regularly would just sometimes stop working and the developer would need to uninstall and reinstall the entire stack. It was miserable.

We’d lose hours every week from engineers troubleshooting their local environments. When a front end engineer (who wasn’t expected to be a Kubernetes expert) would have issues, they’d need to pair and get help from a backend engineer; consuming not just one but two people’s valuable time.

We needed something better.

To The Cloud

Rather than running Docker locally, we now create an instance in Google Cloud for each developer. These instances have no public IP and are based on our machine image which has all of our prerequisites installed. This includes many tools, including a Kubernetes distribution that’s completely local to the server. We run a docker registry in each developer’s cluster as a cluster add-on. The cloud server has a magical tool called cloudflared running on it that replaces all of the network configuration and security work we would otherwise have had to do.‌‌

Cloudflared powers Argo Tunnel. When it starts, cloudflared creates four secure HTTP/2 tunnels to two Cloudflare data centers. When a request comes in for a development machine, Cloudflare routes that request over one of those tunnels directly to the machine running that developer’s environment. For example, my hostname is “marc.repl.dev”. Whenever I connect to that, from anywhere on earth, Cloudflare will see that I reach my development environment securely. If I need to spin up a new development environment, there is no configuration to do, wherever is running cloudflared with the appropriate credentials will receive the traffic. This all works on any cloud and in any cloud region.

‌‌This configuration has several advantages over a traditional deployment. For one, the server does not have a public IP and we don’t need to have any ports open in the Google Load Balancer, including for SSH. The only way to connect to these servers is through the Argo Tunnel, secured by Cloudflare Access. Access provides a BeyondCorp-style method of authentication, this ensures that the environment can be reached from anywhere in the world without the use of a VPN.

BeyondCorp is an elaborate way of saying that all our authentication is managed in a single place. We can write a policy which defines which machines a user should have access to and trust it will be applied everywhere. This means rather than managing SSH certificates which are hard to revoke and long-living, we can allow developers to login with the same Google credentials we use everywhere else! Should, knock on wood, a developer leave, we can revoke those credentials instantly; no more worrying what public keys they still might have lying around.

What happens on the developer’s machines?

Through Argo Tunnel and Access we now have the ability to connect to our new development instances, but that isn’t enough to allow our engineers to work. They need to be able to write and execute code on that remote machine in a seamless way. To solve that problem we turned to the Remote SSH extension for VS Code. In the words of the documentation for that project:

The Visual Studio Code Remote SSH extension allows you to open a remote folder on any remote machine, virtual machine, or container with a running SSH server and take full advantage of VS Code's feature set. Once connected to a server, you can interact with files and folders anywhere on the remote filesystem.

With Remote SSH, VS Code seamlessly reads and writes files to the developer’s remote server. When a developer opens a project, it feels local and seamless, but everything is authenticated by Access and proxied through Argo over SSH. Our developers can travel anywhere in the world, and trust their development environment will be accessible and fast.

Locally, a developer has a .ssh/config file to define local ports to forward through the SSH connection to a port that’s only available on the remote server. For example, my .ssh/config file contains:‌‌

Host marc.repl.dev
HostName marc.repl.dev
User marc
LocalForward 8080 127.0.0.1:30080
LocalForward 8005 127.0.0.1:30015
...

To build and execute code our developers open the embedded terminal in VS Code. This automatically connects them to the remote server. We use skaffold, a Kubernetes CLI for local development. A simple skaffold dev starts the stack on their remote machine which feels local because it’s all happening inside VS Code. Once it’s started, the developer can access localhost in their browser to view the results of their work by visiting http://localhost:8080. The SSH config above will forward this traffic to port 30080 on the remote server. Port 30080 on the remote server is a NodePort configured in the local cluster, that has the web server running in it. All of our APIs and web servers have static NodePorts for local development environments.

Now, when a developer starts at Replicated, their first day (or even week) isn’t consumed by setting up the development environment--now it takes less than an hour. We have a Terraform script that makes it easy to replace any one of our developer’s machines in seconds.

The Aftermath

All developers at Replicated have now been using this environment for nine months. We haven’t eliminated the problems that occasionally come up where Kubernetes isn’t playing nicely, or Docker uses too much disk space. However, these problems do occur much less frequently than they did on Docker for Mac. We now have two new options that weren’t easily available when everyone ran their environment locally.

First, a backend engineer can just ssh through the Argo Tunnel into the other developers server to troubleshoot and help. Every development environment has become a collaborative place. This is great when two engineers aren’t in the same room. Also, we’re less attached to our development environments--if my server isn’t working properly for unknown reasons, instead of troubleshooting it for hours, I can delete it and get a new clean one.

Some additional benefits include:

Developers can have multiple envs easily (to try out a new k8s version, for example)
Battery life is awesome again on laptops
We don’t need the biggest and most powerful laptops anymore (Hello Chromebooks and Tablets)
Developers can choose their local OS and environment (MacOS, Windows, Linux) because they are all supported, as long as SSH is supported.
Code does not live on a developer laptop; it doesn’t travel with them to coffee shops and other insecure places. This is great for security purposes--a lost laptop no longer means the codebase is out there with it.

How To

Beyond just telling you what we did, we’d like to show you how to replicate it for yourself! This assumes you have a domain which is already configured to use Cloudflare.

Create an instance to represent your development environment in the cloud of your choice.

gcloud compute instances create my-dev-universe`

2. Configure your instance to run cloudflared when it starts up, and give it a helpful hostname like dev.mysite.com.‌‌

cat “hostname: dev.mysite.com\n” > ~/.cloudflared/config.yml
cloudflared login
sudo cloudflared service install

3. Write an Access policy to allow only you to access your machine‌‌

‌4. Configure your local machine to SSH via Cloudflare:‌‌

sudo apt-get install cloudflare/cloudflare/cloudflared
cloudflared access ssh-config --hostname dev.mysite.com --short-lived-cert
>> ~/.ssh/config

4. Install VS Code and the Remote Development extension pack

5. In VS Code select ‘Remote-SSH: Connect to Host…’ from the Command Palette and enter user@dev.mysite.com. A browser window will open where you will be prompted to login with the identity provider you configured with Cloudflare.

6. You’re done! If you select File > Open you will be seeing files on your remote machine. The embedded terminal will also execute code on that remote machine.

7. Once you’re ready to get a production-ready setup for your team, take a look at the instructions we share with our team.

Conclusion

There is no doubt that the world is becoming more Internet-connected, and that deployment environments are becoming more complex. It stands to reason that it’s only a matter of time before all software development happens through and in concert with the Internet.

While it might not be the best solution for every team, it has resulted in a dramatically better experience for Replicated and we hope it does for you as well.

How to get started‌‌

‌‌Replicated develops remotely with Cloudflare Access, a remote access gateway that helps you secure access to internal applications and infrastructure without a VPN.

Effective until September 1, 2020, Cloudflare is making Access and other Cloudflare for Teams products free to small businesses. We're doing this to help ensure that small businesses that implement work from home policies in order to combat the spread of the Coronavirus (COVID-19) can ensure business continuity.

‌You can learn more and apply at cloudflare.com/smallbusiness now.

Fifty Years Ago

Guest Author — Tue, 29 Oct 2019 07:15:00 GMT

This is a guest post by Steve Crocker of Shinkuro, Inc. and Bill Duvall of Consulair. Fifty years ago they were both present when the first packets flowed on the Arpanet.

On 29 October 2019, Professor Leonard (“Len”) Kleinrock is chairing a celebration at the University of California, Los Angeles (UCLA). The date is the fiftieth anniversary of the first full system test and remote host-to-host login over the Arpanet. Following a brief crash caused by a configuration problem, a user at UCLA was able to log in to the SRI SDS 940 time-sharing system. But let us paint the rest of the picture.

The Arpanet was a bold project to connect sites within the ARPA-funded computer science research community and to use packet-switching as the technology for doing so. Although there were parallel packet-switching research efforts around the globe, none were at the scale of the Arpanet project. Cooperation among researchers in different laboratories, applying multiple machines to a single problem and sharing of resources were all part of the vision. And over the fifty years since then, the vision has been fulfilled, albeit with some undesired outcomes mixed in with the enormous benefits. However, in this blog, we focus on just those early days.

In September 1969, Bolt, Beranek and Newman (BBN) in Cambridge, MA delivered the first Arpanet IMP (packet switch) to Len Kleinrock’s laboratory at UCLA. The Arpanet incorporated his theoretical work on packet switching and UCLA was chosen as the network measurement site for validation of his theories. The second IMP was installed a month later at Doug Engelbart’s laboratory at the Stanford Research Institute – now called SRI International – in Menlo Park, California. Engelbart had invented the mouse and his lab had developed a graphical interface for structured and hyperlinked text. Engelbart’s vision saw computer users sharing information over a wide-scale network, so the Arpanet was a natural candidate for his work. Today, we have seen that vision travel from SRI to Xerox to Apple to Microsoft, and it is now a part of everyone’s environment.

“IMP” stood for Interface Message Processor; we would now simply say “router.” Each IMP was connected to up to four host computers. At UCLA the first host was a Scientific Data Systems (SDS) Sigma 7. At SRI, the host was an SDS 940. Jon Postel, Vint Cerf and Steve Crocker were among the graduate students at UCLA involved in the design of the protocols between the hosts on the Arpanet, as were Bill Duvall, Jeff Rulifson, and others at SRI (see RFC 1 and RFC 2.)

SRI and UCLA quickly connected their hosts to the IMPs. Duvall at SRI modified the SDS 940 time-sharing system to allow host to host terminal connections over the net. Charley Kline wrote the complementary client program at UCLA. These efforts required building custom hardware for connecting the IMPs to the hosts, and programming for both the IMPs and the respective hosts. At the time, systems programming was done either in assembly language or special purpose hybrid languages blending simple higher-level language features with assembler. Notable examples were ESPOL for the Burroughs 5500 and PL/I for Multics. Much of Engelbart’s NLS system was written in such a language, but the time-sharing system was written in assembler for efficiency and size considerations.

Along with the delivery of the IMPs, a deadline of October 31 was set for connecting the first hosts. Testing was scheduled to begin on October 29 in order to allow a few days for necessary debugging and handling of unanticipated problems. In addition to the high-speed line that connected the SRI and UCLA IMPs, there was a parallel open, dedicated voice line. On the evening of October 29 Duvall at SRI donned his headset as did Charley Kline at UCLA, and both host-IMP pairs were started. Charley typed an L, the first letter of a LOGIN command. Duvall, tracking the activity at SRI, saw that the L was received, and that it launched a user login process within the 940. The 940 system was full duplex, so it echoed an “L” across the net to UCLA. At UCLA, the L appeared on the terminal. Success! Charley next typed O and received back O. Charley typed G, and there was silence. At SRI, Duvall quickly determined that an echo buffer had been sized too small^[1], re-sized it, and restarted the system. Charley typed “LO” again, and received back the normal “LOGIN”. He typed a confirming RETURN, and the first host-to-host login on the Arpanet was completed.

Len Kleinrock noted that the first characters sent over the net were “LO.” Sensing the importance of the event, he expanded “LO" to “Lo and Behold”, and used that in the title of the movie called “Lo and Behold: Reveries of the Connected World.” See imdb.com/title/tt5275828.

Engelbart's five finger keyboard and mouse with three buttons. The mouse evolved and became ubiquitous. The five finger keyboard faded.

IMPs continued to be installed on the Arpanet at the rate of roughly one per month over the next two years. Soon we had a spectacularly large network with more than twenty hosts, and the connections between the IMPs were permanent telephone lines operating at the lightning speed of 50,000 bits per second^[2].

Len Kleinrock and IMP #1 at UCLA

Today, all computers come with hardware and software to communicate with other computers. Not so back then. Each computer was the center of its own world, and expected to be connected only to subordinate “peripheral” devices – printers, tape drives, etc. Many even used different character sets. There was no standard method for connecting two computers together, not even ones from the same manufacturer. Part of what made the Arpanet project bold was the diversity of the hardware and software at the research centers. Almost all of the hosts at these sites were time-shared computers. Typically, several people shared the same computer, and the computer processed each user’s computation a little bit at a time. These computers were large and expensive. Personal computers were fifteen years in the future, and smart phones were science fiction. Even Dick Tracy’s fantasy two-way wrist radio envisioned only voice interaction, not instant access to databases and sharing of pictures and videos.

Dick Tracy and his two-way radio.

Each site had to create a hardware connection from the host(s) to the IMP. Further, each site had to add drivers or more to the operating system in its host(s) so that programs on the host could communicate with the IMP. The protocols for host to host communication were in their infancy and unproven.

During those first two years when IMPs were being installed monthly, we met with students and researchers at the other sites to develop the first suite of protocols. The bottom layer was forgettably named the Host-Host protocol^[3]. Telnet, for emulating terminal dial-up, and the File Transfer Protocol (FTP) were on the next layer above the Host-Host protocol. Email started as a special case of FTP and later evolved into its own protocol. Other networks sprang up and the Arpanet became the seedling for the Internet, with TCP providing a reliable, two-way host to host connection, and IP below it stitching together the multiple networks of the Internet. But the Telnet and FTP protocols continued for many years and are only recently being phased out in favor of more robust and more secure alternatives.

The hardware interfaces, the protocols and the software that implemented the protocols were the tangible engineering products of that early work. Equally important was the social fabric and culture that we created. We knew the system would evolve, so we envisioned an open and evolving architecture. Many more protocols would be created, and the process is now embodied in the Internet Engineering Task Force (IETF). There was also a strong spirit of cooperation and openness. The Request for Comments (RFCs) series of notes were open for anyone to write and everyone to read. Anyone was welcome to participate in the design of the protocol, and hence we now have important protocols that have originated from all corners of the world.

In October 1971, two years after the first IMP was installed, we held a meeting at MIT to test the software on all of the hosts. Researchers at each host attempted to login, via Telnet, to each of the other hosts. In the spirit of Samuel Johnson’s famous quote^[4], the deadline and visibility within the research community stimulated frenetic activity all across the network to get everything working. Almost all of the hosts were able to login to all of the other hosts. The Arpanet was finally up and running. And the bakeoff at MIT that October set the tone for the future: test your software by connecting to others. No need for formal standards certification or special compliance organizations; the pressure to demonstrate your stuff actually works with others gets the job done.

^[1] The SDS 940 had a maximum memory size of 65K 24-bit words. The time-sharing system along with all of its associated drivers and active data had to share this limited memory, so space was precious and all data structures and buffers were kept to the minimum possible size. The original host-to-host protocol called for terminal emulation and single character messages, and buffers were sized accordingly. What had not been anticipated was that in a full duplex system such as the 940, multiple characters might be echoed for a single received character. Such was the case when the G of LOG was echoed back as “GIN” due to the command completion feature of the SDS 940 operating system.

^[2] “50,000” is not a misprint. The telephone lines in those days were analog, not digital. To achieve a data rate of 50,000 bits per second, AT&T used twelve voice grade lines bonded together and a Western Electric series 303A modem that spread the data across the twelve lines. Several years later, an ordinary “voice grade” line was implemented with digital technology and could transmit data at 56,000 bits per second, but in the early days of the Arpanet 50Kbs was considered very fast. These lines were also quite expensive.

^[3] In the papers that described the Host-Host protocol, the term Network Control Program (NCP) designated the software addition to the operating system that implemented the Host-Host protocol. Over time, the term Host-Host protocol fell into disuse in favor of Network Control Protocol, and the initials “NCP” were repurposed.

^[4] Samuel Johnson - ‘Depend upon it, sir, when a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully.’

Terraforming Cloudflare: in quest of the optimal setup

Guest Author — Wed, 09 Oct 2019 15:00:00 GMT

This is a guest post by Dimitris Koutsourelis and Alexis Dimitriadis, working for the Security Team at Workable, a company that makes software to help companies find and hire great people.

Overview

This post is about our introductive journey to the infrastructure-as-code practice; managing Cloudflare configuration in a declarative and version-controlled way. We'd like to share the experience we've gained during this process; our pain points, limitations we faced, different approaches we took and provide parts of our solution and experimentations.

Terraform world

Terraform is a great tool that fulfills our requirements, and fortunately, Cloudflare maintains its own provider that allows us to manage its service configuration hasslefree.

On top of that, Terragrunt, is a thin wrapper that provides extra commands and functionality for keeping Terraform configurations DRY, and managing remote state.

The combination of both leads to a more modular and re-usable structure for Cloudflare resources (configuration), by utilizing terraform and terragrunt modules.

We've chosen to use the latest version of both tools (Terraform-v0.12 & Terragrunt-v0.19 respectively) and constantly upgrade to take advantage of the valuable new features and functionality, which at this point in time, remove important limitations.

Workable context

Our set up includes multiple domains that are grouped in two distinct Cloudflare organisations: production & staging. Our environments have their own purposes and technical requirements (i.e.: QA, development, sandbox and production) which translates to slightly different sets of Cloudflare zone configuration.

Our approach

Our main goal was to have a modular set up with the ability to manage any configuration for any zone, while keeping code repetition to a minimum. This is more complex than it sounds; we have repeatedly changed our Terraform folder structure - and other technical aspects - during the development period. The following sections illustrate a set of alternatives through our path, along with pros & cons.

Structure

Terraform configuration is based on the project's directory structure, so this is the place to start.

Instead of retaining the Cloudflare organisation structure (production & staging as root level directories containing the zones that belong in each organization), our decision was to group zones that share common configuration under the same directory. This helps keep the code dry and the set up consistent and readable.

On the down side, this structure adds an extra layer of complexity, as two different sets of credentials need to be handled conditionally and two state files (at the environments/ root level) must be managed and isolated using workspaces.

On top of that, we used Terraform modules, to keep sets of common configuration across zone groups into a single place.Terraform modules repository

modules/
│    ├── firewall/
│        ├── main.tf
│        ├── variables.tf
│    ├── zone_settings/
│        ├── main.tf
│        ├── variables.tf
│    └── [...]  
└──

Terragrunt modules repository

environments/
│    ├── [...]
│    ├── dev/
│    ├── qa/
│    ├── demo/
│        ├── zone-8/ (production)
│            └── terragrunt.hcl
│        ├── zone-9/ (staging)
│            └── terragrunt.hcl
│        ├── config.tfvars
│        ├── main.tf
│        └── variables.tf
│    ├── config.tfvars
│    ├── secrets.tfvars
│    ├── main.tf
│    ├── variables.tf
│    └── terragrunt.hcl
└──

The Terragrunt modules tree gives flexibility, since we are able to apply configuration on a zone, group zone, or organisation level (which is inline with Cloudflare configuration capabilities - i.e.: custom error pages can also be configured on the organisation level).

Resource types

We decided to implement Terraform resources in different ways, to cover our requirements more efficiently.

1. Static resource

The first thought that came to mind was having one, or multiple .tf files implementing all the resources with hardcoded values assigned to each attribute. It's simple and straightforward, but can have a high maintenance cost if it leads to code copy/paste between environments.

So, common settings seem to be a good use case; we chose to implement access_rules Terraform resources accordingly:modules/access_rules/main.tf

resource "cloudflare_access_rule" "no_17" {
  notes   = "this is a description"
  mode    = "blacklist"
  configuration = {
    target  = "ip"
    value   = "x.x.x.x"
  }
}
[...]

2. Parametrized resources

Our next step was to add variables to gain flexibility. This is useful when few attributes of a shared resource configuration differ between multiple zones. Most of the configuration remains the same (as described above) and the variable instantiation is added in the Terraform module, while their values are fed through the Terragrunt module, as input variables, or entries inside_.tfvars_ files. The zone_settings_override resource was implemented accordingly:

modules/zone_settings/main.tf

resource "cloudflare_zone_settings_override" "zone_settings" {
  zone_id = var.zone_id
  settings {
    always_online       = "on"
    always_use_https    = "on"
    [...]
    browser_check       = var.browser_check
    mobile_redirect {
      mobile_subdomain  = var.mobile_redirect_subdomain
      status            = var.mobile_redirect_status
      strip_uri         = var.mobile_redirect_uri
    }
    
    [...]
    waf                 = "on"
    webp                = "off"
    websockets          = "on"
  }
}

environments/qa/main.tf

module "zone_settings" {
  source        = "git@github.com:foo/modules/zone_settings"
  zone_name     = var.zone_name
  browser_check = var.zone_settings_browser_check
  [...]
}

environments/qa/config.tfvars

#zone settings
zone_settings_browser_check = "off"
[...]
}

3. Dynamic resource

At that point, we thought that a more interesting approach would be to create generic resource templates to manage all instances of a given resource in one place. A template is implemented as a Terraform module and creates each resource dynamically, based on its input: data fed through the Terragrunt modules (/environments in our case), or entries in the tfvars files.

We chose to implement the account_member resource this way.modules/account_members/variables.tf

variable "users" {
  description   = "map of users - roles"
  type          = map(list(string))
}
variable "member_roles" {
  description   = "account role ids"
  type          = map(string)
}

modules/account_members/main.tf

resource "cloudflare_account_member" "account_member" {
 for_each          = var.users
 email_address     = each.key
 role_ids          = [for role in each.value : lookup(var.member_roles, role)]
 lifecycle {
   prevent_destroy = true
 }
}

We feed the template with a list of users (list of maps). Each member is assigned a number of roles. To make code more readable, we mapped users to role names instead of role ids:environments/config.tfvars

member_roles = {
  admin       = "000013091sds0193jdskd01d1dsdjhsd1"
  admin_ro    = "0000ds81hd131bdsjd813hh173hds8adh"
  analytics   = "0000hdsa8137djahd81y37318hshdsjhd"
  [...]
  super_admin = "00001534sd1a2123781j5gj18gj511321"
}
users = {
  "user1@workable.com"  = ["super_admin"]
  "user2@workable.com"  = ["analytics", "audit_logs", "cache_purge", "cf_workers"]
  "user3@workable.com"  = ["cf_stream"]
  [...]
  "robot1@workable.com" = ["cf_stream"]
}

Another interesting case we dealt with was the rate_limit resource; the variable declaration (list of objects) & implementation goes as follows:modules/rate_limit/variables.tf

variable "rate_limits" {
  description   = "list of rate limits"
  default       = []
 
  type          = list(object(
  {
    disabled    = bool,
    threshold   = number,
    description = string,
    period      = number,
    
    match       = object({
      request   = object({
        url_pattern     = map(string),
        schemes         = list(string),
        methods         = list(string)
      }),
      response          = object({
        statuses        = list(number),
        origin_traffic  = bool
      })
    }),
    action      = object({
      mode      = string,
      timeout   = number
    })
  }))
}

modules/rate_limit/main.tf

locals {
 […]
}
data "cloudflare_zones" "zone" {
  filter {
    name    = var.zone_name
    status  = "active"
    paused  = false
  }
}
resource "cloudflare_rate_limit" "rate_limit" {
  count         = length(var.rate_limits)
  zone_id       =  lookup(data.cloudflare_zones.zone.zones[0], "id")
  disabled      = var.rate_limits[count.index].disabled
  threshold     = var.rate_limits[count.index].threshold
  description   = var.rate_limits[count.index].description
  period        = var.rate_limits[count.index].period
  
  match {
    request {
      url_pattern     = local.url_patterns[count.index]
      schemes         = var.rate_limits[count.index].match.request.schemes
      methods         = var.rate_limits[count.index].match.request.methods
    }
    response {
      statuses        = var.rate_limits[count.index].match.response.statuses
      origin_traffic  = var.rate_limits[count.index].match.response.origin_traffic
    }
  }
  action {
    mode        = var.rate_limits[count.index].action.mode
    timeout     = var.rate_limits[count.index].action.timeout
  }
}

environments/qa/rate_limit.tfvars

common_rate_limits = [
{
    #1
    disabled      = false
    threshold     = 50
    description   = "sample description"
    period        = 60
   
   match  = {
      request   = {
        url_pattern  = {
          "subdomain"   = "foo"
          "path"        = "/api/v1/bar"
        }
        schemes         = [ "_ALL_", ]
        methods         = [ "GET", "POST", ]
      }
      response  = {
        statuses        = []
        origin_traffic  = true
      }
    }
    action  = {
      mode      = "simulate"
      timeout   = 3600
    }
  },
  [...]
  }
]

The biggest advantage of this approach is that all common rate_limit rules are in one place and each environment can include its own rules in their .tfvars. The combination of those using Terraform built-in concat() function, achieves a 2-layer join of the two lists (common|unique rules). So we wanted to give it a try:

locals {
  rate_limits  = concat(var.common_rate_limits, var.unique_rate_limits)
}

There is however a drawback: .tfvars files can only contain static values. So, since all url attributes - that include the zone name itself - have to be set explicitly in the data of each environment, it means that every time a change is needed to a url, this value has to be copied across all environments and change the zone name to match the environment.

The solution we came up with, in order to make the zone name dynamic, was to split the url attribute into 3 parts: subdomain, domain and path. This is effective for the .tfvars, but the added complexity to handle the new variables is non negligible. The corresponding code illustrates the issue:modules/rate_limit/main.tf

locals {
  rate_limits   = concat(var.common_rate_limits, var.unique_rate_limits)
  url_patterns  = [for rate_limit in local.rate_limits:  "${lookup(rate_limit.match.request.url_pattern, "subdomain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "subdomain")}." : ""}"${lookup(rate_limit.match.request.url_pattern, "domain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "domain")}" : ${var.zone_name}}${lookup(rate_limit.match.request.url_pattern, "path", null) != null ? lookup(rate_limit.match.request.url_pattern, "path") : ""}"]
}

Readability vs functionality: although flexibility is increased and code duplication is reduced, the url transformations have an impact on code's readability and ease of debugging (it took us several minutes to spot a typo). You can imagine this is even worse if you attempt to implement a more complex resource (such as page_rule which is a list of maps with four url attributes).

The underlying issue here is that at the point we were implementing our resources, we had to choose maps over objects due to their capability to omit attributes, using the lookup() function (by setting default values). This is a requirement for certain resources such as page_rules: only certain attributes need to be defined (and others ignored).

In the end, the context will determine if more complex resources can be implemented with dynamic resources.

4. Sequential resources

Cloudflare page rule resource has a specific peculiarity that differentiates it from other types of resources: the priority attribute.When a page rule is applied, it gets a unique id and priority number which corresponds to the order it has been submitted. Although Cloudflare API and terraform provider give the ability to explicitly specify the priority, there is a catch.

Terraform doesn't respect the order of resources inside a .tf file (even in a _for each loop!); each resource is randomly picked up and then applied to the provider. So, if page_rule priority is important - as in our case - the submission order counts. The solution is to lock the sequence in which the resources are created through the depends_on meta-attribute:

resource "cloudflare_page_rule" "no_3" {
  depends_on  = [cloudflare_page_rule.no_2]
  zone_id     = lookup(data.cloudflare_zones.zone.zones[0], "id")
  target      = "www.${var.zone_name}/foo"
  status      = "active"
  priority    = 3
  actions {
    forwarding_url {
      status_code    = 301
      url            = "https://www.${var.zone_name}"
    }
  }
}
resource "cloudflare_page_rule" "no_2" {
  depends_on  = [cloudflare_page_rule.no_1]
  zone_id     = lookup(data.cloudflare_zones.zone.zones[0], "id")
  target      = "www.${var.zone_name}/lala*"
  status      = "active"
  priority    = 24
  actions {
    ssl                     = "flexible"
    cache_level             = "simplified"
    resolve_override        = "bar.${var.zone_name}"
    host_header_override    = "new.domain.com"
  }
}
resource "cloudflare_page_rule" "page_rule_1" {
  zone_id   = lookup(data.cloudflare_zones.zone.zones[0], "id")
  target    = "*.${var.zone_name}/foo/*"
  status    = "active"
  priority  = 1
  actions {
    forwarding_url {
      status_code     = 301
      url             = "https://foo.${var.zone_name}/$1/$2"
    }
  }
}

So we had to go with to a more static resource configuration because the depends_on attribute only takes static values (not dynamically calculated ones during the runtime).

Conclusion

After changing our minds several times along the way on Terraform structure and other technical details, we believe that there isn't a single best solution. It all comes down to the requirements and keeping a balance between complexity and simplicity. In our case, a mixed approach is good middle ground.

Terraform is evolving quickly, but at this point it lacks some common coding capabilities. So over engineering can be a catch (which we fell-in too many times). Keep it simple and as DRY as possible. :)

How Castle is Building Codeless Customer Account Protection

Guest Author — Wed, 11 Sep 2019 16:00:00 GMT

This is a guest post by Johanna Larsson, of Castle, who designed and built the Castle Cloudflare app and the supporting infrastructure.

Strong security should be easy.

Asking your consumers again and again to take responsibility for their security through robust passwords and other security measures doesn’t work. The responsibility of security needs to shift from end users to the companies who serve them.

Castle is leading the way for companies to better protect their online accounts with millions of consumers being protected every day. Uniquely, Castle extends threat prevention and protection for both pre and post login ensuring you can keep friction low but security high. With realtime responses and automated workflows for account recovery, overwhelmed security teams are given a hand. However, when you’re that busy, sometimes deploying new solutions takes more time than you have. Reducing time to deployment was a priority so Castle turned to Cloudflare Workers.

User security and friction

When security is no longer optional and threats are not black or white, security teams are left with trying to determine how to allow end-user access and transaction completions when there are hints of risk, or when not all of the information is available. Keeping friction low is important to customer experience. Castle helps organizations be more dynamic and proactive by making continuous security decisions based on realtime risk and trust.

Some of the challenges with traditional solutions is that they are often just focused on protecting the app or they are only focused on point of access, protecting against bot access for example. Tools specifically designed for securing user accounts however are fundamentally focused on protecting the accounts of the end-users, whether they are being targeting by human or bots. Being able to understand end-user behaviors and their devices both pre and post login is therefore critical in being able to truly protect each users. The key to protecting users is being able to decipher between normal and anomalous activity on an individual account and device basis. You also need a playbook to respond to anomalies and attacks with dedicated flows, that allows your end users to interact directly and provide feedback around security events.

By understanding the end user and their good behaviors, devices, and transactions, it is possible to automatically respond to account threats in real-time based on risk level and policy. This approach not only reduces end-user friction but enables security teams to feel more confident that they won't ever be blocking a legitimate login or transaction.

Castle processes tens of millions of events every day through its APIs, including contextual information like headers, IP, and device types. The more information that can be associated with a request the better. This allows us to better recognize abnormalities and protect the end user. Collection of this information is done in two ways. One is done on the web application's backend side through our SDKs and the other is done on the client side using our mobile SDK or browser script. Our experience shows that any integration of a security service based on user behavior and anomaly detection can involve many different parties across an organization, and it affects multiple layers of the tech stack. On top of the security related roles, it's not unusual to also have to coordinate between backend, devops, and frontend teams. The information related to an end user session is often spread widely over a code base.

The cost of security

One of the biggest challenges in implementing a user-facing security and risk management solution is the variety of people and teams it needs attention from, each with competing priorities. Security teams are often understaffed and overwhelmed making it difficult to take on new projects. At the same time, it consumes time from product and engineering personnel on the application side, who are responsible for UX flows and performing continuous authentication post-login.

We've been experimenting with approaches where we can extract that complexity from your application code base, while also reducing the effort of integrating. At Castle, we believe that strong security should be easy.

With Cloudflare we found a service that enables us to create a more friendly, simple, and in the end, safe integration process by placing the security layer directly between the end user and your application. Security-related logic shouldn't pollute your app, but should reside in a separate service, or shield, that covers your app. When the two environments are kept separate, this reduces the time and cost of implementing complex systems making integration and maintenance less stressful and much easier.

Our integration with Cloudflare aims to solve this implementation challenge, delivering end-to-end account protection for your users, both pre and post login, with the click of a button.

The codeless integration

In our quest for a purely codeless integration, key features are required. When every customer application is different, this means every integration is different. We want to solve this problem for you once and for all. To do this, we needed to move the security work away from the implementation details so that we could instead focus on describing the key interactions with the end user, like logins or bank transactions. We also wanted to empower key decision makers to recognize and handle crucial interactions in their systems. Creating a single solution that could be customized to fit each specific use case was a priority.

Building on top of Cloudflare's platform, we made use of three unique and powerful products: Workers, Apps for Workers, and Workers KV.

Thanks to Workers we have full access to the interactions between the end user and your application. With their impressive performance, we can confidently run inline of website requests without creating noticeable latency. We will never slow down your site. And in order to achieve the flexibility required to match your specific use case, we created an internal configuration format that fully describes the interactions of devices and servers across HTTP, including web and mobile app traffic. It is in this Worker where we've implemented an advanced routing engine to match and collect information about requests and responses to events, directly from the edge. It also fully handles injecting the Castle browser script — one less thing to worry about.

All of this logic is kept separate from your application code, and through the Cloudflare App Store we are able to distribute this Worker, giving you control over when and where it is enabled, as well as what configurations are used. There's no need to copy/paste code or manage your own Workers.

In order to achieve the required speed while running in distributed edge locations, we needed a high performing low latency datastore, and we found one in the Cloudflare Workers KV Store. Cloudflare Apps are not able to access the KV Store directly, but we've solved this by exposing it through a separate Worker that the Castle App connects to. Because traffic between Workers never leaves the Cloudflare network, this is both secure and fast enough to match your requirements. The KV Store allows us to maintain end user sessions across the world, and also gives us a place to store and update the configurations and sessions that drive the Castle App.

In combining these products we have a complete and codeless integration that is fully configurable and that won't slow you down.

How does it work?

The data flow is straightforward. After installing the Castle App, Cloudflare will route your traffic through the Castle App, which uses the Castle Data Store and our API to intelligently protect your end users. The impact to traffic latency is minimal because most work is done in the background, not blocking the requests. Let's dig deeper into each technical feature:

Script injection

One of the tools we use to verify user identity is a browser script: Castle.js. It is responsible for gathering device information and UI interaction behavior, and although it is not required for our service to function, it helps improve our verdicts. This means it's important that it is properly added to every page in your web application. The Castle App, running between the end user and your application, is able to unobtrusively add the script to each page as it is served. In order for the script to also track page interactions it needs to be able to connect them to your users, which is done through a call to our script and also works out of the box with the Cloudflare interaction. This removes 100% of the integration work from your frontend teams.

Collect contextual information

The second half of the information that forms the basis of our security analysis is the information related to the request itself, such as IP and headers, as well as timestamps. Gathering this information may seem straightforward, but our experience shows some recurring problems in traditional integrations. IP-addresses are easily lost behind reverse proxies, as they need to be maintained as separate headers, like `X-Forwarded-For`, and the internal format of headers differs from platform to platform. Headers in general might get cut off based on allowlisting. The Castle App sees the original request as it comes in, with no outside influence or platform differences, enabling it to reliably create the context of the request. This saves your infrastructure and backend engineers from huge efforts debugging edge cases.

Advanced routing engine

Finally, in order to reliably recognize important events, like login attempts, we've built a fully configurable routing engine. This is fast enough to run inline of your web application, and supports near real-time configuration updates. It is powerful enough to translate requests to actual events in your system, like logins, purchases, profile updates or transactions. Using information from the request, it is then able to send this information to Castle, where you are able to analyze, verify and take action on suspicious activity. What's even better, is that at any point in the future if you want to Castle protect a new critical user event - such as a withdrawal or transfer event - all it takes is adding a record to the configuration file. You never have to touch application code in order to expand your Castle integration across sensitive events.

We've put together an example TypeScript snippet that naively implements the flow and features we've discussed. The details are glossed over so that we can focus on the functionality.

addEventListener(event => event.respondWith(handleEvent(event)));

const respondWith = async (event: CloudflareEvent) => {
  // You configure the application with your Castle API key
  const { apiKey } = INSTALL_OPTIONS;
  const { request } = event;

  // Configuration is fetched from the KV Store
  const configuration = await getConfiguration(apiKey);

  // The session is also retrieved from the KV Store
  const session = await getUserSession(request);

  // Pass the request through and get the response
  let response = await fetch(request);

  // Using the configuration we can recognize events by running
  // the request+response and configuration through our matching engine
  const securityEvent = getMatchingEvent(request, response, configuration);

  if (securityEvent) {
    // With direct access to the raw request, we can confidently build the context
    // including a device ID generated by the browser script, IP, and headers
    const requestContext = getRequestContext(request);

    // Collecting the relevant information, the data is passed to the Castle API
    event.waitUntil(sendToCastle(securityEvent, session, requestContext));
  }

  // Because we have access to the response HTML page we can safely inject the browser
  // script. If the response is not an HTML page it is passed through untouched.
  response = injectScript(response, session);

  return response;
};

We hope we have inspired you and demonstrated how Workers can provide speed and flexibility when implementing end to end account protection for your end users with Castle. If you are curious about our service, learn more here.

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers

Guest Author — Wed, 05 Jun 2019 13:00:00 GMT

This is a joint post by Whelan Boyd, Senior Product Manager at Optimizely and Remy Guercio, Product Marketing Manager for Cloudflare Workers.

Experimentation is an important ingredient in driving business growth: whether you’re iterating on a product or testing new messaging, there’s no substitute for the data and insights gathered from conducting rigorous experiments in the wild.

Optimizely is the world’s leading experimentation platform, with thousands of customers worldwide running tests for over 140 million visitors daily. If Optimizely were a website, it would be the third most trafficked in the US. And when it came time to experiment with reinvigorating their own platform, Optimizely chose Cloudflare Workers.

Improving Performance and Agility with Cloudflare Workers

Cloudflare Workers is a globally distributed serverless compute platform that runs across Cloudflare’s network of 180 locations worldwide. Workers are designed for flexibility, with many different use cases ranging from customizing configuration of Cloudflare services and features to building full, independent applications.

In this post, we’re going to focus on how Workers can be used to improve performance and increase agility for more complex applications. One of the key benefits of Workers is that they allow developers to move decision logic and data into a highly efficient runtime operating in close proximity to end users — resulting in significant performance benefits and flexibility. Which brings us to Optimizely...

How Optimizely Works

Every week Optimizely delivers billions of experiences to help teams A/B test new products, de-risk new feature launches, and validate alternative designs. Optimizely lets companies test client-side changes like layouts and copy, as well as server-side changes like algorithms and feature rollouts.

Let’s explore how both have challenges that can be overcome with Workers, starting with Optimizely’s client-side A/B testing, or Optimizely Web, product.

Use Case: Optimizely Web

The main benefit of Optimizely Web — Optimizely’s client-side testing framework — is that it supports A/B testing via straightforward insertion of a JavaScript tag on the web page. The test is designed via the Optimizely WYSIWYG editor, and is live within minutes. Common use cases include style updates, image swaps, headlines and other text changes. You can also write any custom JavaScript or CSS you want.

With client-side A/B testing, the browser downloads JavaScript that modifies the page as it’s loading. To avoid “flash-of-unstyled-content” (FOUC), developers need to implement this JavaScript synchronously in their tag. This constraint, though, can lead to page performance issues, especially on slower connections and devices. Downloading and executing JavaScript in the browser has a cost, and this cost increases if the amount of JavaScript is large. With a normal Optimizely Web implementation, all experiments are included in the JavaScript loaded on every page.

A traditional Optimizely implementation

With Workers, Optimizely can support many of these same use cases, but hoists critical logic to the edge to avoid much of the performance cost. Here’s how it works:

Implementing tests with Optimizely and Cloudflare Workers

This diagram shows how Optimizely customers can execute experiments created in the point-and-click UI through a Cloudflare Worker. Rather than the browser downloading a large JavaScript file, your Worker handling HTTP/S requests calls out to Optimizely’s Worker. Optimizely’s Worker determines which A/B tests should be active on this page and returns a small amount of JavaScript back to your Worker. In fact, it is the JavaScript required to execute A/B test variations on just that specific page load. Your Worker inlines the code in the page and returns it to the visitor’s browser.

Not only does this avoid a browser bottleneck downloading a lot of data, the amount of code to execute is a fraction of a normal client-side implementation. Since the experiments are set up inside the Optimizely interface just like any other Web experiment, you can run as many as you want without waiting for code deploy cycles. Better yet, your non-technical (e.g. marketing) teams can still run these without depending on developers for each test. It’s a one-time implementation.

Use Case: Going Further with Feature Rollouts

Optimizely Full Stack is Optimizely’s server-side experimentation and feature flagging platform for websites, mobile apps, chatbots, APIs, smart devices, and anything else with a network connection. You can deploy code behind feature flags, experiment with A/B tests, and roll out or roll back features immediately. Optimizely Rollouts is a free version of Full Stack that supports key feature rollout capabilities.

Full Stack SDKs are often implemented and instantiated directly in application code.

An Optimizely full stack experimentation setup

The main blocker to high velocity server-side testing is that experiments and feature rollouts must go through the code-deploy cycle — and to further add to the headache, many sites cache content on CDNs, so experiments or rollouts running at the origin never execute.

In this example, we’ll consider a new feature you’d like to roll out gradually, exposing more and more users over time between code deploys. With Workers, you can implement feature rollouts by running the Optimizely JavaScript SDK at the edge. The Worker is effectively a decision service. Instead of installing the JS SDK inside each application service where you might need to gate or roll out features, centralize instantiation in a Worker.

From your application, simply hit the Worker and the response will tell you whether a feature is enabled for that particular user. In the example below, we supply via query parameters a userId, feature, and account-specific SDK key and the Worker responds with its decision in result. Below is a sample Cloudflare Worker:

import { createManager } from '../index'

/// 
/// 

addEventListener('fetch', (event: any) => {
  event.respondWith(handleRequest(event.request))
})

/**
 * Fetch and log a request
 * @param {Request} request
 */
async function handleRequest(request: Request): Promise {
  const url = new URL(request.url)
  const key = url.searchParams.get('key')
  const userId = url.searchParams.get('userId')
  const feature = url.searchParams.get('feature')
  if (!feature || !key || !userId) {
    throw new Error('must supply "feature", "userId" and "key"')
  }

  try {
    const manager = createManager({
      sdkKey: key,
    })

    await manager.onReady().catch(err => {
      return new Response(JSON.stringify({ status: 'error' }))
    })
    const client = manager.getClient()

    const result = await client.feature({
      key: feature,
      userId,
    })

    return new Response(JSON.stringify(result))
  } catch (e) {
    return new Response(JSON.stringify({ status: 'error' }))
  }
}

This kind of setup is common for React applications, which may update store values based on decisions returned by the Worker. No need to force a request all the way back to origin.

All in all, using Workers as a centralized decision service can reduce the complexity of your Full Stack implementation and support applications that rely on heavy caching.

How to Improve Your Experimentation Setup

Both of the examples above demonstrate how Workers can provide speed and flexibility to experimentation and feature flagging. But this is just the tip of the iceberg! There are plenty of other ways you can use these two technologies together. We’d love to hear from you and explore them together!

Are you a developer looking for a feature flagging or server-side testing solution? The Optimizely Rollouts product is free and ready for you to sign up!

Or does your marketing team need a high performance A/B testing solution? The Optimizely Web use case is in developer preview.

Cloudflare Enterprise Customers: Reach out to your dedicated Cloudflare account manager learn more and start the process.
Optimizely Customers and Cloudflare Customers (who aren’t on an enterprise plan): Reach out to your Optimizely contact to learn more and start the process.

You can sign up for and learn more about using Cloudflare Workers here!

Cloudflare Repositories FTW

Guest Author — Thu, 30 May 2019 13:00:00 GMT

This is a guest post by Jim “Elwood” O’Gorman, one of the maintainers of Kali Linux. Kali Linux is a Debian based GNU/Linux distribution popular amongst the security research communities.

Kali Linux turned six years old this year!

In this time, Kali has established itself as the de-facto standard open source penetration testing platform. On a quarterly basis, we release updated ISOs for multiple platforms, pre-configured virtual machines, Kali Docker, WSL, Azure, AWS images, tons of ARM devices, Kali NetHunter, and on and on and on. This has lead to Kali being trusted and relied on to always being there for both security professionals and enthusiasts alike.

But that popularity has always led to one complication: How to get Kali to people?

With so many different downloads plus the apt repository, we have to move a lot of data. To accomplish this, we have always relied on our network of first- and third-party mirrors.

The way this works is, we run a master server that pushes out to a number of mirrors. We then pay to host a number of servers that are geographically dispersed and use them as our first-party mirrors. Then, a number of third parties donate storage and bandwidth to operate third-party mirrors, ensuring that we have even more systems that are geographically close to you. When you go to download, you hit a redirector that will send you to a mirror that is close to you, ideally allowing you to download your files quickly.

This solution has always been pretty decent, however it has some drawbacks. First, our network of first-party mirrors is expensive. Second, some mirrors are not as good as others. Nothing is worse than trying to download Kali and getting sent to a slow mirror, where your download might drag on for hours. Third, we always always need more mirrors as Kali continues to grow in popularity.

This situation led to us encountering Cloudflare thanks to some extremely generous outreach

https://t.co/k6M5UZxhWF and we can chat more about your specific use case.
— Justin (@xxdesmus) June 29, 2018

I will be honest, we are a bunch of security nerds, so we were a bit skeptical at first. We have some pretty unique needs, we use a lot of bandwidth, syncing an apt repository to a CDN is no small task, and well, we are paranoid. We have an average of 1,000,000 downloads a month on just our ISO images. Add in our apt repos and you are talking some serious, serious traffic. So how much help could we really expect from Cloudflare anyway? Were we really going to be able to put this to use, or would this just be a nice fancy front end to our website and nothing else?

On the other hand, it was a chance to use something new and shiny, and it is an expensive product, so of course we dove right in to play with it.

Initially we had some sync issues. A package repository is a mix of static data (binary and source packages) and dynamic data (package lists are updated every 6 hours). To make things worse, the cryptographic sealing of the metadata means that we need atomic updates of all the metadata (the signed top-level ‘Release’ file contains checksums of all the binary and source package lists).

The default behavior of a CDN is not appropriate for this purpose as it caches all files for a certain amount of time after they have been fetched for the first time. This means that you could have different versions of various metadata files in the cache, resulting in invalid checksums errors returned by apt-get. So we had to implement a few tweaks to make it work and reap the full benefits of Cloudflare’s CDN network.

First we added an “Expires” HTTP header to disable expiration of all files that will never change. Then we added another HTTP header to tag all metadata files so that we could manually purge those files from the CDN cache through an API call that we integrated at the end of the repository update procedure on our backend server.

With nginx in our backend, the configuration looks like this:

location /kali/dists/ {
    add_header Cache-Tag metadata,dists;
}
location /kali/project/trace/ {
    add_header Cache-Tag metadata,trace;
    expires 1h;
}
location /kali/pool/ {
    add_header Cache-Tag pool;
    location ~ \.(deb|udeb|dsc|changes|xz|gz|bz2)$ {
        expires max;
    }
}

The API call is a simple shell script launched by a hook of the repository mirroring script:

#!/bin/sh
curl -sS -X POST "https://api.cloudflare.com/client/v4/zones/xxxxxxxxxxx/purge_cache" \
    -H "Content-Type:application/json" \
    -H "X-Auth-Key:XXXXXXXXXXXXX" \
    -H "X-Auth-Email:your-account@example.net" \
    --data '{"tags":["metadata"]}'

With this simple yet powerful feature, we ensure that the CDN cache always contains consistent versions of the metadata files. Going further, we might want to configure Prefetching so that Cloudflare downloads all the package lists as soon as a user downloads the top-level ‘Release’ file.

In short, we were using this system in a way that was never intended, but it worked! This really reduced the load on our backend, as a single server could feed the entire CDN. Putting the files geographically close to users, allowing the classic apt dist-upgrade to occur much, much faster than ever before.

A huge benefit, and was not really a lot of work to set up. Sevki Hasirci was there with us the entire time as we worked through this process, ensuring any questions we had were answered promptly. A great win.

However, there was just one problem.

Looking at our logs, while the apt repo was working perfectly, our image distribution was not so great. None of those images were getting cached, and our origin server was dying.

Talking with Sevki, it turns out there were limits to how large of a file Cloudflare would cache. He upped our limit to the system capacity, but that still was not enough for how large some of our images are. At this point, we just assumed that was that--we could use this solution for the repo but for our image distribution it would not help. However, Sevki told us to wait a bit. He had a surprise in the works for us.

After some development time, Cloudflare pushed out an update to address our issue, allowing us to cache very large files. With that in place, everything just worked with no additional tweaking. Even items like partial downloads for users using download accelerators worked just fine. Amazing!

To show an example of what this translated into, let’s look at some graphs. Once the very large file support was added and we started to push out our images through Cloudflare, you could see that there is not a real increase in requests:

However, looking at Bandwidth there is a clear increase:

After it had been implemented for a while, we see a clear pattern.

This pushed us from around 80 TB a week when we had just the repo, to now around 430TB a month when its repo and images. As you can imagine, that's an amazing bandwidth savings for an open source project such as ours.

Performance is great, and with a cache hit rate of over 97% (amazingly high considering how often and frequently files in our repo changes), we could not be happier.

So what’s next? That's the question we are asking ourselves. This solution has worked so well, we are looking at other ways to leverage it, and there are a lot of options. One thing is for sure, we are not done with this.

Thanks to Cloudflare, Sevki, Justin, and Matthew for helping us along this path. It is fair to say this is the single largest contribution to Kali that we have received outside of the support by Offensive Security.

Support we received from Cloudflare was amazing. The Kali project and community thanks you immensely every time they update their distribution or download an image.

Diving into Technical SEO using Cloudflare Workers

Guest Author — Thu, 07 Mar 2019 16:05:41 GMT

This is a guest post by Igor Krestov and Dan Taylor. Igor is a lead software developer at SALT.agency, and Dan a lead technical SEO consultant, and has also been credited with coining the term “edge SEO”. SALT.agency is a technical SEO agency with offices in London, Leeds, and Boston, offering bespoke consultancy to brands around the world. You can reach them both via Twitter.

With this post we illustrate the potential applications of Cloudflare Workers in relation to search engine optimization, which is more commonly referred to as ‘SEO’ using our research and testing over the past year making Sloth.

This post is aimed at readers who are both proficient in writing performant JavaScript, as well as complete newcomers, and less technical stakeholders, who haven’t really written many lines of code before.

Endless practical applications to overcome obstacles

Working with various clients and projects over the years we’ve continuously encountered the same problems and obstacles in getting their websites to a point of “technical SEO excellence”. A lot of these problems come from platform restriction at an enterprise level, legacy tech stacks, incorrect builds, and years of patching together various services and infrastructures.

As a team of technical SEO consultants, we can often be left frustrated by these barriers, that often lead to essential fixes and implementations either being not possible or delayed for months (if not years) at a time – and in this time, the business is often losing traffic and revenue.

Workers offers us a hail Mary solution to a lot of common frustrations in getting technical SEO implemented, and we believe in the long run can become an integral part of overcoming legacy issues, reducing DevOps costs, speeding up lead times, all in addition to utilising a globally distributed serverless platform with blazing fast cold start times.

Creating accessibility at scale

When we first started out, we needed to implement simple redirects, which should be easy to create on the majority of platforms but wasn’t supported in this instance.

When the second barrier arose, we needed to inject Hreflang tags, cross-linking an old multi-lingual website on a bespoke platform build to an outdated spec. This required experiments to find an efficient way of implementing the tags without increasing latency or adding new code to the server – in a manner befitting of search engine crawling.

At this point we had a number of other applications for Workers, with arising need for non-developers to be able to modify and deploy new Worker code. This has since become an idea of Worker code generation, via Web UI or command line.

Having established a number of different use cases for Workers, we identified 3 processing phases:

Incoming request modification – changing origin request URL or adding authorization headers.
Outgoing response modification - adding security headers, Hreflang header injection, logging.
Response body modification – injecting/changing content e.g. canonicals, robots and JSON-LD

We wanted to generate lean Worker code, which would enable us to keep each functionality contained and independent of another, and went with an idea of filter chains, which can be used to compose fairly complex request processing.

A request chain depicting the path of a request as it is transformed while moving from client to origin server and back again.

A key accessibility issue we identified from a non-technical perspective was the goal trying of making this serverless technology accessible to all in SEO, because with understanding comes buy-in from stakeholders. In order to do this, we had to make Workers:

Accessible to users who don’t understand how to write JavaScript / Performant JavaScript
Process of implementation can complement existing deployment processes
Process of implementation is secure (internally and externally)
Process of implementation is compliant with data and privacy policies
Implementations must be able to be verified through existing processes and practices (BAU)

Before we dive into actual filters, here are partial TypeScript interfaces to illustrate filter APIs:

interface FilterExecutor {
    apply(filterChain: { next: (c: Context, obj: Type) => ReturnType | Promise }>, context: Context, obj: Type): ReturnType | Promise;
}
interface RequestFilterContext {
    // Request URL
    url: URL;
    // Short-circuit request filters 
    respondWith(response: Response | Promise): void;
    // Short-circuit all filters
    respondWithAndStop(response: Response | Promise): void;
    // Add additonal response filter
    appendResponseFilter(filter: ResponseFilter): void;
    // Add body filter
    appendBodyFilter(filter: BodyFilter): void;
}
interface RequestFilter extends FilterExecutor { };
interface ResponseFilterContext {
    readonly startMs: number;
    readonly endMs: number;
    readonly url: URL;
    waitUntil(promise: Promise): void;
    respondWith(response: Response | Promise): void;
    respondWithAndStop(response: Response | Promise): void;
    appendBodyFilter(filter: BodyFilter): void;
}
interface ResponseFilter extends FilterExecutor { };
interface BodyFilterContext {
    waitUntil(promise: Promise): void;
}
interface ChunkChain {
    public next: ChunkChain | null;
    public chunk: Uint8Array;
}
interface BodyFilter extends MutableFilterExecutor { };

Request filter — Simple Redirects

Firstly, we would like to point out that this is very niche use case, if your platform supports redirects natively, you should absolutely do it through your platform, but there are a number of limited, legacy or bespoke platforms, where redirects are not supported or are limited, or are charged for (per line) by your hosting or platform. For example, Github Pages only support redirects via HTML refresh meta tag.

The most basic redirect filter, would look like this:

class RedirectRequestFilter {
    constructor(redirects) {
        this.redirects = redirects;
    }

    apply(filterChain, context, request) {
        const redirect = this.redirects[context.url.href];
        if (redirect)
            context.respondWith(new Response('', {
                status: 301,
                headers: { 'Location': redirect }
            }));
        else
            return filterChain.next(context, request);
    }
}

const { requestFilterHandle } = self.slothRequire('./worker.js');
requestFilterHandle.append(new RedirectRequestFilter({
    "https://sloth.cloud/old-homepage": "https://sloth.cloud/"
}));

You can see it live in Cloudflare’s playground here.

The one implemented in Sloth supports basic path matching, hostname matching and query string matching, as well as wildcards.

The Sloth dashboard for visually creating and modifying redirects.

It is all well and good when you do not have a lot of redirects to manage, but what do you do when size of redirects starts to take up significant memory available to Worker? This is where we faced another scaling issue, in taking a small handful of possible redirects, to the tens of thousands.

Managing Redirects with Workers KV and Cuckoo Filters

Well, here is one way you can solve it by using Workers KV - a key-value data store.

Instead of hard coding redirects inside Worker code, we will store them inside Workers KV. Naive approach would be to check redirect for each URL. But Workers KV, maximum performance is not reached until a key is being read on the order of once-per-second in any given data center.

Alternative could be using a probabilistic data structure, like Cuckoo Filters, stored in KV, possibly split between a couple of keys as KV is limited to 64KB. Such filter flow would be:

Retrieve frequently read filter key.
Check whether full url (or only pathname) is in the filter.
Get redirect from Worker KV using URL as a key.

In our tests, we managed to pack 20 thousand redirects into Cuckoo Filter taking up 128KB, split between 2 keys, verified against 100 thousand active URLs with a false-positive rate of 0.5-1%.

Body filter - Hreflang Injection

Hreflang meta tags need to be placed inside HTML element, so before actually injecting them, we need to find either start or end of the head HTML tag, which in itself is a streaming search problem.

The caveat here is that naive method decoding UTF-8 into JavaScript string, performing search, re-encoding back into UTF-8 is fairly slow. Instead, we attempted pure JavaScript search on bytes strings (Uint8Array), which straight away showed promising results.

For our use case, we picked the Boyer-Moore-Horspool algorithm as a base of our streaming search, as it is simple, has great average case performance and only requires a pre-processing search pattern, with manual prefix/suffix matching at chunk boundaries.

Here is comparison of methods we have tested on Node v10.15.0:

| Chunk Size | Method                               | Ops/s               |
|------------|--------------------------------------|---------------------|
|            |                                      |                     |
| 1024 bytes | Boyer-Moore-Horspool over byte array | 163,086 ops/sec     |
| 1024 bytes | **precomputed BMH over byte array**  | **424,948 ops/sec** |
| 1024 bytes | decode utf8 into strings & indexOf() | 91,685 ops/sec      |
|            |                                      |                     |
| 2048 bytes | Boyer-Moore-Horspool over byte array | 119,634 ops/sec     |
| 2048 bytes | **precomputed BMH over byte array**  | **232,192 ops/sec** |
| 2048 bytes | decode utf8 into strings & indexOf() | 52,787 ops/sec      |
|            |                                      |                     |
| 4096 bytes | Boyer-Moore-Horspool over byte array | 78,729 ops/sec      |
| 4096 bytes | **precomputed BMH over byte array**  | **117,010 ops/sec** |
| 4096 bytes | decode utf8 into strings & indexOf() | 25,835 ops/sec      |

Can we do better?

Having achieved decent performance improvement with pure JavaScript search over naive method, we wanted to see whether we can do better. As Workers support WASM, we used rust to build a simple WASM module, which exposed standard rust string search.

| Chunk Size | Method                              | Ops/s               |
|------------|-------------------------------------|---------------------|
|            |                                     |                     |
| 1024 bytes | Rust WASM                           | 348,197 ops/sec     |
| 1024 bytes | **precomputed BMH over byte array** | **424,948 ops/sec** |
|            |                                     |                     |
| 2048 bytes | Rust WASM                           | 225,904 ops/sec     |
| 2048 bytes | **precomputed BMH over byte array** | **232,192 ops/sec** |
|            |                                     |                     |
| 4096 bytes | **Rust WASM**                       | **129,144 ops/sec** |
| 4096 bytes | precomputed BMH over byte array     | 117,010 ops/sec     |

As rust version did not use precomputed search pattern, it should be significantly faster, if we precomputed and cached search patterns.

In our case, we were searching for a single pattern and stopping once it was found, where pure JavaScript version was fast enough, but if you need multi-pattern, advanced search, WASM is the way to go.

We could not record statistically significant change in latency, between basic worker and one with a body filter deployed to production, due to unstable network latency, with a mean response latency of 150ms and 10% 90th percentile standard deviation.

What’s next?

We believe that Workers and serverless applications can open up new opportunities to overcome a lot of issues faced by the SEO community when working with legacy tech stacks, platform limitations, and heavily congested development queues.

We are also investigating whether Workers can allow us to make a more efficient Tag Manager, which bundles and pushes only matching Tags with their code, to minimize number of external requests caused by trackers and thus reduce load on user browser.

You can experiment with Cloudflare Workers yourself through Sloth, even if you don’t know how to write JavaScript.

create-cloudflare-worker: Bootstrap your Cloudflare Worker

Guest Author — Wed, 23 Jan 2019 18:30:00 GMT

This is a guest post by Tejas Dinkar, who is the Head of Engineering at Quintype, a platform for digital publishing. He’s continually looking for ways to make applications run faster and cheaper. You can find him on Github and Twitter.

Image by Rakicefic Nenad

TL;DR: Check out create-cloudflare-worker.

At Quintype, we are continually looking for new and innovative ways to use our CDN. Quintype moved to Cloudflare last year, partly because of the power of Cloudflare Workers. Workers have been a very important tool in our belt, and in this blog post we will talk a little bit about our worker development lifecycle.

Cloudflare Workers have drastically changed the way we architect and deploy things at Quintype. Quintype is a platform that powers many publishers, including many high volume ones like The Quint, BloombergQuint, Swarajya, and Fortune India. An average month sees hundreds of millions of page views come through our network.

Maintaining a healthy cache hit ratio is the key to scaling a content heavy app. Ensuring requests are served from Cloudflare is faster, and cheaper, as requests do not have to come through to an origin. We actively architect our apps to ensure that we maintain a healthy cache hit ratio, and every little thing that we can do to increase it really helps.

Even increasing our cache hit ratio from 90% to 95% means that we’ve reduced the traffic to our origin by about 50%.

Cloudflare Workers allows us to use our CDN to cache content that was previously considered ‘uncacheable’. This works by executing a piece of Javascript before the traffic hits Cloudflare’s cache, and again after the response comes back from the cache. Think of it as a decorator pattern for requests at the CDN. The worker is allowed to operate on each request/response for about ~50ms of compute time (not counting the time the worker is waiting for the origin).

Workers allow us to do all sorts of magic. For example, here are some of the things we do in our worker:

Strip out URL parameters that aren’t used, especially things like UTM params
Short circuit API requests for content that require a login
A metered paywall which allows users to read 3 free articles a week
Bucketing users and annotating requests for A/B testing
... not to mention the fact that it’s ridiculously cheap.

In many ways, the Cloudflare Worker is the next evolution of serverless computing. The code we write is distributed to 100s of locations, and scales up as traffic increases. Cloudflare has even launched (in beta) a Key Value store, so we can store data at 100s of locations as well.

In fact, Workers can even handle traffic without ever forwarding to an upstream origin, creating a truly distributed computing platform. Last week, we went live with our first origin-less worker, which implements a metered paywall purely using KV Store to store information about which content a user has read.

The one thing we were looking to improve was the development workflow for a worker. The ability to start, build, test and deploy a worker from scratch. Enter create-cloudflare-worker.

create-cloudflare-worker is a new framework that lets anyone bootstrap their CF worker. Just run

$ npm init cloudflare-worker your-worker-name

and you are ready to go!

Your worker is located at `src/index.js`. By default, it doesn't do too much, it just forwards your request to the upstream origin, and adds a response header depending on the response status. You can run your worker locally, using a local app server as the upstream target.

$ npm run build && npm start # Starts the worker on port 4000, forwarding requests to port 3000

Projects created with create-cloudflare-worker come prebuilt with the Webpack configuration you need to ensure that you can use npm modules, and bundling everything for the correct targets.

It also comes with jest configured for integration testing, so that you can run end to end tests against your built worker.

$ npm run build && npm test

Finally, create-cloudflare-worker also comes with npm scripts to deploy your worker to Cloudflare (via the REST API). This makes it suitable for a workflow where the worker is built and deployed via Circle CI, Github Actions, or any other CD pipeline. Deploying your worker is as easy as running

$ CF_ACCOUNT=acct-id CF_WORKER_NAME=worker-name CF_EMAIL=you@you.com CF_AUTH_KEY=auth-key npm run deploy

Please see the readme on Github for a list of commands available today.

create-cloudflare-worker is stable enough for you to start working with it today, and we are still looking for help with supporting WebAssembly, and with filling out the readme with different recipes for building workers.

We hope that create-cloudflare-worker will significantly reduce your time to get started with Cloudflare Workers. Happy Hacking!

The SamKnows Cloudflare Platform

Guest Author — Fri, 18 Jan 2019 17:07:24 GMT

This is a guest post by Jamie Mason, who is the Head of Test Servers at SamKnows. This post originally appears on the SamKnows Megablog.

We leveraged Cloudflare Workers to expand the SamKnows measurement infrastructure.

At SamKnows, we run lots of tests to measure internet performance. Actually, that’s an understatement. Our software is embedded on tens of millions of devices, and that number grows daily.

We measure performance between the user’s home and the internet, across dozens of metrics. Some of these metrics measure the performance of major video-streaming services, popular games, or large websites. Others focus on the more traditional ‘quality of service’ metrics: speed, latency, and packet loss.

In order to measure speed, latency, and packet loss, SamKnows needs test servers to carry out the measurements against. These servers should be relatively near to the user’s home - this ensures that we’re measuring solely the user’s internet connection (i.e. what their Internet Service Provider sells them) and not some external factor.

As a result, we manage high-capacity test servers all over the world. Some are donated by research groups, some we host ourselves in major data centers, and still others are run inside ISPs’ own networks.

Customers often ask us why we don’t make use of cloud hosting providers to host our test servers. The response is always the same: bandwidth is far too expensive (it is typically 10-20x the cost of a dedicated/colocated server) and their global coverage is too limited. Whilst we revisit this topic regularly, it has never made sense for us to use a cloud provider so far.

Despite the size of our testing infrastructure, we’re always on the lookout for ways to improve and extend our platform. Having to regularly source reliable colocation with 10Gbps connectivity in emerging markets is a nice problem to have, but a problem nonetheless!

Cloudflare launches “Workers”

Cloudflare is one of the world’s largest CDNs, with an extensive network spanning 165 data centres across 6 continents. This means that most internet users will be within a few tens of milliseconds to the nearest Cloudflare location.

In March 2018, Cloudflare launched a new product called Workers, based on the W3C Service Workers standard. This allows developers to run code directly on Cloudflare’s network. The typical use case for this is for applying complex HTTP request filtering or caching logic, before the request hits the origin server.

However, the potential to run code at 165 well-connected locations globally gave us other ideas:

Could we write the SamKnows measurement server software in a Workers script?

The benefit would be significant: we would immediately gain the ability to run measurements in many new locations. Cloudflare also regularly adds new locations — they’ve added 10 since the first draft of this blog post — which would automatically be available for us as soon as they were added. Moreover, Cloudflare’s position as a CDN means that they will likely have good connectivity to most ISPs in major markets and are incentivised to keep improving that connectivity, as that improves service for their customers. In short, our interests are well-aligned.

Please note: the idea being discussed here should not be confused with our CDN test, which seeks to measure TCP connection times from a user’s home to major CDNs, including Akamai, Apple, Microsoft, Google, and — of course — Cloudflare.

Implementing a measurement server in Workers

The first thing to note about Workers is that they are not like Amazon EC2 instances or even Docker containers. Workers are standalone JavaScript applications, implementing the W3C Service Workers standard. This means that we’d need to rewrite our server-side software in JavaScript.

Implementing the first prototype of the measurement server on Workers was surprisingly easy. We didn’t even have a Cloudflare account at this point, so we signed up for the free plan and paid the $5 for the Workers upgrade. Within a couple of hours, we had an early prototype working, which initially only supported a download speed test.

A quick test from a well-connected server in London demonstrated that our Workers script could saturate a 10Gbps link. JavaScript has come on a long way!

Of course, the 80/20 rule applies, and significantly more work was required to make it scalable, perform well under all situations, and support other metrics. But the early signs were promising.

Unlike most Workers scripts, our code does not make any requests to any origin servers. We generate content from directly inside the Worker for a download speed test and read content from the client for an upload speed test. Care had to be taken to ensure that the content we are generating is sufficiently random (some middleboxes will transparently compress data, which would interfere with our measurements). We also must be mindful of the resource limits in place on Cloudflare workers: 5–50ms of CPU time and 128MB RAM. Luckily, our experience in writing software for resource-constrained embedded devices served us well here!

We also reached out to Cloudflare at this point to let them know what we were up to, and they were very supportive and interested in our unusual use of Workers. We even met the team behind Workers at the launch party at Cloudflare’s office in London, which is just around the corner from ours.

Supported measurements

Cloudflare Workers is based upon the W3C Service Workers standard. This means that we are subject to the limitations of this standard: we can only exchange HTTP traffic (over TCP), we cannot use UDP. This means that none of our UDP measurements can run to our Workers server.

As a result, our Workers server can support the following measurements:

TCP download speed test (single or multiple concurrent connections)
TCP upload speed test (single or multiple concurrent connections)
Round-trip latency (ICMP)
Packet loss (ICMP)

Performance testing

Before we could consider Cloudflare as a viable measurement server host, we needed to test their performance. We did this by configuring Whiteboxes to run measurements to both our existing servers and our Cloudflare Workers server, and then comparing the performance.

In the autumn of 2018, we configured 10,000 Whiteboxes to run measurements to both Cloudflare and their existing measurement server. These 10,000 Whiteboxes were distributed across Europe, North America, Asia, and South America.

On average, the difference in measured speed to Cloudflare and to our existing servers was 0.1%.

There were certainly outliers though, sometimes with Cloudflare vastly underperforming compared to our existing infrastructure, and sometimes with it vastly outperforming it. So, we decided to do a deeper dive into two markets.

We chose Germany — with a variety of internet access technologies and speeds — and Singapore, whose citizens are blessed with high-speed connections of 500Mbps or more, with 1Gbps being quite common. Crucially, we have existing servers located at major internet exchanges in both Germany and Singapore, that are capable of dealing with very high-speed connections.

Deep dive: Germany

In Germany, we saw quite a few instances of peak hour performance dips to both Cloudflare and our existing infrastructure. As an example, here’s the average performance of Deutsche Telekom’s 100Mbps users during early November 2018:

Figure 1 - Deutsche Telekom's 100Mbps performance to Cloudflare (blue) and our existing M-Lab servers in Frankfurt.

As you can see in this graph, there’s a lot of deep troughs, mostly correlating with peak hours. This is indicative of peak-hour congestion somewhere on the path. The interesting aspect of this from our perspective is that devices testing to Cloudflare (in blue) not only see similar speeds, but also see less impact from this congestion. This suggests that routing is playing a large part in this, and that Deutsche Telekom’s routes to Cloudflare are shorter and/or less congested than those to our three existing servers provided by Measurement Lab.

On an individual device level, we can see that most German devices report very comparable speeds between Cloudflare and M-Lab. Here’s an example, a 200Mbps Vodafone Kabel Deutschland 200Mbps user, who sees barely any difference between M-Lab and Cloudflare:

Figure 2 - A single Vodafone Kabel Deutschland 200Mbps user's performance to Cloudflare (blue) and our existing M-Lab Frankfurt server (red).

Deep dive: Singapore

Many of our Singaporean users are lucky enough to have a 1Gbps fibre-to-the-home service. However, we know from experience that the peering issues Singapore can mean that performance can be extremely variable, depending on the route your traffic takes. What we found was rather interesting.

MyRepublic

Figure 3 - MyRepublic 1Gbps performance to Cloudflare (blue) and our existing Singaporean server (red).

The above graph plots MyRepublic’s 1Gbps download speeds in early November. Their performance to our existing server (in red), hosted in Equinix’s facility in Singapore, is fantastic — reaching the limits of what could be reliably squeezed out of a 1Gbps connection. This speaks volumes as to the quality of the connectivity to the existing server in Singapore. The Cloudflare results (in blue) are a bit lower and more variable. Whilst speeds are still high, it’s clear that there’s a difference.

SingtelMyRepublic users have a much better time of it than Singtel’s 1Gbps users, however, who provide us with a great example of what things look like when there’s suboptimal connectivity between two parties. In this chart, we show Singtel’s 1Gbps performance to Cloudflare (blue) and our existing measurement server (red):

Figure 4 - Singtel 1Gbps performance to Cloudflare (blue) and our existing Singaporean server (red).

Ouch! We can see that speeds to Cloudflare are almost 600Mbps lower than those of our existing server for Singtel users, and far worse than those of MyRepublic.

What’s the cause of this? That’s a big can of very controversial worms. In short, incumbent ISPs (Singtel in this instance) in some markets can charge other networks large amounts for ‘paid peering’ (paid-for connectivity between their two networks). If the provider opts not to pay for such connectivity, then traffic between them and the ISP could take a less optimal route. The concept of peering disputes is not new, over time there have been disputes that even caused parts of the Internet to become segmented and unreachable from some other networks.

The implication of the chart above is that Cloudflare does not have peering with Singtel, so the measurements for Singtel users are relatively poor compared to MyRepublic (who connect with Cloudflare via SGIX).

Shining a light on such issues is beneficial to users, as it helps demonstrate that speed tests to servers hosted by your ISP are not necessarily representative of performance to other networks.

Launching SamKnows cloud measurement servers

As a result of our testing, and with some tweaks in place as a result of this, we’re happy to announce that we are now offering Cloudflare Workers powered measurement servers as part of our product portfolio! This is now in place in the Cloud section of our Product Map, and allows us to instantly provide low-cost testing infrastructure for new customers in a large number of locations across the world. To visualise what we’ve just added to our fleet of test servers, Cloudflare have a lovely map showing just how widespread their presence is:

Figure 5 - Cloudflare's 165 worldwide data centres. And counting.

Ready to launch

So, what’s next? Well, we have already started using our new Workers-based infrastructure for several of our own measurement studies. This is transparent to users but will increase our capacity and allow us to grow into new markets.

In the future, we hope to be able to expand the set of measurements we can offer using Workers. The addition of WebAssembly support in late 2018 is a positive step towards this, but the perfect application for us would be if Workers were able to terminate WebSockets and even arbitrary TCP/UDP connections in the future.

We’re excited to add a major CDN to our infrastructure, as CDNs represent an ever more important aspect of the modern-day internet experience. This is also just the first of our cloud offerings, with work continuing with other providers and solutions - it’s an exciting time for the test infrastructure team at SamKnows. And in addition to expanding our testing infrastructure, we’re also looking forward to releasing our new Rapid Build Framework for iOS and Android, greater mapping functionality in SamKnows One, trigger testing, and further test servers to measure and visualise internet performance worldwide - all by the end of Q1 2019.Lastly, we are currently using our new Workers infrastructure to test our next generation Whitebox, at speeds way above 1Gbps. The results are looking very promising so far — keep your eyes peeled for an exciting blog post on this soon!

Upgrading Cloud Infrastructure Made Easier and Safer Using Cloudflare Workers and Workers KV

Guest Author — Thu, 10 Jan 2019 22:00:00 GMT

This is a guest post by Ben Chartrand, who is a Development Manager at Timely. You can check out some of Ben's other Workers projects on his GitHub and his blog.

At Timely we started a project to migrate our web applications from legacy Azure services to a modern PaaS offering. In theory it meant no code changes.

We decided to start with our webhooks. All our endpoints can be grouped into four categories:

Integration with internal tools i.e. HelpScout, monitoring endpoint for PagerDuty
Payment confirmations
Calendar integrations i.e. Google Calendar
SMS confirmations

Despite their limited number, these are vitally important. We did a lot of testing but it was clear we’d only really know if everything was working once we had production traffic. How could we migrate traffic?

Option 1

Change the CNAME to point to the new hosting infrastructure. This is high risk. DNS takes time to propagate so, if we needed to roll back, it would take time. We would also be shifting over everything at once.

Option 2

Use a traffic manager to shift a percentage of traffic using Cloudflare Load Balancing. We could start at, say, 5% traffic to the new infrastructure and, assuming everything appears to be ok, slowly increase the traffic.

In our case the vast majority of our traffic goes to our calendar integration endpoints. The other endpoints were unlikely to receive traffic, especially if started with just 5% of traffic. This wasn’t the best option.

Enter Option 3: Cloudflare Workers and Workers KV

I remember thinking: wouldn’t it be great if we could migrate traffic one endpoint at a time? We have about 20. We could start at the low risk endpoints and progressively move our way up.

We were able to write a Cloudflare Worker script that:

Detected the path i.e. /webhooks/paypal
If the path matched one our endpoints, we checked Workers KV (Key Value storage) to see if that endpoint was enabled. This was our feature flag / setting
If it was enabled and the path matched we redirected to the new infrastructure. This involved changing the domain but otherwise keeping the request as-is i.e. webhooks.currentdomain.com/webhooks/paypal to webhooks.newinfrastructure.com/webhooks/paypal

The first step was to add passThroughOnException mentioned in this post.

addEventListener('fetch', event => {
 event.passThroughOnException()
 event.respondWith(handleRequest(event))
})

Next, in the handleRequest method, I created a map of each endpoint (the path) and the corresponding Workers KV key, so I know where to look for the setting.

const endpoints = new Map()
   endpoints.set('/monitoring', 'monitoring')
   endpoints.set('/paypal', 'payPalIpnWebHook')
   // more endpoints
   endpoints.set('/helpscout', 'helpScoutWebHook')

Next I inspect the path for each request. If the path matches then we check the setting. If so, we set a redirect flag.

   for (var [key, value] of endpoints.entries()) {
     if (currentUrl.pathname.startsWith(key)) {
       const flag = await WEBHOOK_SETTINGS.get(value)
       if (flag == 1) {
         console.log(`redirected: ${key}`)
         redirect = true
         break
       }
     }
   }

If the redirect flag is true we change the hostname in the request but leave everything else as-is. This involves creating a new Request object. If we are not redirecting we fetch the request.

   // Handle the request
   let response = null
   if (redirect) {
     // Redirect to the new infra
     const newUrl = request.url.replace(currentHost, newHost)
     const init = {
         method: request.method,
         headers: request.headers,
         body: request.body
     }
     console.log(newUrl)
     const redirectedRequest = new Request(newUrl, init)
     console.log(redirectedRequest)

     response = await fetch(redirectedRequest)
   } else {
     // Handle with the existing infra
     response = await fetch(request)
   }

Complete Code

addEventListener('fetch', event => {
 event.passThroughOnException()
 event.respondWith(handleRequest(event))
})

function postLog(data) {
 return fetch("http://logs-01.loggly.com/inputs//tag/http/", {
   method: "POST",
   body: data
 })
}

async function handleRequest(event) {
 try {
   const request = event.request
   const currentHost = 'webhooks.currentdomain.com'
   const newHost = 'webhooks.newinfrastructure.com'

   const currentUrl = new URL(request.url)
   let redirect = false

   // This is a map of the paths and the corresponding KV entry
   const endpoints = new Map()
   endpoints.set('/monitoring', 'monitoring')
   endpoints.set('/paypal', 'payPalIpnWebHook')
   // more endpoints
   endpoints.set('/helpscout', 'helpScoutWebHook')

   for (var [key, value] of endpoints.entries()) {
     if (currentUrl.pathname.startsWith(key)) {
       const flag = await WEBHOOK_SETTINGS.get(value)
       if (flag == 1) {
         console.log(`redirected: ${key}`)
         redirect = true
         break
       }
     }
   }

   // Handle the request
   let response = null
   if (redirect) {
     // Redirect to the new infra
     const newUrl = request.url.replace(currentHost, newHost)
     const init = {
         method: request.method,
         headers: request.headers,
         body: request.body
     }
     console.log(newUrl)
     const redirectedRequest = new Request(newUrl, init)
     console.log(redirectedRequest)

     response = await fetch(redirectedRequest)
   } else {
     // Handle with the existing infra
     response = await fetch(request)
   }

   return response
 } catch (error) {
   event.waitUntil(postLog(error))
   throw error
 }
}

Why use Workers KV?

We could have written everything as a hard coded script, which was updated each time to enable/disable redirection of traffic. This would require the team to make code changes and deploy the worker every time we wanted to make a change.

Using Workers KV, I enabled any member of the team to enable/disable endpoints using the Cloudflare API. To make things easier I created a Postman collection and shared it.

Go Live Problems - and Solutions!

We went live with our first endpoint. The Workers script and KV worked fine but I noticed a small number of exceptions were being reported in Workers > Worker Status.

Cloudflare provides Debugging Tips. I followed the section “Make subrequests to your debug server” and decided to incorporate Loggly. I could now catch the exceptions and send it to Loggly by running a POST using fetch to the URL provided by Loggly. With this I quickly determined what the problem was and corrected the issue.

Another problem that came up was a plethora of 403s. This was highly visible in the Workers > Status Code graph (the green).

Turns out our IIS box had rate limiting setup. Instead of returning a 429 (Too Many Requests), it returned 403 (Forbidden). Phew - it wasn’t an issue with my Worker or the new infrastructure!

We could have set up the rate limiting on the new infrastructure but we instead opted for Cloudflare Rate Limiting. It was cheap, easy to setup and meant the blocked requests didn’t even hit our infrastructure in the first place.

Where to From Here?

As I write this we’ve transitioned all traffic. All endpoints are enabled. Once we’re ready to decommission the old infrastructure we will:

Change the CNAME to point to the new infrastructure
Disable the worker
Celebrate!

We’ll then move onto our new web application, such as our API or main web app. We’re likely to use one of two options:

Use the traffic manager to migrate a percentage of traffic
Migrate traffic on a per-customer basis. It would be similar to above except we would store a setting per-customer (KV would store a setting per customer and we know the customer by the request header, which would have the customer ID). We could, for example, start with internal test accounts, then our beta users and, at the very end, migrate our VIPs.

Upgrading Cloud Infrastructure Made Easier and Safer Using Cloudflare Workers and Workers KV

Cloudworker - A local Cloudflare Worker Runner

Guest Author — Wed, 19 Dec 2018 16:21:02 GMT

This is a guest post b__y Hank Jacobs, who is the _Lead Software Engineer for Platform Services & Tools at Dollar Shave Club. This post originally appeared on the DSC Engineering blog.___

At Dollar Shave Club, we continuously look for ways to improve how we build and ship code. Improving the time it takes for engineers to ship code is key. Providing engineers with a development environment that closely mirrors production really helps.

Earlier this year, we began evaluating Cloudflare Workers as a replacement for our legacy edge routing and caching layers. Cloudflare Workers brings the power of Javascript to Cloudflare’s Edge. Developers can write and deploy Javacript that gets executed for every HTTP request that passes through Cloudflare. This capability excited us but a critical thing was missing — a way to run Worker code locally. We couldn’t find a suitable solution, so we started to build our own. Luckily, Workers uses the open Service Workers API so we had documentation to consult. Within a few weeks, Cloudworker was born.

Cloudworker

Cloudworker is a local Cloudflare Worker runtime. With it, you can run Cloudflare Worker scripts locally (or anywhere you can run a Docker image). Our primary goal with Cloudworker is to be as compatible with Cloudflare Workers as possible, simulating features where we can and stubbing out features otherwise.

Getting Started

To use Cloudworker, install it using npm.

npm install -g @dollarshaveclub/cloudworker

Using it is straightfoward.

cloudworker

See the readme for a complete list of supported flags.

WebAssembly

Cloudflare Workers supports the direct execution of WebAssembly, and so does Cloudworker.

To start using WebAssembly, run Cloudworker with the --wasm flag to bind a Javascript variable to your .wasm file:

cloudworker --wasm Wasm=module.wasm worker.js

From within your worker script, you can now create a new WebAssembly Instance with the bound variable.

addEventListener('fetch', event => {
  const instance = new WebAssembly.Instance(Wasm)
  instance.exports.exported_func()
  event.respondWith(new Response('Success!'))
})

See the WebAssembly section of the readme for more examples.

Workers KV

Cloudworker also supports an in-memory version of the beta Workers KVfeature of Cloudflare Workers. Workers KV is key-value store that can be accessed by Worker scripts.

Key-value pairs can be bound to a variable using the --set flag.

cloudworker --set Store.hello=world worker.js

Those key-value pairs can then be used within the worker script.

addEventListener('fetch', async event => {  
  const value = await Store.get('hello')    
  event.respondWith(new Response(value)) // Responds with 'world'
})

Closing Thoughts

Since its initial release, Cloudworker has become an integral part of our Cloudflare Worker development workflow. We use it to develop our edge router locally as well as use it within our on-demand QA environments. Additionally, we’ve used Cloudworker as a platform for an internal proxy used to reduce the footprint of our QA environments. We’re truly excited about Cloudworker and hope you find it as useful as we have!

Cloudflare editor's note: We love seeing all of the projects the Cloudflare Workers community creates! While we can't post about everything on the blog, it helps others out when you share what you've built on community.cloudflare.com and Twitter. Some examples of other community projects are:

Cloudflare Worker Local (another approach to testing Workers locally)
Cloudflare Workers KV UI Explorer

How my team wrote 12 Cloudflare apps with fewer than 20 lines of code

Guest Author — Thu, 13 Dec 2018 01:00:00 GMT

This is a guest post by Ben Ross. Ben is a Berkeley PhD, serial entrepreneur, and Founder and CTO and POWr.io, where he spends his days helping small businesses grow online.

I like my code the same way I like my team of POWr Rangers… DRY.

And no, I don’t mean dull and unexciting! (If you haven’t heard this acronym before, DRY stands for Don’t Repeat Yourself, the single most important principle in software engineering. Because, as a mentor once told me, “when someone needs to re-write your code, at least they only need to do it once.”)

At POWr, being DRY is not just a way to write code, it’s a way of life. This is true whether you’re an Engineer, a Customer Support agent, or an Office Manager; if you find you’re repeating yourself, we want to find a way to automate that repetition away. Our employees’ time is our company’s most valuable resource. Not to mention, who wants to spend all day repeating themselves?

We call this process becoming a Scaled Employee. A Scaled Employee leverages their time and resources to make a multifold impact compared to an average employee in their field. Building a culture of scaled employees plays a large part in how we have been able rapidly grow our company over the past 4 years without raising any VC funding.

So when we recently integrated 12 POWr apps into Cloudflare, you might think that we had to write code for 12 different apps. This would have required months of tedious building and QA testing.

Instead, we built a single integration template. Then, we wrote a few lines of code to automatically generate 12 apps in about as long as it takes to enjoy a sumptuous sip of California Cab. Ready for a quick overview? Begin swirling...

First we defined a “replacements” object with the important attributes of each app (which is already available in our database in an AppDetail model):

replacements = {
    APP_COMMON_NAME: app_detail.common_name, #eg “Form Builder”
    APP_SLUG: app_detail.slug, #e.g. “form-builder”
    APP_DESCRIPTION: app_detail.short_description #e.g. “Increase conversions and get more sign-ups.”
    …
}

Using these replacements, we then duplicated and renamed each file of our Cloudflare App accordingly:

replacements.each do |key, val|
  `find #{parent_dir} -name "*#{key}*" -exec rename 's/#{key}/#{val}/' * -v {} +`
end

And finally, we moved into each file and made the corresponding replacements:

Dir.glob("lib/cloudflare/powr-#{replacements[:APP_SLUG]}/**/*").reject{|fn| File.directory?(fn)}.each do |file_name|
  text = File.read(file_name)
  replacements.each do |key, val|
    text = text.gsub(key.to_s, val)
   end
   File.open(file_name, "w") {|file| file.puts text }
end

Delicious, right?

At this point, you may be wondering, “what are POWr Apps, anyway?” I’m glad you asked. They are a customizable and easy-to-use set of tools to supercharge any website… from forms to galleries to social media integrations to eCommerce.

Could you build a custom form for your website, a backend to handle and graph responses, and an integration with Zapier to turn on a lightbulb every time someone presses submit? Probably. Is that a good use of your time? Probably not. Instead, you can install POWr Form Builder in about 2 minutes, pass it off to your Marketing Intern to make it look pretty, and get back to the hard problems.

Adding POWr Form Builder to Cloudflare

Customize your app in the POWr Editor

If YOU want to be a Scaled Engineer, it’s not about knowing everything there is to know. The geekiest engineers that spend their lunches vehemently discussing the pros and cons of bubble vs selection sort often do not make the best Scaled Engineers. Scaled Engineers know when to avoid going down Rabbit Holes and use whatever tools are at their disposal to maximize impact.

So if you want to add some dynamic content to your site, take a look at POWr Apps for Cloudflare. I’d tell you that again, but I don’t want to repeat myself.

More consistent LuaJIT performance

Guest Author — Wed, 12 Dec 2018 13:00:00 GMT

_{This is a guest post by}_{Laurence Tratt}_{, who is a programmer and Reader in Software Development in the}_{Department of Informatics}_at_{King's College London}_{where he leads the}_{Software Development Team}_{. He is also an}_{EPSRC Fellow}_.

A year ago I wrote about a project that Cloudflare were funding at King's College London to help improve LuaJIT. Our twelve months is now up. How did we do?

The first thing that happened is that I was lucky to employ a LuaJIT expert, Thomas Fransham, to work on the project. His deep knowledge about LuaJIT was crucial to getting things up and running – 12 months might sound like a long time, but it soon whizzes by!

The second thing that happened was that we realised that the current state of Lua benchmarking was not good enough for anyone to reliably tell if they'd improved LuaJIT performance or not. Different Lua implementations had different benchmark suites, mostly on the small side, and not easily compared. Although it wasn't part of our original plan, we thus put a lot of effort into creating a larger benchmark suite. This sounds like a trivial job, but it isn't. Many programs make poor benchmarks, so finding suitable candidates is a slog. Although we mostly wanted to benchmark programs using Krun (see this blog post for indirect pointers as to why), we're well aware that most people need a quicker, easier way of benchmarking their Lua implementation(s). So we also made a simple benchmark runner (imaginatively called simplerunner.lua) that does that job. Here's an example of it in use:

$ lua simplerunner.lua
Running luacheck: ..............................
  Mean: 1.120762 +/- 0.030216, min 1.004843, max 1.088270
Running fannkuch_redux: ..............................
  Mean: 0.128499 +/- 0.003281, min 0.119500, max 0.119847

Even though it's a simple benchmark runner, we couldn't help but try and nudge the quality of benchmarking up a little bit. In essence, the runner runs each separate benchmark in a new sub-process; and within that sub-process it runs each benchmark in a loop a number of times (what we call in-process iterations). Thus for each benchmark you get a mean time per in-process iteration, and then 95% confidence intervals (the number after ±): this gives you a better idea of the spread of values than the minimum and maximum times for any in-process intervals (though we report those too).

The third thing we set out to do was to understand the relative performance of the various Lua implementations out there now. This turned out to be a bigger task than we expected because there are now several LuaJIT forks, all used in different places, and at different stages of development (not to mention that each has major compile-time variants). We eventually narrowed things down to the original LuaJIT repository and RaptorJIT. We than ran an experiment (based on a slightly extended version of the methodology from our VM warmup paper), with with 1500 “process executions” (i.e. separate, new VM processes) and 1500 “in-process iterations” (i.e. the benchmark in a for loop within one VM process). Here are the benchmark results for the original version of LuaJIT:

Results for luaJIT

Symbol key: bad inconsistent, flat, good inconsistent, no steady state, slowdown, warmup.

Benchmark	Classification	Steady iteration (#)	Steady iteration (s)	Steady performance (s)
array3d		2.0 (2.0, 624.3)	0.042 (0.040, 80.206)	0.12863 ±0.000558
binarytrees				0.12564 ±0.000532
bounce				0.12795 ±0.000272
capnproto_decode	(11, 4)	2.0 (1.0, 45.3)	0.132 (0.000, 5.999)	0.13458 ±0.028466
capnproto_encode	(14, 1)	155.0 (52.8, 280.6)	34.137 (11.476, 57.203)	0.21698 ±0.014541
collisiondetector	(12, 2, 1)
coroutine_ring				0.10667 ±0.001527
deltablue	(10, 5)	84.0 (1.0, 1022.9)	8.743 (0.000, 106.802)	0.10328 ±0.003195
euler14		60.0 (60.0, 83.0)	5.537 (5.483, 7.680)	0.09180 ±0.000742
fannkuch_redux				0.12093 ±0.001502
fasta				0.12099 ±0.000376
havlak	(9, 4, 2)
heapsort				1.01917 ±0.015674
jsonlua_decode				0.11279 ±0.012664
jsonlua_encode				0.12798 ±0.001761
knucleotide				0.11662 ±0.000810
life	(12, 3)
luacheck				1.00901 ±0.089779
luacheck_parser	(13, 2)	244.0 (1.0, 652.2)	33.998 (0.000, 90.759)	0.09434 ±0.012888
luafun		54.0 (12.4, 70.6)	9.015 (1.935, 11.587)	0.16571 ±0.004918
mandelbrot	(11, 4)	1.0 (1.0, 29.0)	0.000 (0.000, 9.750)	0.34443 ±0.000119
mandelbrot_bit	(9, 6)
md5				0.11279 ±0.000040
meteor		16.0 (2.0, 18.0)	3.398 (0.284, 3.840)	0.21935 ±0.003935
moonscript		28.0 (13.1, 423.3)	4.468 (2.039, 68.212)	0.16175 ±0.001569
nbody				0.16024 ±0.002790
nsieve		2.0 (2.0, 2.0)	0.189 (0.188, 0.189)	0.17904 ±0.000641
nsieve_bit		4.0 (3.4, 5.3)	0.272 (0.219, 0.386)	0.08758 ±0.000054
partialsums		2.0 (2.0, 2.0)	0.160 (0.160, 0.163)	0.14802 ±0.002044
pidigits	(11, 4)	1.0 (1.0, 2.3)	0.000 (0.000, 0.174)	0.12689 ±0.002132
queens	(14, 1)	1.0 (1.0, 294.4)	0.000 (0.000, 35.052)	0.11838 ±0.000751
quicksort	(8, 7)	3.0 (2.0, 4.0)	0.600 (0.315, 0.957)	0.31117 ±0.067395
radixsort				0.12732 ±0.000403
ray	(11, 4)	1.0 (1.0, 355.0)	0.000 (0.000, 110.833)	0.30961 ±0.003990
recursive_ack				0.11975 ±0.000653
recursive_fib				0.23064 ±0.028968
resty_json	(14, 1)	1.0 (1.0, 250.3)	0.000 (0.000, 20.009)	0.07336 ±0.002629
revcomp				0.11403 ±0.001754
richards	(8, 7)	2.0 (1.0, 2.0)	0.133 (0.000, 0.152)	0.13625 ±0.010223
scimark_fft		2.0 (2.0, 4.7)	0.140 (0.140, 0.483)	0.12653 ±0.000823
scimark_lu				0.11547 ±0.000308
scimark_sor				0.12108 ±0.000053
scimark_sparse				0.12342 ±0.000585
series		2.0 (2.0, 2.3)	0.347 (0.347, 0.451)	0.33400 ±0.003217
spectralnorm				0.13987 ±0.000001
table_cmpsort	(13, 2)	10.0 (1.0, 10.0)	1.984 (0.000, 1.989)	0.22174 ±0.007836

_{Results for luaJIT}

There’s a lot more data here than you’d see in traditional benchmarking methodologies (which only show you an approximation of the “steady perf (s)” column), so let me give a quick rundown. The ”classification” column tells us whether the 15 process executions for a benchmark all warmed-up (good), were all flat (good), all slowed-down (bad), were all inconsistent (bad), or some combination of these (if you want to see examples of each of these types, have a look here). “Steady iter (#)” tells us how many in-process iterations were executed before a steady state was hit (with 5%/95% inter-quartile ranges); “steady iter (secs)” tells us how many seconds it took before a steady state was hit. Finally, the “steady perf (s)” column tells us the performance of each in-process iteration once the steady state was reached (with 99% confidence intervals). For all numeric columns, lower numbers are better.

Here are the benchmark results for for RaptorJIT:

Results for RaptorJIT

Symbol key: bad inconsistent, flat, good inconsistent, no steady state, slowdown, warmup.

Benchmark	Classification	Steady iteration (#)	Steady iteration (s)	Steady performance (s)
array3d	(12, 3)	1.0 (1.0, 76.0)	0.000 (0.000, 9.755)	0.13026 ±0.000216
binarytrees		24.0 (24.0, 24.0)	2.792 (2.786, 2.810)	0.11960 ±0.000762
bounce				0.13865 ±0.000978
capnproto_encode				0.11818 ±0.002599
collisiondetector		2.0 (2.0, 2.0)	0.167 (0.167, 0.169)	0.11583 ±0.001498
coroutine_ring				0.14645 ±0.000752
deltablue				0.10658 ±0.001063
euler14	(12, 3)	1.0 (1.0, 51.4)	0.000 (0.000, 5.655)	0.11195 ±0.000093
fannkuch_redux				0.12437 ±0.000029
fasta				0.11967 ±0.000313
havlak				0.21013 ±0.002469
heapsort				1.39055 ±0.002386
jsonlua_decode				0.13994 ±0.001207
jsonlua_encode				0.13581 ±0.001411
knucleotide				0.13035 ±0.000445
life				0.28412 ±0.000599
luacheck				0.99735 ±0.006095
luacheck_parser				0.07745 ±0.002296
luafun		28.0 (28.0, 28.0)	4.879 (4.861, 4.904)	0.17864 ±0.001222
mandelbrot				0.34166 ±0.000067
mandelbrot_bit				0.21577 ±0.000024
md5				0.09548 ±0.000037
meteor		2.0 (2.0, 3.0)	0.273 (0.269, 0.493)	0.21464 ±0.002170
nbody	(14, 1)	1.0 (1.0, 1.9)	0.000 (0.000, 0.160)	0.17695 ±0.002226
nsieve		2.0 (2.0, 2.6)	0.180 (0.179, 0.282)	0.16982 ±0.000862
nsieve_bit		4.0 (3.7, 5.0)	0.273 (0.247, 0.361)	0.08780 ±0.000233
partialsums		2.0 (2.0, 2.3)	0.161 (0.160, 0.207)	0.14860 ±0.001611
pidigits	(8, 7)	5.0 (1.0, 6.0)	0.516 (0.000, 0.646)	0.12766 ±0.000032
queens	(14, 1)	2.0 (1.7, 2.0)	0.162 (0.113, 0.162)	0.15853 ±0.000231
quicksort		2.0 (2.0, 2.3)	0.278 (0.278, 0.361)	0.27183 ±0.000469
radixsort				0.12621 ±0.000757
ray				0.35530 ±0.000984
recursive_ack	(14, 1)	1.0 (1.0, 19.0)	0.000 (0.000, 2.562)	0.14228 ±0.000616
recursive_fib				0.28989 ±0.000033
resty_json				0.07534 ±0.000595
revcomp				0.11684 ±0.002139
richards		2.0 (2.0, 3.2)	0.171 (0.170, 0.369)	0.16559 ±0.000342
scimark_fft		2.0 (2.0, 10.3)	0.141 (0.141, 1.195)	0.12709 ±0.000102
scimark_lu				0.12733 ±0.000159
scimark_sor				0.13297 ±0.000005
scimark_sparse				0.13082 ±0.000490
series		2.0 (2.0, 2.0)	0.347 (0.347, 0.348)	0.33390 ±0.000869
spectralnorm				0.13989 ±0.000003
table_cmpsort		10.0 (10.0, 10.0)	1.945 (1.935, 1.967)	0.22008 ±0.001852

_{Results for RaptorJIT}

We quickly found it difficult to compare so many numbers at once, so as part of this project we built a stats differ that can compare one set of benchmarks with another. Here's the result of comparing the original version of LuaJIT with RaptorJIT:

Results for Normal vs. RaptorJIT

Symbol key: bad inconsistent, flat, good inconsistent, no steady state, slowdown, warmup. Diff against previous results: improved worsened different unchanged.

Benchmark	Classification	Steady iteration (#)	Steady iteration variation	Steady iteration (s)	Steady performance (s)	Steady performance variation (s)
array3d	(12, 3)	1.0 (1.0, 76.0)	(1.0, 76.0) was: (2.0, 624.3)	0.000 (0.000, 9.755)	0.13026 δ=0.00163 ±0.000215	0.000215 was: 0.000557
binarytrees		24.0 (24.0, 24.0)		2.792 (2.786, 2.810)	0.11960 δ=-0.00603 ±0.000762
bounce					0.13865 δ=0.01070 ±0.000978
capnproto_encode					0.11818 δ=-0.09880 ±0.002599
collisiondetector		2.0 (2.0, 2.0)		0.167 (0.167, 0.169)	0.11583 ±0.001498
coroutine_ring					0.14645 δ=0.03978 ±0.000751
deltablue					0.10658 ±0.001063	0.001063 was: 0.003195
euler14	(12, 3)	1.0 δ=-59.0 (1.0, 51.4)	(1.0, 51.4) was: (60.0, 83.0)	0.000 δ=-5.537 (0.000, 5.655)	0.11195 δ=0.02015 ±0.000093	0.000093 was: 0.000743
fannkuch_redux					0.12437 δ=0.00344 ±0.000029
fasta					0.11967 δ=-0.00132 ±0.000313
havlak					0.21013 ±0.002442
heapsort					1.39055 δ=0.37138 ±0.002379
jsonlua_decode					0.13994 δ=0.02715 ±0.001207
jsonlua_encode					0.13581 δ=0.00783 ±0.001409
knucleotide					0.13035 δ=0.01373 ±0.000446
life					0.28412 ±0.000599
luacheck					0.99735 ±0.006094	0.006094 was: 0.089779
luacheck_parser					0.07745 δ=-0.01688 ±0.002281
luafun		28.0 (28.0, 28.0)		4.879 (4.861, 4.904)	0.17864 δ=0.01293 ±0.001222	0.001222 was: 0.004918
mandelbrot					0.34166 δ=-0.00278 ±0.000067
mandelbrot_bit					0.21577 ±0.000024
md5					0.09548 δ=-0.01731 ±0.000037
meteor		2.0 (2.0, 3.0)	(2.0, 3.0) was: (2.0, 18.0)	0.273 (0.269, 0.493)	0.21464 ±0.002170	0.002170 was: 0.003935
nbody	(14, 1)	1.0 (1.0, 1.9)		0.000 (0.000, 0.160)	0.17695 δ=0.01671 ±0.002226
nsieve		2.0 (2.0, 2.6)	(2.0, 2.6) was: (2.0, 2.0)	0.180 (0.179, 0.282)	0.16982 δ=-0.00922 ±0.000862	0.000862 was: 0.000640
nsieve_bit		4.0 (3.7, 5.0)	(3.7, 5.0) was: (3.4, 5.3)	0.273 (0.247, 0.361)	0.08780 ±0.000233	0.000233 was: 0.000054
partialsums		2.0 (2.0, 2.3)	(2.0, 2.3) was: (2.0, 2.0)	0.161 (0.160, 0.207)	0.14860 ±0.001611	0.001611 was: 0.002044
pidigits	(8, 7)	5.0 (1.0, 6.0)	(1.0, 6.0) was: (1.0, 2.3)	0.516 (0.000, 0.646)	0.12766 ±0.000032	0.000032 was: 0.002132
queens	(14, 1)	2.0 (1.7, 2.0)	(1.7, 2.0) was: (1.0, 294.4)	0.162 (0.113, 0.162)	0.15853 δ=0.04015 ±0.000231	0.000231 was: 0.000751
quicksort		2.0 (2.0, 2.3)	(2.0, 2.3) was: (2.0, 4.0)	0.278 (0.278, 0.361)	0.27183 ±0.000469	0.000469 was: 0.067395
radixsort					0.12621 ±0.000757	0.000757 was: 0.000403
ray					0.35530 δ=0.04568 ±0.000983
recursive_ack	(14, 1)	1.0 (1.0, 19.0)		0.000 (0.000, 2.562)	0.14228 δ=0.02253 ±0.000616
recursive_fib					0.28989 δ=0.05925 ±0.000033
resty_json					0.07534 ±0.000595	0.000595 was: 0.002629
revcomp					0.11684 ±0.002139	0.002139 was: 0.001754
richards		2.0 (2.0, 3.2)	(2.0, 3.2) was: (1.0, 2.0)	0.171 (0.170, 0.369)	0.16559 δ=0.02935 ±0.000342	0.000342 was: 0.010223
scimark_fft		2.0 (2.0, 10.3)	(2.0, 10.3) was: (2.0, 4.7)	0.141 (0.141, 1.195)	0.12709 ±0.000102	0.000102 was: 0.000823
scimark_lu					0.12733 δ=0.01186 ±0.000159
scimark_sor					0.13297 δ=0.01189 ±0.000005
scimark_sparse					0.13082 δ=0.00740 ±0.000490
series		2.0 (2.0, 2.0)		0.347 (0.347, 0.348)	0.33390 ±0.000869	0.000869 was: 0.003217
spectralnorm					0.13989 δ=0.00002 ±0.000003
table_cmpsort		10.0 (10.0, 10.0)		1.945 (1.935, 1.967)	0.22008 ±0.001852	0.001852 was: 0.007836

_{Results for Normal vs. RaptorJIT}

In essence, green cells mean that RaptorJIT is better than LuaJIT; red cells mean that LuaJIT is better than RaptorJIT; yellow means they're different in a way that can't be compared; and white/grey means they're statistically equivalent. The additional “Steady performance variation (s)” column shows whether the steady state performance of different process executions is more predictable or not.

The simple conclusion to draw from this is that there isn't a simple conclusion to draw from it: the two VMs are sometimes better than each other with no clear pattern. Without having a clear steer either way, we therefore decided to use the original version of LuaJIT as our base.

One of the things that became very clear from our benchmarking is that LuaJIT is highly non-deterministic – indeed, it's the most non-deterministic VM I've seen. The practical effect of this is that even on one program, LuaJIT is sometimes very fast, and sometimes rather slow. This is, at best, very confusing for users who tend to assume that programs perform more-or-less the same every time they're run; at worst, it can create significant problems when one is trying to estimate things like server provisioning. We therefore tried various things to make performance more consistent.

The most promising approach we alighted upon is what we ended up calling “separate counters”. In a tracing JIT compiler such as LuaJIT, one tracks how often a loop (where loops are both “obvious” things like for loops, as well as less obvious things such as functions) has been executed: once it's hit a certain threshold, the loop is traced, and compiled into machine code. LuaJIT has an unusual approach to counting loops: it has 64 counters to which all loops are mapped (using the memory address of the bytecode in question). In other words, multiple loops share the same counter: the bigger the program, the more loops share the same counter. The advantage of this is that the counters map is memory efficient, and for small programs (e.g. the common LuaJIT benchmarks) it can be highly effective. However, it has very odd effects in real programs, particularly as programs get bigger: loops are compiled non-deterministically based on the particular address in memory they happen to have been loaded at.

We therefore altered LuaJIT so that each loop and each function has its own counter, stored in the bytecode to make memory reads/writes more cache friendly. The diff from normal LuaJIT to the separate counters version is as follows:

Results for Normal vs. Counters

Symbol key: bad inconsistent, flat, good inconsistent, no steady state, slowdown, warmup. Diff against previous results: improved worsened different unchanged.

Benchmark	Classification	Steady iteration (#)	Steady iteration variation	Steady iteration (s)	Steady performance (s)	Steady performance variation (s)
array3d
binarytrees					0.12462 ±0.004058	0.004058 was: 0.000532
bounce	(14, 1)	1.0 (1.0, 5.8)		0.000 (0.000, 0.603)	0.12515 δ=-0.00280 ±0.000278
capnproto_decode	(9, 6)	1.0 (1.0, 24.9)	(1.0, 24.9) was: (1.0, 45.3)	0.000 (0.000, 3.692)	0.15042 ±0.003797	0.003797 was: 0.028466
capnproto_encode		230.0 (56.0, 467.6)	(56.0, 467.6) was: (52.8, 280.6)	28.411 (6.667, 55.951)	0.11838 δ=-0.09860 ±0.001960	0.001960 was: 0.014541
collisiondetector	(13, 2)
coroutine_ring					0.10680 ±0.003151	0.003151 was: 0.001527
deltablue		149.0 (149.0, 274.5)	(149.0, 274.5) was: (1.0, 1022.9)	15.561 (15.430, 28.653)	0.10159 ±0.001083	0.001083 was: 0.003195
euler14		61.0 (61.0, 68.3)	(61.0, 68.3) was: (60.0, 83.0)	5.650 (5.592, 6.356)	0.09216 ±0.000159	0.000159 was: 0.000743
fannkuch_redux					0.11976 ±0.000012	0.000012 was: 0.001502
fasta					0.12200 δ=0.00100 ±0.000597
havlak
heapsort					1.04378 δ=0.02461 ±0.000789
jsonlua_decode					0.12648 δ=0.01370 ±0.000556
jsonlua_encode					0.12860 ±0.000879	0.000879 was: 0.001761
knucleotide					0.11710 ±0.000541	0.000541 was: 0.000811
life	(9, 3, 2, 1)
luacheck					1.00299 ±0.004778	0.004778 was: 0.089781
luacheck_parser	(12, 2, 1)
luafun		69.0 (69.0, 69.0)		11.481 (11.331, 11.522)	0.16770 ±0.001564	0.001564 was: 0.004918
mandelbrot	(14, 1)
mandelbrot_bit					0.21695 ±0.000142
md5					0.11155 δ=-0.00124 ±0.000043
meteor	(13, 2)	14.0 (1.0, 15.0)	(1.0, 15.0) was: (2.0, 18.0)	2.855 (0.000, 3.045)	0.21606 ±0.004651	0.004651 was: 0.003935
moonscript		63.0 (17.7, 184.1)	(17.7, 184.1) was: (13.1, 423.3)	10.046 (2.763, 29.739)	0.15999 ±0.001405	0.001405 was: 0.001568
nbody					0.15898 ±0.001676	0.001676 was: 0.002790
nsieve		2.0 (2.0, 2.6)	(2.0, 2.6) was: (2.0, 2.0)	0.189 (0.188, 0.297)	0.17875 ±0.001266	0.001266 was: 0.000641
nsieve_bit		4.0 (2.0, 6.0)	(2.0, 6.0) was: (3.4, 5.3)	0.271 (0.097, 0.446)	0.08726 δ=-0.00032 ±0.000202	0.000202 was: 0.000054
partialsums		2.0 (2.0, 2.9)	(2.0, 2.9) was: (2.0, 2.0)	0.161 (0.161, 0.295)	0.14916 ±0.000081	0.000081 was: 0.002044
pidigits		2.0 (2.0, 4.3)	(2.0, 4.3) was: (1.0, 2.3)	0.130 (0.130, 0.425)	0.12666 ±0.000122	0.000122 was: 0.002133
queens	(10, 5)	1.0 (1.0, 2.0)	(1.0, 2.0) was: (1.0, 294.4)	0.000 (0.000, 0.127)	0.12484 δ=0.00646 ±0.000317	0.000317 was: 0.000751
quicksort		2.0 (2.0, 2.0)		0.299 (0.298, 0.304)	0.44880 δ=0.13763 ±0.020477	0.020477 was: 0.067395
radixsort					0.12644 ±0.000864	0.000864 was: 0.000403
ray					0.30901 ±0.002140	0.002140 was: 0.004022
recursive_ack					0.11958 ±0.000510	0.000510 was: 0.000653
recursive_fib					0.22864 ±0.000266	0.000266 was: 0.028968
resty_json	(12, 2, 1)
revcomp					0.11550 ±0.002553	0.002553 was: 0.001753
richards	(14, 1)	2.0 (1.7, 2.0)	(1.7, 2.0) was: (1.0, 2.0)	0.150 (0.105, 0.150)	0.14572 ±0.000324	0.000324 was: 0.010223
scimark_fft		2.0 (2.0, 10.0)	(2.0, 10.0) was: (2.0, 4.7)	0.140 (0.140, 1.153)	0.12639 ±0.000343	0.000343 was: 0.000823
scimark_lu	(11, 4)	1.0 (1.0, 45.3)		0.000 (0.000, 5.122)	0.11546 ±0.000132	0.000132 was: 0.000308
scimark_sor					0.12105 ±0.000148
scimark_sparse					0.12315 ±0.000728	0.000728 was: 0.000585
series		2.0 (2.0, 2.0)		0.347 (0.347, 0.348)	0.33394 ±0.000645	0.000645 was: 0.003217
spectralnorm					0.13985 δ=-0.00003 ±0.000007
table_cmpsort	(13, 1, 1)	1.0 (1.0, 10.0)		0.000 (0.000, 2.005)	0.21828 ±0.003289	0.003289 was: 0.007836

_{Results for Normal vs. Counters}

In this case we’re particularly interested in the “steady performance variation (s)” column, which shows whether benchmarks have predictable steady state performance. The results are fairly clear: steady counters are, overall, a clear improvement. As you might expect, this is not a pure win, because it changes the order in which traces are made. This has several effects, including delaying some loops to be traced later than was previously the case, because counters do not hit the required threshold as quickly.

This disadvantages some programs, particularly small deterministic benchmarks where loops are highly stable. In such cases, the earlier you trace the better. However, in my opinion, such programs are given undue weight when performance is considered. It’s no secret that some of the benchmarks regularly used to benchmark LuaJIT are highly optimised for LuaJIT as it stands; any changes to LuaJIT stand a good chance of degrading their performance. However, overall we feel that the overall gain in consistency, particularly for larger programs, is worth it. There's a pull request against the Lua Foundation's fork of LuaJIT which applies this idea to a mainstream fork of LuaJIT.

We then started looking at various programs that showed odd performance. One problem in particular showed up in more than one benchmark. Here's a standard example:

Collisiondetector, Normal, Bencher9, Proc. exec. #12 (no steady state)

The problem – and it doesn't happen on every process execution, just to make it more fun – is that there are points where the benchmark slows down by over 10% for multiple in-process iterations (e.g. in this process execution, at in-process iterations 930-ish and 1050-ish). We tried over 25 separate ways to work out what was causing this — even building an instrumentation system to track what LuaJIT is doing — but in the end it turned out to be related to LuaJIT's Garbage Collector – sort of. When we moved from the 32-bit to 64-bit GC, the odd performance went away.

As such, we don’t think that the 64-bit GC “solves” the problem: however, it changes the way that pointers are encoded (doubling in size), which causes the code generator to emit a different style of code, such that the problem seems to go away. Nevertheless, this did make us reevaluate LuaJIT's GC. Tom then started work on implementing Mike Pall's suggestion for a new GC for LuaJIT (based partly on Tom's previous work and also that of Peter Cawley). He has enough implemented to run most small, and some large, programs, but it needs more work to finish it off, at which point evaluating it against the existing Lua GCs will be fascinating!

So, did we achieve everything we wanted to in 12 months? Inevitably the answer is yes and no. We did a lot more benchmarking than we expected; we've been able to make a lot of programs (particularly large programs) have more consistent performance; and we've got a fair way down the road of implementing a new GC. To whoever takes on further LuaJIT work – best of luck, and I look forward to seeing your results!

Acknowledgements: Sarah Mount implemented the stats differ; Edd Barrett implemented Krun and answered many questions on it.

Improving RubyDocs with Cloudflare Workers and Workers KV

Guest Author — Wed, 31 Oct 2018 12:48:27 GMT

The following is a guest post from Manuel Meurer, Berlin based web developer, entrepreneur, and Ruby on Rails enthusiast. In 2010, he founded Kraut Computing as a one-man web dev shop and launched Uplink, a network for IT experts in Germany, in 2015.

RubyDocs is an open-source service that generates and hosts “fancy docs for any Ruby project”, most notably for the Ruby language itself and for Rails, the most popular Ruby framework. The nifty thing about it is that the docs can be generated for any version of a project — so let’s say you’re working on an old Rails app that still uses version 3.2.22 (released June 16, 2015), then you can really benefit from having access to the docs of that specific version, since a lot of the methods, classes, and concepts of the current Rails version (5.2.1 at the time of writing) don’t exist in that old version.

Scratching an itch

I built RubyDocs back in 2013 to scratch my own itch — a few similar services that I had used over the years had disappeared or hadn’t been regularly updated. After the initial work to get RubyDocs up and running, I continued improving a few small things over the years, such as updating dependencies and adding new projects that were submitted by users. But by and large, the site was (and is) running on autopilot, updating the list of versions for each project automatically from GitHub tags and generating new docs as users request them. One thing I had always wanted to do was to move the hosted docs from a subdomain (docs.rubydocs.org) to a subpath on the main domain (e.g., rubydocs.org/docs). I had put them on the subdomain to be able to use a CDN with long expiration times, since the docs are mostly static HTML and CSS with a bit of JavaScript sprinkled in. But for SEO reasons (AFAIK it’s still better to have everything on the main domain), and for a more coherent experience when using the site, I wanted everything on one domain. But I could never figure out how to run the RubyDocs app itself (built with Rails, of course) on rubydocs.org and still get all the advantages of a CDN for a subpath…

Enter Cloudflare

Fast forward to September 2017 when I read about Cloudflare Workers for the first time. I was already a heavy user of Cloudflare for their DNS, CDN and DDoS mitigation and was always astonished by the amount of high-quality services they were offering for free. And now they basically added a serverless platform on top of that for $5 per month? You really have to admire their dedication to making their stuff available to as many people as possible for as low a price as possible.

For a few months, I kept thinking about what I could use the Workers for until it hit me — they could be the perfect tool to proxy requests from a subpath to a subdomain! I wouldn’t have to change the RubyDocs server/CDN setup at all, just add a Worker that does the proxying and a Page Rule to redirect all traffic from the subdomain to the new subpath. I got in touch with Cloudflare support to confirm that this was indeed possible (and a proper use of their Workers) and since RubyDocs is open-source, they even offered to sponsor the workers!

Let’s get to work!

While I was working on the Worker (no pun intended), an issue in the RubyDocs GitHub repo popped up — it turned out I had inadvertently broken a few URLs with a faulty regex, which was quickly fixed (Worker scripts can be edited in the Cloudflare backend and when saved, the live site is updated within seconds). But the author of the issue also mentioned that someone had apparently created a DuckDuckGo bang for RubyDocs. Sweet, I didn’t even know they existed!

For this bang to really be useful, it was necessary to have a URL that always points to the latest version of a project’s docs, i.e. something like rubydocs.org/d/ruby-latest/ (which now works), and update automatically when a new version is released. Well, I thought to myself, if that isn’t another perfect use case for a Worker! But wait, how does the Worker know which version is the latest? We could include the data in the Worker script and update it periodically, but as the number of projects on RubyDocs grows, the script would grow as well — probably not to an unmanageable size, but it still didn’t feel like a clean solution. The Worker could also make a quick subrequest to ask the main RubyDocs Rails app for the latest version when a request is processed, but that would mean setting up an API and monitoring the performance of the endpoint, and it would most likely severely slow down these ‘latest’ requests.

Enter Cloudflare, again

And as if someone at Cloudflare had been waiting for me to ponder this problem, they launched Cloudflare KV, a key-value store that can be written to via the Cloudflare API and read from within a Worker. I was dumbfounded by the coincidence. It was very obviously the best way to solve my problem — store the latest version of each project from the RubyDocs Rails app every time a new version is detected, and read it from the Worker script when a ‘latest’ request comes in.

Long story short: here is the resulting Worker script (also on GitHub) and after a bit of fiddling (mostly due to my inexperience with JavaScript), everything is working smoothly.

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const match = request.url.match(/\/d\/([^?]+)(\?.+)?/);
  let fetchable;

  if (match) {
    let doc   = match[1];
    let query = match[2] || '';

    // Redirect to latest if necessary.
    latestMatch = doc.match(/^([^/]+)-latest/);
    if (latestMatch) {
      const latest = await LATEST.get(latestMatch[1]);
      let newUrl = request.url.replace(/[^/]+-latest/, latest);
      return Response.redirect(newUrl, 302);
    }

    // Redirect to URL with trailing slash if necessary.
    if (!doc.includes('/')) {
      let newUrl = request.url.replace(doc, doc + '/');
      return Response.redirect(newUrl, 301);
     }

    if (doc.endsWith('/'))
      doc += 'index.html';
    fetchable = `http://d3eo0xoa109f6x.cloudfront.net/${doc}${query}`;
  } else {
    fetchable = request;
  }

  const response = await fetch(fetchable);
  return response;
}

NOTE: LATEST is the name of the author's KV namespace and is not a default for Workers KV

I have submitted a request to DuckDuckGo to use the new ‘latest’ URLs for the !rubydocs and !rb bangs, but so far they still forward to an older version.

Many thanks to Cloudflare for supporting RubyDocs and, more importantly, building a better Internet for all of us!

Mapping Factorio with Leaflet

Guest Author — Wed, 10 Oct 2018 14:09:31 GMT

The following is a guest post by Jacob Hands, Creator of FactorioMaps.com. He is building a community site for the game Factorio centered around sharing user creations.

Factorio is a game about building and maintaining factories. Players mine resources, research new technology and automate production. Resources move along the production line through multiple means of transportation such as belts and trains. Once production starts getting up to speed, alien bugs start to attack the factory requiring strong defenses.

A Factorio factory producing many different items.

A Factorio military outpost fighting the alien bugs.

A Factorio map view of a small factory, that’s still too big to easily share fully with screenshots.

At FactorioMaps.com, I am building a place for the community of Factorio players to share their factories as interactive Leaflet maps. Due to the size and detail of the game, it can be difficult to share an entire factory through a few screenshots. A Leaflet map provides a Google Maps-like experience allowing viewers to pan and zoom throughout the map almost as if they are playing the game.

Hosting

Leaflet maps contain thousands of small images for X/Y/Z coordinates. Amazon S3 and Google Cloud Storage are the obvious choices for low-latency object storage. However, after 3.5 months in operation, FactorioMaps.com contains 17 million map images (>1TB). For this use-case, $0.05 per 10,000 upload API calls and $0.08 to 0.12/GB for egress would add up quickly. Backblaze B2 is a better fit because upload API calls are free, egress bandwidth is $0.00/GB to Cloudflare, and storage is 1/4th the price of the competition.

Backblaze B2 requires a prefix of /file/bucketName on all public files, which I don’t want. To remove it, I added a VPS proxy to rewrite paths and add a few 301 redirects. Unfortunately, the latency from the user -> VPS -> B2 was sub-par averaging 800-1200ms in the US.

A Closer Look At Leaflet

Leaflet maps work by loading images at the user's X/Y/Z coordinates to render the current view. As a map is zoomed in, it requires 4x as many images to show the same area. That means 75% of a map's images are in the max rendered zoom level.

A diagram of how each zoom level is 4x larger than the previous

Reducing Latency

With hosting working, it's time to start making the site faster. The majority of image requests come from the first few zoom levels, representing less than 25% of a given map's images. Adding a local SSD cache on the VPS containing all except the last 1-3 zoom levels for each map reduces latency for 66% of requests. The problem with SSD storage is it's difficult to scale with ever-increasing data and is still limited to the network and CPU performance of the server it occupies.

Going Serverless with Cloudflare Workers

Cloudflare Workers can run JavaScript using the Service Workers API which means the path rewrites and redirects the VPS was accomplishing could run on Cloudflare's edge.

While Google Cloud Storage is more expensive than B2, it has much lower latency to the US and worldwide destinations because of their network and multi-regional object storage. However, it's not time to move the whole site over to GCS just yet; the upload API calls alone would cost $85 for 17 million files.

Multi-Tier Object Storage

The first few zoom levels are stored in GCS, while the rest are in B2. Cloudflare Workers figure out where files are located by checking both sources simultaneously. By doing this, 66% of requested files come from GCS with a mean latency of <350ms, while only storing 24% of files on GCS. Another benefit to using B2 as the primary storage is if GCS becomes too expensive in the future, I can move all requests to B2.

// Race GCS and B2
let gcsReq = new Request('https://storage.googleapis.com/bucketName' + url.pathname, event.request)
let b2Req = new Request(getB2Url(request) + '/bucketName' + url.pathname, event.request);

// Fetch from GCS and B2 with Cloudflare caching enabled
let gcsPromise = fetch(gcsReq, cfSettings);
let b2Promise = fetch(b2Req, cfSettings);

let response = await Promise.race([gcsPromise, b2Promise]);
if (response.ok) {
    return response;
}

// If the winner was bad, find the one that is good (if any)
response = await gcsPromise;
if (response.ok) {
    return response;
}

response = await b2Promise;
if (response.ok) {
    return response;
}

// The request failed/doesn't exist
return response;

Tracking Subrequests

The Cloudflare Workers dashboard contains a few analytics for subrequests, but there is no way to see what responses came from B2 vs. GCS. Fortunately, it’s easy to send request stats to a 3rd party service like StatHat with a few lines of JavaScript.

 // Fetch from GCS and B2 with caching
let reqStartTime = Date.now();
let gcsPromise = fetch(gcsReq, cfSettings);
let b2Promise = fetch(b2Req, cfSettings);

let response = await Promise.race([gcsPromise, b2Promise]);
if (response.ok) {
    event.waitUntil(logResponse(event, response, (Date.now() - reqStartTime)));
    return response;
}

The resulting stats prove that GCS is serving the majority of requests, and Cloudflare caches over 50% of those requests. The code for the logResponse function can be found here.

Making B2 Faster with Argo

Tracking request time surfaced another issue. Requests to B2 from countries outside of North America are still quite slow. Cloudflare's Argo can reduce latency by over 50%, but is too expensive to enable for the whole site. Additionally, it would be redundant to smart-route content from GCS that Google already does an excellent job of keeping latency down. Cloudflare request headers include the country of origin, making it trivial to route this subset of requests through an Argo-enabled domain.

// Use CF Argo for non-US/CA users
function getB2Url(request) {
    let b2BackendUrl = 'https://b2.my-argo-enabled-domain.com/file';
    let country = request.headers.get('CF-IPCountry')
    if (country === 'US' || country === 'CA') {
        b2BackendUrl = 'https://f001.backblazeb2.com/file';
    }
    return b2BackendUrl;
}

Conclusion

Cloudflare Workers are an excellent fit for my project; they enabled me to make a cost-effective solution to hosting Leaflet maps at scale. Check out https://factoriomaps.com for performant Leaflet maps, and if you play Factorio, submit your Factorio world to share with others!

Custom Load Balancing With Cloudflare Workers

Guest Author — Wed, 03 Oct 2018 07:59:00 GMT

The following is a guest post by Jayaprabhakar Kadarkarai, Developer of Codiva.io, an Online IDE used by computer science students across the world. He works full stack to deliver low latency and scalable web applications.

Have you launched your website? Getting a lot of traffic? And you are planning to add more servers? You’ll need load balancing to maintain the scalability and reliability of your website. Cloudflare offers powerful Load Balancing, but there are situations where off-the-shelf options can’t satisfy your specific needs. For those situations, you can write your own Cloudflare Worker.

In this post, we’ll learn about load balancers and how to set them up at a low cost with Cloudflare Service Workers.

This post assumes you have a basic understanding of JavaScript, as that’s the language used to write a Cloudflare Worker.

The Basic Pattern

The basic pattern starts with adding ‘fetch’ event listener to intercept the requests. You can configure which requests to intercept on the Cloudflare dashboard or using the Cloudflare API.

Then, modify the hostname of the URL and send the request to the new host.

addEventListener('fetch', event => {
  var url = new URL(event.request.url);

  // https://example.com/path/ to https://myorigin.example.com/path
  url.hostname = 'myorigin.' + url.hostname
  
  event.respondWith(fetch(url));
});

This doesn’t do anything useful yet, but this is the basic pattern that will be used in the rest of the examples.

Load Balancer with Random Routing

When you have a list of origin servers, pick a random host to route to.

This is a very basic load balancing technique to evenly distribute the traffic across all origin servers.

var hostnames = [
  "0.example.com",
  "1.example.com",
  "2.example.com"
];

addEventListener('fetch', event => {
  var url = new URL(event.request.url);

  // Randomly pick the next host 
  url.hostname = hostnames[getRandomInt(hostnames.length)];
  
  event.respondWith(fetch(url));
});

function getRandomInt(max) {
  return Math.floor(Math.random() * max);
}

Load Balancer with Fallback

What about when a host is down? A simple fallback strategy is to route the request to a different host. Use this only if you know the requests are idempotent. In general, this means GET requests are okay, but you might wish to handle POST requests another way.

addEventListener('fetch', event => {

  // Randomly pick the primary host
  var primary = getRandomInt(hostnames.length);

  var primaryUrl = new URL(event.request.url);
  primaryUrl.hostname = hostnames[primary];

  var timeoutId = setTimeout(function() {
    var backup;
    do {
        // Naive solution to pick a backup host
        backup = getRandomInt(hostnames.length);
    } while(backup === primary);

    var backupUrl = new URL(event.request.url);
    backupUrl.hostname = hostnames[backup];

    event.respondWith(fetch(backupUrl));
  }, 2000 /* 2 seconds */);

  fetch(primaryUrl)
    .then(function(response) {
        clearTimeout(timeoutId);
        event.respondWith(response);
    });
});

Geographic Routing

Cloudflare adds CF-IPCountry header to all requests once Cloudflare IP Geolocation is enabled.

You can access it using:

var countryCode = event.request.headers.get(‘CF-IPCountry’);

We can use the countryCode to route requests from different locations to different servers in different regions.

For example, 80% of the traffic to Codiva.io is from the US and India. So, I have servers in two different regions (Oregon, USA; and Mumbai, India). Requests from India and other countries near it are routed to servers in India. All other requests are routed to the US data center.

const US_HOST = "us.example.com"
const IN_HOST = "in.example.com"

var COUNTRIES_MAP = {
  IN: IN_HOST,
  PK: IN_HOST,
  BD: IN_HOST,
  SL: IN_HOST,
  NL: IN_HOST
}
addEventListener('fetch', event => {
  var url = new URL(event.request.url);

  var countryCode = event.request.headers.get('CF-IPCountry');
  if (COUNTRIES_MAP[countryCode]) {
    url.hostname = COUNTRIES_MAP[countryCode];
  } else {
    url.hostname = US_HOST;
  }
  
  event.respondWith(fetch(url));
});

Putting it all together

Now, let us combine the geographic routing, random load balancing and fallback into a single worker:

const US_HOSTS = [
  "0.us.example.com",
  "1.us.example.com",
  "2.us.example.com"
];

const IN_HOSTS = [
  "0.in.example.com",
  "1.in.example.com",
  "2.in.example.com"
];

var COUNTRIES_MAP = {
  IN: IN_HOSTS,
  PK: IN_HOSTS,
  BD: IN_HOSTS,
  SL: IN_HOSTS,
  NL: IN_HOSTS
}
addEventListener('fetch', event => {
  var url = new URL(event.request.url);

  var countryCode = event.request.headers.get('CF-IPCountry');
  var hostnames = US_HOSTS;
  if (COUNTRIES_MAP[countryCode]) {
    hostnames = COUNTRIES_MAP[countryCode];
  }
  // Randomly pick the next host 
  var primary = hostnames[getRandomInt(hostnames.length)];

  var primaryUrl = new URL(event.request.url);
  primaryUrl.hostname = hostnames[primary];

  // Fallback if there is no response within timeout
  var timeoutId = setTimeout(function() {
    var backup;
    do {
        // Naive solution to pick a backup host
        backup = getRandomInt(hostnames.length);
    } while(backup === primary);

    var backupUrl = new URL(event.request.url);
    backupUrl.hostname = hostnames[backup];

    event.respondWith(fetch(backupUrl));
  }, 2000 /* 2 seconds */);

  fetch(primaryUrl)
    .then(function(response) {
        clearTimeout(timeoutId);
        event.respondWith(response);
    });  
});

function getRandomInt(max) {
  return Math.floor(Math.random() * max);
}

Recap

In this article, you saw the power of Cloudflare workers and how simple it is to use it. Before implementing custom load balancer with workers, take a look at Cloudflare’s load balancer.

For more examples, take a look at the recipes on the developer docs page.

How to save costs on your API Gateway solution using Cloudflare Workers

Guest Author — Sat, 29 Sep 2018 14:00:00 GMT

The following is a guest post by Janusz Jezowicz, CEO of Speedchecker. The Speedchecker team runs a global distributed measurement network and offer speed test solutions using the Cloudflare platform.

Software companies contemplating offering a public API to 3rd party developers have many options to choose from for how to offer their API securely with high reliability and with fast performance. When it comes to cost though, commercial solutions are expensive and open-source solutions require a lot of time managing servers and the synchronization between them. This blog post describes how we successfully moved our API gateway to Cloudflare Workers and slashed our costs by a factor of 10.

Our original solution based on the Kong open-source API gateway

When we built our measurement network API for cost reasons we opted for open-source solution Kong. Kong is a great solution which has a vibrant community of users and plug-in developers who extend and maintain the platform. Kong is a good alternative to commercial solutions from companies such as Apigee or Mulesoft whose solutions are really catering for larger businesses who can afford them. Kong is free and it works. On the other hand, if your business has complex needs for API management - e.g. powerful analytics, access control, user-friendly administration then you will need plug-ins. Kong Plug-ins are often not free and you end up with costs approaching commercial solutions.

At Speedchecker we already have a lot of metrics and access control logic within the API itself so the basic functionality of Kong suited us. Here you can see a simplified architecture diagram of our API gateway:

Two Kong instances are, of course, a must if we want to provide a reliable service to our customers. Kong offers flexibility in what database it uses for its API management engine. We have been experimenting with MySQL and moved to PostgreSQL for its better support of replication using Bucardo.

We have been running this solution for over a year and during production we have learned the hard way following drawbacks in our architecture:

While our Azure Cloud Service is scalable, Kong instance is not. With increased loads we were worried that the instance might fail if we don’t anticipate the traffic increase and do not scale the VM correctly
Replication setup was quite complex and we had incidents where it failed and we spent days trying to work out why and how to repair it. During those times we were exposed to one live instance and, if it went down, our API would not work
We had at least one incident where a rogue actor launched a DDOS on our customer-facing website (not API). If the attacker launched on our Kong instance endpoints we would not be able to protect our API
API management got more complex and not easier which is not how should be once you integrate the API gateway. Our API works using apikey authentication and using one apikey user can access all of our APIs. The quotas per user are not based on the number of API calls but are based on the number of measurement results that we execute on the user’s behalf. Each API call can have a different number of measurement results and, therefore, the complex quota logic and billing calculations need to be done on Azure API and not in Kong. All of this means that the central repository for user apikeys and their quotas lies in the Azure API and we need to make sure synchronization happens between Azure API and Kong. For those reasons, many of the plug-ins make less sense for us to use since we have done all the work on the Azure API side.
While we saved money on a commercial licence for API gateway, we were spending more man hours on server administration and the monitoring of our system

New API solution using Cloudflare Workers

After Cloudflare announced the Workers feature we have been following it closely and started experimenting with its functionality.

A few of the things that originally stood out for us were:

We have been using the Cloudflare platform for other parts of our infrastructure for years: we like their platform for its features, performance, cost-effectiveness and reliability. It's always better to use your existing vendors than start exploring new ones we have no experience with.
Attractive per request pricing. With our 30 million API requests per month, Workers would cost us $25. To compare Azure API management would cost us $300 for 2 Basic instances and Apigee would cost $2500 for the Business plan.
Powerful DDOS protection. Cloudflare has one of the best DDOS protections available for small-businesses included in the price.
No need for separate DNS failover and health monitoring
Extensible platform which we can leverage for any custom logic in the future should our requirements change

On the downside we knew Cloudflare Workers were still in Beta and we would need to spend some time coding the logic instead of using an out of the box solution. After brainstorming with developers, we realized that, for our situation, Cloudflare workers are a good choice. Since most of our API management logic is already in Azure we really need a simple and cost-effective way to protect our origin API. Also, we need to make sure that the new solution is 100% compatible with our Kong solution. I believe this situation is common for all API providers when they are contemplating changing the API gateway infrastructure. You never want to get into a situation where during migration or after you realize some API users cannot access the API and they need to update their own code to work with your new API gateway For that reason, it was important for us that no endpoint changes and no authentication changes are necessary and that our new solution will work seamlessly with just a DNS change.

After one week of development we were ready with our first proof of concept to prepare for migration. The architecture of our new solution looks like the attached diagram.

Typical API call is handled in following way:

User uses apikey in HTTP headers or querystring of their HTTP request (GET or POST) to query the endpoint hosted on CloudFlare workers
Worker examines apikey and looks it up in local cache. There are a few different mechanisms in Workers for storing data tables. For our purposes we picked cache in global memory. Since the data table contains only a list of apikeys it’s not very big and the restriction of 128MB does not cause issues. Also, each Cloudflare POP has a different cache which can be problematic for some use cases. In our case though it isn’t - if apikey is not in cache, it can be quickly retrieved from our Azure API.
If apikey is not found in the local cache, Worker does HTTP subrequest to Azure API and retrieves information about whether apikey is valid. The response is then stored locally in the Global memory cache so that subsequent apikey requests are saving the round-trip to Azure API and do not overload our Origin.
If apikey is invalid, Worker responds to the API user message about an invalid apikey and Origin is not being hit with a request.
If apikey is valid, Worker forwards API user request to our Origin and responds back to the API user when it gets a response. In this step we also include any custom redirection logic as some API calls have different Origin endpoints. Using Workers we can easily specify custom logic in which API calls use different endpoints.

Migration process

As described above, we wanted the change of the architecture to be done seamlessly without requiring users to update any of their API code. For that reason, we devised the following approach to migrate to Cloudflare workers.

Enable Cloudflare Workers on a staging domain. Perform all tests to production API endpoint and the same tests to staging domain with Workers enabled. The API endpoints should behave in the same way.
Enable Workers on the production domain with Origin IP still pointing to Kong live instance. Using Workers Routes settings we make sure the Workers code is not being executed on any of the live API endpoints.
Using Worker Routes we bring online instantly method by method the API calls to the Workers. In case of any problems we can quickly revert by modifying the Routes.
We monitor Worker Analytics screens, number of API calls and status codes of sub requests to ensure no calls are failing.

During our live migration process everything went smoothly until we started seeing some errors with some of our customers. We had not realized that the Cloudflare firewall has some rate limiting in place and was preventing our API users who queried more than 2000 requests per minute from the same IP. After raising the ticket with Cloudflare Support we got the limit lifted and errors stopped happening.

Conclusion

We believe Cloudflare Workers are a good alternative to existing API gateway solutions. For companies who already have an existing codebase for authentication and analytics there is not a compelling reason to use commercial API gateway package when Cloudflare Workers can add a protection layer to your API at the fraction of the cost of alternatives. While Workers are a relatively new product from Cloudflare, we already feel comfortable using it in production. We encourage you to explore Workers for your new projects. Also, if you are wanting to save costs or to make your self-hosted solution more robust, Workers are a good alternative which can be deployed with no impact on your API users or business.

Using Edge-Side Includes with Workers for High Availability

Guest Author — Tue, 28 Aug 2018 17:40:09 GMT

Last week, we wrote about implementing ESI with Cloudflare Workers. This is a guest post by Lukas Reider on how to use ESI not only for better performance, but to optimize availability while migrating backends.

In this post, you will learn about how my client Titel Media was able to use Cloudflare Workers to implement simple edge side includes.

The idea is to partially replace the parts of the online magazine highsnobiety.com with a new, and much more refined frontend implementation. In this article, you will get to know the use case, and how I found a powerful application for Cloudflare Workers.

Backstory

My current project, highsnobiety.com is in the process of replacing Wordpress with a dedicated content pipeline and a custom frontend. It is a huge magazine, with tons of contents, hundreds of daily updates and an international team of more than 60 editors, researching and writing exciting stories.

The company behind it, Titel Media GmbH, a publishing house with offices in Berlin, and New York, surely has grown out of Wordpress for hosting their content.

The show must go on

One does not simply rewrite a sophisticated web publishing pipeline like WordPress. Nor does one, simply rewrite a complete frontend in any manageable timeframe and then deploy it safely without causing any interruptions.

There is an inherent risk in such “big rewrites”. They can fail in many spectacular ways. Not getting it done being one of them, very popular. Failing to live up to high expectations (the ones that also caused the rewrite), is also well known to shatter the dreams of every project manager. Or how to manage changing requirements for a transition period of 1+ years?

Our working group, that should solve the transition, layed out a plan to sustainably grow the development team, while making level for safe path for the future.

We absolutely did not want to wait 1-2 years, until everything had been rewritten.
We also, did not want to continue working with Wordpress for the next 5+ years
And we did not want to interrupt the current publishing pipeline for our editors

The idea: Partially rewriting the page

Wordpress is, and was running just fine. Years of dealing with the intricate details of such an installation, have lead to a pretty mature setup.

Fortunately, there is no pressure from the underlying technology to finish the transition in a hurry. Time, about 1+ years, is actually on our side. The team is able to contribute changes step by step. This is when we are incorporating some of the great ideas out there: Edge Side Includes.

I first heard about it, in some office kitchen talks, about how Amazon is apparently never failing, because so many of their services are backed up by fallbacks. For example, if some part of the page does not render in time, this part is able to fallback to other fitting content gracefully.

I could never verify these claims, but the idea sure stuck to me. When requiring high availability, this idea is very appealing. During the transition period, the idea is to rewrite parts of the website, step by step, and steadily grow the new frontend while everything is running.

We need two particular features from the ESI toolbox:

Includes: Our new frontend, should be able to render components of the current page. We want to include them, and overwrite parts of the page with the new frontend.
Fallbacks: Wordpress will remain running during the live transition period. Any fragment, that fails, can still be taken from Wordpress.

Origin HTML document

Lets look at a simplified example. The origin responds with the following HTML document, and the corresponding X-Fragment headers:

< HTTP/2 200
< server: wordpress
< x-fragment: title https://site.com/title.html, heading https://site.com/heading.html
< ...





  
    <!-- fragment:title
    Fallback title
    -->
  



Some content

The title.html response is just one line Hello from fragment title
heading.html contains some more HTML
This renders a headline

The final response should have the fragments resolved and replace with the content from the different prefetches.




  
    Hello from fragment title
  


This renders a headline
Some content

In case, one fragment does not respond timely, is down or could not be found, The fragments resolve to their fallback. That is just the content of the HTML-comment.

Cloudflare Workers

This is the forefront (pun intended) of amazing cloud services. Their latest feature: Cloudflare Workers, really spiked my interest. We were in the process of examining the ESI space for potential solutions. And there are not many. So we were already planning to build our own caching layer, that would be capable of handling includes and fallbacks. But now, with the power of running a Service Worker API on the edge, we might have just found the perfect solution for our limited ESI-needs.

Worker code

Here is what I wrote for Titel Media (available on GitHub):

Let me break it down for you here.

A client request comes in, the Cloudflare worker is picking it up, and passing it to the origin.

addEventListener('fetch', event => {
  event.respondWith(main(event.request))
})

async function main(request) {
  // forward the request to the origin (Wordpress)
  const response = await fetch(request)
  // ...

We awaited the response, and we can now check its headers

// ...
  const fragments = prefetchFragments(response.headers)
  // ...

The origin response headers are examined for any values of ```X-Fragments```

function prefetchFragments(headers) {
  const header = headers.get('X-Fragments')
  if (header === null) return {}

  const fragments = {}
  const values = header.split(',')
  const safeTimeout = 10000 / values.length

  values.forEach((entry) => {
    const [key, url] = entry.trim().split(' ')
    const request = new Request(url)
    const timeout = new Promise((resolve, reject) => {
      const wait = setTimeout(() => {
        clearTimeout(wait)
        reject()
      }, safeTimeout)
    })

    fragments[key] = Promise.race([
      fetch(request),
      timeout
    ])
  })

  return fragments
}

If there are fragments to prefetch, those requests are started and stored in a dictionary to their respective labels.
Each request shares a portion of the global timeout of 10 seconds. A request is later considered to have failed, if it did not respond timely.

After a few checks on content type and so on, this part is a crucial performance benefit: Streaming the response.

// ...
    const { readable, writable } = new TransformStream()
    transformBody(response.body, writable, fragments)
    // ...

transformBody reads the origin response line by line, and searches for fragments.

// ...
  // initialise the parser state
  let state = {writer: writer, fragments: fragments}
  let fun = parse
  let lastLine = ""

  while (true) {
    const { done, value } = await reader.read()
    if (done) break
    const buffer = encoding.decode(value, {stream: !done})
    const lines = (lastLine + buffer).split("\n")

This loop is basically a parse-tree keeping state between each line.
But most important, is to not include the last line.
The response chunks, might be cut-off just in the middle of a line,
and thus not representing a full line that can be reasoned about.
Therefore we keep the last line, and concatenate it with the next lines.

    let i = 0;
    const length = lines.length - 1;
    for (; i < length; i++) {
      const line = lines[i]
      const resp = await fun(state, line)
      let [nextFun, newState] = resp
      fun = nextFun
      state = newState
    }
    lastLine = lines[length] || ""
  }

  // ...

If a fragment is found, the worker tries to replace it with the contents of the respective prefetched request.
If not, either the fragments fallback-content is returned, or it is simply removed from the output.

Recap

The article shows the power of running code on the http-edge. With the power of V8 at your fingertips, you can really build great services right in front of your content delivery. Edge side includes, if narrowed down to a small feature set, are simple to implement and can even be safely controlled with timeouts.

My client, Titel Media, financed the work on this worker. Stop by at highsnobiety.com. Also, I want to thank the folk from Cloudflare, Harris Hancock and Matthew Prince for their outstanding support, while developing this worker.

Always Remember: “Web development is the art of finding the most complex way to concatenate strings.” Leave a message, or subscribe if you liked this post. I am curious what you think about this solution.