Subscribe to receive notifications of new posts:

Why Cloudflare Chose AMD EPYC for Gen X Servers

2020-03-01

2 min read

From the very beginning Cloudflare used Intel CPU-based servers (and, also, Intel components for things like NICs and SSDs). But we're always interested in optimizing the cost of running our service so that we can provide products at a low cost and high gross margin.

We're also mindful of events like the Spectre and Meltdown vulnerabilities and have been working with outside parties on research into mitigation and exploitation which we hope to publish later this year.

We looked very seriously at ARM-based CPUs and continue to keep our software up to date for the ARM architecture so that we can use ARM-based CPUs when the requests per watt is interesting to us.

In the meantime, we've deployed AMD's EPYC processors as part of Gen X server platform and for the first time are not using any Intel components at all. This week, we announced details of this tenth generation of servers. Below is a recap of why we're excited about the design, specifications, and performance of our newest hardware.

Every server can run every service. This architectural decision has helped us achieve higher efficiency across the Cloudflare network. It has also given us more flexibility to build new software or adopt the newest available hardware.

Notably, Intel is not inside. We are not using their hardware for any major server components such as the CPU, board, memory, storage, network interface card (or any type of accelerator).

This time, AMD is inside.

Compared with our prior server (Gen 9), our Gen X server processes as much as 36% more requests while costing substantially less. Additionally, it enables a ~50% decrease in L3 cache miss rate and up to 50% decrease in NGINX p99 latency, powered by a CPU rated at 25% lower TDP (thermal design power) per core.

To identify the most efficient CPU for our software stack, we ran several benchmarks for key workloads such as cryptography, compression, regular expressions, and LuaJIT. Then, we simulated typical requests we see, before testing servers in live production to measure requests per watt.    

Based on our findings, we selected the single socket 48-core AMD 2nd Gen EPYC 7642.

The single AMD EPYC 7642 performed very well during our lab testing, beating our Gen 9 server with dual Intel Xeon Platinum 6162 with the same total number of cores. Key factors we noticed were its large L3 cache, which led to a low L3 cache miss rate, as well as a higher sustained operating frequency.

Partnering with AMD, we tuned the 2nd Gen EPYC 7642 processor to achieve additional 6% performance. We achieved this by using power determinism and configuring the CPU's Thermal Design Power (TDP).

Finally, we described how we use Secure Memory Encryption (SME), an interesting security feature within the System on a Chip architecture of the AMD EPYC line. We were impressed by how we could achieve RAM encryption without significant decrease in performance. This reduces the worry that any data could be exfiltrated from a stolen server.

We enjoy designing hardware that improves the security, performance and reliability of our global network, trusted by over 26 million Internet properties.

Want to help us evaluate new hardware? Join us!

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
EPYCAMDGen XCacheHardwareSecurity

Follow on X

Nitin Rao|@NitinBRao
Cloudflare|@cloudflare

Related posts

October 08, 2024 1:00 PM

Cloudflare acquires Kivera to add simple, preventive cloud security to Cloudflare One

The acquisition and integration of Kivera broadens the scope of Cloudflare’s SASE platform beyond just apps, incorporating increased cloud security through proactive configuration management of cloud services. ...

October 07, 2024 1:00 PM

Thermal design supporting Gen 12 hardware: cool, efficient and reliable

Great thermal solutions play a crucial role in hardware reliability and performance. Gen 12 servers have implemented an exhaustive thermal analysis to ensure optimal operations within a wide variety of temperature conditions and use cases. By implementing new design and control features for improved power efficiency on the compute nodes we also enabled the support of powerful accelerators to serve our customers....