
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Tue, 14 Apr 2026 23:05:18 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Redesigning Workers KV for increased availability and faster performance]]></title>
            <link>https://blog.cloudflare.com/rearchitecting-workers-kv-for-redundancy/</link>
            <pubDate>Fri, 08 Aug 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ Workers KV is Cloudflare's global key-value store. After the incident on June 12, we re-architected KV’s redundant storage backend, remove single points of failure, and make substantial improvements.
 ]]></description>
            <content:encoded><![CDATA[ <p>On June 12, 2025, Cloudflare suffered a significant service outage that affected a large set of our critical services. As explained in <a href="https://blog.cloudflare.com/cloudflare-service-outage-june-12-2025/"><u>our blog post about the incident</u></a>, the cause was a failure in the underlying storage infrastructure used by our Workers KV service. Workers KV is not only relied upon by many customers, but serves as critical infrastructure for many other Cloudflare products, handling configuration, authentication and asset delivery across the affected services. Part of this infrastructure was backed by a third-party cloud provider, which experienced an outage on June 12 and directly impacted availability of our KV service.</p><p>Today we're providing an update on the improvements that have been made to Workers KV to ensure that a similar outage cannot happen again. We are now storing all data on our own infrastructure. We are also serving all requests from our own infrastructure in addition to any third-party cloud providers used for redundancy, ensuring high availability and eliminating single points of failure. Finally, the work has meaningfully improved performance and set a clear path for the removal of any reliance on third-party providers as redundant back-ups.</p>
    <div>
      <h2>Background: The Original Architecture</h2>
      <a href="#background-the-original-architecture">
        
      </a>
    </div>
    <p>Workers KV is a global key-value store that supports high read volumes with low latency. Behind the scenes, the service stores data in regional storage and caches data across Cloudflare's network to deliver exceptional read performance, making it ideal for configuration data, static assets, and user preferences that need to be available instantly around the globe.</p><p>Workers KV was initially launched in September 2018, predating Cloudflare-native storage services like Durable Objects and R2. As a result, Workers KV's original design leveraged <a href="https://www.cloudflare.com/learning/cloud/what-is-object-storage/">object storage</a> offerings from multiple third-party cloud service providers, maximizing availability via provider redundancy. The system operated in an active-active configuration, successfully serving requests even when one of the providers was unavailable, experiencing errors, or performing slowly.</p><p>Requests to Workers KV were handled by Storage Gateway Worker (SGW), a service running on Cloudflare Workers. When it received a write request, SGW would simultaneously write the key-value pair to two different third-party object storage providers, ensuring that data was always available from multiple independent sources. Deletes were handled similarly, by writing a special tombstone value in place of the object to mark the key as deleted, with these tombstones garbage collected later.</p><p>Reads from Workers KV could usually be served from Cloudflare's cache, providing reliably low latency. For reads of data not in cache, the system would race requests against both providers and return whichever response arrived first, typically from the geographically closer provider. This racing approach optimized read latency by always taking the fastest response while providing resilience against provider issues.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/47G3VI7yhmPK7xMblLqYH9/1c69708e7e408d8f235964b26586249f/image3.png" />
          </figure><p>Given the inherent difficulty of keeping two independent storage providers synchronized, the architecture included sophisticated machinery to handle data consistency issues between backends. Despite this machinery, consistency edge cases remained more frequent than consumers required due to the inherently imperfect availability of upstream object storage systems and the challenges of maintaining perfect synchronization across independent providers.</p><p>Over the years, the system's implementation evolved significantly, including<a href="https://blog.cloudflare.com/faster-workers-kv/"> <u>a variety of performance improvements we discussed last year</u></a>, but the fundamental dual-provider architecture remained unchanged. This provided a reliable foundation for the massive growth in Workers KV usage while maintaining the performance characteristics that made it valuable for global applications.</p>
    <div>
      <h2>Scaling Challenges and Architectural Trade-offs</h2>
      <a href="#scaling-challenges-and-architectural-trade-offs">
        
      </a>
    </div>
    <p>As Workers KV usage scaled dramatically and access patterns became more diverse, the dual-provider architecture faced mounting operational challenges. The providers had fundamentally different limits, failure modes, APIs, and operational procedures that required constant adaptation. </p><p>The scaling issues extended beyond provider reliability. As KV traffic increased, the total number of IOPS exceeded what we could write to local cache infrastructure, forcing us to rely on traditional caching approaches when data was fetched from origin storage. This shift exposed additional consistency edge cases that hadn't been apparent at smaller scales, as the caching behavior became less predictable and more dependent on upstream provider performance characteristics.</p><p>Eventually, the combination of consistency issues, provider reliability disparities, and operational overhead led to a strategic decision to reduce complexity by moving to a single object storage provider earlier this year. This decision was made with awareness of the increased risk profile, but we believed the operational benefits outweighed the risks and viewed this as a temporary intermediate state while we developed our own storage infrastructure.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6ZNIELjE1IHxvygKmVnFEX/f942b19fc2eb20055c350f5a8d83ad60/image5.png" />
          </figure><p>Unfortunately, on June 12, 2025, that risk materialized when our remaining third-party cloud provider experienced a global outage, causing a high percentage of Workers KV requests to fail for a period that lasted over two hours. The cascading impact to customers and to other Cloudflare services was severe: Access failed all identity-based logins, Gateway proxy became unavailable, WARP clients couldn't connect, and dozens of other services experienced significant disruptions.</p>
    <div>
      <h2>Designing the Solution</h2>
      <a href="#designing-the-solution">
        
      </a>
    </div>
    <p>The immediate goal after the incident was clear: bring at least one other fully redundant provider online such that another single-provider outage would not bring KV down. The new provider needed to handle massive scale along several dimensions: hundreds of billions of key-value pairs, petabytes of data stored, millions of GET requests per second, tens of thousands of steady-state PUT/DELETE requests per second, and tens of gigabits per second of throughput—all with high availability and low single-digit millisecond internal latency.</p><p>One obvious option was to bring back the provider that we had disabled earlier in the year. However, we could not just flip the switch back. The infrastructure to run in the dual backend configuration on the prior third-party storage provider was gone and the code had experienced some bit rot, making it infeasible to quickly revert to the previous dual-provider setup. </p><p>Additionally, the other provider had frequently been a source of their own operational problems, with relatively high error rates and concerningly low request throughput limits, that made us hesitant to rely on it again. Ultimately, we decided that our second provider should be entirely owned and operated by Cloudflare.</p><p>The next option was to build directly on top of <a href="https://www.cloudflare.com/developer-platform/products/r2/">Cloudflare R2</a>. We already had a private beta version of Workers KV running on R2, but this experience helped us better understand Workers KV's unique storage requirements. Workers KV's traffic patterns are characterized by hundreds of billions of small objects with a median size of just 288 bytes—very different from typical object storage workloads that assume larger file sizes.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7HwNtmJvOyyLDKPtMwaSxh/035283328f33e80d37832762b994a97b/3.png" />
          </figure><p>For workloads dominated by sub-1KB objects at this scale, database storage becomes significantly more efficient and cost-effective than traditional object storage. When you need to store billions of very small values with minimal per-value overhead, a database is a natural architectural fit. We're working on optimizations for R2 such as inlining small objects with metadata to eliminate additional retrieval hops that will improve performance for small objects, but for our immediate needs, a database-backed solution offered the most promising path forward.</p><p>After thorough evaluation of possible options, we decided to use a distributed database already in production at Cloudflare. This same database is used behind the scenes by both R2 and Durable Objects, giving us several key advantages: we have deep in-house expertise and existing automation for deployment and operations, and we knew we could depend on its reliability and performance characteristics at scale.</p><p>We sharded data across multiple database clusters, each with three-way replication for durability and availability. This approach allows us to scale capacity horizontally while maintaining strong consistency guarantees within each shard. We chose to run multiple clusters rather than one massive system to ensure a smaller blast radius if any cluster becomes unhealthy and to avoid pushing the practical limits of single-cluster scalability as Workers KV continues to grow.</p>
    <div>
      <h2>Implementing the Solution</h2>
      <a href="#implementing-the-solution">
        
      </a>
    </div>
    <p>One immediate challenge that we ran into when implementing the system was connectivity. The SGW needed to communicate with database clusters running in our core datacenters, but databases typically use binary protocols over persistent TCP connections—not the HTTP-based communication patterns that work efficiently across our global network.</p><p>We built KV Storage Proxy (KVSP) to bridge this gap. KVSP is a service that provides an HTTP interface that our SGW can use while managing the complex database connectivity, authentication, and shard routing behind the scenes. KVSP stripes namespaces across multiple clusters using consistent hashing, preventing hotspotting where popular namespaces could overwhelm single clusters, eliminating noisy neighbor issues, and ensuring capacity limitations are distributed rather than concentrated.</p><p>The biggest downside of using a distributed database for Workers KV’s storage is that, while it excels at handling the small objects that dominate KV traffic, it is not optimal for the occasional large values of up to 25 MiB that some users store. Rather than compromise on either use case, we extended KVSP to automatically route larger objects to Cloudflare R2, creating a hybrid storage architecture that optimizes the backend choice based on object characteristics. From the perspective of SGW, this complexity is completely transparent—the same HTTP API works for all objects regardless of size.</p><p>We also restored our dual-provider capabilities between storage providers from KV’s prior architecture and adapted them to work well in tandem with the changes that had been made to KV’s implementation since it dropped down to a single provider. The modified system now operates by racing writes to both backends simultaneously, but returns success to the client as soon as the first backend confirms the write.</p><p>This improvement minimizes latency while ensuring durability across both systems. When one backend succeeds but the other fails—due to temporary network issues, rate limiting, or service degradation—the failed write is queued for background reconciliation, which serves as part of our synchronization machinery that is described in more detail below.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2rUmyvzl6LsWdsjoD8H7vv/23d25498c1dc01cc3df84d92fa2c500e/image6.png" />
          </figure>
    <div>
      <h2>Deploying the Solution</h2>
      <a href="#deploying-the-solution">
        
      </a>
    </div>
    <p>With the hybrid architecture implemented, we began a careful rollout process designed to validate the new system while maintaining service availability.</p><p>The first step was introducing background writes from SGW to the new Cloudflare backend. This allowed us to validate write performance and error rates under real production load without affecting read traffic or user experience. It also was a necessary step in copying all data over to the new backend.</p><p>Next, we copied existing data from the third-party provider to our new backend running on Cloudflare infrastructure, routing the data through KVSP. This brought us to a critical milestone: we were now in a state where we could manually failover all operations to the new backend within minutes in the event of another provider outage. The single point of failure that caused the June incident had been eliminated.</p><p>With confidence in the failover capability, we began enabling our first namespaces in active-active mode, starting with internal Cloudflare services where we had sophisticated monitoring and deep understanding of the workloads. We dialed up traffic very slowly, carefully comparing results between backends. The fact that SGW could see responses from both backends asynchronously—after already returning a response to the user—allowed us to perform detailed comparisons and catch any discrepancies without impacting user-facing latency.</p><p>During testing, we discovered an important consistency regression compared to our single provider setup, which caused us to briefly roll back the change to put namespaces in active-active mode. While Workers KV<a href="https://developers.cloudflare.com/kv/concepts/how-kv-works/"> <u>is eventually consistent by design</u></a>, with changes taking up to 60 seconds to propagate globally as cached versions time out, we had inadvertently regressed read-your-own-write (RYOW) consistency for requests routed through the same Cloudflare point of presence.</p><p>In the previous dual provider active-active setup, RYOW was provided within each PoP because we wrote PUT operations directly to a local cache instead of relying on the traditional caching system in front of upstream storage. However, KV throughput had outscaled the number of IOPS that the caching infrastructure could support, so we could no longer rely on that approach. This wasn't a documented property of Workers KV, but it is behavior that some customers have come to rely on in their applications.</p><p>To understand the scope of this issue, we created an adversarial test framework designed to maximize the likelihood of hitting consistency edge cases by rapidly interspersing reads and writes to a small set of keys from a handful of locations around the world. This framework allowed us to measure the percentage of reads where we observed a violation of RYOW consistency—scenarios where a read immediately following a write from the same point of presence would return stale data instead of the value that was just written. This allowed us to design and verify a new approach to how KV populates and invalidates data in cache, which restored the RYOW behavior that customers expect while maintaining the performance characteristics that make Workers KV effective for high-read workloads.</p>
    <div>
      <h2>How KV Maintains Consistency Across Multiple Backends</h2>
      <a href="#how-kv-maintains-consistency-across-multiple-backends">
        
      </a>
    </div>
    <p>With writes racing to both backends and reads potentially returning different results, maintaining data consistency across independent storage providers requires a sophisticated multi-layered approach. While the details have evolved over time, KV has always taken the same basic approach, consisting of three complementary mechanisms that work together to reduce the likelihood of inconsistencies and minimize the window for data divergence.</p><p>The first line of defense happens during write operations. When SGW sends writes to both backends simultaneously, we treat the write as successful as soon as either provider confirms persistence. However, if a write succeeds on one provider but fails on the other—due to network issues, rate limiting, or temporary service degradation—the failed write is captured and sent to a background reconciliation system. This system deduplicates failed keys and initiates a synchronization process to resolve the inconsistency.</p><p>The second mechanism activates during read operations. When SGW races reads against both providers and notices different results, it triggers the same background synchronization process. This helps ensure that keys that become inconsistent are brought back into alignment when first accessed rather than remaining divergent indefinitely.</p><p>The third layer consists of background crawlers that continuously scan data across both providers, identifying and fixing any inconsistencies missed by the previous mechanisms. These crawlers also provide valuable data on consistency drift rates, helping us understand how frequently keys slip through the reactive mechanisms and address any underlying issues.</p><p>The synchronization process itself relies on version metadata that we attach to every key-value pair. Each write automatically generates a new version consisting of a high-precision timestamp plus a random nonce, stored alongside the actual data. When comparing values between providers, we can determine which version is newer based on these timestamps. The newer value is then copied to the provider with the older version.</p><p>In rare cases where timestamps are within milliseconds of each other, clock skew could theoretically cause incorrect ordering, though given the tight bounds we maintain on our clocks through<a href="https://www.cloudflare.com/time/"> <u>Cloudflare Time Services</u></a> and typical write latencies, such conflicts would only occur with nearly simultaneous overlapping writes.</p><p>To prevent data loss during synchronization, we use conditional writes that verify that the last timestamp is older before writing instead of blindly overwriting values. This allows us to avoid introducing new inconsistency issues in cases where requests in close proximity succeed to different backends and the synchronization process copies older values over newer values.</p><p>Similarly, we can’t just delete data when the user requests it because if the delete only succeeded to one backend, the synchronization process would see this as missing data and copy it from the other backend. Instead, we overwrite the value with a tombstone that has a newer timestamp and no actual data. Only after both providers have the tombstone do we proceed with actually removing the keys from storage.</p><p>This layered consistency architecture doesn't guarantee strong consistency, but in practice it does eliminate most mismatches between backends while maintaining a performance profile that makes Workers KV attractive for latency-sensitive, high-read workloads while also providing high availability in the case of any backend errors. In distributed systems terms, KV chooses availability (AP) over consistency (CP) in the <a href="https://en.wikipedia.org/wiki/CAP_theorem"><u>CAP theorem</u></a>, and more interestingly also chooses latency over consistency in the absence of a partition, meaning it’s PA/EL under the <a href="https://en.wikipedia.org/wiki/PACELC_design_principle"><u>PACELC theorem</u></a>. Most inconsistencies are resolved within seconds through the reactive mechanisms, while the background crawlers ensure that even edge cases are typically corrected over time.</p><p>The above description applies to both our historical dual-provider setup and today's implementation, but two key improvements in the current architecture lead to significantly better consistency outcomes. First, KVSP maintains a much lower steady-state error rate compared to our previous third-party providers, reducing the frequency of write failures that create inconsistencies in the first place. Second, we now race all reads against both backends, whereas the previous system optimized for cost and latency by preferentially routing reads to a single provider after an initial learning period.</p><p>In the original dual-provider architecture, each SGW instance would initially race reads against both providers to establish baseline performance characteristics. Once an instance determined that one provider consistently outperformed the other for its geographic region, it would route subsequent reads exclusively to the faster provider, only falling back to the slower provider when the primary experienced failures or abnormal latency. While this approach effectively controlled third-party provider costs and optimized read performance, it created a significant blind spot in our consistency detection mechanisms—inconsistencies between providers could persist indefinitely if reads were consistently served from only one backend.</p>
    <div>
      <h2>Results: Performance and Availability Gains</h2>
      <a href="#results-performance-and-availability-gains">
        
      </a>
    </div>
    <p>With these consistency mechanisms in place and our careful rollout strategy validated through internal services, we continued expanding active-active operation to additional namespaces across both internal and external workloads, and we were thrilled with what we saw. Not only did the new architecture provide the increased availability we needed for Workers KV, it also delivered significant performance improvements.</p><p>These performance gains were particularly pronounced in Europe, where our new storage backend is located, but the benefits extended far beyond what geographic locality alone could explain. The internal latency improvements compared to the third-party object store we were writing to in parallel were remarkable.</p><p>For example, p99 internal latency for reads to KVSP were below 5 milliseconds. For comparison, non-cached reads to the third-party object store from our closest location—after normalizing for transit time to create an apples-to-apples comparison—were typically around 80ms at p50 and 200ms at p99.</p><p>The graphs below show the closest thing that we can get to an apples-to-apples comparison: our observed internal latency for requests to KVSP compared with observed latency for requests that are cache misses and end up being forwarded to the external service provider from the closest point of presence, which includes an additional 5-10 milliseconds of request transit time.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6MtjLi4ajLQTTcqQ77Qgm0/10c9b30285d2725f6bf94a519c2d3a78/5.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3zOE8nlQm46xejHuC6ns7b/780bdba6913e45b4332841cfb287400c/6.png" />
          </figure><p>These performance improvements translated directly into faster response times for the many internal Cloudflare services that depend on Workers KV, creating cascading benefits across our platform. The database-optimized storage proved particularly effective for the small object access patterns that dominate Workers KV traffic.</p><p>After seeing these positive results, we continued expanding the rollout, copying data and enabling groups of namespaces for both internal and external customers. The combination of improved availability and better performance validated our architectural approach and demonstrated the value of building critical infrastructure on our own platform.</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Our immediate plans focus on expanding this hybrid architecture to provide even greater resilience and performance for Workers KV. We're rolling out the KVSP solution to additional locations, creating a truly global distributed backend that can serve traffic entirely from our own infrastructure while also working to further improve how quickly we reach consistency between providers and in cache after writes.</p><p>Our ultimate goal is to eliminate our remaining third-party storage dependency entirely, achieving full infrastructure independence for Workers KV. This will remove the external single points of failure that led to the June incident while giving us complete control over the performance and reliability characteristics of our storage layer.</p><p>Beyond Workers KV, this project has demonstrated the power of hybrid architectures that combine the best aspects of different storage technologies. The patterns we've developed—using KVSP as a translation layer, automatically routing objects based on size characteristics, and leveraging our existing database expertise—can be leveraged by other services that need to balance global scale with strong consistency requirements. The journey from a single-provider setup to a resilient hybrid architecture running on Cloudflare infrastructure demonstrates how thoughtful engineering can turn operational challenges into competitive advantages. With dramatically improved performance and active-active redundancy, Workers KV is well positioned to serve as an even more reliable foundation for the growing set of customers that depend on it.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers KV]]></category>
            <category><![CDATA[Post Mortem]]></category>
            <guid isPermaLink="false">6Cd705JORQK737rDeTEDX8</guid>
            <dc:creator>Alex Robinson</dc:creator>
            <dc:creator>Tyson Trautmann</dc:creator>
        </item>
        <item>
            <title><![CDATA[Hyperdrive: making databases feel like they’re global]]></title>
            <link>https://blog.cloudflare.com/hyperdrive-making-regional-databases-feel-distributed/</link>
            <pubDate>Thu, 28 Sep 2023 13:02:00 GMT</pubDate>
            <description><![CDATA[ Hyperdrive makes accessing your existing databases from Cloudflare Workers, wherever they are running, hyper fast ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/38AL8kYOfUuUOlGD2p2Wfw/596490c5c841fb416154cbce56cc830b/image1-33.png" />
            
            </figure><p>Hyperdrive makes accessing your existing databases from Cloudflare Workers, wherever they are running, hyper fast. You connect Hyperdrive to your database, change one line of code to connect through Hyperdrive, and voilà: connections and queries get faster (and spoiler: <a href="https://developers.cloudflare.com/hyperdrive/">you can use it today</a>).</p><p>In a nutshell, Hyperdrive uses our global network to speed up queries to your existing databases, whether they’re in a legacy cloud provider or with <a href="https://www.cloudflare.com/developer-platform/products/d1/">your favorite serverless database provider;</a> dramatically reduces the <a href="https://www.cloudflare.com/learning/performance/glossary/what-is-latency/">latency</a> incurred from repeatedly setting up new database connections; and caches the most popular read queries against your database, often avoiding the need to go back to your database at all.</p><p>Without Hyperdrive, that core database — the one with your user profiles, product inventory, or running your critical web app — sitting in the us-east1 region of a legacy cloud provider is going to be really slow to access for users in Paris, Singapore and Dubai and slower than it should be for users in Los Angeles or Vancouver. With each round trip taking up to 200ms, it’s easy to burn up to a second (or more!) on the multiple round-trips needed just to set up a connection, before you’ve even made the query for your data. Hyperdrive is designed to fix this.</p><p>To demonstrate Hyperdrive’s performance, we built a <a href="https://hyperdrive-demo.pages.dev/">demo application</a> that makes back-to-back queries against the same database: both with Hyperdrive and without Hyperdrive (directly). The app selects a database in a neighboring continent: if you’re in Europe, it selects a database in the US — an all-too-common experience for many European Internet users — and if you’re in Africa, it selects a database in Europe (and so on). It returns raw results from a straightforward <code>SELECT</code> query, with no carefully selected averages or cherry-picked metrics.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3VWco8QZERMlkgBpiilOyA/99245c5a6ce8a208e7fc793121aeaef3/image2-25.png" />
            
            </figure><p><i>We</i> <a href="https://hyperdrive-demo.pages.dev/"><i>built a demo app</i></a> <i>that makes real queries to a PostgreSQL database, with and without Hyperdrive</i></p><p>Throughout internal testing, initial user reports and the multiple runs in our benchmark, Hyperdrive delivers a 17 - 25x performance improvement vs. going direct to the database for cached queries, and a 6 - 8x improvement for uncached queries and writes. The cached latency might not surprise you, but we think that being 6 - 8x faster on uncached queries changes “I can’t query a centralized database from Cloudflare Workers” to “where has this been all my life?!”. We’re also continuing to work on performance improvements: we’ve already identified additional latency savings, and we’ll be pushing those out in the coming weeks.</p><p>The best part? Developers with a Workers paid plan can <a href="https://developers.cloudflare.com/hyperdrive/">start using the Hyperdrive open beta immediately</a>: there are no waiting lists or special sign-up forms to navigate.</p>
    <div>
      <h3>Hyperdrive? Never heard of it?</h3>
      <a href="#hyperdrive-never-heard-of-it">
        
      </a>
    </div>
    <p>We’ve been working on Hyperdrive in secret for a short while: but allowing developers to connect to databases they already have — with their existing data, queries and tooling — has been something on our minds for quite some time.</p><p>In a modern distributed cloud environment like Workers, where compute is globally distributed (so it’s close to users) and functions are short-lived (so you’re billed no more than is needed), connecting to traditional databases has been both slow and unscalable. Slow because it takes upwards of seven round-trips (<a href="https://www.cloudflare.com/learning/ddos/glossary/tcp-ip/">TCP handshake</a>; <a href="https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/">TLS negotiation</a>; then auth) to establish the connection, and unscalable because databases like PostgreSQL have a <a href="https://www.postgresql.org/message-id/flat/31cc6df9-53fe-3cd9-af5b-ac0d801163f4%40iki.fi">high resource cost per connection</a>. Even just a couple of hundred connections to a database can consume non-negligible memory, separate from any memory needed for queries.</p><p>Our friends over at Neon (a popular serverless Postgres provider) wrote about this, and <a href="https://neon.tech/blog/serverless-driver-for-postgres">even released a WebSocket proxy and driver to reduce the</a> connection overhead, but are still fighting uphill in the snow: even with a custom driver, we’re down to 4 round-trips, each still potentially taking 50-200 milliseconds or more. When those connections are long-lived, that’s OK — it might happen once every few hours at best. But when they’re scoped to an individual function invocation, and are only useful for a few milliseconds to minutes at best — your code spends more time waiting. It’s effectively another kind of cold start: having to initiate a fresh connection to your database before making a query means that using a traditional database in a distributed or serverless environment is (to put it lightly) <i>really slow</i>.</p><p>To combat this, Hyperdrive does two things.</p><p>First, it maintains a set of regional database connection pools across Cloudflare’s network, so a Cloudflare Worker avoids making a fresh connection to a database on every request. Instead, the Worker can establish a connection to Hyperdrive (fast!), with Hyperdrive maintaining a pool of ready-to-go connections back to the database. Since a database can be anywhere from 30ms to (often) 300ms away over a <i>single</i> round-trip (let alone the seven or more you need for a new connection), having a pool of available connections dramatically reduces the latency issue that short-lived connections would otherwise suffer.</p><p>Second, it understands the difference between read (non-mutating) and write (mutating) queries and transactions, and can automatically cache your most popular read queries: which represent over 80% of most queries made to databases in typical web applications. That product listing page that tens of thousands of users visit every hour; open jobs on a major careers site; or even queries for config data that changes occasionally; a tremendous amount of what is queried does not change often, and caching it closer to where the user is querying it from can dramatically speed up access to that data for the next ten thousand users. Write queries, which can’t be safely cached, still get to benefit from both Hyperdrive’s connection pooling <i>and</i> Cloudflare’s <a href="https://www.cloudflare.com/network/">global network</a>: being able to take the fastest routes across the Internet across our backbone cuts down latency there, too.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2ZPuh9S7KNGNfcIOW1FSqF/07cd95b1d45dd66d7b13fcab51cf4189/image4-16.png" />
            
            </figure><p><i>Even if your database is on the other side of the country, 70ms x 6 round-trips is a lot of time for a user to be waiting for a query response.</i></p><p>Hyperdrive works not only with PostgreSQL databases — including <a href="https://neon.tech/">Neon</a>, Google Cloud SQL, AWS RDS, and <a href="https://www.timescale.com/">Timescale</a>, but also PostgreSQL-compatible databases like <a href="https://materialize.com/">Materialize</a> (a powerful stream-processing database), <a href="https://www.cockroachlabs.com/">CockroachDB</a> (a major distributed database), Google Cloud’s <a href="https://cloud.google.com/alloydb">AlloyDB</a>, and AWS Aurora Postgres.</p><p>We’re also working on bringing support for MySQL, including providers like PlanetScale, by the end of the year, with more database engines planned in the future.</p>
    <div>
      <h3>The magic connection string</h3>
      <a href="#the-magic-connection-string">
        
      </a>
    </div>
    <p>One of the major design goals for Hyperdrive was the need for developers to keep using their existing drivers, query builder and ORM (Object-Relational Mapper) libraries. It wouldn’t have mattered how fast Hyperdrive was if we required you to migrate away from your favorite ORM and/or rewrite hundreds (or more) lines of code &amp; tests to benefit from Hyperdrive’s performance.</p><p>To achieve this, we worked with the maintainers of popular open-source drivers — including <a href="https://node-postgres.com/">node-postgres</a> and <a href="https://github.com/porsager/postgres">Postgres.js</a> — to help their libraries support <a href="/workers-tcp-socket-api-connect-databases/">Worker’s new TCP socket API</a>, which is going through the <a href="https://github.com/wintercg/proposal-sockets-api">standardization process</a>, and we expect to see land in Node.js, Deno and Bun as well.</p><p>The humble database connection string is the shared language of database drivers, and typically takes on this format:</p>
            <pre><code>postgres://user:password@some.database.host.example.com:5432/postgres</code></pre>
            <p>The magic behind Hyperdrive is that you can start using it in your existing Workers applications, with your existing queries, just by swapping out your connection string for the one Hyperdrive generates instead.</p>
    <div>
      <h3>Creating a Hyperdrive</h3>
      <a href="#creating-a-hyperdrive">
        
      </a>
    </div>
    <p>With an existing database ready to go — in this example, we’ll use a Postgres database from <a href="https://neon.tech/">Neon</a> — it takes less than a minute to get Hyperdrive running (yes, we timed it).</p><p>If you don’t have an existing Cloudflare Workers project, you can quickly create one:</p>
            <pre><code>$ npm create cloudflare@latest
# Call the application "hyperdrive-demo"
# Choose "Hello World Worker" as your template</code></pre>
            <p>From here, we just need the database connection string for our database and a quick <a href="https://developers.cloudflare.com/workers/wrangler/install-and-update/">wrangler command-line</a> invocation to have Hyperdrive connect to it.</p>
            <pre><code># Using wrangler v3.10.0 or above
wrangler hyperdrive create a-faster-database --connection-string="postgres://user:password@neon.tech:5432/neondb"

# This will return an ID: we'll use this in the next step</code></pre>
            <p>Add our Hyperdrive to the <a href="https://developers.cloudflare.com/workers/configuration/bindings/">wrangler.toml configuration</a> file for our Worker:</p>
            <pre><code>[[hyperdrive]]
binding = "HYPERDRIVE"
id = "cdb28782-0dfc-4aca-a445-a2c318fb26fd"</code></pre>
            <p>We can now write a <a href="https://developers.cloudflare.com/workers/">Worker</a> — or take an existing Worker script — and use Hyperdrive to speed up connections and queries to our existing database. We use <a href="https://node-postgres.com/">node-postgres</a> here, but we could just as easily use <a href="https://orm.drizzle.team/">Drizzle ORM</a>.</p>
            <pre><code>import { Client } from 'pg';

export interface Env {
	HYPERDRIVE: Hyperdrive;
}

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext) {
		console.log(JSON.stringify(env));
		// Create a database client that connects to our database via Hyperdrive
		//
		// Hyperdrive generates a unique connection string you can pass to
		// supported drivers, including node-postgres, Postgres.js, and the many
		// ORMs and query builders that use these drivers.
		const client = new Client({ connectionString: env.HYPERDRIVE.connectionString });

		try {
			// Connect to our database
			await client.connect();

			// A very simple test query
			let result = await client.query({ text: 'SELECT * FROM pg_tables' });

			// Return our result rows as JSON
			return Response.json({ result: result });
		} catch (e) {
			console.log(e);
			return Response.json({ error: JSON.stringify(e) }, { status: 500 });
		}
	},
};</code></pre>
            <p>The code above is intentionally simple, but hopefully you can see the magic: our database driver gets a connection string from Hyperdrive, and is none-the-wiser. It doesn’t need to know anything about Hyperdrive, we don’t have to toss out our favorite query builder library, and we can immediately realize the speed benefits when making queries.</p><p>Connections are automatically pooled and kept warm, our most popular queries are cached, and our entire application gets faster.</p><p>We’ve also built out <a href="https://developers.cloudflare.com/hyperdrive/examples/">guides for every major database provider</a> to make it easy to get what you need from them (a connection string) into Hyperdrive.</p>
    <div>
      <h3>Going fast can’t be cheap, right?</h3>
      <a href="#going-fast-cant-be-cheap-right">
        
      </a>
    </div>
    <p>We think Hyperdrive is critical to accessing your existing databases when building on Cloudflare Workers: traditional databases were just never designed for a world where clients are globally distributed.</p><p><b>Hyperdrive’s connection pooling will always be free</b>, for both database protocols we support today and new database protocols we add in the future. Just like <a href="https://www.cloudflare.com/ddos/">DDoS protection</a> and our global <a href="https://www.cloudflare.com/application-services/products/cdn/">CDN</a>, we think access to Hyperdrive’s core feature is too useful to hold back.</p><p>During the open beta, Hyperdrive itself will not incur any charges for usage, regardless of how you use it. We’ll be announcing more details on how Hyperdrive will be priced closer to GA (early in 2024), with plenty of notice.</p>
    <div>
      <h3>Time to query</h3>
      <a href="#time-to-query">
        
      </a>
    </div>
    <p>So where to from here for Hyperdrive?</p><p>We’re planning on bringing Hyperdrive to GA in early 2024 — and we’re focused on landing more controls over how we cache &amp; automatically invalidate based on writes, detailed query and performance analytics (soon!), support for more database engines (including MySQL) as well as continuing to work on making it even faster.</p><p>We’re also working to enable private network connectivity via <a href="https://developers.cloudflare.com/magic-wan/">Magic WAN</a> and Cloudflare Tunnel, so that you can connect to databases that aren’t (or can’t be) exposed to the public Internet.</p><p>To connect Hyperdrive to your existing database, visit our <a href="https://developers.cloudflare.com/hyperdrive/">developer docs</a> — it takes less than a minute to create a Hyperdrive and update existing code to use it. Join the <i>#hyperdrive-beta</i> channel in our <a href="https://discord.cloudflare.com/">Developer Discord</a> to ask questions, surface bugs, and talk to our Product &amp; Engineering teams directly.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1BbZolVBokU7h0EedVawor/e98def78df39541f03a595ef2e083387/image3-31.png" />
            
            </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Database]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">3buVHDU7WwOkwln1S36n02</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Alex Robinson</dc:creator>
        </item>
    </channel>
</rss>